Exploring Metaheuristic Optimized Machine Learning
Exploring Metaheuristic Optimized Machine Learning
Aleksandar Petrovic , Luka Jovanovic , Nebojsa Bacanin * , Milos Antonijevic , Nikola Savanovic ,
Miodrag Zivkovic , Marina Milovanovic , Vuk Gajic
doi: 10.20944/preprints202409.0021.v1
Keywords: Natural language processing; Software error detection; Metaheuristic; Optimization; XGBoost;
AdaBoost; Convolutional neural networks; explainable artificial intelligence
Copyright: This is an open access article distributed under the Creative Commons
Attribution License which permits unrestricted use, distribution, and reproduction in any
medium, provided the original work is properly cited.
Preprints.org (www.preprints.org) | NOT PEER-REVIEWED | Posted: 2 September 2024 doi:10.20944/preprints202409.0021.v1
Disclaimer/Publisher’s Note: The statements, opinions, and data contained in all publications are solely those of the individual author(s) and
contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting
from any ideas, methods, instructions, or products referred to in the content.
Article
Exploring Metaheuristic Optimized Machine Learning
for Software Defect Detection on Natural Language
and Classical Datasets
Aleksandar Petrovic 1,†,‡ , Luka Jovanovic 1,‡ , Nebojsa Bacanin 1, *,‡ , Milos Antonijevic 1,‡ ,
Nikola Savanovic 1,‡ , Miodrag Zivkovic 1,‡ , Marina Milovanovic 2,‡ and Vuk Gajic 1
1 Singidunum University, Danijelova 32, 11000 Belgrade, Serbia
2 Union Nikola Tesla University, Cara Dusana 62-64, 11000 Belgrade, Serbia
* Correspondence: [email protected]
† Current address: Danijelova 32, 11000 Belgrade, Serbia
‡ These authors contributed equally to this work.
Abstract: Software is increasingly vital, with automated systems regulating critical functions. As development
demands grow, manual code review becomes more challenging, often making testing more time-consuming than
development. A promising approach to improving defect detection at the source code level is the use of artificial
intelligence combined with natural language processing (NLP). Source code analysis, leveraging machine-readable
instructions, is an effective method for enhancing defect detection and error prevention. This work explores source
code analysis through NLP and machine learning, comparing classical and emerging error detection methods. To
optimize classifier performance, metaheuristic optimizers are used, and a modified algorithm is introduced to
meet the study’s specific needs. The proposed two-tier framework uses a convolutional neural network (CNN)
in the first layer to handle large feature spaces, with AdaBoost and XGBoost classifiers in the second layer to
improve error identification. Additional experiments using Term Frequency-Inverse Document Frequency (TF-
IDF) encoding in the second layer demonstrate the framework’s versatility. Across five experiments with public
datasets, the CNN achieved an accuracy of 0.768799. The second layer, using AdaBoost and XGBoost, further
improved these results to 0.772166 and 0.771044, respectively. Applying NLP techniques yielded exceptional
accuracies of 0.979781 and 0.983893 from the AdaBoost and XGBoost optimizers.
Keywords: natural language processing; software error detection; metaheuristic; optimization; XGBoost; Ad-
aBoost; convolutional neural networks; explainable artificial intelligence;
1. Introduction
The role of software has already reached a critical point where a widespread issue could have
serious consequences for society [1]. This emphasizes the need for a robust system for error handling.
The consequences of software defects can range from trivial to life-threatening, as the applications of
software range from entertainment to medical purposes. The Internet of Things (IoT) and artificial
intelligence (AI) massively contribute to this software dependency [2], as more devices are running
software and more devices are running more complex software. The responsibility of developers
increases as some use cases like autonomous vehicles and medicine require much more extensive
testing [3]. Even though a software defect can be a mere inconvenience in some cases, even those cases
would benefit from software defect prediction (SDP) [4]. The key contribution of SDP in the software
development is in the testing phase. The goal is to prioritize modules that are prone to errors. Such
insights into the state of the project can allow developers to discover errors sooner or even prepare for
them in certain environments.
The process of producing software is called the software development life-cycle (SDLC) during
which SDP should be applied to minimize the number of errors. Software is almost never perfect
and it has become common practice for developers to release unfinished projects and work on them
iteratively through updates. With the use of code writing conventions and other principles, errors can
be minimized, but never fully rooted out. Therefore, a robust system that would assist developers
2 of 44
in finding errors is required. With a system like such, the errors could be found earlier which could
prevent substantial financial losses. To measure the quality of the code defect density is calculated. The
most common form of such measurement is the number of defects per thousand lines of code (KLOC).
The advancements in AI technologies, specifically machine learning (ML) show great potential for
various NLP applications [5]. Considering that the code is a language as well, this potential should be
explored for predictions regarding programming languages as well. When natural and programming
languages are compared various similarities can be observed, but the programming languages are
more strict in terms of writing rules which aids pattern recognition. The quality control process can
be improved through AI use which can simplify the error detection process and ensure better test
coverage. To detect the errors in text it is necessary to understand what techniques are applied through
NLP to achieve this. Tokenization is one of the key concepts and its role is to segment text into smaller
units for processing. Stemming reduces words to their basic forms, while lemmatization identifies the
root of the word through dictionaries and morphological analysis. Different parts of sentences such as
nouns, verbs, and adjectives are identified through the parsing process. The potential of NLP for the
SDP problem is unexplored and a literature gap is observed.
However, the use of these techniques does not come without a cost. Most sophisticated ML
algorithms have extensive parameters that directly affect their performance. The process of finding
the best subset of these parameters is called hyperparameter optimization. The perfect solution in
most cases cannot be achieved in a realistic amount of time, which is why the goal of the optimization
is to find a sub-optimal solution that is very close to the best solution. However, a model with
hyperparameters optimized for one use case does not yield the same performance for other use cases.
This is the problem that is described by the no free lunch (NFL) theorem [6] that states that no solution
provides equally high performance across all use cases. This work builds upon preceding research [7]
to provide a more in depth comparison between optimizers, as well as explore the potential of NLP to
boost error detection in software source code.
• A proposal for a two layer framework that combines CNN and ML classifiers for software defect
detection.
• An introduction of a NLP based approach in combination with ML classifiers for software defect
detection.
• A introduction of a modified optimization algorithm that builds upon the admirable performance
of the original PSO.
• The application of explainable AI techniques to the best performing model in order to determine
the features importance on model decisions
The described research is presented in the following manner: Section 2 gives fundamentals of
the applied techniques, Section 3 describes the main method used in this research, Section 4 provides
the settings of the performed experiments along necessary information for experiment reproduction,
Section 5 follows with the results of experiments, and Section 6 provides the final thoughts on
performed experiments along possibilities for future research.
2. Related works
Software defects, otherwise known as bugs, are software errors that result in incorrect and
unexpected behavior. Various scenarios produce errors but most come from design flaws, unexpected
component interaction, and coding. Such errors affect performance, but in some cases, the security
of the system can be compromised as well. Through the use of statistical methods, historical data
analysis, and ML such cases can be predicted and reduced.
The use of ensemble learning for SDP was explored by Ali et al. [8]. The authors presented a
framework that trains random forest, support vector machine (SVM), and naive Bayes individually,
which are later combined into an ensemble technique through the soft voting method. The proposed
method obtained one of the highest results while maintaining solid stability. On the other hand, Khleel
Preprints.org (www.preprints.org) | NOT PEER-REVIEWED | Posted: 2 September 2024 doi:10.20944/preprints202409.0021.v1
3 of 44
et al. [9] explore the potential of a bidirectional long short-term model for the same problem. The
technique was tested with random and synthetic minority oversampling techniques. While high
performance was exhibited in both experiments, the authors conclude that the random oversampling
was better due to the class imbalances for the SDP problem. Lastly, Zhang et al. [10] propose a
framework based on a deep Q-learning network (DQN) with the goal of removing irrelevant features.
The authors performed thorough testing of several techniques with and without DQN, and the research
utilized a total of 22 SDP datasets.
The application of NLP techniques in ML is broad and Jim et al. [5] provide a detailed overview of
ML techniques that are suitable for such use cases. The use cases reported include fields from healthcare
to entertainment. The paper also provides depth into the nature of computational techniques used
in NLP. Different approaches include image-text, audio-visual, audio-image-text, labeling, document
level, sentence level, phrase level, and word level approaches to sentiment analysis. Furthermore,
the authors surveyed a list of datasets for this use case. Briciu et al. [11] propose a model based on
bidirectional encoder representations from transformers (BERT) NLP technique for SDP. The authors
compared RoBERTa and CodeBERT-MLM language models for the capture of semantics and context
while a neural network was used as a classifier. Finally, Dash et al. [12] performed an NLP-based
review for sustainable marketing. The authors performed text-mining through keywords and string
selection, while the semantic analysis was performed through the use of term frequency-inverse
document frequency.
The use of metaheuristics as optimizers has proven to yield substantial performance increases
when combined with various AI techniques [13]. Jain et al. [14] proposed a hybrid ensemble learning
technique optimized by an algorithm from the swarm intelligence subgroup of ML algorithms. Some
notable examples of metaheuristic optimizers include the well established variable neighborhood
search (VNS) [15], artificial bee colony (ABC) [16] and bat algorithm (BA) [17]. Some recent additions
also include COLSHADE [18] and the recently introduced sinh cosh optimizer (SCHO) [19]. Opti-
mizers have shown promising outcomes applied in several field including timeseries forecasting [20],
healthcare [21] and anomaly detection in medical timeseries data [22]. Applying hybrid optimizers to
parameter tuning has demonstrated decent outcomes in preceding works as well [23–25].
The SDP problem is considered to have nondeterministic polynomial time hardness (NP-hard), as
the problem cannot be solved by manual search in a realistic amount of time. Hence the optimizers
are to be applied such as swarm intelligence algorithms. However, the process is not as simple
since for every use case a custom set of AI techniques has to be applied due to the NFL theorem [6].
Furthermore, the problem of hyperparameter optimization which is required to yield the most out of
the performance of AI techniques is considered NP-hard as well. The application of NLP techniques
for SDP is limited in literature, indicating a research gap that this work aims to bridge.
4 of 44
the language and its patterns. Training with large datasets can prepare BERT models for specialized
tasks due to their transfer learning functionality. The application of BERT in software detection is
not explored. The BERT has proven a reliable and efficient technique across many different NLP use
cases where there are many more factors to account for than with programming languages. The latter
are more strict in their rules of writing and are more prone to repeating patterns which reduces the
complexity of the text analysis. A promising text mining technique with roots in statistical text analysis
is term frequency inverse document frequency (TF-IDF) []. As the name implies it is comprised of two
components. Term frequency is determined as the ration between a term occurring in a document
versus the total number of terms in said document. Mathematically it can be determined as Eq 1:
2.2. CNN
The CNNs are recognized in the deep learning field for their high performance and versatility
[28,29]. The multi-layered visual cortex of the animal brain served as the main inspiration for this
technique. The information is moved between the layers where the input of the next layer is the output
of the previous layer. The information gets filtered and processed in this manner. The complexity of
the data is decreased through each layer while the ability of the model to detect finer details increases.
The architecture of the CNN model consists of the convolutional, pooling, and fully connected layers.
Filters that are most commonly used are 3×3, 5×5, and 7×7.
To achieve the highest performance CNNs require hyperparameter optimization. The most
commonly tuned parameters based on their impact on performance are the number of kernels and
kernel size, learning rate, batch size, the amount of each type of layer used, weight regularization,
activation function, and dropout rate. s, the activation function, the dropout rate, and so on are
some examples of hyperparameters. This process is considered NP-hard, however, the metaheuristic
methods have proven to yield results when applied as optimizers for CNN hyperparameter tuning [14].
The convolutional function provides the input vector described in Equation (4).
[l ] [l ] [l ] [l ]
zi,j,k = wk xi,j + bk , (4)
[l ]
where the output of the k-th feature at position i, j and layer l is given as zi,j,k , the input at i, j is x, the
display filters are given as w, and the bias is shown as b.
The activation function follows the convolution function shown in Equation (5).
[l ] [l ]
gi,j,k = g(zi,j,k ) (5)
5 of 44
After the activation function, the pooling layers process the input toward resolution reduction.
Different pooling functions can be used, and some of the most popular ones are average and max
pooling. This behavior is described in Equation (6).
[l ] [l ]
yi,j,k = pooling( gi,j,k ). (6)
H ( p, q) = − ∑ p( x )ln(q( x )) (7)
x
where the discrete variable x has two distributions defined over it, p and q.
2.3. AdaBoost
During the previous decade, ML has constantly grown as a field. As a result, a large number
of algorithms have been produced due to their disproportionate contributions based on the area of
application. Adaptive boosting (AdaBoost) aims to overcome this through the application of weaker
algorithms as a group. The algorithm was developed by Freund and Schapire in 1995 [30]. Algorithms
that are considered weak perform classification slightly better than random guessing. The AdaBoost
technique applies more weak classifiers through each iteration and balances the classifier’s weight
which is derived from accuracy. For errors in classification, the weights are decreased, while increases
in weights are performed for good classifications.
The error of a weak classifier is calculated according to Equation 8.
where the error weight in the t-th iteration is given as ϵt , the number of training samples is given as N,
and the weight of i-th training sample during the t-th iteration wi,t . The ht ( xi ) represents the predicted
label, and yi shows the true label. The function I(·) provides 0 for false cases, and 1 for true cases.
After the weights have been established, the modification process for weights begins for new
classifiers. To achieve accurate classification large groups of classifiers should be used. The combination
of sub-models and their results represents a linear model. The weight are calculated in the ensemble as
per Equation (9).
1 − ϵt
1
αt = ln (9)
2 ϵt
where the αt changes for each weak learner and represents its weight in the final model. The weights
are updated according to the Equation (10).
where yi marks the true mark of the i-th instance, ht ( xi ) represents the prediction result of the weak
student i-th instance in the t-th round, and wi,t denotes the weight, i-th instance in the t-th round.
The advantages of AdaBoost are that it reduces bias through learning from previous iterations,
while the ensemble technique reduces variance and prevents overfitting. Therefore, the AdaBoost can
provide robust prediction models. However, AdaBoost is sensitive to noisy data and exceptions.
Preprints.org (www.preprints.org) | NOT PEER-REVIEWED | Posted: 2 September 2024 doi:10.20944/preprints202409.0021.v1
6 of 44
2.4. XGBoost
The XGBoost method is recognized as a high-performing algorithm [31]. However, the highest
performance is achieved only through hyperparameter tuning. The foundation of the algorithm is
ensemble learning which exploits many weaker models. Optimization with regularization along
gradient boosting significantly boosts performance. The motivation behind the technique is to manage
complex input-target relationships through previously observed patterns.
The objective function of the XGBoost that combines the loss function and the regularization term
is provided in Equation (11.
obj(Θ) = L(θ ) + Ω(Θ), (11)
where the Θ shows the hyperparameter set, L(Θ) provides loss function, while the Ω(Θ) shows the
regularization term used for model complexity management.
Mean square error (MSE) is used for utilization of the loss function given in Equation (12).
where the yˆi provides the value predicted for the target for each iteration i, and the yi provides the
predicted value.
The process of differentiating actual and predicted values is given in Equation (13).
3. Methods
7 of 44
the solution that is found to be the best yet is marked as pbest, and the best in the group as gbest. Both
values are used to calculate the next particle’s position. When applying the inertia weight approach
the equation can be modified as provided in Equation (18).
where the vid indicates particle velocity, xid shows the current position, w gives the inertia factor,
relative cognitive influence is provided by c1 , while the social component influence is given as c2 , and
the r1 and r2 are random numbers. The pbest and gbest are provided respectively through pid and p gd .
The inertia factor is modeled by Equation (19).
wmax − wmin
w = wmax − ·t (19)
T
where the initial weight is provided with wmax , the final weight as wmin , the maximum number of
iterations as T, and the current iteration is t.
where Xi,j denotes the j-th factor assigned to individual i, lb j and ub j signify parameter boundaries for
j, ψ is a factor used to introduced randomness selected form a uniform [0, 1] disturbing. The remaining
50% of the population is initialized using the quasi-reflection-based learning (QRL) [35] mechanism as
per Eq. 21.
lb j + ub j
qr
X j = rnd , xj , (21)
2
lb j + ub j
in this equation rnd is used to determine an arbitrary value form , x j limits. Bu incorporating
2
QRL in to the initialization process, diversification is achieved during the initialization increasing
chances of locating more promising solutions.
Once these two populations are generated, they are kept separate during the optimization taking
a multi-population inspired approach. Two mechanisms inspired by the GA are used to communicated
between the populations, genetic crossover and genetic mutation. Crossover is applied to combine
agent parameters between two agents, creating offspring agents. Mutation is used to introduced further
diversification within existing agents by randomly tweaking parameter values within constraints.
These two processes are described in Figure 1
Preprints.org (www.preprints.org) | NOT PEER-REVIEWED | Posted: 2 September 2024 doi:10.20944/preprints202409.0021.v1
8 of 44
9 of 44
10 of 44
4. Experimental Setup
In this work two separate simulations are carried out. The first set of simulations is conducted
using the a group of 5 datasets which are KC1, JM1, CM1, KC2, and PC1. These datasets are part
of NASA’s promise repository aimed at SDP. The instances in datasets represent various software
models and the features depict the quality of that code. McCabe [36] and Halstead [37] metrics were
applied through 22 features. The McCabe approach was applied to its methodology that emphasizes
less complex code through the reduction of pathways, while Halstead uses counting techniques as the
logic behind it was that the larger the code the more it is error-prone. A tow layer approach is applied
to these simulations. The class balance in the utilized dataset is provided in Figure 3. The first utilized
dataset is dis balanced with 22.5% of samples representing software with errors and 77.5% without.
6000
5000
Number of instances
4000
3000
2000 22.50%
1000
0
defect-free defective
Classes
1 https://paperswithcode.com/dataset/mbpp
Preprints.org (www.preprints.org) | NOT PEER-REVIEWED | Posted: 2 September 2024 doi:10.20944/preprints202409.0021.v1
11 of 44
v(g)
ev(g)
lOComment
lOBlank
locCodeAndComment
total_Op
total_Opnd
iv(g)
lOCode
uniq_Op
uniq_Opnd
branchCount
Figure 4. NASA feature correlation heat-map.
Both datasets are separated in to training and testing portions for simulations. An initial 70% of
data is used for training and a later 30% for evaluation. Evaluations are carried out using a standard
set of classification evaluation metrics including accuracy, recall, f1-score and precision. The Matthews
correlation coefficient (MCC) metric is used as the objective function that can be determined as per 22.
An indicator function is also tracked. The indicator function is the Cohen’s kappa which is determined
as per Eq. 23. Metaheuristics are tasked with selecting optimal parameters, maximizing the MCC score.
( TP × TN ) − ( FP × FN )
MCC = p (22)
( TP + FP)( TP + FN )( TN + FP)( TN + FN )
here true positives (TP) denotes samples correctly classified as positive, true negatives (TN) denotes
instances correctly classified as negative. Similarly false positives (FP) and false negatives (FN) denote
samples incorrectly classified as positive and negative respectively.
In the second optimization layer, intermediate outputs of the CNN are used. The final dense
layer is recorded during classifications of all the samples available in the dataset. This is once again
separated in to 70% for training and 30% for testing. These are then utilized to training and optimize
Preprints.org (www.preprints.org) | NOT PEER-REVIEWED | Posted: 2 September 2024 doi:10.20944/preprints202409.0021.v1
12 of 44
AdaBoost and XGBoost models. Parameter ranges for AdaBoost and XGBoost are provided in Table 2
and Table 3 respectively.
Several optimizers are included in a comparative analysis against the proposed modified algo-
rithm. These include the original PSO [32], as well as several other established optimizers such as the
GA [34] and VNS [15]. Additional optimizers included in the comparison are the ABC [16], BA [17]
and COLSHADE [18] optimizers. A recently proposed SCHO [19] optimizer is also explored. Each
optimizer is implemented using original parameter settings suggested in the works that introduced
said algorithm. Optimizes are allocated a population size of 10 agents and allowed a total of 8 iterations
to locate promising outcomes within the given search range. Simulations are carried out through 30
independent executions to facilitate further analysis.
For NLP simulations, only a the second layer of the framework is utilized. Input text is encoded
using TF-IDF encoding to a maximum of 1000 tokens. These are used as inputs for model training
and evaluation. Optimization is carried out using the second layer of the introduced framework and a
comparative analyses is conducted under identical conditions as previous simulations to demonstrate
the flexibility of the proposed approach.
5. Simulation Outcomes
The following subsections present the simulation outcomes of the conducted simulation. First
outcomes of the simulations with traditional dataset are conducted in a two layer approach. Secondly
two simulations with NLP are presented using only the second layer of the framework.
13 of 44
First layer CNN optimization outcomes in terms of indicator function are provided in Table 5.
The introduced MPPSO demonstrates similarly favorable results attaining the best outcomes with a
score of 0.231201. Mean and median scores of 0.233446 and 0.232697 are also the best among evaluated
optimizes. A high rate of stability in terms of indicator metric is demonstrated by the original PSO,
however the performance of the modified version descend that of the original in all cases.
Visual comparisons in terms of optimizer stability is provided in Figure 5. While the stability of
the introduced optimizer is not the highest compared to all other optimizes, algorithms with higher
rates of stability attain less favorable outcomes. This suggests that VNS, BA, and COLSHADE for
instance overly focus on local optima rather then looking for more favorable outcomes. This lack
of exploration leads to higher stability but less favorable outcomes in this test case. The introduced
optimizer also outperformed the base PSO while demonstrating higher stability, suggesting that the
introduced modification has a positive influence on performance.
NASA L1 Framework CNN - objective violin plot diagram NASA L1 Framework CNN - error box plot diagram
0.280
0.265
0.275
0.260
0.270
0.255
0.265
Objective
0.250
error
0.260
0.245
0.255
0.240
0.250
0.245 0.235
0.240 0.230
SO
SO
NS
A
SO
NS
A
BC
SO
BC
E
AD
AD
N-G
N-B
N-G
N-B
CH
CH
N-A
N-A
N-V
N-V
PP
N-P
PP
N-P
SH
SH
N-S
N-S
CN
CN
CN
CN
N-M
N-M
CN
CN
CN
CN
CN
CN
OL
OL
CN
CN
CN
CN
N-C
N-C
CN
CN
Algorithm Algorithm
14 of 44
solution by iteration eight. Other optimizes stagnate prior to reaching favorable outcomes, suggesting
that the boost in exploration incorporated in to the MPPSO helps overcome the shortcomings inherent
to the original PSO. Similar conclusions can be made in terms of indicator function convergence
demonstrated in Figure 7. As the indicator function was not the target of the optimization, there is
a small decrease in the first iteration of the optimization. However, the best best outcome of the the
comparison is once again located by the introduced optimizer by iteration 3.
objective
objective
objective
0.26 0.26 0.26 0.26
0.25 0.25 0.25 0.25
0.24 0.24 0.24 0.24
0.23 0.23 0.23 0.23
0.22 0.22 0.22 0.22
2 4 6 8 2 4 6 8 2 4 6 8 2 4 6 8
Iterations Iterations Iterations Iterations
CNN-ABC CNN-BA CNN-SCHO CNN-COLSHADE
0.30 0.30 0.30 0.30
CNN-ABC CNN-BA CNN-SCHO CNN-COLSHADE
0.29 0.29 0.29 0.29
0.28 0.28 0.28 0.28
0.27 0.27 0.27 0.27
objective
objective
objective
0.260
Objective
0.255
0.250
0.245
0 1 2 3 4 5 6 7 8
Iterations
Figure 6. Layer 1 (CNN) objective function convergence.
Preprints.org (www.preprints.org) | NOT PEER-REVIEWED | Posted: 2 September 2024 doi:10.20944/preprints202409.0021.v1
15 of 44
error
error
error
0.24 0.24 0.24 0.24
2 4 6 8 2 4 6 8 2 4 6 8 2 4 6 8
Iterations Iterations Iterations Iterations
CNN-ABC CNN-BA CNN-SCHO CNN-COLSHADE
0.30 0.30 0.30 0.30
CNN-ABC CNN-BA CNN-SCHO CNN-COLSHADE
error
error
error
0.24 0.24 0.24 0.24
2 4 6 8 2 4 6 8 2 4 6 8 2 4 6 8
Iterations Iterations Iterations Iterations
NASA L1 Framework CNN - error convergence graphs
CNN-MPPSO
CNN-PSO
0.27
CNN-GA
CNN-VNS
CNN-ABC
CNN-BA
CNN-SCHO
CNN-COLSHADE
0.26
error
0.25
0.24
0.23
0 1 2 3 4 5 6 7 8
Iterations
Figure 7. Layer 1 (CNN) indicator function convergence.
Comparisons between the best preforming models optimized by each algorithm included in the
comparative analysis is provided in Table 6. The introduced optimize demonstrates a clear dominance
Preprints.org (www.preprints.org) | NOT PEER-REVIEWED | Posted: 2 September 2024 doi:10.20944/preprints202409.0021.v1
16 of 44
in terms of accuracy as well as precision. However, other optimizers demonstrate certain favorable
characterises as well, which is to be somewhat expected and further supports the NFL theorem of
optimization.
Further details for the best performing model optimized by the MPPSO algorithm are provided
in the form of the ROC curve and confusion matrix provided in Figure 8 and Figure 9. A plot of the
best performing model is also provided in Figure 10. Finally, to support simulation repeatability, the
parameter selections made by each optimizer for their respective best performing optimized models is
provided in Table 7.
17 of 44
200 200
Count
Count
150 150
100 100
50 50
0 0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
P(x = no defects) P(x = defective)
ROC Curve OvO ROC Curve OvO
1.0 1.0
0.8 0.8
0.6 0.6
TPR
TPR
0.4 0.4
0.2 0.2
0.0 0.0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
FPR FPR
0.8
0.6
True label
0.5
0.4
0.2
cts
e
tiv
fe
fec
de
de
no
Predicted label
Figure 9. CNN-MPPSO optimized L1 model confusion matrix.
Preprints.org (www.preprints.org) | NOT PEER-REVIEWED | Posted: 2 September 2024 doi:10.20944/preprints202409.0021.v1
18 of 44
19 of 44
Second layer AdaBoost optimization outcomes in terms of indicator function are provided in
Table 9. Interestingly, in terms of indicator function outcomes the BA demonstrates Superior perfor-
mance to all tested algorithms. However, since this was not the goal and metric of the optimisation
process these findings are quite interesting and further support the NFL theorem. High stability rates
are demonstrated by the SCHO algorithm.
Visual comparisons in terms of optimizer stability are provided in Figure 11. Two interesting
spikes in objective function distributions can be observed in the introduced optimizer outcomes.
Interestingly, other optimizer overly focus on the first (lower) region. However, the introduced
optimizer overcomes these limitations and locates a more promising region. The BA locates a promising
region within the aerospace. However, the BA stability greatly sufferers providing by far the lowest
stability compared to other evaluated optimizers leading to overall mixed results.
NASA L2 Framework CNN-AB - objective violin plot diagram NASA L2 Framework CNN-AB - error box plot diagram
0.30 0.30
0.28 0.29
0.28
0.26
0.27
Objective
error
0.24
0.26
0.22 0.25
0.24
0.20
0.23
0.18
SO
SO
NS
A
A
BC
DE
SO
SO
NS
A
BC
DE
B-G
B-B
B-G
B-B
CH
CH
B-A
B-A
B-V
B-V
HA
HA
PP
B-P
PP
B-P
B-S
B-S
N-A
N-A
N-A
N-A
B-M
B-M
S
S
N-A
N-A
N-A
N-A
N-A
N-A
OL
OL
N-A
N-A
CN
CN
CN
CN
N-A
N-A
B-C
B-C
CN
CN
CN
CN
CN
CN
CN
CN
CN
CN
N-A
N-A
CN
CN
Algorithm Algorithm
Convergence rates for both the objective and indicator functions for L2 adaboost optimizations
are tracked and provided in Figure 12 and Figure 13 for each optimizer. The porpoised MPPSO shows
a favorable rate of convergence, locating a good solution by iterations 7, and further improving on
Preprints.org (www.preprints.org) | NOT PEER-REVIEWED | Posted: 2 September 2024 doi:10.20944/preprints202409.0021.v1
20 of 44
this solution in iteration 19. While other optimizers manage to find decent outcomes, a fast rate of
convergence limits the quality of the located solution. Similar conclusions can be made in terms of
indicator function convergence demonstrated in Figure 13 where the optimizer once again locates the
pest solution near the end of the optimization in iteration 19.
objective
objective
objective
0.26 0.26 0.26 0.26
objective
objective
objective
0.26 0.26 0.26 0.26
0.26
0.25
0.24
21 of 44
0.29
CNN-AB-MPPSO 0.29
CNN-AB-PSO 0.29
CNN-AB-GA 0.29
CNN-AB-VNS
CNN-AB-MPPSO CNN-AB-PSO CNN-AB-GA CNN-AB-VNS
0.28 0.28 0.28 0.28
0.27 0.27 0.27 0.27
0.26 0.26 0.26 0.26
0.25 0.25 0.25 0.25
error
error
error
error
0.24 0.24 0.24 0.24
0.23 0.23 0.23 0.23
0.22 0.22 0.22 0.22
0.21 0.21 0.21 0.21
0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20
Iterations Iterations Iterations Iterations
0.29
CNN-AB-ABC 0.29
CNN-AB-BA 0.29
CNN-AB-SCHO 0.29
CNN-AB-COLSHADE
CNN-AB-ABC CNN-AB-BA CNN-AB-SCHO CNN-AB-COLSHADE
0.28 0.28 0.28 0.28
0.27 0.27 0.27 0.27
0.26 0.26 0.26 0.26
0.25 0.25 0.25 0.25
error
error
error
error
0.24 0.24 0.24 0.24
0.23 0.23 0.23 0.23
0.22 0.22 0.22 0.22
0.21 0.21 0.21 0.21
0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20
Iterations Iterations Iterations Iterations
NASA L2 Framework CNN-AB - error convergence graphs
0.265
CNN-AB-MPPSO
CNN-AB-PSO
CNN-AB-GA
0.260 CNN-AB-VNS
CNN-AB-ABC
CNN-AB-BA
0.255 CNN-AB-SCHO
CNN-AB-COLSHADE
0.250
error
0.245
0.240
0.235
0.230
Comparisons between the best preforming AdaBoost L2 models optimized by each algorithm
included in the comparative analysis is provided in Table 10. The best performing models match
Preprints.org (www.preprints.org) | NOT PEER-REVIEWED | Posted: 2 September 2024 doi:10.20944/preprints202409.0021.v1
22 of 44
performance across several metrics suggesting that several optimizers are equally well suited for
AdaBoost optimization demonstrating a favorable accuracy of 0.772166 outperforming scored attained
by using just CNN in L1.
Table 10. Best performing optimized Layer 2 AdaBoost model detailed metric comparisons.
Method metric no error error accuracy macro avg weighted avg
CNN-AB- precision 0.827586 0.490909 0.772166 0.659248 0.751887
MPPSO recall 0.891892 0.359401 0.772166 0.625646 0.772166
f1-score 0.858537 0.414986 0.772166 0.636761 0.758808
CNN-AB- precision 0.827586 0.490909 0.772166 0.659248 0.751887
PSO recall 0.891892 0.359401 0.772166 0.625646 0.772166
f1-score 0.858537 0.414986 0.772166 0.636761 0.758808
CNN-AB- precision 0.827586 0.490909 0.772166 0.659248 0.751887
GA recall 0.891892 0.359401 0.772166 0.625646 0.772166
f1-score 0.858537 0.414986 0.772166 0.636761 0.758808
CNN-AB- precision 0.827586 0.490909 0.772166 0.659248 0.751887
VNS recall 0.891892 0.359401 0.772166 0.625646 0.772166
f1-score 0.858537 0.414986 0.772166 0.636761 0.758808
CNN-AB- precision 0.827586 0.490909 0.772166 0.659248 0.751887
ABC recall 0.891892 0.359401 0.772166 0.625646 0.772166
f1-score 0.858537 0.414986 0.772166 0.636761 0.758808
CNN-AB- precision 0.828416 0.474468 0.766180 0.651442 0.748834
BA recall 0.880792 0.371048 0.766180 0.625920 0.766180
f1-score 0.853801 0.416433 0.766180 0.635117 0.755463
CNN-AB- precision 0.827586 0.490909 0.772166 0.659248 0.751887
SCHO recall 0.891892 0.359401 0.772166 0.625646 0.772166
f1-score 0.858537 0.414986 0.772166 0.636761 0.758808
CNN-AB- precision 0.827586 0.490909 0.772166 0.659248 0.751887
COLSHADE recall 0.891892 0.359401 0.772166 0.625646 0.772166
f1-score 0.858537 0.414986 0.772166 0.636761 0.758808
support 2072 601
Further details for the best performing adaboost L2 model optimized by the MPPSO algorithm
are provided in the form of the ROC curve and confusion matrix provided in Figure 14 and Figure 15.
Finally, to support simulation repeatability, the parameter selections made by each optimizer for their
respective best performing optimized models is provided in Table 11.
23 of 44
Count
800 800
600 600
400 400
200 200
0 0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
P(x = no defects) P(x = defective)
ROC Curve OvO ROC Curve OvO
1.0 1.0
0.8 0.8
0.6 0.6
TPR
TPR
0.4 0.4
0.2 0.2
0.0 0.0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
FPR FPR
0.8
0.6
True label
0.5
0.4
0.2
cts
e
tiv
fe
fec
de
de
no
Predicted label
Figure 15. Layer 2 AdaBoost optimized L1 model confusion matrix.
Preprints.org (www.preprints.org) | NOT PEER-REVIEWED | Posted: 2 September 2024 doi:10.20944/preprints202409.0021.v1
24 of 44
Second layer XGBoost optimization outcomes in terms of indicator function are provided in
Table 13. The introduced optimizer does not attain the best outcomes in terms of indicator function
results, as the ABC algorithm attains the most favorable results. These interesting findings further
support the NFL theorem indicating that no single approach is equally suited to all challenges and
across all metrics. In terms of indicator function stability, the PSO attained the best outcomes.
Visual comparisons in terms of optimizer stability is provided in Figure 16. While the introduced
optimizer attains more favorable outcomes in terms of objective function, an evident disadvantage
can be observed in comparison to the original PSO, with the PSO algorithm favoring a better location
within the search space in more cases. An overall low stability can be observed for the BA algorithm,
while the ABC algorithm focuses of a sub optimal search space in many solution instances.
Convergence rates for both the objective and indicator functions for L2 XGBoost optimizations
are tracked and provided in Figure 17 and Figure 18 for each optimizer. The introduced optimize
manages to find a promising region in the 19 iteration, surpassing solutions located by other algorithms.
However, this improvement comes at a cost of indicator function outcomes, where the performance is
slightly reduced. Often times the trade off in terms of one metric can mean improvements in another.
As the indicator function was not the primary goal of the optimisation the MPPSO algorithm attained
favorable optimization performance overall.
Preprints.org (www.preprints.org) | NOT PEER-REVIEWED | Posted: 2 September 2024 doi:10.20944/preprints202409.0021.v1
25 of 44
NASA L2 Framework CNN-XG - objective violin plot diagram NASA L2 Framework CNN-XG - error box plot diagram
0.290
0.250
0.285
0.245
0.280
0.275
0.240
Objective
error
0.270
0.235
0.265
0.260
0.230
0.255
0.225
0.250
SO
S
GA
BC
DE
SO
S
GA
BC
DE
N
N
G-B
G-B
S
CH
CH
G-A
G-A
G-V
G-V
HA
HA
PP
G-P
PP
G-P
G-
G-
G-S
G-S
N-X
N-X
G-M
N-X
G-M
N-X
LS
LS
N-X
N-X
N-X
N-X
N-X
N-X
N-X
N-X
O
O
CN
CN
CN
CN
N-X
N-X
G-C
G-C
CN
CN
CN
CN
CN
CN
CN
CN
CN
CN
N-X
N-X
CN
CN
Algorithm Algorithm
objective
objective
objective
0.27 0.27 0.27 0.27
0.26 0.26 0.26 0.26
0.25 0.25 0.25 0.25
0.24 0.24 0.24 0.24
0.23 0.23 0.23 0.23
0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20
Iterations Iterations Iterations Iterations
CNN-XG-ABC CNN-XG-BA CNN-XG-SCHO CNN-XG-COLSHADE
0.31 CNN-XG-ABC 0.31 CNN-XG-BA 0.31 CNN-XG-SCHO 0.31 CNN-XG-COLSHADE
0.30 0.30 0.30 0.30
0.29 0.29 0.29 0.29
0.28 0.28 0.28 0.28
objective
objective
objective
objective
0.270
0.265
0.260
0.255
26 of 44
error
error
error
0.24 0.24 0.24 0.24
0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20
Iterations Iterations Iterations Iterations
CNN-XG-ABC CNN-XG-BA CNN-XG-SCHO CNN-XG-COLSHADE
0.28 0.28 0.28 0.28
CNN-XG-ABC CNN-XG-BA CNN-XG-SCHO CNN-XG-COLSHADE
0.27 0.27 0.27 0.27
error
error
error
0.24 0.24 0.24 0.24
0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20
Iterations Iterations Iterations Iterations
NASA L2 Framework CNN-XG - error convergence graphs
CNN-XG-MPPSO
0.255 CNN-XG-PSO
CNN-XG-GA
CNN-XG-VNS
0.250 CNN-XG-ABC
CNN-XG-BA
CNN-XG-SCHO
CNN-XG-COLSHADE
0.245
error
0.240
0.235
0.230
0.225
0.0 2.5 5.0 7.5 10.0 12.5 15.0 17.5 20.0
Iterations
Figure 18. Layer 2 XGBoost indicator function convergence.
Comparisons between the best preforming XGBoost L2 models optimized by each algorithm
included in the comparative analysis is provided in Table 14. The favorable performance of the
Preprints.org (www.preprints.org) | NOT PEER-REVIEWED | Posted: 2 September 2024 doi:10.20944/preprints202409.0021.v1
27 of 44
VNS algorithm is undeniable for this challenge further supporting the NFL theorem of optimization.
Nevertheless, constant experimentation is needed to determined suitable optimizers for any given
problem. Furthermore, it is important to consider all the metrics when determining which algorithm is
the best suited to the demands of a given optimization challenge.
Table 14. Best performing optimized Layer 2 XGBoost model detailed metric comparisons.
Method metric no error error accuracy macro avg weighted avg
CNN-XG- precision 0.830018 0.488069 0.771044 0.659044 0.753134
MPPSO recall 0.886100 0.374376 0.771044 0.630238 0.771044
f1-score 0.857143 0.423729 0.771044 0.640436 0.759694
CNN-XG- precision 0.831729 0.475610 0.766180 0.653669 0.751658
PSO recall 0.875483 0.389351 0.766180 0.632417 0.766180
f1-score 0.853045 0.428179 0.766180 0.640612 0.757518
CNN-XG- precision 0.829499 0.489035 0.771418 0.659267 0.752949
GA recall 0.887548 0.371048 0.771418 0.629298 0.771418
f1-score 0.857543 0.421949 0.771418 0.639746 0.759603
CNN-XG- precision 0.828342 0.497706 0.774411 0.663024 0.754001
VNS recall 0.894305 0.361065 0.774411 0.627685 0.774411
f1-score 0.860060 0.418515 0.774411 0.639288 0.760783
CNN-XG- precision 0.772056 0.209091 0.679386 0.490573 0.645478
ABC recall 0.832046 0.153078 0.679386 0.492562 0.679386
f1-score 0.800929 0.176753 0.679386 0.488841 0.660589
CNN-XG- precision 0.830902 0.480167 0.768051 0.655535 0.752043
BA recall 0.879826 0.382696 0.768051 0.631261 0.768051
f1-score 0.854665 0.425926 0.768051 0.640295 0.758267
CNN-XG- precision 0.831199 0.476386 0.766554 0.653792 0.751422
SCHO recall 0.876931 0.386023 0.766554 0.631477 0.766554
f1-score 0.853452 0.426471 0.766554 0.639961 0.757449
CNN-XG- precision 0.828328 0.493213 0.772914 0.660770 0.752980
COLSHADE recall 0.891892 0.362729 0.772914 0.627310 0.772914
f1-score 0.858936 0.418025 0.772914 0.638480 0.759801
support 2072 601
Further details for the best performing XGBoost L2 model optimized by the MPPSO algorithm
are provided in the form of the ROC curve and confusion matrix provided in Figure 19 and Figure 20.
Finally, to support simulation repeatability, the parameter selections made by each optimizer for their
respective best performing optimized models is provided in Table 23.
28 of 44
400 400
300 300
Count
Count
200 200
100 100
0 0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
P(x = no defects) P(x = defective)
ROC Curve OvO ROC Curve OvO
1.0 1.0
0.8 0.8
0.6 0.6
TPR
TPR
0.4 0.4
0.2 0.2
0.0 0.0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
FPR FPR
0.8
0.6
True label
0.5
0.4
0.2
cts
e
tiv
fe
fec
de
de
no
Predicted label
Figure 20. Layer 2 XGBoost optimized L1 model confusion matrix.
Preprints.org (www.preprints.org) | NOT PEER-REVIEWED | Posted: 2 September 2024 doi:10.20944/preprints202409.0021.v1
29 of 44
AdaBoost NLP classifier optimization outcomes in terms of indicator function are provided in
Table 17. High stability rates are showcased by the PSO and favorable outcomes by the VNS algorithm,
attaining the best overall indicator score, as well as the ABC that demonstrates the best outcomes in
the worst case execution.
Visual comparisons in terms of AdaBoost NLP optimizer stability is provided in Figure 21. High
stability rates are demonstrated by the PSO, however, the best scores are still located by the modified
optimizer suggesting that modification hold further potential. A trade off is always present when
tackling multiple optimizations problems it is essential to explore multiple potential optimizes in order
to determine a suitable approach, as stated by the NFL theorem.
Convergence rates for both the objective and indicator functions for NLP AdaBoost optimizations
are tracked and provided in Figure 22 and Figure 23 for each optimizer. While all optimizes show a
favorable convergence rate several optimizers dwell in sub optimal regions within the search space.
The introduced optimizer overcomes the local minimum issue and locates a promising solution in the
18 iteration of the optimisation. Similar operations can be made in terms of indicator functions with
the best solution located by the optimizer in the latter iterations.
Preprints.org (www.preprints.org) | NOT PEER-REVIEWED | Posted: 2 September 2024 doi:10.20944/preprints202409.0021.v1
30 of 44
Software defects prediction TF-IDF AdaBoost - objective violin plot diagram Software defects prediction TF-IDF AdaBoost - error box plot diagram
0.025
0.965
0.024
0.960
0.023
Objective
error
0.955
0.022
0.950
0.021
0.945
0.020
SO
-BA
HO
DE
SO
-BA
HO
DE
-VN
-VN
-G
-AB
-G
-AB
-PS
-PS
HA
HA
PP
PP
-SC
-SC
AB
AB
AB
AB
AB
AB
AB
AB
AB
AB
-M
-M
LS
LS
AB
AB
AB
AB
-CO
-CO
AB
AB
Algorithm Algorithm
objective
objective
objective
0.95 0.95 0.95 0.95
objective
objective
objective
0.94
Objective
0.93
0.92
0.91
0.0 2.5 5.0 7.5 10.0 12.5 15.0 17.5 20.0
Iterations
31 of 44
error
error
error
0.030 0.030 0.030 0.030
0.025 0.025 0.025 0.025
0.020 0.020 0.020 0.020
0 10 20 0 10 20 0 10 20 0 10 20
Iterations Iterations Iterations Iterations
AB-ABC AB-BA AB-SCHO AB-COLSHADE
0.050 0.050 0.050 0.050
AB-ABC AB-BA AB-SCHO AB-COLSHADE
0.045 0.045 0.045 0.045
0.040 0.040 0.040 0.040
0.035 0.035 0.035 0.035
error
error
error
error
0.030 0.030 0.030 0.030
0.025 0.025 0.025 0.025
0.020 0.020 0.020 0.020
0 10 20 0 10 20 0 10 20 0 10 20
Iterations Iterations Iterations Iterations
Software defects prediction TF-IDF AdaBoost - error convergence graphs
AB-MPPSO
0.045 AB-PSO
AB-GA
AB-VNS
AB-ABC
AB-BA
0.040 AB-SCHO
AB-COLSHADE
0.035
error
0.030
0.025
0.020
0.0 2.5 5.0 7.5 10.0 12.5 15.0 17.5 20.0
Iterations
Figure 23. NLP Layer 2 AdaBoost indicator function convergence.
Comparisons between the best preforming AdaBoost NLP classifier models optimized by each
algorithm included in the comparative analysis is provided in Table 18. Favorable outcomes can be
Preprints.org (www.preprints.org) | NOT PEER-REVIEWED | Posted: 2 September 2024 doi:10.20944/preprints202409.0021.v1
32 of 44
observed for all evaluated models suggesting that metaheuristic optimizers, as well as the AdaBoost
classifier are well suited to problem of error detection through applied NLP.
Table 18. Best performing optimized NLP Layer 2 AdaBoost model detailed metric comparisons.
Method metric no error error accuracy macro avg weighted avg
AB-MPPSO precision 0.993776 0.966033 0.979781 0.979904 0.980170
recall 0.966375 0.993711 0.979781 0.980043 0.979781
f1-score 0.979884 0.979676 0.979781 0.979780 0.979782
AB-PSO precision 0.992409 0.966644 0.979438 0.979526 0.979773
recall 0.967048 0.992313 0.979438 0.979680 0.979438
f1-score 0.979564 0.979310 0.979438 0.979437 0.979440
AB-GA precision 0.990385 0.969220 0.979781 0.979802 0.980006
recall 0.969738 0.990217 0.979781 0.979977 0.979781
f1-score 0.979952 0.979606 0.979781 0.979779 0.979783
AB-VNS precision 0.989003 0.967191 0.978067 0.978097 0.978306
recall 0.967720 0.988819 0.978067 0.978270 0.978067
f1-score 0.978246 0.977885 0.978067 0.978066 0.978069
AB-ABC precision 0.991034 0.965940 0.978410 0.978487 0.978728
recall 0.966375 0.990915 0.978410 0.978645 0.978410
f1-score 0.978550 0.978268 0.978410 0.978409 0.978412
AB-BA precision 0.993776 0.966033 0.979781 0.979904 0.980170
recall 0.966375 0.993711 0.979781 0.980043 0.979781
f1-score 0.979884 0.979676 0.979781 0.979780 0.979782
AB-SCHO precision 0.990365 0.967235 0.978753 0.978800 0.979022
recall 0.967720 0.990217 0.978753 0.978968 0.978753
f1-score 0.978912 0.978591 0.978753 0.978751 0.978754
AB-COLSHADE precision 0.991047 0.967258 0.979095 0.979152 0.979381
recall 0.967720 0.990915 0.979095 0.979318 0.979095
f1-score 0.979245 0.978944 0.979095 0.979094 0.979097
support 1487 1431
Further details for the best performing AdaBoost NLP classifier model optimized by the MPPSO
algorithm are provided in the form of the ROC curve and confusion matrix provided in Figure 24
and Figure 25. Finally, to support simulation repeatability, the parameter selections made by each
optimizer for their respective best performing optimized models is provided in Table 19.
Count
200 200
100 100
0 0.0 0.2 0.4 0.6 0.8 1.0 0 0.0 0.2 0.4 0.6 0.8 1.0
P(x = no defects) P(x = defective)
ROC Curve OvO ROC Curve OvO
1.0 1.0
0.8 0.8
0.6 0.6
TPR
TPR
0.4 0.4
0.2 0.2
0.0 0.0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
FPR FPR
33 of 44
0.8
no defects 0.966 0.034
0.6
True label
0.4
e
tiv
fec
fec
de
de
no
Predicted label
Figure 25. NLP Layer 2 AdaBoost optimized model confusion matrix.
34 of 44
XGBoost NLP classifier optimization outcomes in terms of indicator function are provided in
Table 21. High stability rates are demonstrated by the PSO, while the ABC algorithm demonstrates the
best scores across all other metrics for the indicator function.
Visual comparisons in terms ofXGBoost NLP optimizer stability is provided in Figure 26. While
the best scores in the best case scenario are showcased by the introduced optimizer, PSO outcomes are
also favorable, with many solutions overcoming local optima and locating more promising outcomes
overall.
Software defects prediction TF-IDF XGBoost - objective violin plot diagram Software defects prediction TF-IDF XGBoost - error box plot diagram
0.970
0.026
0.965
0.024
0.960
0.022
Objective
error
0.955
0.020
0.950
0.018
0.945
0.016
SO
S
O
-BA
HO
SO
S
O
-BA
HO
E
-VN
AD
-VN
AD
-G
-AB
-G
-AB
-PS
-PS
PP
PP
-SC
-SC
XG
XG
XG
XG
H
H
XG
XG
XG
XG
XG
XG
-M
-M
LS
LS
XG
XG
XG
XG
-CO
-CO
XG
XG
Algorithm Algorithm
Convergence rates for both the objective and indicator functions for NLP XGBoost optimizations
are tracked and provided in Figure 27 and Figure 28 for each optimizer. The introduced optimizer
overcomes local minimum traps, locating a favorable outcomes in iteration 18, suggesting that the
boost in exploration especially in later stages helps overall performance improve, while the baseline
PSO sticks to a less favorable regions. These outcomes are mirrored in terms of indicator function with
the best solution determined by the introduced optimiser in the 18 iteration.
Preprints.org (www.preprints.org) | NOT PEER-REVIEWED | Posted: 2 September 2024 doi:10.20944/preprints202409.0021.v1
35 of 44
objective
objective
objective
0.95 0.95 0.95 0.95
0 10 20 0 10 20 0 10 20 0 10 20
Iterations Iterations Iterations Iterations
XG-ABC XG-BA XG-SCHO XG-COLSHADE
1.05 XG-ABC 1.05 XG-BA 1.05 XG-SCHO 1.05 XG-COLSHADE
objective
objective
objective
0.95 0.95 0.95 0.95
0 10 20 0 10 20 0 10 20 0 10 20
Iterations Iterations Iterations Iterations
Software defects prediction TF-IDF XGBoost - objective convergence graphs
XG-MPPSO
0.9675 XG-PSO
XG-GA
XG-VNS
0.9650 XG-ABC
XG-BA
XG-SCHO
0.9625 XG-COLSHADE
0.9600
Objective
0.9575
0.9550
0.9525
0.9500
0.9475
0.0 2.5 5.0 7.5 10.0 12.5 15.0 17.5 20.0
Iterations
Figure 27. NLP Layer 2 XGBoost objective function convergence.
Preprints.org (www.preprints.org) | NOT PEER-REVIEWED | Posted: 2 September 2024 doi:10.20944/preprints202409.0021.v1
36 of 44
error
error
error
0.020 0.020 0.020 0.020
0.018 0.018 0.018 0.018
0.016 0.016 0.016 0.016
0 10 20 0 10 20 0 10 20 0 10 20
Iterations Iterations Iterations Iterations
XG-ABC XG-BA XG-SCHO XG-COLSHADE
0.028 XG-ABC 0.028 XG-BA 0.028 XG-SCHO 0.028 XG-COLSHADE
0.026 0.026 0.026 0.026
0.024 0.024 0.024 0.024
0.022 0.022 0.022 0.022
error
error
error
error
0.020 0.020 0.020 0.020
0.018 0.018 0.018 0.018
0.016 0.016 0.016 0.016
0 10 20 0 10 20 0 10 20 0 10 20
Iterations Iterations Iterations Iterations
Software defects prediction TF-IDF XGBoost - error convergence graphs
0.026 XG-MPPSO
XG-PSO
XG-GA
XG-VNS
XG-ABC
0.024 XG-BA
XG-SCHO
XG-COLSHADE
0.022
error
0.020
0.018
0.016
0.0 2.5 5.0 7.5 10.0 12.5 15.0 17.5 20.0
Iterations
Figure 28. NLP Layer 2 XGBoost indicator function convergence.
Comparisons between the best preforming XGBoost NLP classifier models optimized by each
algorithm included in the comparative analysis is provided in Table 22. The best performance is
demonstrated by the introduced optimizer across many metrics attaining an accuracy of 0.983893. The
GA also demonstrated good performance in terms of precision for non error and recall for errors in
code further affirming the NFL theorem.
Preprints.org (www.preprints.org) | NOT PEER-REVIEWED | Posted: 2 September 2024 doi:10.20944/preprints202409.0021.v1
37 of 44
Table 22. Best performing optimized NLP Layer 2 XGBoost model detailed metric comparisons.
Method metric no error error accuracy macro avg weighted avg
XG-MPPSO precision 0.995186 0.972678 0.983893 0.983932 0.984148
recall 0.973100 0.995108 0.983893 0.984104 0.983893
f1-score 0.984019 0.983765 0.983893 0.983892 0.983895
XG-PSO precision 0.995862 0.970708 0.983208 0.983285 0.983527
recall 0.971083 0.995807 0.983208 0.983445 0.983208
f1-score 0.983316 0.983098 0.983208 0.983207 0.983209
XG-GA precision 0.996545 0.969409 0.982865 0.982977 0.983237
recall 0.969738 0.996506 0.982865 0.983122 0.982865
f1-score 0.982958 0.982771 0.982865 0.982864 0.982866
XG-VNS precision 0.995865 0.971370 0.983550 0.983618 0.983853
recall 0.971755 0.995807 0.983550 0.983781 0.983550
f1-score 0.983662 0.983437 0.983550 0.983550 0.983552
XG-ABC precision 0.995159 0.967391 0.981151 0.981275 0.981542
recall 0.967720 0.995108 0.981151 0.981414 0.981151
f1-score 0.981248 0.981054 0.981151 0.981151 0.981153
XG-BA precision 0.995169 0.969367 0.982180 0.982268 0.982516
recall 0.969738 0.995108 0.982180 0.982423 0.982180
f1-score 0.982289 0.982069 0.982180 0.982179 0.982181
XG-SCHO precision 0.995856 0.969388 0.982522 0.982622 0.982876
recall 0.969738 0.995807 0.982522 0.982772 0.982522
f1-score 0.982624 0.982420 0.982522 0.982522 0.982524
XG-COLSHADE precision 0.995856 0.969388 0.982522 0.982622 0.982876
recall 0.969738 0.995807 0.982522 0.982772 0.982522
f1-score 0.982624 0.982420 0.982522 0.982522 0.982524
support 1487 1431
Further details for the best performing XGBoost NLP classifier model optimized by the MPPSO
algorithm are provided in the form of the ROC curve and confusion matrix provided in Figure 29
and Figure 30. Finally, to support simulation repeatability, the parameter selections made by each
optimizer for their respective best performing optimized models is provided in Table 23.
1200 1200
1000 1000
800 800
Count
Count
600 600
400 400
200 Class: no defects 200 Class: defective
Class: defective Class: no defects
0 0.0 0.2 0.4 0.6 0.8 1.0 0 0.0 0.2 0.4 0.6 0.8 1.0
P(x = no defects) P(x = defective)
ROC Curve OvO ROC Curve OvO
1.0 1.0
0.8 0.8
0.6 0.6
TPR
TPR
0.4 0.4
0.2 0.2
0.0 0.0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
FPR FPR
38 of 44
0.8
no defects 0.973 0.027
0.6
True label
0.4
e
tiv
fec
fec
de
de
no
Predicted label
Figure 30. NLP Layer 2 XGBoost optimized model confusion matrix.
39 of 44
all simulations are conducted using independent random seeds the independence criteria is met.
Homoscedasticity is confirmed via Levene’s test [39] which resulted in a p − values of 0.68 for each
case, indicating that this condition is also met.
The last conduction, normality, is assessed using the Shapiro-Wilk’s test [40], with p − values
computed for each method included in the comparative analysis. As all the values are bellow a
threshold of 0.05 the null hypothesis (H0) may be rejected, suggesting outcomes do not originate form
normal distributions. These findings are further enforced by the objective function KDE diagrams
presented in Figure 31. The outcomes of the Shapiro-Wilk test are provided in Table 24.
10.0
7.5
5.0
2.5
0.0
0.24 0.25 0.26 0.27 0.28
Objective
NASA L2 Framework CNN-AB - objective KDE plot diagram NASA L2 Framework CNN-XG - objective KDE plot diagram
CNN-AB-MPPSO 20.0 CNN-XG-MPPSO
CNN-AB-PSO CNN-XG-PSO
CNN-AB-GA CNN-XG-GA
20 CNN-AB-VNS 17.5 CNN-XG-VNS
CNN-AB-ABC CNN-XG-ABC
CNN-AB-BA CNN-XG-BA
CNN-AB-SCHO 15.0
CNN-XG-SCHO
CNN-AB-COLSHADE CNN-XG-COLSHADE
15 12.5
Density
Density
10.0
10
7.5
5.0
5
2.5
0 0.0
0.18 0.20 0.22 0.24 0.26 0.28 0.30 0.25 0.26 0.27 0.28 0.29
Objective Objective
Software defects prediction TF-IDF AdaBoost - objective KDE plot diagram Software defects prediction TF-IDF XGBoost - objective KDE plot diagram
AB-MPPSO 35 XG-MPPSO
AB-PSO XG-PSO
40 AB-GA 30 XG-GA
AB-VNS XG-VNS
AB-ABC 25 XG-ABC
30 AB-BA XG-BA
AB-SCHO XG-SCHO
AB-COLSHADE 20 XG-COLSHADE
Density
Density
20 15
10
10
5
0
0.940 0.945 0.950 0.955 0.960 0.965 0.970 0 0.945 0.950 0.955 0.960 0.965 0.970
Objective Objective
Table 24. Shapiro-Wilk outcomes of the five conducted simulations for verification of the normality
condition for safe utilization of parametric tests.
Model MPPSO PSO GA VNS ABC BA SCHO COLSHADE
CNN Optimization 0.033 0.029 0.023 0.017 0.035 0.044 0.041 0.037
AdaBoost Optimization 0.019 0.027 0.038 0.021 0.034 0.043 0.025 0.049
XGBoost Optimization 0.022 0.026 0.032 0.018 0.029 0.040 0.038 0.042
AdaBoost NLP Optimization 0.031 0.022 0.037 0.026 0.033 0.046 0.030 0.045
XGBoost NLP Optimization 0.028 0.024 0.034 0.019 0.031 0.041 0.036 0.043
As one of the criteria needed to justify the safe use of parametric tests is not met further evaluations
are conducted via non-parametric tests. The Wilcoxon signed-rank test is therefore applied to compare
Preprints.org (www.preprints.org) | NOT PEER-REVIEWED | Posted: 2 September 2024 doi:10.20944/preprints202409.0021.v1
40 of 44
the performance of the MPPSO to other algorithms included in the comparative analysis. Test scores
are presented in Table 25. A threshold of α = 0.05 is not exceeded in any of the test cases indicating
that the outcomes attained by the MPPSO algorithm hold statistically significant improvements over
other evaluated methods.
Table 25. Wilcoxon signed-rank test scores of the five conducted simulations.
MPPSO vs. others PSO GA VNS ABC BA SCHO COLSHADE
CNN Optimization 0.027 0.024 0.034 0.018 0.032 0.040 0.045
AdaBoost Optimization 0.020 0.028 0.031 0.022 0.030 0.038 0.042
XGBoost Optimization 0.033 0.023 0.037 0.019 0.031 0.043 0.046
AdaBoost NLP Optimization 0.029 0.026 0.035 0.021 0.029 0.041 0.044
XGBoost NLP Optimization 0.024 0.027 0.032 0.020 0.034 0.039 0.047
Figure 32. Best CNN, XGBoost and AdaBoost model feature importances on NASA dataset.
Feature importances for the best performing models in NLP simulations are likewise provided
in Figure 33 for the AdaBoost model and in Figure 34 for the XGBoost model. TF-IDF vectorized
feature importances are once again interpreted using SHAP analysis and the top 20 contributing
features shown for both models. Both models place a high importance on the def keyword, as
well as the print and write keywords where errors can often occur. The AdaBoost model places a
significant importance of the def keyboard, with print holding the second spot with a significantly
lower importance. The XGBoost model however has a more even distribution of importance. While
Preprints.org (www.preprints.org) | NOT PEER-REVIEWED | Posted: 2 September 2024 doi:10.20944/preprints202409.0021.v1
41 of 44
def holds a high importance the second highest impact is on the write keyword followed by print with
a slightly decreased importance.
Understanding feature importance on model decisions can further aid in understanding the
challenges associated with software defect detection. Furthermore, determining what features play
and important role in these classifications can further aid in reducing computational costs of models in
Preprints.org (www.preprints.org) | NOT PEER-REVIEWED | Posted: 2 September 2024 doi:10.20944/preprints202409.0021.v1
42 of 44
deployment as well as aid in improving data collection in the future. Detecting hidden model biases is
also essential for enforcing trust in model decisions improving the generalizability and objectivity of
decisions.
6. Conclusion
Software has become increasingly integral to societal infrastructure, governing critical systems.
Ensuring the reliability of software is therefore paramount across various industries. As the demand
for accelerated development intensifies, the manual review of code presents growing challenges, with
testing frequently consuming more time than development. A promising method for detecting defects
at the source code level involves the integration of AI with NLP. Given that software is composed
of human-readable code, which directs machine operations, the validation of the highly diverse
machine code on a case-by-case basis is inherently complex. Consequently, source code analysis offers
a potentially effective approach to improving defect detection and preventing errors.
This work explores the advantages and challenges of utilizing AI for error detection in software
code. Both classical and NLP methods are explored on two publicly available dataset with 5 exper-
iments total conducted. A two layer optimization framework is introduced in order to manage the
complex demands of error detection. A CNN architecture is utilized in the first layer to help process
the large amounts of data in a more computational efficient manner, with the second layer handling
intermediate results using AdaBoost and XGBoost classifiers. An additional set of simulations using
only the second layer of the framework in combination with TF-IDF encoding is also carried out
in order to provide a comparison between emerging NLP and classical techniques. As optimizer
performance is highly dependant on adequate parameter selection, a modified version of the well
established PSO is introduced, designed specifically for the needs of this research and with an aim to
overcome some of the known drawback of the original algorithm. A comparative analysis is carried
out with several state of the art optimizes with the introduced approach demonstrative promising
outcomes in several simulations.
Twin layer simulations improve upon the baseline outcomes demonstrated by CNN boost accu-
racy form 0.768799 to 0.772166 for the AdaBoost models and 0.771044 for the XGBoost best performing
classifier. This suggests that a two layer approach can yield favorable outcomes while maintaining
favorable computational demands in comparison to more complex network solutions. Optimization
carried out using NLP demonstrate an impressive accuracy of 0.979781 for the best performing Ad-
aBoost model and a 0.983893 for the best performing XGBoost model. Simulations are further validated
using statistical evaluation to confirm the significance of the observations. The best preforming models
are also subjected to SHAP analysis to determine feature importance and help locate any potential
hidden biases within the best performing models.
It is worth noting that the extensive computational demands of the optimizations carried out in
this work limit the extend of optimizers that can be tested. Further limitations are associated with
population sized and allocated numbers of iterations for each optimization due to hardware memory
limitations. Future works hope to address these concerns as additional resources become available.
Additional implementations of the proposed MPPSO hope to be explored for other implementations.
Emerging transformer based architectures based on custom BERT encoding also hope to be explored
for software defect detection in future works.
References
1. Alyahyan, S.; Alatawi, M.N.; Alnfiai, M.M.; Alotaibi, S.D.; Alshammari, A.; Alzaid, Z.; Alwageed, H.S.
Software reliability assessment: An architectural and component impact analysis. Tsinghua Science and
Technology 2024.
2. Zhang, H.; Gao, X.Z.; Wang, Z.; Wang, G. Guest Editorial of the Special Section on Neural Computing-Driven
Artificial Intelligence for Consumer Electronics. IEEE Transactions on Consumer Electronics 2024, 70, 3517–3520.
Preprints.org (www.preprints.org) | NOT PEER-REVIEWED | Posted: 2 September 2024 doi:10.20944/preprints202409.0021.v1
43 of 44
3. Mcmurray, S.; Sodhro, A.H. A study on ML-based software defect detection for security traceability in smart
healthcare applications. Sensors 2023, 23, 3470.
4. Giray, G.; Bennin, K.E.; Köksal, Ö.; Babur, Ö.; Tekinerdogan, B. On the use of deep learning in software
defect prediction. Journal of Systems and Software 2023, 195, 111537.
5. Jim, J.R.; Talukder, M.A.R.; Malakar, P.; Kabir, M.M.; Nur, K.; Mridha, M. Recent advancements and
challenges of nlp-based sentiment analysis: A state-of-the-art review. Natural Language Processing Journal
2024, p. 100059.
6. Wolpert, D.H.; Macready, W.G. No free lunch theorems for optimization. IEEE transactions on evolutionary
computation 1997, 1, 67–82.
7. Zivkovic, T.; Nikolic, B.; Simic, V.; Pamucar, D.; Bacanin, N. Software defects prediction by metaheuristics
tuned extreme gradient boosting and analysis based on Shapley Additive Explanations. Applied Soft
Computing 2023, 146, 110659. https://doi.org/https://doi.org/10.1016/j.asoc.2023.110659.
8. Ali, M.; Mazhar, T.; Al-Rasheed, A.; Shahzad, T.; Ghadi, Y.Y.; Khan, M.A. Enhancing software defect
prediction: a framework with improved feature selection and ensemble machine learning. PeerJ Computer
Science 2024, 10, e1860.
9. Khleel, N.A.A.; Nehéz, K. Software defect prediction using a bidirectional LSTM network combined with
oversampling techniques. Cluster Computing 2024, 27, 3615–3638.
10. Zhang, Q.; Zhang, J.; Feng, T.; Xue, J.; Zhu, X.; Zhu, N.; Li, Z. Software Defect Prediction Using Deep
Q-Learning Network-Based Feature Extraction. IET Software 2024, 2024, 3946655.
11. Briciu, A.; Czibula, G.; Lupea, M. A study on the relevance of semantic features extracted using BERT-based
language models for enhancing the performance of software defect classifiers. Procedia Computer Science
2023, 225, 1601–1610.
12. Dash, G.; Sharma, C.; Sharma, S. Sustainable marketing and the role of social media: an experimental study
using natural language processing (NLP). Sustainability 2023, 15, 5443.
13. Velasco, L.; Guerrero, H.; Hospitaler, A. A literature review and critical analysis of metaheuristics recently
developed. Archives of Computational Methods in Engineering 2024, 31, 125–146.
14. Jain, V.; Kashyap, K.L. Ensemble hybrid model for Hindi COVID-19 text classification with metaheuristic
optimization algorithm. Multimedia Tools and Applications 2023, 82, 16839–16859.
15. Mladenović, N.; Hansen, P. Variable neighborhood search. Computers & operations research 1997, 24, 1097–
1100.
16. Karaboga, D.; Akay, B. A comparative study of artificial bee colony algorithm. Applied mathematics and
computation 2009, 214, 108–132.
17. Yang, X.S.; Hossein Gandomi, A. Bat algorithm: a novel approach for global engineering optimization.
Engineering computations 2012, 29, 464–483.
18. Gurrola-Ramos, J.; Hernàndez-Aguirre, A.; Dalmau-Cedeño, O. COLSHADE for real-world single-objective
constrained optimization problems. In Proceedings of the 2020 IEEE congress on evolutionary computation
(CEC). IEEE, 2020, pp. 1–8.
19. Bai, J.; Li, Y.; Zheng, M.; Khatir, S.; Benaissa, B.; Abualigah, L.; Wahab, M.A. A sinh cosh optimizer.
Knowledge-Based Systems 2023, 282, 111081.
20. Damaševičius, R.; Jovanovic, L.; Petrovic, A.; Zivkovic, M.; Bacanin, N.; Jovanovic, D.; Antonijevic, M.
Decomposition aided attention-based recurrent neural networks for multistep ahead time-series forecasting
of renewable power generation. PeerJ Computer Science 2024, 10.
21. Gajevic, M.; Milutinovic, N.; Krstovic, J.; Jovanovic, L.; Marjanovic, M.; Stoean, C. Artificial neural network
tuning by improved sine cosine algorithm for healthcare 4.0. In Proceedings of the Proceedings of the 1st
international conference on innovation in information technology and business (ICIITB 2022). Springer
Nature, 2023, Vol. 104, p. 289.
22. Minic, A.; Jovanovic, L.; Bacanin, N.; Stoean, C.; Zivkovic, M.; Spalevic, P.; Petrovic, A.; Dobrojevic, M.;
Stoean, R. Applying recurrent neural networks for anomaly detection in electrocardiogram sensor data.
Sensors 2023, 23, 9878.
23. Jovanovic, L.; Milutinovic, N.; Gajevic, M.; Krstovic, J.; Rashid, T.A.; Petrovic, A. Sine cosine algorithm
for simple recurrent neural network tuning for stock market prediction. In Proceedings of the 2022 30th
Telecommunications Forum (TELFOR). IEEE, 2022, pp. 1–4.
Preprints.org (www.preprints.org) | NOT PEER-REVIEWED | Posted: 2 September 2024 doi:10.20944/preprints202409.0021.v1
44 of 44
24. Jovanovic, L.; Djuric, M.; Zivkovic, M.; Jovanovic, D.; Strumberger, I.; Antonijevic, M.; Budimirovic, N.;
Bacanin, N. Tuning xgboost by planet optimization algorithm: An application for diabetes classification.
In Proceedings of the Proceedings of fourth international conference on communication, computing and
electronics systems: ICCCES 2022. Springer, 2023, pp. 787–803.
25. Pavlov-Kagadejev, M.; Jovanovic, L.; Bacanin, N.; Deveci, M.; Zivkovic, M.; Tuba, M.; Strumberger, I.;
Pedrycz, W. Optimizing long-short-term memory models via metaheuristics for decomposition aided wind
energy generation forecasting. Artificial Intelligence Review 2024, 57, 45.
26. Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. Bert: Pre-training of deep bidirectional transformers for
language understanding. arXiv preprint arXiv:1810.04805 2018.
27. Aftan, S.; Shah, H. A survey on bert and its applications. In Proceedings of the 2023 20th Learning and
Technology Conference (L&T). IEEE, 2023, pp. 161–166.
28. Mittal, S.; Stoean, C.; Kajdacsy-Balla, A.; Bhargava, R. Digital assessment of stained breast tissue images
for comprehensive tumor and microenvironment analysis. Frontiers in bioengineering and biotechnology 2019,
7, 246.
29. Postavaru, S.; Stoean, R.; Stoean, C.; Caparros, G.J. Adaptation of deep convolutional neural networks
for cancer grading from histopathological images. In Proceedings of the Advances in Computational
Intelligence: 14th International Work-Conference on Artificial Neural Networks, IWANN 2017, Cadiz, Spain,
June 14-16, 2017, Proceedings, Part II 14. Springer, 2017, pp. 38–49.
30. Freund, Y.; Schapire, R.E. A desicion-theoretic generalization of on-line learning and an application to
boosting. In Proceedings of the European conference on computational learning theory. Springer, 1995, pp.
23–37.
31. Chen, T.; Guestrin, C. Xgboost: A scalable tree boosting system. In Proceedings of the Proceedings of the
22nd acm sigkdd international conference on knowledge discovery and data mining, 2016, pp. 785–794.
32. Kennedy, J.; Eberhart, R. Particle swarm optimization. In Proceedings of the Proceedings of ICNN’95-
international conference on neural networks. ieee, 1995, Vol. 4, pp. 1942–1948.
33. Bacanin, N.; Simic, V.; Zivkovic, M.; Alrasheedi, M.; Petrovic, A. Cloud computing load prediction by
decomposition reinforced attention long short-term memory network optimized by modified particle swarm
optimization algorithm. Annals of Operations Research 2023, pp. 1–34.
34. Mirjalili, S.; Mirjalili, S. Genetic algorithm. Evolutionary algorithms and neural networks: theory and applications
2019, pp. 43–55.
35. Rahnamayan, S.; Tizhoosh, H.R.; Salama, M.M.A. Quasi-oppositional Differential Evolution. In Proceedings
of the 2007 IEEE Congress on Evolutionary Computation, 2007, pp. 2229–2236. https://doi.org/10.1109/
CEC.2007.4424748.
36. McCabe, T. A Complexity Measure. IEEE Transactions on Software Engineering 1976, 2, 308–320.
37. Halstead, M. Elements of Software Science; Elsevier, 1977.
38. LaTorre, A.; Molina, D.; Osaba, E.; Poyatos, J.; Del Ser, J.; Herrera, F. A prescription of methodological
guidelines for comparing bio-inspired optimization algorithms. Swarm and Evolutionary Computation 2021,
67, 100973.
39. Glass, G.V. Testing homogeneity of variances. American Educational Research Journal 1966, 3, 187–190.
40. Shapiro, S.S.; Francia, R. An approximate analysis of variance test for normality. Journal of the American
statistical Association 1972, 67, 215–216.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those
of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s)
disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or
products referred to in the content.