0% found this document useful (0 votes)
26 views

Symmetry 13 00322 v3

This document proposes an evolutionary multilabel classification algorithm based on cultural algorithms. It aims to improve upon the naïve Bayes classifier for multilabel classification problems. The algorithm introduces cultural algorithms to search for and determine optimal weights in a weighted naïve Bayes framework. This is intended to address limitations of naïve Bayes related to conditional independence assumptions and label selection strategies. Experimental results showed the proposed algorithm performed better than other algorithms in classification performance.

Uploaded by

Ahmed Yakubu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views

Symmetry 13 00322 v3

This document proposes an evolutionary multilabel classification algorithm based on cultural algorithms. It aims to improve upon the naïve Bayes classifier for multilabel classification problems. The algorithm introduces cultural algorithms to search for and determine optimal weights in a weighted naïve Bayes framework. This is intended to address limitations of naïve Bayes related to conditional independence assumptions and label selection strategies. Experimental results showed the proposed algorithm performed better than other algorithms in classification performance.

Uploaded by

Ahmed Yakubu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

SS symmetry

Article
Evolutionary Multilabel Classification Algorithm Based on
Cultural Algorithm
Qinghua Wu 1 , Bin Wu 2 , Chengyu Hu 3 and Xuesong Yan 3,4, *

1 Faculty of Computer Science and Engineering, Wuhan Institute of Technology, Wuhan 430205, China;
[email protected]
2 School of Economics and Management, Nanjing Tech University, Najing 211816, China; [email protected]
3 School of Computer Science, China University of Geosciences, Wuhan 430074, China; [email protected]
4 Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education,
Jilin University, Changchun 130012, China
* Correspondence: [email protected]

Abstract: As one of the common methods to construct classifiers, naïve Bayes has become one of
the most popular classification methods because of its solid theoretical basis, strong prior knowl-
edge learning characteristics, unique knowledge expression forms, and high classification accuracy.
This classification method has a symmetry phenomenon in the process of data classification. Al-
though the naïve Bayes classifier has high classification performance in single-label classification
problems, it is worth studying whether the multilabel classification problem is still valid. In this paper,
with the naïve Bayes classifier as the basic research object, in view of the naïve Bayes classification
algorithm’s shortage of conditional independence assumptions and label class selection strategies,
the characteristics of weighted naïve Bayes is given a better label classifier algorithm framework;
the introduction of cultural algorithms to search for and determine the optimal weights is proposed
as the weighted naïve Bayes multilabel classification algorithm. Experimental results show that the

 algorithm proposed in this paper is superior to other algorithms in classification performance.
Citation: Wu, Q.; Wu, B.; Hu, C.; Yan, X.
Evolutionary Multilabel Classification Keywords: multilabel classification; naïve Bayesian algorithm; cultural algorithms; weighted Bayesian;
Algorithm Based on Cultural Algorithm. evolutionary multilabel classification
Symmetry 2021, 13, 322. https://
doi.org/10.3390/sym13020322

Academic Editor: Juan Luis García Guirao 1. Introduction


The multilabel learning problem draws its origins from the text classification prob-
Received: 28 January 2021
lem [1–3]. For example, a text may belong to one of several predetermined topics such
Accepted: 8 February 2021
as hygiene and governance. Today, problems of this type are extremely widespread in
Published: 16 February 2021
everyday applications. For example, in video indexing, audio clips can be divided accord-
ing to emotion-related labels such as “happiness” and “joy” [4]. In functional genomics,
Publisher’s Note: MDPI stays neutral
multiple function labels can be assigned to each gene, such as “a large and tall body”
with regard to jurisdictional claims in
and “fair skin” [5]. In image recognition, an image can simultaneously contain several
published maps and institutional affil-
iations.
scene labels such as “big tree” and “tall building” [6]. Because multilabel classification
is becoming increasingly widespread in real applications, an in-depth study about this
subject can be significantly beneficial for our everyday lives [7–12].
Many methods are available to construct multilabel classifiers, such as naïve Bayes [13],
decision tree [14], the k-nearest neighbors [15], support vector machines (SVMs) [16],
Copyright: © 2021 by the authors.
instance-based learning [17], artificial neural networks [18], and genetic algorithm-based
Licensee MDPI, Basel, Switzerland.
methods [19]. The naïve Bayes classifier (NBC) is a learning method incorporating supervi-
This article is an open access article
sion and guidance mechanisms and is simple and efficient [20]. These features have aided
distributed under the terms and
the NBC in becoming highly popular for classifier learning. However, the NBC is based
conditions of the Creative Commons
Attribution (CC BY) license (https://
on a simple albeit unrealistic assumption that the attributes are mutually independent.
creativecommons.org/licenses/by/
This begs the question: when the NBC is used to construct classifiers, is it feasible to
4.0/). improve the accuracy of the resulting classifiers by making corrections to this assumption?

Symmetry 2021, 13, 322. https://doi.org/10.3390/sym13020322 https://www.mdpi.com/journal/symmetry


Symmetry 2021, 13, 322 2 of 20

In 2004, Gao et al. proposed a multiclass (MC) classification approach to text catego-
rization (TC) [21]. McCallum et al. proposed the use of conditional random fields (CRF)
to predict the classification of unlabeled test data [22]. Zhang and Zhou proposed the
multilabel K-nearest neighbors (ML-KNNs) algorithm for the classic multilabel classifica-
tion problem [23]. Zhang et al. converted the NBC model, which is meant for single-label
datasets, into a multilabel naïve Bayes (MLNB) algorithm that is suitable for multilabel
datasets [13]. Xu et al. proposed an ensemble based on the conditional random field
(En-CRF) method for multilabel image/video annotation [24]. Qu et al. proposed the appli-
cation of Bayes’ theorem to the multilabel classification problem [25]. Wu et al. proposed a
weighted naïve Bayes based on differential evolution (DE-WNB) algorithm and estimated
a naïve Bayes based on self-adaptive differential evolution (SAMNB) algorithm for clas-
sifying single-label datasets [26,27]. In 2014, Sucar et al. proposed the use of Bayesian
network-based chain classifiers for multilabel classification [28].
For data mining researchers, methods for improving the accuracy of multilabel classi-
fiers have become an important subject in studies on the multilabel classification problem.
The problem with the NBC model is that it is exceptionally challenging for the attributes of
real datasets to be mutually independent. The assumption of mutual independence will
significantly affect classification accuracy in datasets sensitive to feature combinations and
when the dimensionality of class labels is very large. There are two problems that must be
considered when constructing a multilabel classifier: (1) the processing of the relationships
between the different labels, label sets, and attribute sets and the different attributes, and (2)
the selection of the final label set for predicting the classification of real data. The available
strategies for solving the label selection problem in NBC-based multilabel classification
generally overlook the interdependencies between the labels. This is because they rely only
on the maximization of posterior probability to perform label selections.
As naïve Bayes multilabel classification can be considered an optimization problem,
many researchers have attempted to apply intelligent optimization algorithms to it [29–35].
The intelligent optimization algorithms have wide applications [36–57]. Cultural algo-
rithms are a type of intelligent search algorithm; compared to the conventional genetic
algorithm, cultural algorithms add a so-called “belief space” to the population component.
This component stores the experience and knowledge learned by the individuals during the
population’s evolution process. The evolution of the individuals in the population space
is then guided by this knowledge. Cultural algorithms are established to be particularly
well-suited to the optimization of multimodal problems [58,59]. Based on the characteristics
of the samples in this work, a cultural algorithm was used to search for the optimal naïve
Bayes multilabel classifier. It was then used to predict the class labels of test samples.

2. Bayesian Multilabel Classification and Cultural Algorithms


2.1. Bayesian Multilabel Classification
The naïve Bayes approach has become highly popular for constructing classifiers.
This is owing to its robust theoretical foundations, the capability of NBCs to learn prior
knowledge, the unique knowledge representation of NBCs, and the accuracy of NBCs.
Although NBCs are capable of remarkable classification performance, questions remain
with regard to their performance in multilabel classification. Furthermore, there are a few
additional questions with regard to naïve Bayes multilabel classifiers compared to naïve
Bayes single-label classifiers: First, do different attributes exert different levels of influence
on the prediction of each class label? Secondly, is it feasible to extract the interdependencies
among the various labels of the label set and, thus, optimize classifier performance?
The multilabel classification problem can be converted into m single-label binary classi-
fication problems, according to the dimensionality of the class label set, m. The naïve Bayes
algorithm is then used to solve m single-label binary classification problems, thereby solv-
ing the multilabel classification problem. This approach is called the naïve Bayes multilabel
classification (NBMLC) algorithm. As binary classifiers are used in this algorithm, a class
label may take a value of zero or one. For example, if a sample belongs to a class label Ck ,
The training dataset of a multilabel classification problem is presented in Table 1.
Here, A = {A1, A2, A3, A4} represents the attribute set of the training dataset, whereas C =
{C1, C2} represents the label set of the training dataset. This model has four training in-
Symmetry 2021, 13, 322 stances. 3 of 20

Table 1. Training dataset of a multilabel data model.

the class label acquires


A1 a valueAof 2
one in theAsample
3
instance.
A4 This is C expressed
1
Ck2 = 1 or
as C
Ck 1 . Conversely, if a sample does not belong to the class label Ck , the class label acquires a
x1 1 sample instance.
value1 of zero in the x12 x13expressed as
This is x14C = 0 or C 00 . 1
k k
2The trainingxdataset
21 x22
of a multilabel x 23
classification x24 is presented
problem 1 in Table01. Here,
A = {A1 , A2 , A3 , A4 } represents the attribute set of the training dataset, whereas C = {C1 , C2 }
3
represents
x31 set of the xtraining
the label 32 x33 This model
dataset.
x34 has four training
0 0
instances.
x 41 of a multilabel
Table41. Training dataset x42 x 43
data model. x44 1 1

A1
The testing instance A2
is as follows: GivenA3 Y = <A1 = Ay14, A2 = y2, AC3 1= y3, A4 = yC4>,
2 the

1 is to solvex11
objective for the valuesx12corresponding
x13 to the class
x14 labels C1 and
0 C2. 1
The
2 problem is
x21solved as follows:
x22 First,
x23construct a naïve
x24 Bayes 1 network classifier
0
(Figure31). The nodesx31
C 1 and C2 in Figure 1 represent the class attributes C1 and C2. The
x32 x33 x34 0 0
four other nodes (A1, A2, A3, A4) represent the four attribute values, A1, A2, A3, and A4. The
4 x41 x42 x43 x44 1 1
class nodes C1 and C 2 are the parent nodes of the attribute nodes A1, A2, A3, and A4. The

following three assumptions have been adopted


Given Y in the abovementioned NBMLC algo-
The testing instance is as follows: = <A 1 = y1 , A2 = y2 , A3 = y3 , A4 = y4 >,
rithm:
the objective is to solve for the values corresponding to the class labels C and C . 1 2
A. All Theattribute
problemnodes exhibit
is solved an equal First,
as follows: level construct
of importance forBayes
a naïve the selection
networkofclassifier
a class
node.
(Figure 1). The nodes C1 and C2 in Figure 1 represent the class attributes C1 and C2 .
B.TheThe
fourattribute nodes(A(A
other nodes 1 ,1,AA
2 ,2, A
A33,, A
and A4) are mutually
4 ) represent the fourindependent and completely
attribute values, A1 , A2 , A3 ,
andunrelated to each
A4 . The class other.
nodes C1 and C2 are the parent nodes of the attribute nodes A1 , A2 ,
A3 , and
C. A4 . The
Assume that following three assumptions
the class nodes C1 and C2 are have been adopted
unrelated in the abovementioned
and independent.
NBMLC algorithm:

Figure1.1.AAnaïve
Figure naïveBayes
Bayesclassifier.
classifier.

A. All attribute nodes exhibit an equal level of importance for the selection of a
However, these assumptions tend to be untrue in real problems. In the case of As-
class node.
sumption A, it is feasible for different attributes to contribute differently to the selection
B. The attribute nodes (A1 , A2 , A3 , and A4 ) are mutually independent and completely
of a class label; different conditional attributes may not necessarily exhibit an equal level
unrelated to each other.
of importance in the classification of decision attributes. For example, in real data, if the
C. Assume that the class nodes C1 and C2 are unrelated and independent.
attribute A1 has a value larger than 0.5, this instance must belong to C1, and the value of
the attribute A2 has no bearing on whethertothis
However, these assumptions tend be data
untrue in realbelongs
instance problems.
to C1 In the
or C case of
2. Hence,
Assumption A, it is feasible for different attributes to contribute differently
the value of A2 does not significantly affect the selection of the class label. to the selection
of a class label; different conditional attributes may not necessarily exhibit an equal level
of importance
2.2. in the classification of decision attributes. For example, in real data, if the
Cultural Algorithms
attribute A1 has a value larger than 0.5, this instance must belong to C1 , and the value of
Cultural algorithms (CAs) are inspired by the processes of cultural evolution that
the attribute A2 has no bearing on whether this data instance belongs to C1 or C2 . Hence,
occur in the natural world. The effectiveness of the CA framework has already been es-
the value of A2 does not significantly affect the selection of the class label.
tablished in many applications. Figure 2 illustrates the general architecture of a CA. A CA
consists of a population
2.2. Cultural Algorithms space (POP) and a belief space (BLF). POP and BLF have inde-
pendent evolution processes. In a CA, these spaces are connected by a communication
Cultural algorithms (CAs) are inspired by the processes of cultural evolution that
protocol, i.e., a functional function, which enables these spaces to cooperatively drive the
occur in the natural world. The effectiveness of the CA framework has already been
established in many applications. Figure 2 illustrates the general architecture of a CA.
A CA consists of a population space (POP) and a belief space (BLF). POP and BLF have
independent evolution processes. In a CA, these spaces are connected by a communication
protocol, i.e., a functional function, which enables these spaces to cooperatively drive the
evolution and optimization of the individuals in the population. The functional functions
of a CA are the “accept function” and “influence function”.
Symmetry 2021, 13, x FOR PEER REVIEW 5 of 21

Symmetry 2021, 13, 322 4 of 20


evolution and optimization of the individuals in the population. The functional functions
of a CA are the “accept function” and “influence function”.

Figure2.2.Fundamental
Figure Fundamentalarchitecture
architectureofofcultural
culturalalgorithms.
algorithms.

As
AsCAs
CAspossess
possessmany
manyevolutionary
evolutionaryprocesses,
processes,aadifferent
differenthybrid
hybridcultural
culturalalgorithm
algorithm
can
can be developed by including a different evolutionary algorithm in the POP space.The-
be developed by including a different evolutionary algorithm in the POP space. The-
oretically,
oretically,any
anyevolutionary
evolutionaryalgorithm
algorithmcan
canbebeincorporated
incorporatedwithin
withinthe
thePOP
POPspace
spaceas
asanan
evolutionary
evolutionaryrule.
rule.However,
However,aasystematic
systematictheoretical
theoreticalfoundation
foundationhas
hasyet
yettotobe
beestablished
established
for
forapplying
applyingCAs
CAsasasan
anintelligent
intelligentoptimization
optimizationalgorithm.
algorithm.
3. Cultural-Algorithm-Based Evolutionary Multilabel Classification Algorithm
3. Cultural-Algorithm-Based Evolutionary Multilabel Classification Algorithm
3.1. Weighted Bayes Multilabel Classification Algorithm
3.1. Weighted Bayes Multilabel Classification Algorithm
In Assumption A of the NBMLC algorithm, all the attribute nodes exhibit an equal
Inimportance
level of Assumption forAtheof selection
the NBMLC algorithm,
of a class all theInattribute
label node. nodes
single-label exhibitmany
problems, an equal
re-
level of importance
searchers incorporatefor the selection
feature weightingofina the
class label node.
NBMLC In single-label
algorithm problems,
to correct this many
assumption.
researchers
This has beenincorporate
demonstrated feature weighting
to improve in the NBMLC
classification algorithm
accuracy to correct
[26,60,61]. In thisthis
work,as-
sumption. This has been demonstrated to improve classification accuracy
we apply the weighting approach to the multilabel classification problem and, thus, ob- [26,60,61]. In
this work, we apply the weighting approach to the multilabel classification
tain the weighted naïve Bayes multilabel classifier (WNBMLC). Here, wj represents the problem and,
thus, obtain
weight the weighted
of the attribute xj , i.e.,naïve Bayes multilabel
the importance of xj forclassifier (WNBMLC).
the class label Here,(1)wshows
set. Equation j repre-

sents
the the weight of
mathematical the attribute
expression xj, WNBMLC
of the i.e., the importance
algorithm.of xj for the class label set. Equation
(1) shows the mathematical expression of the WNBMLC algorithm.
d
P(Ci | X ) = argmax P(Ci ) ∏ P
wj
d ( x j Ci ) (1)
wj
P(Ci | X )  arg i 
Ci max P (Cj=)1 P ( x j | Ci ) (1)
Ci j 1
Here, it is illustrated that the key to solving the multilabel classification problem
lies inHere, it is illustrated
the weighting that the
of sample key toFirst,
features. solvingwe the multilabel
constructed classification
a WNBMLC problem
(see Figure lies
3),
in thethe
where weighting
nodes Cof 1 andsample features. First,
C2 correspond to theweclassconstructed
attributesa CWNBMLC (see nodes
1 and C2 . The FigureA3),1,
Awhere
2 , A3 , and A4 represent
the nodes C1 and C the to theAclass
four attributes,
2 correspond 1 , A2 ,attributes
A3 , and AC41. and
TheCclass
2. The nodesC1Aand
nodes 1, A2,

CA2 3,are
andthe
A4parent
represent nodes the of theattributes,
four attribute A 1, A2, A13, and
nodes A2 , A
A34., The A4 . The
and class nodesweights
C1 andofC2theare
conditional
the parent attributes
nodes of A1attribute
the , A2 , A3 , nodes
and A4Afor1, Athe
2, A3selection
, and A 4. of
The a weights
class label
of from
the the class
conditional
Symmetry 2021, 13, x FOR PEER REVIEW 6 of 21
label set C =
attributes , A12,,CA23}, are
A1{C andwA14, for
w2 , the
w3 , selection
and w4 , respectively.
of a class label Infrom
this work, a CA
the class was
label setused
C = {Cto1,
iteratively optimize the selection of feature weights.
C2} are w1, w2, w3, and w4, respectively. In this work, a CA was used to iteratively optimize
the selection of feature weights.

Figure
Figure3.
3. Weighted
Weighted naïve
naïve Bayes
Bayes classifier.
classifier.

3.1.1. Correction of the Prior Probability Formula


An NBC occasionally calculates probabilities that are zero or very close to zero.
Therefore, it is necessary to consider cases where the denominator becomes zero and to
prevent underflows caused by the multiplication of small probabilities. Furthermore,
while calculating conditional probabilities, an extreme situation can occasionally arise
Symmetry 2021, 13, 322 5 of 20

3.1.1. Correction of the Prior Probability Formula


An NBC occasionally calculates probabilities that are zero or very close to zero. There-
fore, it is necessary to consider cases where the denominator becomes zero and to prevent
underflows caused by the multiplication of small probabilities. Furthermore, while calcu-
lating conditional probabilities, an extreme situation can occasionally arise where all the
training instances either belong to or do not belong to a class label. Thereby, the class label
has a value of one or zero, respectively. This can occur if there are very few samples in the
training set or when the dimensionalities of the attributes and class labels are very large.
Consequently, the sample instances in the training set do not fully cover the relationship
between the attributes and class labels, and it becomes infeasible to classify the records of
the test set using the NBC. For example, it is feasible for the training set class label C1 = 1
to have a probability of zero. The denominator then becomes zero in the equations for
calculating the average and variance of the conditional probability, resulting in erroneous
calculations. We circumvented this problem by using the M-estimate while calculating
prior probabilities. Equation (2) shows the definition of the M-estimate.
nc + m × p
P( Xi |Yj ) = (2)
n+m
In this equation, n is the total number of instances that belong to the class label Yj
in the training set; nc is the number of sample instances in the training set that belong
to the class label Yj and have an attribute value of Xi ; m is a deterministic parameter
called the equivalent sample size; p is a self-defined parameter. According to this equation,
if the training set is absent or the number of sample instances in the training set is zero,
nc = n = 0, and P (Xi | Yj ) = p. Therefore, p may be considered the prior probability for
the appearance of the attribute Xi in the class Yj . In addition, prior probability (p) and
observation probability (nc /n) are determined by the equivalent sample size (m). In this
work, m is defined as one, and the value of p is 1/|C|, where |C| is the total number of
class labels, i.e., the dimensionality of the class label set. Therefore, Equation (3) shows the
equation for calculating prior probability.

|Ci,D |+1/|C |
P(Ci ) = (3)
| D |+1

3.1.2. Correction of the Conditional Probability Formula


Correction for Assumption A in the conditional probability formula: Suppose that the
instances of the training set adhere to a Gaussian density function. If all the training set
instances belong to (or do not belong to) a certain class label, the number of elements in the
average and variance formulae that correspond to the class label will be zero. Therefore,
the denominator becomes zero. In this scenario, the calculated conditional probabilities are
meaningless. In this work, it is assumed that for each class label, there is a minimum of
one instance in the training set that belongs to that class label and also a minimum of one
instance in the training set that does not belong to that class label. Therefore, if a class label
was not selected in the sample instances of a training set, the algorithm will still assume
that it was selected in one instance. This does not affect the results of the classification
because it is a low-probability event compared to the number of training set instances.
However, it ensures that the denominator in the conditional probability formula will not
become zero. That is, regardless of the number of sample instances wherein a class label
was evaluated as zero under a certain specified set of conditions, the selection count will
always be incremented by one to ensure that this class label is selected in a minimum of
one instance or not selected in one instance.
Correction for Assumption B: Suppose that conditional probability is being calculated
by discretizing the continuous attribute values. If all the sample instances of a training
set belong to (or do not belong to) a class label, Nj (Ck ) = 0. Therefore, the denominator
N (C )
of P xi,j Ck = Nj,n(C k) becomes zero. This data would then be considered invalid by

j k
Symmetry 2021, 13, 322 6 of 20

the classifier. To resolve this issue, we used the M-estimate to smooth the conditional
probability formula, as in Equation (4).
 
 Nj,n Ck +1/ A j
P xi,j Ck =  (4)
Nj Ck + 1

3.1.3. Correction of the Posterior Probability Formula


In this work, the logarithm summation (log-sum) method was used to prevent under-
flows caused by the multiplication of small probabilities. In P( X |Ci ) = P( x1 , x2 , . . . , xn ) =
n
∏ P( xk |Ci ), even if all of the factors in the multiplication product are not zero, if n is
k =1
large, the final result of P( X |Ci ) can be zero, or an underflow may occur and prevent
the evaluation of P( X |Ci ) . In this scenario, it is infeasible to classify the test samples
via stringent pairwise probability comparisons. It is, therefore, necessary to convert the
probabilities through Equations (5) and (6).
n
P( X |Ci ) P(Ci ) = P(Ci ) ∏ P( xk |Ci ) (5)
k =1
!
n n
log P( X |Ci ) = log P(Ci ) ∏ P( xk |Ci ) = log P(Ci ) + ∑ log( P(xk |Ci )) (6)
k =1 k =1

The product calculation is transformed into a log-sum to solve this problem. This solves
the underflow problem effectively, improves the accuracy of the calculation, and facilitates
stringent pairwise comparisons. To ensure accurate calculations in this work, M-estimate-
smoothed equations were used to calculate prior probability and conditional probability,
whereas the log-sum method was used to calculate posterior probability in all the experi-
ments described in this paper.
Symmetry 2021, 13, x FOR PEER REVIEW 8 of 21

3.2.
Symmetry 2021, 13, x FOR PEER REVIEW Improved Cultural Algorithm 8 of 21
In the proposed algorithm, the individuals (chromosomes) in the POP space are de-
real numbers.
signed The dimensionality
using real-number coding. The of the chromosome
variables is equal to the
of the individuals are dimensionality
randomly initialized of the
conditional
in the attributes
(0.0, 1.0)Therange in the sample
of real numbers data. Moreover, each real number corresponds to a
real numbers. dimensionality of theso that each chromosome
chromosome is equal to the consists of a set ofofreal
dimensionality the
conditional
numbers. The attribute in
dimensionalitythe dataset. Suppose
of thedata.
chromosome that the population size is N and that the
conditional attributes in the sample Moreover,is each
equalreal
to the dimensionality
number corresponds ofto
thea
attribute dimensionality
conditional attributesininthe of an
the individual
sample data. in the population
Moreover, is n. Then, each individual ain
conditional attribute dataset. Suppose that theeach real
population number
size iscorresponds
N and that to the
the population,
conditional Wi, may
attribute in thebedataset.
expressed as an n-dimensional
Suppose vectorsize
suchisthat Wi =that
N individual {w1the
, w2,
attribute dimensionality of an individual in thethat the population
population is n. Then, each and in
…, w
attribute
j , … w }. In this
dimensionality
n equation, w
of expressedis the
an individual
j weight of the j-th
inn-dimensional attribute
the populationvector of individual
is n. Then, w ,
eachWindividual
i which
the population, Wi, may be as an such that i = {w1, w2,
is the
in within (0.0, 1.0).W
population, The structure
i , may of each chromosome
be expressed is shownvector
as an n-dimensional in Figure
such4.that W = {w1 ,
…, wj, … wn}. In this equation, wj is the weight of the j-th attribute of individual wi,i which
w , . . . , wj , . . . wn }. In this equation, wj is the weight of the j-th attribute of individual wi ,
is 2within (0.0, 1.0). The structure of each chromosome is shown in Figure 4.
which is within (0.0, 1.0). The structure of each chromosome is shown in Figure 4.

Figure 4. Structure of a chromosome in the cultural algorithm.

Figure 4. Structure
Structure of a chromosome in the cultural algorithm.
Here, wi  (0,1) , and n represents the dimensionality of the conditional attributes in
Here, wi ∈(0,1), and n represents the dimensionality of the conditional attributes in the
the multilabel
Here, w classification problem. the dimensionality of the conditional attributes in
multilabel i  (0,1) , and
classification n represents
problem.
The structure of the POP space is shown in Figure 5.
the multilabel classification
The structure of the POPproblem.
space is shown in Figure 5.
The structure of the POP space is shown in Figure 5.

Figure5.5.Structure
Figure Structureofofthe
thepopulation
populationspace
spaceininthe
thecultural
culturalalgorithm.
algorithm.

Figure 5. Structure of the population space in the cultural algorithm.


3.2.1. Definition and Update Rules of the Belief Space
The BLF space
3.2.1. Definition in our Rules
and Update algorithm
of theuses the
Belief <S, N> structure. Here, S is situational
Space
knowledge (SK), which is mainly used to record the exemplar individuals in the evolu-
The BLF space in our algorithm uses the <S, N> structure. Here, S is situational
Here, wi  (0,1) , and n represents the dimensionality of the conditional attributes in
the multilabel classification problem.
The structure of the POP space is shown in Figure 5.

Symmetry 2021, 13, 322 7 of 20

Figure Definition
3.2.1. 5. Structure ofand
the population
Update space
Rulesinof
thethe
cultural algorithm.
Belief Space
3.2.1. The BLF space
Definition and Update in ourRulesalgorithm uses Space
of the Belief the <S, N> structure. Here, S is situational knowl-
edgeThe (SK), which is mainly used
BLF space in our algorithm uses the <S,to record theN>exemplar
structure. individuals in the evolutionary
Here, S is situational
process. The structure of SK may be expressed as SK = {S 1 , S2 , . . . , SS }. In this equation,
knowledge (SK), which is mainly used to record the exemplar individuals in the evolu-
Stionary
represents
process.the
The capacity
structureof of SK,
SK mayandbethe structure
expressed of =each
as SK {S1, Sindividual
2, …, SS}. In thisin the SK set has the
equa-
expression Si = {x
tion, S represents thej i capacity
| f (Si )}.ofInSK,this
andequation,
the structure Si is
of the
eachi-th
individual
exemplar in the SK set has in the SK set,
individual
the expression
and Si =fitness
f (xi ) is the {xji | f (S
ofi)}.individual
In this equation,
xi inSthe
i is the i-th exemplar individual in the SK
population. The structure of SK is shown in
set, and f(xi) is the fitness of individual xi in the population. The structure of SK is shown
Figure 6, and the update rules for SK are shown in Equation (7).
in Figure 6, and the update rules for SK are shown in Equation (7).
tx t )f( xf t( s t )) > f (st )

t
 x best fx(best
stt +1 1 = t
best best (7) (7)
 s others s t others

Figure 6.6.Structure
Figure Structureof situational knowledge.
of situational knowledge.

NNisisnormative
normative knowledge
knowledge (NK). In the BLF
(NK). In space,
the BLFit isspace,
the information
it is thethat is effec-
information that is ef-
tively carried by the range of values of the variable. When
fectively carried by the range of values of the variable. When a CA is used a CA is used to optimize a to optimize
problem of dimensionality n, the expression of NK is NK = {N1, N2, …, Nn}. In this equation,
a problem of dimensionality n, the expression of NK is NK = {N1 , N2 , . . . , Nn }. In this
Ni = {(li, ui), (Li, Ui)}. Here, i ≤ n, and li and ui are the lower and upper limits of the i-th
equation,
dimensional Nvariable,
i = {(li , uwhich , Uiinitialized
i ), (Li are i ≤zero
)}. Here, as n, and li and
and one, ui are theLlower
respectively. and upper limits of
i and Ui are the
the i-th dimensional variable, which are initialized as zero and
individual fitness values that correspond to the li lower limit and ui upper limit, respec- one, respectively. Li and Ui
are theofindividual
tively, variable xi. Lfitness
i and Uvalues that correspond
i are initialized the li values.
as positivelytoinfinite lower limit and ui upper
The structure of limit, re-
Symmetry 2021, 13, x FOR PEER REVIEW 9 of 21
NK is shownofinvariable
spectively, Figure 7, xand i . Lthe
i andupdate rules
Ui are for NK are
initialized asshown in Equation
positively infinite(8).values. The structure
of NK is shown in Figure 7, and the update rules for NK are shown in Equation (8).
x j,i x j,i <= lit or f ( xj x) <x Lit l or f ( x ) t+ t t
 
ff(( xx )j )x x j,i
 l<=
or f ( xli) orL f ( x j ) < Li
t t t t
L1
lit+1 = t l  
i
t 1 j ,i j ,i
L i
i = L
j

i t 1
it
j j ,i i j i

li others l i
t
others
 
L others L others
i
t
i
(8) (8)
xk,j xk,i >= uit oru f (xkx) >x U tu or f ( x t) + ( xt ) or
 U f ( xk ) > U t

U t
f (fx( xk )) x
 x k,i>=
t
u or f u t t
t +1 t 1 i
k, j k ,i 1 i k i t 1
i k k ,i
i i k i
ui = 
i U = U 
others U t others
i
uit others u i
t t
others
i U i
i

Figure 7. Structure of normative knowledge.


Figure 7. Structure of normative knowledge.
3.2.2. Fitness
3.2.2. FitnessFunction
Function
In this work, the fitness function is defined as the accuracy of the class labels pre-
dictedInbythisthe work, theTherefore,
algorithm. fitness function
the value is
of defined
the fitnessas the accuracy
function of the t-thofgeneration
the class labels predicted
by thei-th
of the algorithm.
individual, Therefore,
f(Xit), is equalthe value
to the of the fitness
classification accuracyfunction of thebyt-th
that is obtained sub-generation of the
i-th individual,
stituting the weightf (X t ), is equal
of iindividual Xitto the
into theclassification
weighted naïveaccuracy
Bayes posteriorthat is obtained by substituting
probability
formula.
the weight Therefore, substituting
of individual Xthe
t weight of the dimension corresponding to individual
i into the weighted naïve Bayes posterior probability formula.
Xit into the weighted naïve Bayes posterior probability formula (Equations (1)–(3)) yields
Therefore, substituting the weight of the dimension corresponding to individual Xi t into
the theoretical class label. This theoretical class label, Jidk, is then compared to the real class
the
label,weighted naïve
Ji k. A score of one isBayes
assignedposterior probability
if they are equal, and zero formula
otherwise. If (Equations (1)–(3)) yields the
there are n test
theoretical
instances andclass m classlabel. This theoretical
label dimensions, classforlabel,
the equation Ji d k , the
calculating is then
fitnesscompared
of an indi- to the real class
vidual Jis shown in Equation (9).
label, i k . A score of one is assigned if they are equal, and zero otherwise. If there are n
test instances and m class label dimensions,
n m the equation for calculating the fitness of an
individual is shown in Equation 
t (9).
 J id,k  J i,k  (9)
i 1 k 1
f (X )  i
nm
n m  
d −J
∑ ∑ Ji,k i,k
3.2.3. Influence Function i =1 k =1
f ( Xit ) = (9)
The influence function (Equation 10) is the channel n by∗which
m the BLF space guides
the evolution of individuals in the POP space. That is, the influence function enables the
various knowledge categories of the BLF space to influence the evolution of individuals
in the POP space. To adapt the CA to the multilabel classification problem, NK was used
to adjust the step-length of the individuals’ evolution. Meanwhile, SK was used to direct
the evolution of the individuals.
Symmetry 2021, 13, 322 8 of 20

3.2.3. Influence Function


The influence function (Equation (10)) is the channel by which the BLF space guides
the evolution of individuals in the POP space. That is, the influence function enables the
various knowledge categories of the BLF space to influence the evolution of individuals in
the POP space. To adapt the CA to the multilabel classification problem, NK was used to
adjust the step-length of the individuals’ evolution. Meanwhile, SK was used to direct the
evolution of the individuals.
 t t t
 x j,i + |size( Ii ) · N (0, 1)| x j,i < si

t +1
x j,i = x j,i − |size( Ii ) · N (0, 1)| x j,i > sit
t t
(10)
 x t + size( I ) · N (0, 1) x t = st

j,i i j,i i

3.2.4. Selection Function


In the CA-based weighted naïve Bayes multilabel classification (CA-WNB) algorithm,
the greedy search strategy is used to determine whether a newly generated test individual,
vj,i (t + 1), will replace the parent individual in the t-th generation, xj,i (t), to form a new
individual in the t + 1-st generation, xj,i (t + 1). The algorithm compares the fitness value of
vj,i (t + 1), f (vj,i (t + 1)) to that of xj,i (t), f (xj,i (t)). vj,i (t + 1) will be selected for the next genera-
tion only if f (vj,i (t + 1)) is strictly superior to f (xj,i (t)). Otherwise, xj,i (t) will be retained in
Symmetry 2021, 13, x FOR PEER REVIEW 10 of 21
the t + 1-th generation. This approach systematically selects superior individuals for reten-
tion in the next generation. The implementation of this approach can be mathematically
expressed as Equation (11).
v j ,i (t  1) , if f (v j ,i ((
t  1))  f ( x j ,i (t ))
xi (t  1)   v j,i (t + 1), i f f (v j,i (t + 1)) > f ( x j,i(11)
(t))
 x j ,i (t )x,i (t + 1) =
others (11)
x j,i (t), others

The process
The process of ourof our improved
improved CA isinshown
CA is shown Figurein
8: Figure 8:

Begin

Initialize the population space


P(0)

Evaluate the population by the


posterior probability

Initialize the belief space B(0)

Terminating
Y End
algorithm?

N
The new population P(t+1) is
generated by the influence
function of P(t).

The whole population is


evaluated by the posterior
probability

Update belief space B(t+1)


through accpt function by B(t)

Figure 8. Flowchart of the improved cultural algorithm.


Figure 8. Flowchart of the improved cultural algorithm.
3.3. CA-Based Evolutionary Multilabel Classification Algorithm
In the CA-WNB algorithm, the main purpose of the CA is to determine the weight of
the attributes for label selection as the weight searching process is effectively a label-learn-
ing process. Once the optimal weights have been determined, the attribute weights may
be used to classify the test set’s instances. The architecture of the CA-WNB algorithm is
shown in Figure 9. The procedure of the CA-WNB algorithm is described below. It pro-
Step 4: Initialize the BLF. NK and SK are obtained by selecting the range of the BLF
according to the settings of the accept function and the individuals in this range.
Step 5: Update the POP. Based on the features of NK and SK of the BLF, new indi-
viduals are generated in the POP according to the influence rules of the influence function.
Symmetry 2021, 13, 322
In the selection function, the exemplar individuals are selected from the parents and9 chil- of 20

dren according to the greedy selection rules, thus forming the next generation of the POP.
Step 6: Update the BLF. If the new individuals are superior to the individuals of the
BLF, the BLF is
3.3. CA-Based updated. Otherwise,
Evolutionary Multilabel Step 5 is repeated
Classification until the algorithm attains the max-
Algorithm
imum number of iterations or until the results have converged. The algorithm is then ter-
In the CA-WNB algorithm, the main purpose of the CA is to determine the weight
minated.
of the attributes for label selection as the weight searching process is effectively a label-
CA optimization is used to obtain the optimal combination of weights based on the
learning process. Once the optimal weights have been determined, the attribute weights
training set. The
may be used weighted
to classify naïve
the test Bayes
set’s posterior
instances. Theprobability formula
architecture is then used
of the CA-WNB to pre-
algorithm
dict the class labels of the unlabeled test set instances. The predictions are
is shown in Figure 9. The procedure of the CA-WNB algorithm is described below. It pro- then scored: a
point is scored if the prediction is equal to the theoretical value; no point is
vides a detailed explanation of the algorithm’s architecture. The training of the CA-WNB scored other-
wise. This is
algorithm ultimately
performed yields the average
according to the classification accuracy of the test set’s instances.
following procedure:

Architecture of
Figure 9. Architecture of the
the cultural
culturalalgorithm-based
algorithm-basedweighted
weightednaïve
naïveBayes
Bayesmultilabel
multilabelclassification
classifica-
tion (CA-WNB)
(CA-WNB) algorithm.
algorithm.

Step 1: The Results


4. Experimental data is preprocessed
and Analysisusing stratified sampling. In this step, 70% of the
sample
4.1. Experimental Datasets imported into the training set. The other 30% of the dataset is
dataset is randomly
imported into the test set. The prior and posterior probabilities of the sample data in the
In most cases, the efficacy of multilabel classification is strongly correlated with the
training set (with M-estimate smoothing applied) are then calculated.
characteristics of the dataset.
Step 2: Initialize the POPAspace.
few ofThe
these datasets may
individuals in thehave
POPgaps, noise, or nonuniform
are randomly initialized,
with each individual corresponding to a set of feature weights. The sizeMoreover,
distributions. The dataset’s attributes may also be strongly correlated. the data
of the population
may
is NP.be discrete, continuous, or a mix of both. The datasets used in this work have been
normalized.
Step 3: That
Evaluateis, thetheattribute
POP. Let data
wihave = xi .been scaled sonormalize
In addition, that their values
wi (thefall
sumwithin
of alla
small, specified value interval, which is generally 0.0–1.0.
the attribute weights should be equal to one so that the sum of all the variables in the
Single instance
chromosome is one).multilabel
The weights datasets
of each were selected are
individual for this experiment.
substituted In datasets
into the weightedof
this type, a task can be represented by an instance. However, this
naïve Bayes posterior probability formula (which uses the log-sum method) to predict theinstance belongs to mul-
tiple labels
class class labels
of thesimultaneously.
sample data in the Given
traininga multilabel dataset Dclassification
set. The resulting and class label set C = {C
accuracies are1,
C2, C3considered
then , C4}, if there theis fitness
a tuple,values
Xi, whose
of theclass labels areThe
individuals. Yi =evaluation
{C1, C3}, theofsample
the POP instance
space isis
represented
thus completed,as {Xandi |1,the
0, 1, 0}. individual
best That is, all inthethe
sample
POP isinstances
stored. in the dataset must include
all the labels
Step in the label
4: Initialize the set.
BLF.IfNKan instance
and SK are belongs to the
obtained by class labelthe
selecting Ci, range
the value of BLF
of the Ci is
one; if an instance does not belong to C , the value of C is zero.
according to the settings of the accept function and the individuals in this range.
i i

Step 5: Update the POP. Based on the features of NK and SK of the BLF, new individu-
als are generated in the POP according to the influence rules of the influence function. In the
selection function, the exemplar individuals are selected from the parents and children
according to the greedy selection rules, thus forming the next generation of the POP.
Step 6: Update the BLF. If the new individuals are superior to the individuals of
the BLF, the BLF is updated. Otherwise, Step 5 is repeated until the algorithm attains
the maximum number of iterations or until the results have converged. The algorithm is
then terminated.
CA optimization is used to obtain the optimal combination of weights based on the
training set. The weighted naïve Bayes posterior probability formula is then used to predict
Symmetry 2021, 13, 322 10 of 20

the class labels of the unlabeled test set instances. The predictions are then scored: a point
is scored if the prediction is equal to the theoretical value; no point is scored otherwise.
This ultimately yields the average classification accuracy of the test set’s instances.

4. Experimental Results and Analysis


4.1. Experimental Datasets
In most cases, the efficacy of multilabel classification is strongly correlated with the
characteristics of the dataset. A few of these datasets may have gaps, noise, or nonuniform
distributions. The dataset’s attributes may also be strongly correlated. Moreover, the data
may be discrete, continuous, or a mix of both. The datasets used in this work have been
normalized. That is, the attribute data have been scaled so that their values fall within a
small, specified value interval, which is generally 0.0–1.0.
Single instance multilabel datasets were selected for this experiment. In datasets of
this type, a task can be represented by an instance. However, this instance belongs to
multiple class labels simultaneously. Given a multilabel dataset D and class label set C
= {C1 , C2 , C3 , C4 }, if there is a tuple, Xi , whose class labels are Yi = {C1 , C3 }, the sample
instance is represented as {Xi |1, 0, 1, 0}. That is, all the sample instances in the dataset must
include all the labels in the label set. If an instance belongs to the class label Ci , the value of
Ci is one; if an instance does not belong to Ci , the value of Ci is zero.
As the proposed multilabel classification algorithm is mainly aimed at text data,
to ensure that the results of the validation experiments are universally comparable, the sim-
ulation experiments were performed using four widely accepted and preprocessed multi-
label datasets that were obtained from the following multilabel datasets website: http:
//mulan.sourceforge.net/datasets.html, accessed on 16 February 2021. The CAL500 and
emotions datasets concern music and emotion, the yeast dataset is from bioinformatics,
and the scene dataset comprises natural scenes; the features of these datasets are described
in Table 2.

Table 2. The datasets used in this experiment.

No. of Samples in Sample Set No. of Attributes


Data Set Definition Domain No. of Class Labels
Training Set Test Set Numerical Type Noun Type
yeast Biology 1500 917 103 0 14
scene Image 1211 1196 294 0 6
emotions Music 391 202 72 0 6
CAL500 Music 351 151 68 0 174

4.2. Classification Evaluation Criteria


Multilabel data are generally mined to achieve two objectives: to classify multilabel
data and to sort large numbers of labels. In this work, we focus only on multilabel
classification. Therefore, the evaluation criteria used in this work are based on evaluation
criteria for classification methods. According to the characteristics of the experimental
data, if we suppose that there is a multilabel dataset D, a class label set C = {C1 , C2 , C3 , C4 },
and a Xi tuple whose class labels are Yi = {C1 , C3 }, the representation of the sample
instance is then {Xi | 1, 0, 1, 0}. In a multilabel classification problem, the label set, Zi ,
that was predicted by the multilabel classifier for Xi may differ from the actual label
set, Yi . Suppose that the class label set that was obtained by the algorithm is {1, 0, 0, 0}.
As this set partially matches the real set, {1, 0, 1, 0}, the prediction accuracy of this instance
is Accuracyi = (3/4) × 100% = 75%. That is, three of the four class labels were predicted
correctly. Each correct prediction results in 1 point while a wrong prediction results in 0
points; prediction accuracy is given by the number of points divided by the dimensionality
of the class label set. It is thus shown that the prediction accuracy of each test instance
ranges within [0, 1]. If F(Cik ) and Cik represent the theoretical (i.e., algorithm-predicted) and
Symmetry 2021, 13, 322 11 of 20

real values, respectively, of the class label in the k-th dimension of the i-th sample instance
in the test set, then N is the total number of sample instances in the test set and m is the
dimensionality of the class label set. The equation for calculating the classification accuracy
of each to-be-classified sample instance in the test set is Equation (12).

1 m   k k
m k∑
Accuracyi = T F Ci , Ci (12)
=1

The T (F(Cik ), Cik ) function returns a value of one if the values of F(Cik ) and Cik are equal,
and zero otherwise, as per Equation (13).
  
     1 F Ck = Ck
T F Cik , Cik =  i i
(13)
 0 F Ck 6= Ck
i i

The average criterion of the algorithm is the average classification accuracy of all the
sample instances in the test set. It is calculated using Equation (14).

1 N 1 m   k k
N i∑ m k∑
Accuracy = T F Ci , Ci (14)
=1 =1

4.3. Classification Prediction Methods


CA produces a set of feature weights at the end of its iteration. The two classification
methods described in [31] is used to predict the classification accuracy of each individual
for the test sets. As these algorithms use topological rankings to predict an algorithm’s
classification accuracy, the population size (NP) of the CA must be at least 30 to ensure that
these algorithm prediction methods remain effective.

4.4. Analysis of the Results of the NBMLC Experiment


In the NBMLC experiment, stratified sampling was used to randomly import 70%
of the sample instances into the training set. The remaining 30% of the samples were
imported into the test set. The attributes of the experimental datasets are continuous,
and there are many methods by which an NBC can compute their conditional probabilities.
Therefore, three common fitting methods (Gaussian distribution, Cauchy distribution,
and data discretization) were used in the experimental validation. The calculated results
were then compared to analyze the strengths and weaknesses of each fitting method. In the
data discretization experiment, data fitting was performed by specifying the initial number
of discrete intervals, which is 10 in this experiment. Ten independent trials were then
performed by applying the NBMLC algorithm to each of the four experimental datasets.
The maximum (MAX), minimum (MIN), and average (AVE) values of the 10 trials were
then recorded. The experimental results are presented in Table 3.

Table 3. Comparison between the results of the NBMLC experiment.

Gaussian Cauchy Disperse_10


Data Set
MAX MIN AVE MAX MIN AVE MAX MIN AVE
CAL500 0.8732 0.8574 0.8622 0.8662 0.8588 0.8635 0.8838 0.8620 0.8740
emotions 0.6976 0.6798 0.6884 0.7022 0.6798 0.6892 0.8361 0.7968 0.8168
scene 0.8239 0.8195 0.8212 0.8241 0.8188 0.8151 0.8707 0.8552 0.8614
yeast 0.7749 0.7636 0.7688 0.7720 0.7603 0.7673 0.8023 0.7799 0.7958

The analysis of the results of the three fitting experiments is presented in Table 4. In this
table, Gau-Cau represents the difference between the average classification accuracies of the
Gaussian and Cauchy distribution experiments. Dis-Gau represents the difference between
Symmetry 2021, 13, 322 12 of 20

the average classification accuracies of the data discretization experiment and the Gaussian
distribution experiment. Dis-Cau is the difference between the average classification
accuracies of the data discretization experiment and the Cauchy distribution experiment.
Table 4. Analysis of the results of the naïve Bayes multilabel classification (NBMLC) experiments.

Data Set Gau-Cau Dis-Gau Dis-Cau


CAL500 −0.0013 0.0118 0.0105
emotions −0.0008 0.1284 0.1276
scene 0.0061 0.0402 0.0463
yeast 0.0015 0.027 0.0285

All three fitting methods exhibit a time complexity of magnitude O(N × n × m). Here,
N is the size of the dataset, n the dimensionality of the attributes, and m the dimensionality
of the class labels. Figure 10 shows the total computation times of each distribution method
in their 10 trial runs, which were performed using identical computational hardware.
The horizontal axis indicates the type of fitting method, whereas the vertical axis indicates
Symmetry 2021, 13, x FOR PEER REVIEW 14 of 21
the computation time consumed by each method.

Figure10.
Figure 10.Comparison
Comparisonbetween
betweenthe
thecomputational
computationaltimes
timesofofeach
eachofofthe
thethree
threefitting
fittingmethods.
methods.

Table44compares
Table comparesthe theclassification
classificationaccuracies
accuraciesofofthe theNBMLCs
NBMLCswith withthree
threedistribution
distribution
methods.ItItisisdemonstrated
methods. demonstratedthat thatthe
theclassification
classificationaccuracy
accuracyof ofan
anNBMLC
NBMLCisisthe thehighest
highest
whendata
when datadiscretization
discretizationisisused usedtotofitfitthe
theconditional
conditionalprobabilities.
probabilities.Furthermore,
Furthermore,ititisis
demonstrated
demonstratedthat that the
the data discretization approach approachyields yieldshigher
higherclassification
classificationefficacy
efficacyin
inhighly
highly concentrated
concentrated datasets.
datasets. TheThe
useuse of Gaussian
of Gaussian andand Cauchy
Cauchy distributions
distributions tothe
to fit fit the
con-
conditional probabilities
ditional probabilities of the
of the datasetdataset resulted
resulted in significantly
in significantly poorer
poorer resultsresults
than than
that of that
the
ofdiscretization
the discretization approach.
approach. Furthermore,
Furthermore, the classification
the classification accuracies
accuracies obtained obtained with
with Gauss-
Gaussian and Cauchy
ian and Cauchy distribution
distribution are similar.
are similar. Further Further
analysisanalysis revealed
revealed thateffects
that the the effects
of the
ofdifferent
the different distribution
distribution methods methods on classification
on classification accuracyaccuracy are significantly
are significantly moremore pro-
pronounced in the “emotions” dataset than in the CAL500, “scene”,
nounced in the “emotions” dataset than in the CAL500, “scene”, or “yeast” datasets. In or “yeast” datasets.
Inthethe “emotions”
“emotions” dataset,
dataset, thethe classification
classification accuracy
accuracy of theofdiscretization
the discretization approach
approach is nearlyis
nearly 13% higher than that of the other approaches. In the “scene”,
13% higher than that of the other approaches. In the “scene”, “yeast”, and CAL500 da- “yeast”, and CAL500
datasets,
tasets, thethediscretization
discretizationapproach
approachoutperformed
outperformed the the other
other approaches
approaches by by4%,4%,3%,3%,andand1%,1%,
respectively. An analysis of the characteristics of the datasets revealed
respectively. An analysis of the characteristics of the datasets revealed that the class label that the class
label dimensionality
dimensionality of the of“emotions”
the “emotions” datasetdataset is smaller
is smaller thanthan
that that of other
of the the other datasets.
datasets. It is
Itfollowed
is followedby the “scene” dataset and the “yeast” dataset. The CAL500 dataset hadhad
by the “scene” dataset and the “yeast” dataset. The CAL500 dataset the
the highest
highest number
number of class
of class labellabel dimensions.
dimensions. Therefore,
Therefore, it mayit may be concluded
be concluded that thethatclassi-
the
classification accuracies
fication accuracies of these
of these fittingfitting
methodsmethodsbecomebecomemoremore similar
similar as theas number
the number of
of class
class label dimensions increase. Although the algorithmic time complexities
label dimensions increase. Although the algorithmic time complexities of these fitting of these fitting
methods
methodsare areononananidentical
identical level of of
level magnitude,
magnitude, thethe
attribute values
attribute of the
values testtest
of the datadata
mustmust be
divided into intervals in the discretization approach. This requirement
be divided into intervals in the discretization approach. This requirement resulted in resulted in higher
computation times than the Gaussian and Cauchy distribution approaches.
higher computation times than the Gaussian and Cauchy distribution approaches.

4.5. Analysis of Results of the CA-WNB Experiment


In the CA-WNB algorithm, there are three parameters that are relevant for the CA:
the maximum number of iterations, population size, and the initial acceptance ratio of the
accept function. The configuration of these parameters is presented in Table 5.
Symmetry 2021, 13, 322 13 of 20

4.5. Analysis of Results of the CA-WNB Experiment


In the CA-WNB algorithm, there are three parameters that are relevant for the CA:
the maximum number of iterations, population size, and the initial acceptance ratio of the
accept function. The configuration of these parameters is presented in Table 5.
Table 5. Configuration of CA parameters.

Parameter Population Size Maximum Number of Iterations Initial Acceptance Ratio


Value 100 200 0.2

In the CA-WNB algorithm, the CA is used to optimize the attribute weights of the
WNBMLC and to validate the weights of the three methods for conditional probability
fitting. The results obtained by the CA-WNB algorithm were compared to those by the
ordinary NBMLC algorithm, based on the abovementioned design rules and experimental
evaluation criteria for classification functions. As the Gaussian and Cauchy approaches
attempt to model the continuous attributes of each dataset by fitting their probability
curves, and the NBMLC results associated with these approaches are similar, we used
Prediction Methods 1 and 2 to compare the experimental results corresponding to the
CA-WNB and NBMLC algorithms with Gaussian and Cauchy fitting. These comparisons
were also conducted between the CA-WNB and NBMLC algorithms with the discretization
approach, with varying numbers of discretization intervals (num = 10 and num = 20).

Gaussian and Cauchy Distribution


Table 6 [62] shows the experimental results of the best individuals produced by the
CA-WNB and NBMLC algorithms with Gaussian and Cauchy distribution, according to
Prediction Methods 1 and 2. The CA-WNB and NBMLC algorithms were applied to four
experimental datasets, and 10 trials were performed for each dataset. The MAX, MIN,
and AVE of these trials were then recorded. In Table 6, CA-WNB-P1 and CA-WNB-P2
represent the classification accuracies predicted by Prediction Methods 1 and 2, respectively,
for the individuals produced by the CA-WNB algorithm.
Table 6. Experimental results of NBMLC and CA-WNB for best value.

Gaussian Cauchy
Data Set Algorithm
MAX MIN AVE MAX MIN AVE
NBMLC 0.8732 0.8574 0.8622 0.8721 0.8554 0.8635
CAL_500 CA-WNB-P1 0.8893 0.8750 0.8813 0.8871 0.8737 0.8800
CA-WNB-P2 0.8897 0.8751 0.8825 0.8890 0.8744 0.8811
NBMLC 0.6976 0.6798 0.6884 0.6976 0.6787 0.6892
emotions CA-WNB-P1 0.8059 0.7853 0.7938 0.8215 0.7850 0.8040
CA-WNB-P2 0.8115 0.7900 0.7993 0.8215 0.7869 0.8044
NBMLC 0.8239 0.8195 0.8212 0.8195 0.8098 0.8151
scene CA-WNB-P1 0.8693 0.8564 0.8630 0.8841 0.8652 0.8732
CA-WNB-P2 0.8714 0.8592 0.8654 0.8848 0.8615 0.8744
NBMLC 0.7749 0.7636 0.7688 0.7739 0.7688 0.7673
yeast CA-WNB-P1 0.8045 0.7787 0.7901 0.8129 0.7873 0.7948
CA-WNB-P2 0.8051 0.7851 0.7933 0.8126 0.7831 0.7952

Tables 7–9 [31] compare the experimental results of the top-10, top-20, and top-30
topological rankings of Prediction Methods 1 and 2 between the individuals produced by
the CA-WNB and NBMLC algorithms with Gaussian and Cauchy distribution.
Symmetry 2021, 13, 322 14 of 20

Table 7. Experimental results of NBMLC and CA-WNB for top-10 value.

Gaussian Cauchy
Data Set Algorithm
MAX MIN AVE MAX MIN AVE
NBMLC 0.8732 0.8574 0.8622 0.8721 0.8554 0.8635
CAL_500 CA-WNB-P1 0.8885 0.8697 0.8821 0.8884 0.8727 0.8801
CA-WNB-P2 0.8897 0.8716 0.8829 0.8890 0.8744 0.8811
NBMLC 0.6976 0.6798 0.6884 0.6976 0.6787 0.6892
emotions CA-WNB-P1 0.8012 0.7799 0.7939 0.8224 0.7822 0.8039
CA-WNB-P2 0.8143 0.7900 0.8011 0.8271 0.7869 0.8070
NBMLC 0.8239 0.8195 0.8212 0.8195 0.8098 0.8151
scene CA-WNB-P1 0.8698 0.8592 0.8647 0.8836 0.8638 0.8725
CA-WNB-P2 0.8714 0.8592 0.8656 0.8848 0.8626 0.8746
NBMLC 0.7749 0.7636 0.7688 0.7739 0.7688 0.7673
yeast CA-WNB-P1 0.8040 0.7898 0.7953 0.8115 0.7875 0.7943
CA-WNB-P2 0.8051 0.7851 0.7938 0.8126 0.7831 0.7941

Table 8. Experimental results of NBMLC and CA-WNB for top-20 value.

Gaussian Cauchy
Data Set Algorithm
MAX MIN AVE MAX MIN AVE
NBMLC 0.8732 0.8574 0.8622 0.8721 0.8554 0.8635
CAL_500 CA-WNB-P1 0.8884 0.8690 0.8823 0.8884 0.8740 0.8801
CA-WNB-P2 0.8903 0.8717 0.8832 0.8890 0.8744 0.8812
NBMLC 0.6976 0.6798 0.6884 0.6976 0.6787 0.6892
emotions CA-WNB-P1 0.8021 0.7843 0.7939 0.8231 0.7812 0.8071
CA-WNB-P2 0.8143 0.7900 0.8013 0.8271 0.7869 0.8088
NBMLC 0.8239 0.8195 0.8212 0.8195 0.8098 0.8151
scene CA-WNB-P1 0.8714 0.8573 0.8653 0.8839 0.8620 0.8724
CA-WNB-P2 0.8714 0.8592 0.8658 0.8848 0.8626 0.8750
NBMLC 0.7749 0.7636 0.7688 0.7739 0.7688 0.7673
yeast CA-WNB-P1 0.8060 0.7868 0.7955 0.8109 0.7824 0.7927
CA-WNB-P2 0.8091 0.7913 0.7972 0.8139 0.7831 0.7943

Tables 10 and 11 present the average classification accuracy of the weighting combina-
tions corresponding to the final-generation individuals whose fitness values ranked in the
top-10, top-20, and top-30, as yielded by Prediction Methods 1 and 2, in the classification
of the four experimental datasets; Gaussian and Cauchy distribution were used to model
conditional probability. These tables also show the percentage by which the CA-WNB
algorithm improves upon the classification accuracies of the NBMLC algorithm, according
to Prediction Methods 1 and 2. The bolded entries in these tables indicate the classification
accuracy obtained by the best individual. The bottom rows of these tables present the
average classification accuracy of the three algorithms. CA-WNB-P1 and CA-WNB-P2 are
the average classification accuracies obtained using the CA-WNB algorithm, according to
Prediction Methods 1 and 2.
Symmetry 2021, 13, 322 15 of 20

Table 9. Experimental results of NBMLC and CA-WNB for top-30 value.

Gaussian Cauchy
Data Set Algorithm
MAX MIN AVE MAX MIN AVE
NBMLC 0.8732 0.8574 0.8622 0.8721 0.8554 0.8635
CAL_500 CA-WNB-P1 0.8888 0.8695 0.8822 0.8890 0.8722 0.8793
CA-WNB-P2 0.8893 0.8716 0.8829 0.8890 0.8722 0.8800
NBMLC 0.6976 0.6798 0.6884 0.6976 0.6787 0.6892
emotions CA-WNB-P1 0.8031 0.7797 0.7937 0.8187 0.7812 0.7984
CA-WNB-P2 0.8059 0.7853 0.7938 0.8215 0.7812 0.8040
NBMLC 0.8239 0.8195 0.8212 0.8195 0.8098 0.8151
scene CA-WNB-P1 0.8702 0.8580 0.8654 0.8788 0.8594 0.8701
CA-WNB-P2 0.8693 0.8564 0.8660 0.8848 0.8626 0.8735
NBMLC 0.7749 0.7636 0.7688 0.7739 0.7688 0.7673
yeast CA-WNB-P1 0.8104 0.7870 0.7951 0.8087 0.7814 0.7904
CA-WNB-P2 0.8045 0.7850 0.7921 0.8126 0.7792 0.7932

Table 10. Experiment results of NBMLC and CA-WNB with Gaussian distribution.

Average Classification Accuracy Improved Percentage


Data Set Algorithm
NBMLC CA-WNB-P1 CA-WNB-P2 CA-WNB-P1 CA-WNB-P2
best 0.8813 0.8825 2.22% 2.35%
Top 10 0.8821 0.8829 2.31% 2.40%
CAL500 0.8622
Top 20 0.8823 0.8832 2.33% 2.43%
Top 30 0.8822 0.8829 2.32% 2.41%
best 0.7938 0.7993 15.32% 16.11%
Top 10 0.7939 0.8011 15.33% 16.38%
emotions 0.6884
Top 20 0.7939 0.8013 15.33% 16.41%
Top 30 0.7937 0.7938 15.31% 15.32%
best 0.8647 0.8656 5.30% 5.41%
Top 10 0.8653 0.8658 5.37% 5.43%
scene 0.8212
Top 20 0.8654 0.8660 5.38% 5.46%
Top 30 0.8630 0.8654 5.09% 5.39%
best 0.7901 0.7933 2.77% 3.19%
Top 10 0.7953 0.7938 3.44% 3.24%
yeast 0.7688
Top 20 0.7955 0.7972 3.47% 3.69%
Top 30 0.7951 0.7921 3.42% 3.03%
Mean 0.7851 0.8336 0.8354 6.54% 6.79%
Symmetry 2021, 13, 322 16 of 20

Table 11. Experiment results of NBMLC and CA-WNB with Cauchy distribution.

Average Classification Accuracy Improved Percentage


Data Set Algorithm
NBMLC CA-WNB-P1 CA-WNB-P2 CA-WNB-P1 CA-WNB-P2
best 0.8800 0.8811 1.91% 2.04%
Top 10 0.8801 0.8811 1.92% 2.04%
CAL500 0.8635
Top 20 0.8801 0.8812 1.92% 2.04%
Top 30 0.8793 0.8800 1.83% 1.91%
best 0.8040 0.8044 16.65% 16.71%
Top 10 0.8039 0.8070 16.63% 17.08%
emotions 0.6892
Top 20 0.8071 0.8088 17.09% 17.34%
Top 30 0.7984 0.8040 15.84% 16.65%
best 0.8732 0.8744 7.13% 7.28%
Top 10 0.8725 0.8746 7.04% 7.30%
scene 0.8151
Top 20 0.8724 0.8750 7.04% 7.35%
Top 30 0.8701 0.8735 6.75% 7.17%
best 0.7948 0.7952 3.59% 3.64%
Top 10 0.7943 0.7941 3.53% 3.49%
yeast 0.7673
Top 20 0.7927 0.7943 3.32% 3.52%
Top 30 0.7904 0.7932 3.01% 3.38%
Mean 0.7838 0.8371 0.8389 7.20% 7.43%

It is apparent that the average accuracy obtained by the CA-WNB algorithm is superior
to that of the NBMLC algorithm. However, this is obtained at the expense of computation
Symmetry 2021, 13, x FOR PEER REVIEW
time. The CA-WNB algorithm iteratively optimizes attribute weights prior to the prediction 18 of 21
of class labels so as to weaken the effects of the naïve conditional independence assumption.
Even if one overlooks the effects of having multiple prediction methods and different
training
the CA-WNBand testalgorithm
dataset sizes on the
is still NP computation
× MAXGEN time,timesthe time than
higher complexity
that ofof theNBMLC
the CA-WNB al-
algorithm is still NP × MAXGEN times higher than that of the NBMLC algorithm
gorithm (NP is the population size, and MAXGEN is the maximum number of evolutions). (NP is the
population
The running size,
times MAXGEN
andof the CA-WNB is the and
maximum
NBMLC number of evolutions).
algorithms (under theThe running
same times
conditions
of the CA-WNB and NBMLC algorithms (under the same conditions
and environment as in the abovementioned experiments) are shown in Figure and environment as 11.
in
the abovementioned
NBMLC-Gau experiments) are
and NBMLC-Cau are shown in Figure
the running 11. of
times NBMLC-Gau
the NBMLC andalgorithm
NBMLC-Cau with
are the running times of the NBMLC algorithm with Gaussian
Gaussian and Cauchy distribution, respectively. Meanwhile, CA-WNB-Gau and CA-and Cauchy distribution,
respectively.
WNB-Cau are Meanwhile,
the running CA-WNB-Gau and CA-WNB-Cau
times of the CA-WNB algorithmare theGaussian
with running times of the
and Cauchy
CA-WNB algorithm with
distribution, respectively. Gaussian and Cauchy distribution, respectively.

Figure11.
Figure 11.Computation
Computationtimes
timesof
ofNBMLC
NBMLCand
andCA-WNB
CA-WNBalgorithms.
algorithms.

The experimental results demonstrate the following:


(1) Based on Figure 11, the CA-WNB algorithm consumes a significantly longer compu-
tation time than the NBMLC algorithm.
(2) If one omits computational cost, the CA-WNB algorithm is evidently superior to the
Symmetry 2021, 13, 322 17 of 20

The experimental results demonstrate the following:


(1) Based on Figure 11, the CA-WNB algorithm consumes a significantly longer compu-
tation time than the NBMLC algorithm.
(2) If one omits computational cost, the CA-WNB algorithm is evidently superior to the
simple NBMLC algorithm in terms of classification performance. The percentage
improvement in classification accuracy with the CA-WNB algorithm was most pro-
nounced in the “emotions” dataset, followed by the “scene”, “yeast”, and CAL500
datasets. An analysis of the characteristics of these datasets revealed that the use of
stratified sampling to disrupt the “emotions” and “scene” datasets did not signifi-
cantly affect their classification accuracies. Meanwhile, the effects of this operation on
the CAL500 and “yeast” datasets are more pronounced. This illustrates that the “emo-
tions” and “scene” datasets have relatively uniform distributions of data, whereas
the CAL500 and “yeast” datasets have highly nonuniform data distributions. If the
training dataset is strongly representative of the dataset, the classification efficacy
of the algorithm will be substantially improved by the training process. Otherwise,
the improvement in classification efficacy will not be very significant. Based on this
observation, weighted naïve Bayes can be used to optimize classification efficacy if
the distribution of the dataset is relatively uniform and if the time requirements of
the experiment are not particularly stringent.
(3) A comparison of Tables 10 and 11 reveals that the improvement in the average
classification accuracy yielded by Prediction Method 2 (6.79% and 7.43%) is always
higher than that yielded by Prediction Method 1 (6.54% and 7.20%) regardless of
whether Gaussian or Cauchy fitting is used. Furthermore, the improvement in average
classification accuracy is always higher with the Gaussian fitting (7.20% and 7.43%)
than with the Cauchy fitting (6.54% and 6.79%). A comparison of the results with
Cauchy fitting and Gaussian fitting reveals that the results of the CA-WNB algorithm
with Cauchy fitting varied substantially. Moreover, the weights obtained by the
CA also exert a higher impact on the Cauchy fitting. Therefore, the results with
Cauchy fitting are unstable to a certain degree. Nonetheless, the average classification
accuracy of the CA-WNB algorithm with Cauchy fitting is superior to that of the
CA-WNB algorithm with Gaussian fitting.
(4) Tables 10 and 11 reveal that the highest average classification accuracies for the four
experimental datasets were obtained by individuals with Ranks 10–20, according to
Prediction Method 2. Conversely, the worst classification accuracies were obtained
by individuals with Ranks 20–30. It is established that the weights that have the best
fit with training set instances may not be the optimal weights for classifying test set
instances. After a certain number of evolutions in the population, excessive fitting may
have occurred because the fitting curves were adjusted too finely, thereby resulting
in less-than-ideal classification accuracies for the test set’s instances. Consequently,
the fitting curves produced by the weights of individuals with Ranks 10–20 were
better for classifying the instances of the test set.

5. Conclusions
In this paper, we study the multilabel classification problem. This paper presents the
algorithm framework of naïve Bayes multilabel classification and analyzes and compares
the effects of three common fitting methods of continuous attributes on the classification
performance of the naïve Bayes multilabel classification algorithm from the perspective
of average classification accuracy and algorithm time cost. On this basis, the framework
of weighted naïve Bayes multilabel classification is given, the determination of weights is
regarded as an optimization problem, the cultural algorithm is introduced to search for
and determine the optimal weight, and the weighted naïve Bayes multilabel classification
algorithm based on the cultural algorithm is proposed. In this algorithm, the classification
accuracy obtained by substituting the current individual in the weighted naïve Bayes
multilabel classification is taken as the objective function, and the attribute dimension
Symmetry 2021, 13, 322 18 of 20

represents the individual dimension. Each one-dimensional variable in the individual


represents the weight of the corresponding dimension of the attribute, and the coding test
and verification are carried out by real numbers to obtain a prediction strategy that is more
suitable for the algorithm. Experimental results show that the algorithm proposed in this
paper is superior to similar algorithms in classification performance when the time cost is
not considered.

Author Contributions: Conceptualization, Q.W. and B.W.; Data curation, Q.W. and C.H.; Investi-
gation, C.H.; Methodology, Q.W.; Software, C.H.; Visualization, X.Y.; Writing—original draft, Q.W.;
Writing—review & editing, X.Y. All authors have read and agreed to the published version of
the manuscript.
Funding: This research was funded by Natural Science Foundation of China (U1911205 and 62073300),
the Fundamental Research Funds for the Central Universities, China University of 615 Geosciences
(Wuhan) (CUGGC03) and the Fundamental Research Funds for the Central Univer-616 sities,
JLU (93K172020K18).
Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.
Data Availability Statement: Not applicable.
Acknowledgments: This paper is supported by the Natural Science Foundation of China (U1911205
and 62073300), China University of Geosciences (Wuhan) (CUGGC03), and the Fundamental Research
Funds for the Central Universities (JLU; 93K172020K18).
Conflicts of Interest: The authors declare no conflict of interest.

References
1. Tsoumakas, G.; Katakis, I.; Vlahavas, I. Mining multi-label data. In Data Mining and Knowledge Discovery Handbook; Springer:
Boston, MA, USA, 2010; pp. 667–685.
2. Streich, A.P.; Buhmann, J.M. Classification of multi-labeled data: A generative approach. In Machine Learning and Knowledge
Discovery in Databases; Springer: Berlin/Heidelberg, Germany, 2008; pp. 390–405.
3. Kazawa, H.; Izumitani, T.; Taira, H.; Maeda, E. Maximal margin labeling for multi-topic text categorization. In Advances in Neural
Information Processing Systems; MIT Press: Vancouver, BC, Canada, 2004; pp. 649–656.
4. Snoek, C.G.; Worring, M.; Van Gemert, J.C.; Geusebroek, J.M.; Smeulders, A.W. The challenge problem for automated detection
of 101 semantic concepts in multimedia. In Proceedings of the 14th annual ACM International Conference on Multimedia,
Santa Barbara, CA, USA, 23–27 October 2006; ACM: New York, NY, USA, 2006; pp. 421–430.
5. Vens, C.; Struyf, J.; Schietgat, L.; Džeroski, S.; Blockeel, H. Decision trees for hierarchical multi-label classification. Mach. Learn.
2008, 73, 185–214. [CrossRef]
6. Boutell, M.R.; Luo, J.; Shen, X.; Brown, C.M. Learning multi-label scene classification. Pattern Recognit. 2004, 37, 1757–1771.
[CrossRef]
7. Xia, Y.; Chen, K.; Yang, Y. Multi-Label Classification with Weighted Classifier Selection and Stacked Ensemble. Inf. Sci. 2020.
[CrossRef]
8. Qian, W.; Xiong, C.; Wang, Y. A ranking-based feature selection for multi-label classification with fuzzy relative discernibility.
Appl. Soft Comput. 2021, 102, 106995. [CrossRef]
9. Yao, Y.; Li, Y.; Ye, Y.; Li, X. MLCE: A Multi-Label Crotch Ensemble Method for Multi-Label Classification. Int. J. Pattern Recognit.
Artif. Intell. 2020. [CrossRef]
10. Yang, B.; Tong, K.; Zhao, X.; Pang, S.; Chen, J. Multilabel Classification Using Low-Rank Decomposition. Discret. Dyn. Nat. Soc.
2020, 2020, 1–8. [CrossRef]
11. Kumar, A.; Abhishek, K.; Kumar Singh, A.; Nerurkar, P.; Chandane, M.; Bhirud, S.; Busnel, Y. Multilabel classification of remote
sensed satellite imagery. Trans. Emerg. Telecommun. Technol. 2020, 4, 118–133. [CrossRef]
12. Huang, S.J.; Li, G.X.; Huang, W.Y.; Li, S.Y. Incremental Multi-Label Learning with Active Queries. J. Comput. Sci. Technol. 2020,
35, 234–246. [CrossRef]
13. Zhang, M.L.; Peña, J.M.; Robles, V. Feature selection for multi-label naive Bayes classification. Inf. Sci. 2009, 179, 3218–3229.
[CrossRef]
14. De Carvalho, A.C.; Freitas, A.A. A tutorial on multi-label classification techniques. Found. Comput. Intell. 2009, 5, 177–195.
15. Spyromitros, E.; Tsoumakas, G.; Vlahavas, I. An empirical study of lazy multilabel classification algorithms. In Artificial Intelligence:
Theories, Models and Applications; Springer: Berlin/Heidelberg, Germany, 2008; pp. 401–406.
Symmetry 2021, 13, 322 19 of 20

16. Rousu, J.; Saunders, C.; Szedmak, S.; Shawe-Taylor, J. Kernel-based learning of hierarchical multilabel classification models.
J. Mach. Learn. Res. 2006, 7, 1601–1626.
17. Yang, Y.; Chute, C.G. An example-based mapping method for text categorization and retrieval. ACM Trans. Inf. Syst. (TOIS) 1994,
12, 252–277. [CrossRef]
18. Grodzicki, R.; Mańdziuk, J.; Wang, L. Improved multilabel classification with neural networks. Parallel Probl. Solving Nat. Ppsn X
2008, 5199, 409–416.
19. Gonçalves, E.C.; Freitas, A.A.; Plastino, A. A Survey of Genetic Algorithms for Multi-Label Classification. In Proceedings of the
2018 IEEE Congress on Evolutionary Computation (CEC), Rio de Janeiro, Brazil, 29 January 2018; pp. 1–8.
20. McCallum, A.; Nigam, K. A comparison of event models for naive bayes text classification. AAAI-98 Workshop Learn. Text Categ.
1998, 752, 41–48.
21. Gao, S.; Wu, W.; Lee, C.H.; Chua, T.S. A MFoM learning approach to robust multiclass multi-label text categorization. In Proceedings
of the Twenty-First International Conference on Machine Learning; ACM: New York, NY, USA, 2004; pp. 329–336.
22. Ghamrawi, N.; McCallum, A. Collective multi-label classification. In Proceedings of the 14th ACM International Conference on
Information and Knowledge Management; ACM: New York, NY, USA, 2005; pp. 195–200.
23. Zhang, M.L.; Zhou, Z.H. ML-KNN: A lazy learning approach to multi-label learning. Pattern Recognit. 2007, 40, 2038–2048.
[CrossRef]
24. Xu, X.S.; Jiang, Y.; Peng, L.; Xue, X.; Zhou, Z.H. Ensemble approach based on conditional random field for multi-label image and
video annotation. In Proceedings of the 19th ACM International Conference on Multimedia; ACM: New York, NY, USA, 2011;
pp. 1377–1380.
25. Qu, G.; Zhang, H.; Hartrick, C.T. Multi-label classification with Bayes’ theorem. In Proceedings of the 2011 4th International
Conference on Biomedical Engineering and Informatics (BMEI), Shanghai, China, 15–17 October 2011; pp. 2281–2285.
26. Wu, J.; Cai, Z. Attribute weighting via differential evolution algorithm for attribute weighted naive bayes (wnb). J. Comput.
Inf. Syst. 2011, 7, 1672–1679.
27. Wu, J.; Cai, Z. A naive Bayes probability estimation model based on self-adaptive differential evolution. J. Intell. Inf. Syst. 2014,
42, 671–694. [CrossRef]
28. Sucar, L.E.; Bielza, C.; Morales, E.F.; Hernandez-Leal, P.; Zaragoza, J.H.; Larrañaga, P. Multi-label classification with Bayesian
network-based chain classifiers. Pattern Recognit. Lett. 2014, 41, 14–22. [CrossRef]
29. Reyes, O.; Morell, C.; Ventura, S. Evolutionary feature weighting to improve the performance of multi-label lazy algorithms.
Integr. Comput. Aided Eng. 2014, 21, 339–354. [CrossRef]
30. Lee, J.; Kim, D.W. Memetic feature selection algorithm for multi-label classification. Inf. Sci. 2015, 293, 80–96. [CrossRef]
31. Yan, X.; Wu, Q.; Sheng, V.S. A Double Weighted Naive Bayes with Niching Cultural Algorithm for Multi-Label Classification.
Int. J. Pattern Recognit. Artif. Intell. 2016, 30, 1–23. [CrossRef]
32. Wu, Q.; Liu, H.; Yan, X. Multi-label classification algorithm research based on swarm intelligence. Clust. Comput. 2016,
19, 2075–2085. [CrossRef]
33. Zhang, Y.; Gong, D.W.; Sun, X.Y.; Guo, Y.N. A PSO-based multi-objective multi-label feature selection method in classification.
Sci. Rep. 2017, 7, 376. [CrossRef] [PubMed]
34. Wu, Q.; Wang, H.; Yan, X.; Liu, X. MapReduce-based adaptive random forest algorithm for multi-label classification.
Neural Comput. Appl. 2019, 31, 8239–8252. [CrossRef]
35. Moyano, J.M.; Gibaja, E.L.; Cios, K.J.; Ventura, S. An evolutionary approach to build ensembles of multi-label classifiers. Inf. Fusion
2019, 50, 168–180. [CrossRef]
36. Guo, Y.N.; Zhang, P.; Cheng, J.; Wang, C.; Gong, D. Interval Multi-objective Quantum-inspired Cultural Algorithms.
Neural Comput. Appl. 2018, 30, 709–722. [CrossRef]
37. Yan, X.; Zhu, Z.; Hu, C.; Gong, W.; Wu, Q. Spark-based intelligent parameter inversion method for prestack seismic data.
Neural Comput. Appl. 2019, 31, 4577–4593. [CrossRef]
38. Wu, B.; Qian, C.; Ni, W.; Fan, S. The improvement of glowworm swarm optimization for continuous optimization problems.
Expert Syst. Appl. 2012, 39, 6335–6342. [CrossRef]
39. Lu, C.; Gao, L.; Li, X.; Zheng, J.; Gong, W. A multi-objective approach to welding shop scheduling for makespan, noise pollution
and energy consumption. J. Clean. Prod. 2018, 196, 773–787. [CrossRef]
40. Wu, Q.; Zhu, Z.; Yan, X.; Gong, W. An improved particle swarm optimization algorithm for AVO elastic parameter inversion
problem. Concurr. Comput. Pract. Exp. 2019, 31, 1–16. [CrossRef]
41. Yu, P.; Yan, X. Stock price prediction based on deep neural network. Neural Comput. Appl. 2020, 32, 1609–1628. [CrossRef]
42. Gong, W.; Cai, Z. Parameter extraction of solar cell models using repaired adaptive differential evolution. Solar Energy 2013,
94, 209–220. [CrossRef]
43. Wang, F.; Li, X.; Zhou, A.; Tang, K. An estimation of distribution algorhim for mixed-variable Newsvendor problems. IEEE Trans.
Evol. Comput. 2020, 24, 479–493.
44. Wang, G.G. Improving Metaheuristic Algorithms with Information Feedback Models. IEEE Trans. Cybern. 2017, 99, 1–14.
[CrossRef] [PubMed]
45. Yan, X.; Li, P.; Tang, K.; Gao, L.; Wang, L. Clonal Selection Based Intelligent Parameter Inversion Algorithm for Prestack Seismic
Data. Inf. Sci. 2020, 517, 86–99. [CrossRef]
Symmetry 2021, 13, 322 20 of 20

46. Yan, X.; Yang, K.; Hu, C.; Gong, W. Pollution source positioning in a water supply network based on expensive optimization.
Desalination Water Treat. 2018, 110, 308–318. [CrossRef]
47. Wang, R.; Zhou, Z.; Ishibuchi, H.; Liao, T.; Zhang, T. Localized weighted sum method for many-objective optimization. IEEE Trans.
Evol. Comput. 2018, 22, 3–18. [CrossRef]
48. Lu, C.; Gao, L.; Yi, J. Grey wolf optimizer with cellular topological structure. Expert Syst. Appl. 2018, 107, 89–114. [CrossRef]
49. Wang, F.; Zhang, H.; Zhou, A. A particle swarm optimization algorithm for mixed-variable optimization problems.
Swarm Evol. Comput. 2021, 60, 100808. [CrossRef]
50. Yan, X.; Zhao, J. Multimodal optimization problem in contamination source determination of water supply networks.
Swarm Evol. Comput. 2019, 47, 66–71. [CrossRef]
51. Yan, X.; Hu, C.; Sheng, V.S. Data-driven pollution source location algorithm in water quality monitoring sensor networks. Int. J.
Bio-Inspir Compu. 2020, 15, 171–180. [CrossRef]
52. Hu, C.; Dai, L.; Yan, X.; Gong, W.; Liu, X.; Wang, L. Modified NSGA-III for Sensor Placement in Water Distribution System.
Inf. Sci. 2020, 509, 488–500. [CrossRef]
53. Wang, R.; Li, G.; Ming, M.; Wu, G.; Wang, L. An efficient multi-objective model and algorithm for sizing a stand-alone hybrid
renewable energy system. Energy 2017, 141, 2288–2299. [CrossRef]
54. Li, S.; Gong, W.; Yan, X.; Hu, C.; Bai, D.; Wang, L. Parameter estimation of photovoltaic models with memetic adaptive differential
evolution. Solar Energy 2019, 190, 465–474. [CrossRef]
55. Yan, X.; Zhang, M.; Wu, Q. Big-Data-Driven Pre-Stack Seismic Intelligent Inversion. Inf. Sci. 2021, 549, 34–52. [CrossRef]
56. Wang, F.; Li, Y.; Liao, F.; Yan, H. An ensemble learning based prediction strtegy for dynamic multi-objective optimization.
Appl. Soft Comput. 2020, 96, 106592. [CrossRef]
57. Yan, X.; Li, T.; Hu, C. Real-time localization of pollution source for urban water supply network in emergencies. Clust. Comput.
2019, 22, 5941–5954. [CrossRef]
58. Reynolds, R.G. Cultural algorithms: Theory and applications. In New Ideas in Optimization; McGraw-Hill Ltd.: Berkshire, UK,
1999; pp. 367–378.
59. Reynolds, R.G.; Zhu, S. Knowledge-based function optimization using fuzzy cultural algorithms with evolutionary programming.
IEEE Trans. Syst. Man Cybern. Part B 2001, 31, 1–18. [CrossRef]
60. Zhang, H.; Sheng, S. Learning weighted naïve Bayes with accurate ranking. In Proceedings of the 4th IEEE International
Conference on Data Mining, Brighton, UK, 1–4 November 2004; pp. 567–570.
61. Xie, T.; Liu, R.; Wei, Z. Improvement of the Fast Clustering Algorithm Improved by K-Means in the Big Data. Appl. Math.
Nonlinear Sci. 2020, 5, 1–10. [CrossRef]
62. Yan, X.; Li, W.; Wu, Q.; Sheng, V.S. A Double Weighted Naive Bayes for Multi-label Classification. In International Symposium on
Computational Intelligence and Intelligent Systems; Springer: Singapore, 2015; pp. 382–389.

You might also like