2023 - A Two Stages Prediction Strategy For Evolutionary Dynamic Multi-Objective Optimization
2023 - A Two Stages Prediction Strategy For Evolutionary Dynamic Multi-Objective Optimization
https://doi.org/10.1007/s10489-022-03353-2
Abstract
In many engineering and scientific research processes, the dynamic multi-objective problems (DMOPs) are widely involved.
It’s a quite challenge, which involves multiple conflicting objects changing over time or environment. The main task of
DMOPs is tracking the Pareto front as soon as possible when the object changes over time. To accelerate the tracking process,
a two stages prediction strategy (SPS) for DMOPs is proposed. To improve the prediction accuracy, population prediction
is divided into center point prediction and manifold prediction when the change is detected. Due to the limitations of the
support vector machine, the new population is predicted by the combination of the elite solution in the previous environment
and Kalman filter in the early stage. Experimental results show that the proposed algorithm performs better on convergence
and distribution when dealing with nonlinear problems, especially in the problems where the environmental change occurs
frequently.
Keywords Dynamic multi-objective problems · Evolutionary algorithm · Kalman filter · Support vector machine
1. Change Detection: The function of this operator for solving DMOPs [16]. Both algorithms make use of
is to detect whether a change has occurred. If special points in the PF and the prediction model is linear.
so, the corresponding processing strategy will be The prediction strategy adopts a simple linear model to
adopted. Usually, there are two strategies to adopt: deal with historical data and predict the PF in the next
reevaluating solutions [2] and checking population environment. The prediction accuracy of the linear model
statistical information [27]. The first approach is easy can meet the requirements when the problem is simple. But
to implement but it assumes that there is no uncertain the accuracy of prediction is not satisfied for the Pareto
factor when evaluating objective functions. The second front when the changes are large and complex. To improve
strategy could overcome the shortcoming caused by the accuracy of population prediction, artificial intelligence
the uncertain factor, but the algorithm may need some is introduced into the strategy of environmental change,
additional parameters to evaluate objective function. and the least squares support vector machine (LSSVM) is
Most existing work still focuses on detecting whether proposed to quicken the speed and accuracy of tracking the
a change has happened or not. A good strategy should PF. Overall, the main contributions of this paper include:
further estimate the degree of changes, which might 1. A two stages prediction strategy is presented, which
help the next two steps. predicts new solutions according to the characteristics
2. Change Reaction: This part mainly focuses on making of different models and problems at different stages of
actions when a change is detected. The actions evolution.
include: 1) Memory maintenance: The individual 2. In the early stage of evolution with insufficient samples
points or information extracted from the current and poor convergence, the Kalman Filter (KF) model
population are added into the former population and is adopted for prediction. It shows fast speed and fewer
the old information are deleted; 2) Parameter tuning: samples, but low accuracy.
The adaption of algorithm parameters, such as the 3. In the later stage of evolution where more attention is
mutation probability, are employed; 3) Population re- paid to accuracy, the LSSVM is selected, which requires
initialization: The population needs to be re-initialized more samples but less iterations, and has a good fitting
when an environmental change is detected. The effect for linear and nonlinear problems.
following techniques are widely utilized: reuse the 4. Since the two models effectively exert their respective
previous elite population from the previous populations; advantages at different stages, the algorithm can quickly
mutate the previous population by heuristic method; track the front and maintain good distribution.
predict a new population by model [17, 18]; apply
The rest of this paper is organized as follows. Section
different crossover and mutation operators [12].
II presents the definitions of DMOPs. Section III describes
3. Problem Optimization: This part tackles the current
the prediction strategy in detail. Section IV introduces the
MOPs as a traditional optimization problem. An MOEA
test instances and performance metrics. Section V analyzes
developed for solving stationary MOPs is usually
the experimental results. Section VI outlines the conclusions
applied directly or with a little modifications [5]. The
and suggestions for future research.
task of these modifications is to enhance the diversity
of the population using various techniques [20],
including random migration, which mainly includes
randomly generation points into the population in each
2 Related definitions
generation; recrudescence, which generates solutions
According to the essential feature of the uncertain factors in
with high crossover and mutation probabilities [6];
optimization problems from nature, DMOPs are classified
memory mechanisms, which maintains a portion of
into different categories [11]. We focus on the following
dominated solutions; multiple populations or parallel
type of DMOPs in this paper:
computing.
In the above three components, the change reaction is
min F(x, t) = [f1 (x, t), f2 (x, t), ..., fm (x, t)]
the core of the DMOPs. At present, most of the prediction x∈Rn (1)
methods are used to generate new populations to predict the s.t.ai ≤ xi ≤ bi
population as close as possible to the true PF [1]. Qingya
Li proposed a predictive strategy based on special points for where F(x, t) consists of m real-valued objective functions,
evolutionary dynamic multi-objective optimization [14], Fei each of which is continuous with respect to x over [a, b],
Zou. et al proposed a knee-guided prediction approach for m is the number of objectives, t represents the time index,
dynamic multi-objective optimization [29] and Xuemin Ma Rn is the decision space, x represents the decision vector, n
et al. proposed a feature information prediction algorithm is the number of decision variables, and [ai , bi ] represents
A two stages prediction strategy... 1117
w + γ −1 I represents correlative matrix, and let B ≡ w + the error covariance estimate, z denotes the measurement
γ −1 I, so: of the state vector. The process and measurement noise
T covariance matrices are Q and R, respectively, K is the KF
0 1̄ b 0
= (13) gain. A measurement z ∈ Rm that is given by
1̄ B a Y
To sum up, we can get:
zt = xt + vt (22)
1̄T B−1 Y
b= (14) p(v) ∼ N(0, R) (23)
1̄T B−1 1̄
The current estimates are obtained using the previous
a = B−1 (Y − b1̄) (15) predictions and the current observation. KF designed for
Combined with (14) and (15), we can summarize: prediction has two variants: 2 − D KF (2 by 2 KF) and
3 − D KF (3 by 3 KF). There is no control inputs in
l
this system. Moreover the process is Gaussian noise of
Y = wϕ(X) + b = aj K(Xj , X) + b (16)
j =1
zero mean, and the observation noise is Gaussian noise of
assumed variance. R and Q can be calculated [18].
Thus, the center point at time t + 1 can be predicted.
Taking the center point at time t as training samples for 3.3 PS manifold estimation
LSSVM, the center point at the next moment is predicted
by (16). According to (14) and (15), the parameter b and a Ct+1 is calculated by
Ct and
Ct−1 [26], and the process is as
in (16) are calculated. follows:
The prediction accuracy of the LSSVM is closely related εim˜N(0, σim ) (25)
to the number of training samples. If the sample size of
the predicted object is too small, it may not be accurate, 1 2
σim = D(C̃t , C̃t−1 ) (26)
even if all the samples are supported vectors. Therefore, n
the prediction of LSSVM will be biased when the learning where xt ∈ Ct , i = 1, 2, ..., n, n is the decision variable,
sample is less than one complete learning cycle. However, D(A, B) represents the distance between manifolds A and
the KF can estimate the state of a process without any B, defined as:
such learning time and correct itself based on subsequently
1
made measurements. Thus we take the KF to predict the D(A, B) = min ||x − y|| (27)
shortage of learning samples. Due to space limitations, only |A| y∈B
x∈A
the main formulas of Kalman prediction are given herein, where |A| represents the number of individuals in popula-
the prediction specific calculation process reference [18]. tion A, x − y is the Euclidean distance between x and
The simple process of calculation is as follows. y.
The equations for the time update step are:
3.4 SPS process
x−
t = A
xt−1 (17)
The SPS procedure can be integrated in the MOEA
P−
t = APt−1 A + Q
T
(18) framework. It targets to initialize a new population Pt when
a change happens at the beginning of time t. In the SPS
And the equations for the measurement update step are:
strategy, the LSSVM model requires sufficient samples
−1
to complete the prediction calculation, so the process is
Kt = P− −
t (Pt + R) (19) divided into two stages. When the sample number is less
than 2p, the historical center samples are insufficient. Some
x−
xt = x−
t + Kt (zt − t ) (20) new individuals are directly generated by the previous
generation, while some individuals are generated by the KF
Pt = (I − Kt )P−
t (21)
prediction model. When the sample number is greater than
where A is the state-transition matrix that relates the state 2p, the number of samples is enough to adopt the LSSVM
at the previous time step t − 1 to the state at the current model. The SPS procedure is summarized in Table 2. The
step t. x is the state vector to be estimated by the KF, P is input dimension of the LSSVM is p. the max number of
1120 H. Sun et al.
1
MI GD = I GD(Pt∗ , Pt ) (30)
|T |
t∈T
F1 [0, 1] × [−1, 1]n−1 f (x, t) = x1 , f2 (x, t) = g(1 − PF is fixed but PS change, two objectives.
1 n
f1
g ), g = 1 + (xi − G(t))2 , G =
i=2
sin(0.5πt), t = n1T ττT , P S(t) : 0 ≤
x1 ≤ 1, xi√= G,for i = 2, ..., n, P F (t) :
f2 = 1 − f1 , 0 ≤ f1 ≤ 1
F2 [0, 1] × [−1, 1]n−1 f1 (x, t) = x1 , f2 (x, t) = g(1 − PF changes but PS is fixed,two objectives.
H n
( fg1 ) ), g = 1+9 H = 1.25 +
xi2
i=2
0.75 sin(0.5πt), t = n1T τ
τT P S(t) :
0 ≤ x1 ≤ 1, xi = 0,for i = 2, ..., n,
P F (t) : f2 = 1 − f1H , 0 ≤ f1 ≤ 1
F3 [0, 1] × [−1, 1]n−1 f1 (x, t) = x1 , f2 (x, t) = g(1 − PF changes and PS changes,two objectives.
H n
( fg1 ) ), g = 1 + (xi − G)2
i=2
G = sin(0.5πt)H = 1.25 +
0.75 sin(0.5πt), t = n1T ττT P S(t) :
0 ≤ x1 ≤ 1, xi = G,for i = 2, ..., n,
P F (t) : f2 = 1 − f1H , 0 ≤ f1 ≤ 1
F5 [0, 5]n f1 (x, t) = |x1 − a|H + yi2 , f2 (x, t) = PF changes and PS changes,two objectives.
i∈I1
|x1 − a − 1|H + yi2 yi = xi − b −
i∈I2
H + ni
1+|x1 −a| , H = 1.25+0.75 sin(πt)
a =2 cos(πt)
+ 2b = 2 sin(2πt) + 2, t =
1
nT
τ
τT I1 = {i|1 ≤ i ≤ n} ,i is odd
,I2 = {i|1 ≤ i ≤ n},i is even. P S(t) : a ≤
i
x1 ≤ a + 1, xi = b + 1 − |x1 − a|H + n ,for
i = 2, ..., n P F (t) : f1 = s , f2 =
H
(1 − s)H , 0 ≤ s ≤ 1
F6 [0, 5]n f1 (x, t) = |x1 − a|H + yi2 , f2 (x, t) = PF changes and PS changes,two objectives.
i∈I1
|x1 − a − 1|H + yi2 yi = xi − b −
i∈I2
i
1+|x1 −a|H + n , H = 1.25+0.75 sin(πt)
a = 2 cos(1.5πt) sin(0.5πt) + 2b =
2 cos(1.5πt) cos(0.5πt) + 2t = n1T ττT
I1 = {i|1 ≤ i ≤ n}, i is odd,I2 = {i|1 ≤
i ≤ n} ,i is even P S(t) : a ≤ x1 ≤
i
a + 1, xi = b + 1 − |x1 − a|H + n ,for
i = 2, ..., n P F (t) : f1 = s , f2 =
H
(1 − s)H , 0 ≤ s ≤ 1
F7 [0, 5]n f1 (x, t) = |x1 − a|H + yi2 , f2 (x, t) = PF changes and PS changes, two objectives.
i∈I1
|x1 − a − 1|H + yi2 yi = xi − b −
i∈I2
i
1+|x1 −a|H + n , H = 1.25+0.75 sin(πt)
a = 1.7(1 − sin(πt)) sin(πt) + 3.4,
b = 1.4(1
− sin(πt)) cos(πt) + 2.1, t =
τT I1 = {i|1 ≤ i ≤ n},i is odd,I2 =
1 τ
nT
{i|1 ≤ i ≤ n},i is even P S(t) : a ≤
i
x1 ≤ a + 1, xi = b + 1 − |x1 − a|H + n ,for
i = 2, ..., n P F (t) : f1 = s , f2 =
H
(1 − s)H , 0 ≤ s ≤ 1
1122 H. Sun et al.
Fig. 3 The center points prediction comparison simulation figure of F5-F7 when nT = 15
F7 is shown in Fig. 7. MOEA/D-KF takes the KF model chart of F5-F7 is given in this paper, the IGD trend chart
which is just a simple linear filtering model, can provide of F1-F3 is shown in the addenda. In F1-F3 PPS gets a
reasonable performance as a prediction method. MRCDMO little better performance than SPS. SPS performs as well
adopts multireginal center points to predict solutions when as PPS in a linear function. And the IGD in SPS performs
environmental changes occur. However, they will show the much better than that in PPS on F5-F7 when nT =10, and
same shortage as the PPS when dealing with the nonlinear 5. It is not difficult to explain why SPS performs less than
problems, so we only analyze the SPS and PPS in detail. PPS on F1-F3. The AR(n) model adopted in PPS is a linear
The population predicted by SPS gets a small IGD when model. And the functions of F1-F3 are linear functions, F5-
nT =10, and 5. Due to the limitation of space, the IGD trend F7 are nonlinear. So linear models predict linear functions
Fig. 4 The center points prediction comparison simulation figure of F5-F7 when nT = 10
1124 H. Sun et al.
Fig. 5 The center points prediction comparison simulation figure of F5-F7 when nT = 5
are certainly accurate. On the contrary, the LSSVM model Compared with PPS, the MIGD of SPS is smaller,
shows many unique advantages in solving small sample, which means SPS has higher convergence accuracy and
nonlinear, and high dimensional pattern recognition. wider distribution. The variance of SPS is also smaller
The comparison between the approximate PF and the true which shows that the computational stability of SPS is
PF obtained by the MOEA/D-KF, PPS, SPS is presented in high.
Figs. 8 and 9. For F3, the PF and PS both change, which is the same to
Although there is a small beat on F1, F3 in the early stage, F1. IGD appears a big beat after convergence in SPS. It is
it is much better than that in PPS of F2, F5-F7. Convergence not difficult to explain: in the early stage of the algorithm,
precision of F2, F5-F7 obtained by SPS is better than that the function changes less frequently, but the approximate PF
by PPS and MOEA/D-KF. From Fig. 8, it is observed that of storage is different from the true PF. Compared with AR
SPS performs better than PPS on F5-F7, but it’s not obvious. linear model, the LSSVM that belonged to intelligent model
The worst part is that the convergence accuracy of F1, F3 is is more dependent on samples; although the Kalman filter
not as good as PPS and MOEA/D-KF in the PF comparison has been added to predict, in the case of large samples, there
diagram in supplementary. However, the approximate PF of
F5-F7 obtained by SPS in Fig. 9 has better convergence than
Table 5 The mean of absolute error between the true and predicted
PPS. That is to say, SPS still be able to work well when center points
the environmental change frequently on nonlinear functions.
The mean and variance of IGD running 30 times in the same Test Function Models The x1 The x2
environment are given in the Table 6.
F1 PPS 1.15e-16 1.46e-04
PF of F1 is fixed but the PS changes after a change has
SPS 0 3.85e-04
occurred. Both of these algorithms have good convergence
F2 PPS 7.12e-17 0
and distribution in the later stage, which we can know from
SPS 0 0
Fig. S5 and S6 in supplementary. The IGD of PPS converges
F3 PPS 9.68e-17 1.25e-16
faster than SPS, and there is a big beat at about t = 10
SPS 0 3.85e-04
in SPS in Fig. S3 in supplementary. It can also be seen in
F5 PPS 1.12e-15 6.97e-03
Table 6 that the MIGD of PPS is smaller than that of SPS,
SPS 3.94e-04 7.68e-04
and PPS has higher computational stability. The best results
F6 PPS 1.00e-02 4.31e-03
are shown in boldface in Table 5 and Table 6.
SPS 3.11e-04 1.39e-04
PF of F2 changes after a change has occurred, but PS
F7 PPS 7.37e-02 2.78e-02
is fixed. These algorithms perform well in convergence
SPS 3.19e-04 4.24e-04
and distribution from Fig. S3,S5,and S6 in supplementary.
A two stages prediction strategy... 1125
Fig. 6 The center points prediction comparison simulation figure of F1-F3 when nT = 5
will be a big mutation in the early stages. With the iteration model used by PPS is significantly reduced for F5-F7. From
of the algorithm, the approximate PF is much closer to Fig. 7, the IGD obtained by SPS convergence when t = 40
the true PF, which provids LSSVM model a more accurate on F5, while the PPS finally converges when t = 75. The
learning model, then the prediction accuracy and prediction value of MIGD obtained by SPS is smaller both on mean
stability are improved simultaneously. and variance, which indicates that SPS is superior to the
With the increasing of the complexity of the true Pareto PPS algorithm in convergence, distribution and calculation
solution set, the prediction accuracy of the linear prediction stability.
Fig. 7 IGD trend comparison of MOEA/D-KF, MRCDMO, PPS and SPS over time of changes on F5-F7
1126 H. Sun et al.
Fig. 8 PF obtained by MOEA/D-KF, MRCDMO, PPS, SPS at t = 149, 150, 151, 152, 153 on F5-F7 when nT = 10
The changes of F6 and F7 are more complicated. accuracy of the model decreases. Therefore, the IGD
The prediction accuracy of the AR linear model used in of PPS algorithm fluctuates continuously throughout the
PPS is low (as described above), which will lead to the calculation cycle. It is obvious that the prediction accuracy
prediction of population deviate seriously from the true of Pareto solution set on SPS is higher than PPS for F5-F7,
PS. Since the ideal approximate PF is not identified before which indicates that the LSSVM, an artificial intelligence
the environmental change (40 iterations), the distortion model, used in this algorithm performs better for complex
factor of the learning samples increases and the prediction DMOPs.
Fig. 9 PF obtained by MOEA/D-KF, MRCDMO, PPS, SPS at t = 149, 150, 151, 152, 153 on F5-F7 when nT = 5
A two stages prediction strategy... 1127
Instance Models nT = 10 nT = 5
Fig. 10 The box plot of IGD values for MOEA/D-KF, MRCDMO, PPS, SPS for the test function F5-F7
1128 H. Sun et al.
Table 7 Mean values of MIGD obtained by SPS on six instances in different parameter settings
Instance F1 F2 F3 F5 F6 F7
Through data analysis, the advantages of SPS enhance with different parameters on these problems over 30 runs.
with increasing of the intensity of environmental changes. By calculating the average of all data in the Table 7, SPS
For more intuitive observation, the box plot of IGD values performs well under circumstances where 65% individuals
for PPS, MOEA/D-KF, SPS on F5-F7 when nT =10, and 5 are generated from P t−1 , and 35% use Kalman Filter for
is given in Fig. 10. From the comparison of box diagrams, prediction.
the median and four minute distance obtained by SPS are
smaller, which indicates that the accuracy of prediction is
higher. In addition, the exception point produced by SPS is 6 Conclusions and future work
obviously less than PPS and MOEA/D-KF.
According to a large number of experiments and data, In this paper, we propose SPS to enhance the performance
SPS, the algorithm proposed in this paper, is better of MOEAs in dealing with dynamic environments when the
than PPS and MOEA/D-KF on nonlinear functions on environment changes violently. In SPS, we focus on the
convergence accuracy and speed when the environment prediction of the center points. The main advantages of the
changes violently. And SPS is also applicable to the proposed SPS are as follows:
prediction of linear functions. The LSSVM artificial intelligence model is used to
improve the prediction accuracy of the central point, thereby
5.3 Comparisons of different parameters in improving the ability of the population to track the new
population selection PF in the new environment. Then the LSSVM model can
achieve high prediction accuracy in both linear and non-
In the SPS strategy, when 2 ≤ t < 2 ∗ p, 65% individuals linear changes, but its disadvantage is that it is overly
are generated from P t−1 , and 35% use Kalman Filter dependent on the distribution of training samples. The more
for prediction. To determine the appropriate parameter in the samples used for training cover the entire change period,
population selection, we consider six problems F1, F2, F3, the higher the prediction accuracy of the trained model. To
F5, F6 and F7. The number of individuals selected from overcome this shortcoming, the Kalman filter algorithm is
P t−1 is set from 0.05 to 0.95 in 0.05 intervals. The other introduced. In the case of insufficient samples, the Kalman
parameters are the same as in Table 4. Table 7 gives the filter algorithm is used to generate the initial population.
statistical results of MIGD of solutions obtained by SPS By combining the three algorithms, the algorithm achieves
A two stages prediction strategy... 1129
Table 8 Glossary of
abbreviations Parameters Settings
higher convergence accuracy under linear and nonlinear Acknowledgements This work was supported by the National Natural
changes. In actual production, many problems need dynamic Science Foundation of China [No. 62003296, 61703361]; the Natural
Science Foundation of Hebei [No. E2018203162, F20202
optimization. For example, in the rolling production process,
03031]; the Science and Technology Research Projects of Hebei
the rolling speed has a great influence on the rolling [No. QN2020225]; the Post-Doctoral Research Projects of Hebei
efficiency, as the rolling speed increases or decreases, the [No. B2019003021]; the Hebei Province Graduate Innovation Funding
optimal rolling schedule will also change. The combination Project [CXZZBS2022134]. The authors would like to thank the editor
and anonymous reviewers for their helpful comments and suggestions
of research theoretical algorithms and actual production is
to improve the quality of this paper.
the main research work in the future.
Declarations
Appendix
Conflict of Interests The authors declared that they have no conflicts of
interest to this work. We declare that we do not have any commercial or
In order to facilitate readers to view the paper, the glossary associative interest that represents a conflict of interest in connection
of abbreviations is presented in Table 8. with the work submitted.
1130 H. Sun et al.