Symmetry
Symmetry
Article
Top-Oil Temperature Prediction of Power Transformer Based
on Long Short-Term Memory Neural Network with
Self-Attention Mechanism Optimized by Improved
Whale Optimization Algorithm
Dexu Zou 1,2 , He Xu 3 , Hao Quan 3, * , Jianhua Yin 3 , Qingjun Peng 2 , Shan Wang 2 , Weiju Dai 2 and Zhihu Hong 2
Abstract: The operational stability of the power transformer is essential for maintaining the symmetry,
balance, and security of power systems. Once the power transformer fails, it will lead to heightened
instability within grid operations. Accurate prediction of oil temperature is crucial for efficient trans-
former operation. To address challenges such as the difficulty in selecting model hyperparameters
and incomplete consideration of temporal information in transformer oil temperature prediction,
a novel model is constructed based on the improved whale optimization algorithm (IWOA) and
long short-term memory (LSTM) neural network with self-attention (SA) mechanism. To incorporate
holistic and local information, the SA is integrated with the LSTM model. Furthermore, the IWOA is
employed in the optimization of the hyper-parameters for the LSTM-SA model. The standard IWOA
is improved by incorporating adaptive parameters, thresholds, and a Latin hypercube sampling
Citation: Zou, D.; Xu, H.; Quan, H.; initialization strategy. The proposed method was applied and tested using real operational data
Yin, J.; Peng, Q.; Wang, S.; Dai, W.;
from two transformers within a practical power grid. The results of the single-step prediction experi-
Hong, Z. Top-Oil Temperature
ments demonstrate that the proposed method significantly improves the accuracy of oil temperature
Prediction of Power Transformer
prediction for power transformers, with enhancements ranging from 1.06% to 18.85% compared to
Based on Long Short-Term Memory
benchmark models. Additionally, the proposed model performs effectively across various prediction
Neural Network with Self-Attention
Mechanism Optimized by Improved
steps, consistently outperforming benchmark models.
Whale Optimization Algorithm.
Symmetry 2024, 16, 1382. https:// Keywords: power transformer; top-oil temperature prediction; self-attention mechanism; whale
doi.org/10.3390/sym16101382 optimization algorithm; long short-term memory networks
unexpected failures but also optimize maintenance schedules, reduce operational risks,
and extend the transformer’s lifespan. Effective oil temperature prediction enhances the
overall reliability and efficiency of the power system, making it an essential component in
maintaining the symmetrical operation of the electrical grid.
Researchers generally study the prediction of transformer oil temperatures through
mathematical and data-driven models [8–10]. Zhao et al. used the least squares method
to establish a parameter identification algorithm [11], and this mathematical model can
effectively predict the top oil temperature but lacks strong generalization ability. Wang et al.
establish a thermal circuit model to simulate the changes in the transformer temperature
over time, but it has a lengthy computation time [12].
With the development of intelligent algorithms, artificial intelligence technologies have
been applied to the field of power system forecasting. Interesting studies can be found in the
fields of load forecasting [13], vehicle-to-grid (V2G) scheduling prediction [14], and solar
irradiance forecasting [15]. There have been some research efforts focused on predicting
transformer oil temperature using these algorithms. Qing et al. developed a model based
on artificial neural networks for forecasting the top oil temperature of transformers [16],
and this model significantly reduces the computational time but ignores the selection
of optimal hyperparameters. Tan et al. proposed a forecast model that considers path
analysis and similar moments [17], but the validation dataset is small and the adaptability
is difficult to confirmed. Li et al. introduced a regression model with enhanced particle
swarm optimization (PSO) for transformer top oil temperature forecast [18]. However, the
large sampling interval of data caused the substandard performance. Based on a similar
day, Tan et al. introduced a method to predict top oil temperature. The above approach
relies solely on single-day similarity for prediction and deteriorates the model prediction
performance [19]. To sum up, these studies do not fully consider temporal information
of different input features, thus failing to combine global and local information within
transformer operational data. In addition, the optimal hyper-parameters of the model are
difficult to determine.
To tackle the issues mentioned, this paper introduces a novel method: an improved
whale optimization algorithm (IWOA) optimized long short-term memory (LSTM) neural
network with self-attention (SA) mechanism model. The proposed method comprehen-
sively addresses challenges related to the difficulty in selecting hyperparameters for the oil
temperature prediction model and the insufficient consideration of temporal information. It
integrates SA with LSTM and utilizes the IWOA to obtain the optimal hyper-parameters for
the LSTM-SA model, resulting in high prediction accuracy. Finally, the proposed method is
tested with actual operating data in a practical power grid. The results demonstrate that
the proposed method has better forecasting performance.
The remaining sections of this paper are as below: Section 2 discusses the power trans-
former and top-oil temperature. Section 3 introduces the LSTM-SA model and the IWOA.
Section 4 presents a case study that shows the superiority of the IWOA for optimization
and the effectiveness of the proposed method for predicting top-oil temperature. Finally,
conclusions and discussions are presented in Section 5.
Figure
Figure1.1.The
Thebasic
basicconstruction
constructionof of
anan
oil-immersed transformer.
oil-immersed transformer.
3. The
3. TheProposed
ProposedIWOA-LSTM-SA
IWOA-LSTM-SA Method
Method forfor Top-Oil
Top-Oil Temperature
Temperature Prediction
Prediction
3.1.Framework
3.1. Framework
Inthis
In thisstudy,
study,IWOA-LSTM-SA
IWOA-LSTM-SA hashas been
been developed
developed for for transformer
transformer oil temperature
oil temperature
forecasting,ininwhich
forecasting, whichIWOA
IWOA has
has been
been employed
employed to precisely
to precisely search
search optimal
optimal inputinput
hyper-hyper-
parameters
Symmetry 2024, 16, x FOR PEER REVIEW and LSTM-SA as the forecasting model to combine global and local
parameters and LSTM-SA as the forecasting model to combine global and local infor-18 information.
4 of
The flowchart
mation. is presented
The flowchart in Figure
is presented 2.
in Figure 2.
Data preprocessing
Output global
Output prediction results optimal location
End
Figure 2.
Figure 2. Flow
Flow chart
chartof
ofIWOA-LSTM-SA.
IWOA-LSTM-SA.
The main phases of the IWOA-LSTM-SA will be detailed in the following sections.
The main phases of the IWOA-LSTM-SA will be detailed in the following sections.
The output gate regulates the current output and decides the output information. The
formula for calculation is given below:
gt = σ Wg · [rt−1 , xt ] + p g (3)
Forget gate
rt
Ct−1 × + Ct
tanh
st × gt
×
mt
σ tanh σ
σ
rt−1 rt
Figure 3.3.LSTM
Figure LSTMstructure diagram.
structure diagram.
In summary, LSTM is suitable for processing time series data, so this paper uses LSTM
In summary,
to establish LSTM
a temperature is suitable
prediction model. forFurthermore,
processingittime seriestodata,
is difficult process solong
this pape
LSTM
sequence todata
establish a temperature
for the LSTM model that weprediction model.
introduce SA Furthermore,
to solve it ismethod
this problem. This difficult to p
considers
long both local
sequence and
data global
for information.
the LSTM model that we introduce SA to solve this problem
It consists of three components. Firstly, the data that come from the LSTM model is
method considers both local and global information.
the input of the SA layer. Secondly, the matrices q, k, and v are calculated using the weight
It consists
matrices of three
Wq , Wk , and components.
Wv . Thirdly, a1,2 is the Firstly, thebetween
dot product data that come
q1 and from
k2 , and a2,2the LSTM mo
is the
the input of the SA layer. Secondly, the matrices q , k , and v are calculated usin
1,2
weight matrices W q , Wk , and Wv . Thirdly, a is the dot product between q1 a
2,2
and a is the dot product between q2 and k2 . The attention matrix M means th
relation between different time steps. The structure is shown in Figure 4.
LSTM to establish a temperature prediction model. Furthermore, it is difficult to proces
long sequence data for the LSTM model that we introduce SA to solve this problem. Thi
method considers both local and global information.
It consists of three components. Firstly, the data that come from the LSTM model i
Symmetry 2024, 16, 1382 the input of the SA layer. Secondly, the matrices q , k , and v are calculated using the
5 of 17
weight matrices W q , Wk , and Wv . Thirdly, a1,2 is the dot product between q1 and k2
2,2
dot a is between
andproduct the dot product
q2 and k2between q2 and
. The attention k2 . The
matrix attention
M means the matrix
correlation means the cor
M between
different
relation time steps.different
between The structure
timeissteps.
shown in Figure
The structure4. is shown in Figure 4.
Self-Attention Attention
bt
Mechanism Layer matrix M Attention matrix
···
M
b2
b1
a 1,1 a 2 ,1 ··· a t ,1 Matmul a1, 2 a 2,2 ··· at ,2 Matmul a1,t a2,t ··· at , t Matmul
· · ·
q1 k1 v1 q2 k2 v2 qt kt vt
Wq Wk Wv Wq Wk Wv ··· Wq Wk Wv
LSTM Layer
h1 h2 hn Output
C1 C2 Cn
X + C1 X + C2 X + Cn
+ X + X + X
X1 X2 Xt
Figure4.4.LSTM-SA
Figure LSTM-SA structure.
structure.
→ →∗
where t denotes the current iteration; G is a vector indicating the position; G is the place
→ →
vector of the best solution acquired yet, A and C are calculated from the following:
→ →→ →
A = 2a r 1 − a (8)
→ →
C = 2r2 (9)
→ → →
where a is an adjustment vector and a is linearly decreasing from 2 to 0; the vectors r1 and
→
r2 are random vectors that fall within the range of [0, 1].
original position and the position of the currently best-so-far whale. The equation for
calculation is as below:
→ t
a = 2× 1− (10)
tmax
(2) Spiral updating location: the WOA uses spiral updating location to launch attacks on
prey, and the spiral hunting equation is as below:
→ →∗ → →∗
G (t+1) = ebl cos(2πl ) · G (t) − G (t) + G (t) (11)
where l is a random count within the interval [−1, 1] and b represents a constant. They
approach the prey using two mechanisms: a shrinking circle and a spiral-shaped path. The
updated equations are as follows.
∗
→ → → →∗ →
G (t) − A C · G (t) − G (t) , p < 0.5
→
G ( t +1) = →∗ → →∗ (12)
ebl cos(2πl ) · G (t) − G (t) + G (t), p ≥ 0.5
→
where G rand denotes the random location of a whale.
this paper, an adaptive selection threshold is used to replace the fixed threshold. The
method automatically adjusts the threshold according to the problem’s characteristics
throughout the search process. The calculation is given by the following formula:
" !#
t et tf
pa = 1 − × L× +f× (14)
( L + f )tmax etmax tmax f
where t denotes the current iteration, while tmax denotes the maximum iteration count; L,
f are control parameters, and their values are 2 and 4, respectively.
In our method, when the threshold is larger in the initial stage, the whale will pref-
erentially choose the encircling movement strategy. With the increasing of iterations, the
threshold decreases, thus the whale is more likely to choose the spiral motion strategy.
Equation (12) is updated to Equation (15).
∗
→ → → →∗ →
G (t) − A C · G (t) − G (t) , p < p a
→
G ( t +1) = →∗ → →∗ (15)
ebl cos(2πl ) · G (t) − G (t) + G (t), p ≥ p a
⇀
(3) Adaptive parameter: in traditional method, a decreases linearly from 2 to 0. In
order to enhances local searching ability, this study uses a nonlinear strategy to
adjust b in Equation (16), which influences the shape of the logarithmic spiral. It
can significantly improve the effectiveness of local search and the speed of global
search, thereby enhancing overall accuracy [29]. At the same time, we establish a
relationship between b and t to achieve adaptive adjustment. Equation (10) is updated
to Equation (16).
wherek,k,v are
where v are control
control parameters,
parameters, and their
and their valuesvalues are 10,
are 4 and 4 and 10, respectively.
respectively.
The IWOA flowchart is illustrated in Figure
The IWOA flowchart is illustrated in Figure 5. 5.
Figure5.5.Flow
Figure Flow chart
chart of the
of the IWOA.
IWOA.
AI BI CI P Q AU BU CU T
AI 1.000 0.999 0.999 0.999 0.925 −0.862 −0.866 −0.835 0.371
BI 0.999 1.000 0.999 0.999 0.924 −0.863 −0.866 −0.835 0.371
CI 0.999 0.999 1.000 0.999 0.925 −0.862 −0.866 −0.835 0.371
P 0.999 0.999 0.999 1.000 0.925 −0.857 −0.859 −0.828 0.369
Q 0.925 0.924 0.925 0.925 1.000 −0.842 −0.844 −0.823 0.372
AU −0.862 −0.863 −0.862 −0.857 −0.842 1.000 0.979 0.964 −0.346
BU −0.866 −0.866 −0.866 −0.859 −0.844 0.979 1.000 0.981 −0.342
CU −0.835 −0.835 −0.835 −0.828 −0.823 0.964 0.981 1.000 −0.339
T 0.371 0.371 0.371 0.369 0.372 −0.346 −0.342 −0.339 1.000
As shown in Table 1, the correlation coefficient between the top-oil temperature and
the high-voltage side three-phase current is 0.371, and the correlation coefficients with
active power and reactive power are 0.369 and 0.372, respectively, indicating a positive
correlation. The correlation coefficients between the top-oil temperature and the high-
voltage side three-phase voltage are −0.346, −0.342, and −0.339, respectively, indicating a
negative correlation with the top-oil temperature. This also suggests that the high-voltage
side three-phase voltage, current, and active and reactive power have some influence on
the transformer oil temperature. Similarly, a correlation analysis of the input features of
Dataset 2 based on the Pearson correlation coefficient method is conducted. Ultimately,
this paper selects high-voltage-side current, active and reactive power, voltage, and top-oil
temperature as input features. The dataset is split into training and test sets, in which 80%
is used for training and 20% for testing.
Figure 6. Cont.
Symmetry2024,
Symmetry 16,x1382
2024,16, FOR PEER REVIEW 10 10
ofof1817
4.3. In
One-Step
Table 2,Prediction
the optimal value reaches 0 in the F5, F6 and F8 functions, and the average
valuesSingle-step
also show oil
significant improvement.
temperature predictionAs shownforecasting
involves in Figure 6,theIWOA exhibits better
transformer’s top oil
convergence performance compared to traditional algorithms. These findings
temperature for the next time step using historical data. In this experiment, the confirm the
prediction
effectiveness
is for 30 minofinto
thethe
enhancement strategies
future. To balance thefor WOA.and testing errors, we introduced L2
training
regularization and dropout during the model training. Specifically, a dropout rate of 0.1
Table
was 2. Comparison
applied, alongofwith
test results for each algorithm.
L2 regularization using a factor of 0.01. The prediction results
for Dataset 1, demonstrating
Function Evaluation Index the effectiveness
GA ofPSO
the method, WOA
are presentedIWOA in Figure 7.
To further illustrate the trade-off between training and testing errors,−10 Figure 8 provides a
Mean 3602.311 0.035 7.21 × 10 1.46 × 10−19
F1
comparison of the training and testing errors.
Best 1454.955 0.001 3.32 × 10−13 1.17 × 10−24
Mean 21.197 32.013 5.16 × 10 −9 1.73 × 10−13
F2
Best 13.936 0.081 5.12 × 10−9 2.24 × 10−15
Mean 3477.958 0.047 8.98 × 10 −10 4.16 × 10−20
F3
Best 1771.241 0.001 1.68 × 10 −12 1.42 × 10−22
Mean 1.432 5.176 0.015 0.00075
F4
Best 0.413 0.065 0.003 0.00014
Mean 28.474 51.152 0 0
is for 30 min into the future. To balance the training and testing errors, we introduced L2
regularizationand
regularization anddropout
dropoutduring
duringthe
themodel
modeltraining.
training.Specifically,
Specifically,aadropout
dropoutraterateofof0.1
0.1
was applied, along with L2 regularization using a factor of 0.01. The prediction results
was applied, along with L2 regularization using a factor of 0.01. The prediction results for fo
Dataset1,1,demonstrating
Dataset demonstratingthetheeffectiveness
effectivenessofofthe
themethod,
method,are arepresented
presentedininFigure
Figure7.7.To To
Symmetry 2024, 16, 1382 further illustrate the trade-off between training and testing errors, Figure
further illustrate the trade-off between training and testing errors, Figure 11 8 provides
8 of
provides
17 aa
comparisonofofthe
comparison thetraining
trainingand
andtesting
testingerrors.
errors.
Figure7.
Figure
Figure 7.7.The
Theprediction
The prediction
prediction results
results
results ofofIWOA-LSTM-SA.
IWOA-LSTM-SA.
of IWOA-LSTM-SA.
Figure8.
Figure
Figure 8.8.Training
Training
Training and
and
and testing
testing
testing errors
errors
errors over
over
over iterations.
iterations.
iterations.
Theoretically,
Theoretically,
Theoretically, when
whenthere
when is a is
there
there significant gap between
isaasignificant
significant trainingtraining
gapbetween
gap between and test errors,
training andtest
and it usually
test errors,ititusu-
errors, usu
indicates over-fitting,
allyindicates where
indicatesover-fitting, the
over-fitting,where model
wherethe performs
themodel well
modelperforms on
performswellthe training
wellon onthe data but
thetraining struggles
trainingdata databut
butstrug-
strug
ally
to generalize to unseen data. As illustrated in Figure 8, both the training and test losses
gles to generalize to unseen data. As illustrated in Figure 8,
gles to generalize to unseen data. As illustrated in Figure 8, both the training and test both the training and tes
decrease rapidly during the initial epochs and then converge to similar values as training
lossesdecrease
losses decrease
progresses.
rapidlythat
rapidly
This suggests
during
during theachieved
the
we have
initialepochs
initial epochs andthen
and
a well-balanced
then converge
converge
trade-off totosimilar
between
similar valuesas
values
training
a
training
training
and progresses.
progresses.
testing This
This
errors. This suggests
suggests
balance was thatthatwewehave
successfullyhave achieved
achieved
attained aawell-balanced
well-balanced
by applying regularization trade-offbe-
trade-off be
tween
tween training
training and
and testing
testing errors.
errors. This
This balance
balance was
was successfully
successfully
techniques, such as L2 regularization and dropout, which helped control model complexity, attained
attained byby applying
applying reg
reg-
ularization
ularization
mitigate techniques,
techniques,
over-fitting, such as
such
and enhance as
the L2
L2 regularization
regularization
model’s andcapabilities.
and
generalization dropout, which
dropout, which helped helped control
contro
model
model Tocomplexity,
assess the performance
complexity, of this method,
mitigateover-fitting,
mitigate over-fitting, andthis
and paperthe
enhance
enhance compared
themodel’s
model’s it generalization
with benchmarkcapabil-
generalization capabil
methods,
ities.
ities. including BP, gate recurrent unit (GRU), convolutional neural networks (CNN),
LSTM,To LSTM-SA,
assessthe andperformance
the WOA-LSTM-SA models. In order to reduce the accidental error, this
To assess performance ofofthis
thismethod,
method, this
this papercompared
paper compared ititwith
withbenchmark
benchmark
paper conducted 10 repeated experiments and averaged the results to show the forecasting
methods,including
methods, includingBP, BP,gate
gaterecurrent
recurrentunit unit(GRU),
(GRU),convolutional
convolutionalneural neuralnetworks
networks(CNN), (CNN)
performance. Figure 9 displays the prediction results for each model on Dataset 1. It
LSTM,
LSTM, LSTM-SA, and WOA-LSTM-SA models. In order to reduce
is evident that the proposed model shows the best prediction result compared to all error,
LSTM-SA, and WOA-LSTM-SA models. In order to reduce the
the accidental
accidental error
thispaper
this paperconducted
benchmark conducted
models. The1010repeated
repeated
reason experiments
is thatexperiments
the proposed and andaveraged
averaged
approach not only the
the resultsto
results
combines toshow
both show thefore-
localthe fore
casting
casting
and global performance.
performance. Figure
information Figure 9 displays
but also9utilizes
displays IWOA the
the toprediction
prediction results
determineresults for each
for each
the optimal model on Dataset 1.1
model
hyper-parameters. on Dataset
Table 3 presents the comparative results.
It is evident that the proposed model shows the best prediction result compared to all
benchmark models. The reason is that the proposed approach not only combines both
Symmetry 2024, 16, 1382 local and global information but also utilizes IWOA to determine the optimal hyper-pa-
12 of 17
rameters. Table 3 presents the comparative results.
Figure 9.
Figure 9. Performance
Performance comparison
comparison across
across models.
models.
Table 3. Model
Table 3. Model prediction
prediction evaluation
evaluation indexes.
indexes.
Model
Model RMSERMSE MAE MAE MAPEMAPE(%)
(%) RR22 Time (s)
Time (s)
BP BP 1.698
1.698 1.228
1.228 2.581
2.581 0.825
0.825 13.287
13.287
CNN CNN 1.646
1.646 1.170
1.170 2.462
2.462 0.836
0.836 32.317
32.317
GRU 1.553 1.011 2.144 0.854 96.109
GRU 1.553 1.011 2.144 0.854 96.109
Dataset 1 LSTM 1.633 1.022 2.175 0.838 129.666
Dataset 1 LSTM
LSTM-SA 1.633
1.537 1.022
1.031 2.175
2.253 0.838
0.861 129.666
174.497
WOA-LSTM-SA 1.537
LSTM-SA 1.462 0.998
1.031 2.103
2.253 0.870
0.861 11,058.906
174.497
IWOA-LSTM-SA 1.438 0.989 2.089 0.873 10,083.375
WOA-LSTM-SA 1.462 0.998 2.103 0.870 11,058.906
BP
IWOA-LSTM-SA 1.438 0.923 0.715
0.989 2.428
2.089 0.974
0.873 38.216
10,083.375
CNN 0.824 0.596 1.929 0.979 80.746
BPGRU 0.923
0.758 0.715
0.544 2.428
1.772 0.974
0.982 38.216
165.984
Dataset 2 CNN LSTM 0.824
0.874 0.596
0.643 1.929
2.129 0.979
0.977 80.746
234.946
LSTM-SA
GRU 0.809
0.758 0.576
0.544 1.890
1.772 0.980
0.982 383.995
165.984
Dataset 2 WOA-LSTM-SA
LSTM 0.757
0.874 0.535
0.643 1.739
2.129 0.982
0.977 13,016.477
234.946
IWOA-LSTM-SA 0.749 0.524 1.703 0.983 11,075.689
LSTM-SA 0.809 0.576 1.890 0.980 383.995
WOA-LSTM-SA 0.757 0.535 1.739 0.982 13,016.477
From Table 3, it is evident0.749
IWOA-LSTM-SA that our method
0.524 does not have an advantage
1.703 0.983 in terms of
11,075.689
computation time compared to traditional machine learning models. Therefore, in scenarios
where prediction accuracy is not a primary concern, traditional machine learning models
From Table 3, it is evident that our method does not have an advantage in terms of
can still be considered for top oil temperature prediction of transformers. The prediction
computation time compared to traditional machine learning models. Therefore, in scenar-
model proposed in this paper, however, places a greater emphasis on improving prediction
ios where prediction accuracy is not a primary concern, traditional machine learning mod-
accuracy. To analyze and compare each model more comprehensively, this paper includes
els can still be considered for top oil temperature prediction of transformers. The predic-
a residual plot. Using Dataset 1 as an example, in the residual plot (Figure
Symmetry 2024, 16, x FOR PEER REVIEW of10),
18 the true
tion model proposed in this paper, however, places a greater emphasis on13improving pre-
values are shown on the horizontal axis, while the vertical axis represents the residual
diction accuracy. To analyze and compare each model more comprehensively, this paper
values (percentage).
includes a residual plot. Using Dataset 1 as an example, in the residual plot (Figure 10),
the true values are shown on the horizontal axis, while the vertical axis represents the
residual values (percentage).
Figure
Figure10.10.
Model residuals.
Model residuals.
The residual percentage is relatively higher for the data between 30 and 43 °C and 55
to 60 °C. The reason is as follows: there are about 4000 sample points within the tempera-
ture range of 43 to 55 °C, whereas the temperature ranges of 30~43 °C and 55~60 °C each
contain approximately 200 sample points. This unbalanced distribution leads to low ac-
Symmetry 2024, 16, 1382 13 of 17
The residual percentage is relatively higher for the data between 30 and 43 ◦ C and 55 to
60 ◦ C.
The reason is as follows: there are about 4000 sample points within the temperature
range of 43 to 55 ◦ C, whereas the temperature ranges of 30~43 ◦ C and 55~60 ◦ C each contain
approximately 200 sample points. This unbalanced distribution leads to low accuracy on
sparse samples.
and 1.634, representing reductions of 12.60% and 11.11% compared to the BP model, 7.61%
and 15.89% compared to the CNN model, 6.49% and 17.30% compared to the GRU model,
5.19% and 14.14% compared to the LSTM model, 4.56% and 12.82% compared to the LSTM-
SA model, and 3.06% and 1.80% compared to the WOA-LSTM-SA model. By analyzing the
multi-step prediction metrics, we conclude that the proposed model demonstrates good
performance across different prediction steps compared to traditional models.
(a) 3-step prediction results for dataset 1 (b) 5-step prediction results for dataset 1
(c) 3-step prediction results for dataset 2 (d) 5-step prediction results for dataset 2
Figure 11.11.
Figure Multi-step
Multi-stepprediction performance
prediction performance comparison
comparison across
across models
models (one week).
(one week).
5. Conclusions
Table 5. Multi-step prediction evaluation metrics.
Oil temperature prediction can effectively prevent symmetrical and asymmetrical
Step
faults in transformers. Model
This paper RMSE
adopts a novel approachMAE
to improveMAPE (%) Time
the performance of (s)
top-oil temperature prediction BPduring transformer
1.698 operations.
1.228The proposed
2.581 model has
13.287
been tested using actual data, and some conclusions can be obtained as follows:
CNN 1.646 1.170 2.462 32.317
(1)To verify the efficacy of the IWOA, this paper conducts tests with eight test functions.
GRU 1.553 1.011 2.144 96.109
The findings demonstrate that the IWOA outperforms GA, PSO, and WOA in terms
1 (30 min)speed and
of convergence LSTM
accuracy. 1.633 1.022 2.175 129.666
LSTM-SA
(2) To verify the effectiveness of the proposed1.537
model, 1.031
extensive 2.253
experiments were 174.497
con-
ducted using actual operating data.
WOA-LSTM-SA 1.462 The experimental results
0.998 indicate
2.103that the pro-
11,058.906
posed approach outperforms current state-of-the-art methods. On Dataset 1, the model
IWOA-LSTM-SA
in RMSE of 15.31%,1.438 0.98911.94%, 6.44%,
2.089 and 1.98%
10,083.375
Datasetachieved
1 reductions 12.64%, 7.41%,
compared to the BP, CNN,BP 1.763
GRU, LSTM, LSTM-SA, and1.382
WOA-LSTM-SA 2.873methods,14.082
re-
spectively. Similarly, on Dataset
CNN 2, the model demonstrated
1.652 1.221 significant improvements,
2.557 22.572
with RMSE reductions of 18.85%, 9.09%, 1.19%, 14.29%, 7.42%, and 1.06% compared
GRU 1.597 1.133 2.409 95.775
to the same benchmark methods.
3 (90 min)
(3) The proposed LSTMeffectively1.605
model performs 1.164
across various 2.453
prediction steps compared 179.898
to
benchmark models. Specifically,
LSTM-SAfor the 3-step
1.562 prediction,
1.162the RMSE2.448
of the proposed
229.012
model is 1.537 and WOA-LSTM-SA
1.015 for Dataset 1 and1.555
Dataset 2, respectively,
1.102 reflecting
2.311 reductions
11,746.135
of 12.83% and 38.65% compared to the BP model, 6.98% and 20.89% compared to
the CNN model, 3.75%IWOA-LSTM-SA 1.537 to the GRU
and 13.62% compared 1.088model, 4.24%
2.308and 27.16%
10,149.217
compared to the LSTM model, 1.60% and 17.93% compared to the LSTM-SA model,
and 1.16% and 4.34% compared to the WOA-LSTM-SA model. For the 5-step pre-
diction, the RMSE of the proposed model is 1.714 and 1.634, representing reductions
of 12.60% and 11.11% compared to the BP model, 7.61% and 15.89% compared to
Symmetry 2024, 16, 1382 16 of 17
the CNN model, 6.49% and 17.30% compared to the GRU model, 5.19% and 14.14%
compared to the LSTM model, 4.56% and 12.82% compared to the LSTM-SA model,
and 3.06% and 1.80% compared to the WOA-LSTM-SA model.
Author Contributions: D.Z. led the conceptualization, methodology, software development, and
original draft preparation. Validation was carried out by D.Z., H.X. and H.Q., while H.X. and D.Z.
handled formal analysis. H.Q. managed the investigation, and Z.H. and W.D. provided resources.
S.W. was responsible for data curation. Writing—review and editing involved D.Z., H.X., H.Q., Q.P.
and J.Y., with visualization by D.Z., H.X. and J.Y. Supervision was provided by D.Z. and H.Q., project
administration by Q.P. and S.W., and funding acquisition by D.Z. All authors have read and agreed
to the published version of the manuscript.
Funding: This work was supported by the Electric Power Research Institute of Yunnan Power Grid
Co., Ltd., Kunming, Yunnan, China (No. YNKJXM20220009).
Data Availability Statement: Data are contained in the article.
Conflicts of Interest: Authors Dexu Zou, Qingjun Peng, Shan Wang, Weiju Dai, and Zhihu Hong
were employed by the company China Southern Power Grid Yunnan Power Grid Co., Ltd. The
remaining authors declare that the research was conducted in the absence of any commercial or
financial relationships that could be construed as a potential conflict of interest.
Appendix A
Table A1 displays the ten test functions used in this study.
Function Range
k
F1 ( x ) = ∑ xn2 [−100, 100]
n =1
k k
F2 ( x ) = ∑ | xn | + ∏ | xn | [−10, 10]
n =1
n=
1
2
k n
F3 ( x ) = ∑ ∑ xi [−100, 100]
n =1 i −1
k
F4 ( x ) = ∑ nxn4 + random[0, 1) [−1.28, 1.28]
n =1
1 xn [−600, 600]
∗ ∑ xn2 − ∏ cos √
F5 ( x ) = 1 +
4000
2 n
F6 ( x ) = xs n − 10 cos! (2πxn ) + 10 [−5.12, 5.12]
1 k 2
1 k
[−32, 32]
F7 ( x ) = 20 − 20 exp −0.2 ∑ xn − exp ∑ cos(2πxn ) + e
k n =1 k n =1
k −1
k
F8 ( x ) = πk 10 sin(πy1 ) + ∑ (yn − 1)2 1 + 10 sin2 (πyn+1 ) + (yn − 1)2 + ∑ µ( xn , 10, 100, 4)
[−50, 50]
n =1 p n = 1
F9 ( x ) = ∑id=d − xi × sin | xi | + 418.98288727243369 × d [−500, 500]
0.2
F10 ( x ) = ∑id=1 (ln( xi − 2))2 + (ln(10 − xi ))2 − ∏10 i =1 x i
[2, 10]
References
1. Xu, X.; He, Y.; Li, X.; Peng, F.; Xu, Y. Overload Capacity for Distribution Transformers with Natural-Ester Immersed High-
Temperature Resistant Insulating Paper. Power Sys. Technol. 2018, 42, 1001–1006.
2. Wang, S.; Gao, M.; Zhuo, R. Research on high efficient order reduction algorithm for temperature coupling simulation model of
transformer. High Volt. Appar. 2023, 59, 115–126.
3. Liu, X.; Xie, J.; Luo, Y. A novel power transformer fault diagnosis method based on data augmentation for KPCA and deep
residual network. Energy Rep. 2023, 9, 620–627. [CrossRef]
4. Chen, T.; Chen, Y.; Li, X. Prediction for dissolved gas concentration in power transformer oil based on CEEMDAN-SG-BiLSTM.
High Volt. Appar. 2023, 59, 168–175.
5. Zang, C.; Zeng, J.; Li, P. Intelligent diagnosis model of mechanical fault for power transformer based on SVM algorithm. High
Volt. Appar. 2023, 59, 216–222.
6. Ji, H.; Wu, X.; Wang, H. A New Prediction Method of Transformer Oil Temperature Based on C-Prophet. Adv. Power Syst. Hyd.
Eng. 2023, 39, 48–55.
Symmetry 2024, 16, 1382 17 of 17
7. Tan, F.; Xu, G.; Zhang, P. Research on Top Oil Temperature Prediction Method of Similar Day Transformer Based on Topsis and
Entropy Method. Elect. Power Sci. Eng. 2021, 37, 62–69.
8. Amoda, O.A.; Tylavsky, D.J.; McCulla, G.A.; Knuth, W.A. Acceptability of three transformer hottest-spot temperature models.
IEEE Trans. Power Deliv. 2011, 27, 13–22. [CrossRef]
9. Zhou, L.; Wang, J.; Wang, L.; Yuan, S.; Huang, L.; Wand, D.; Guo, L. A Method for Hot-Spot Temperature Prediction and Thermal
Capacity Estimation for Traction Transformers in High-Speed Railway Based on Genetic Programming. IEEE Trans. Transp.
Electrif. 2019, 5, 1319–1328. [CrossRef]
10. Deng, Y.; Ruan, J.; Quan, Y.; Gong, R.; Huang, D.; Duan, C.; Xie, Y. A Method for Hot Spot Temperature Prediction of a 10 kV
Oil-Immersed Transformer. IEEE Access 2019, 7, 107380. [CrossRef]
11. Zhao, B.; Zhang, X. Parameter Identification of Transformer Top Oil Temperature Model and Prediction of Top Oil Tempeature.
High. Volt. Eng. 2004, 30, 9–10.
12. Wang, H.; Su, P.; Wang, X. Prediction of Surface Temperatures of Large Oil-Immersed Power Transformers. J. Tsinghua Univ. Sci.
Technol. 2005, 45, 569–572.
13. Tan, M.; Hu, C.; Chen, J.; Wang, L.; Li, Z. Multi-node load forecasting based on multi-task learning with modal feature extraction.
Eng. Appl. Artif. Intell. 2022, 112, 104856. [CrossRef]
14. Shang, Y.; Li, S. FedPT-V2G: Security enhanced federated transformer learning for real-time V2G dispatch with non-IID data.
Appl. Energy 2024, 358, 122626. [CrossRef]
15. Bai, M.; Yao, P.; Dong, H.; Fang, Z.; Jin, W.; Yang, X.; Liu, J.; Yu, D. Spatial-temporal characteristics analysis of solar irradiance
forecast errors in Europe and North America. Energy 2024, 297, 131187. [CrossRef]
16. Qing, H.; Jennie, S.; Daniel, J. Prediction of top-oil temperature for transformers using neural network. IEEE Trans. Power Deliv.
2000, 15, 1205–1211.
17. Tan, F.; Chen, H.; He, J. Top oil temperature forecasting of UHV transformer based on path analysis and similar time. Elect. Power
Autom. Equip. 2021, 41, 217–224.
18. Li, S.; Xue, J.; Wu, M.; Xie, R.; Jin, B.; Zhang, H.; Li, Q. Prediction of Transformer Top-oil Temperature with the Improved Weighted
Support Vector Regression Based on Particle Swarm Optimization. High Volt. Appar. 2021, 57, 103–109.
19. Tan, F.L.; Xu, G.; Li, Y.F.; Chen, H.; He, J.H. A method of transformer top oil temperature forecasting based on similar day and
similar hour. Elect. Power Eng. Tech. 2022, 41, 193–200.
20. Yi, Y. Research on Prediction Method of Transformer Top-Oil Temperature Based on Assisting Dispatchers in Decision-Making.
Master’s Thesis, Southwest Jiaotong University, Chengdu, China, 2017.
21. Gharehchopogh, F.S.; Gholizadeh, H. A comprehensive survey: Whale Optimization Algorithm and its applications. Swarm Evol.
Comput. 2019, 48, 1–24. [CrossRef]
22. Brodzicki, A.; Piekarski, M.; Jaworek-Korjakowska, J. The whale optimization algorithm approach for deep neural networks.
Sensors 2021, 21, 8003. [CrossRef] [PubMed]
23. Mostafa Bozorgi, S.; Yazdani, S. IWOA: An improved whale optimization algorithm for optimization problems. J. Comput. Des.
Eng. 2019, 6, 243–259. [CrossRef]
24. Naderi, E.; Azizivahed, A.; Asrari, A. A step toward cleaner energy production: A water saving-based optimization approach for
economic dispatch in modern power systems. Electr. Power Syst. Res. 2022, 204, 107689. [CrossRef]
25. Gao, W.; Liu, S.; Huang, L. Inspired artificial bee colony algorithm for global optimization problems. Acta Electron. Sin. 2012, 40,
2396.
26. Shi, X.; Li, M.; Wei, Q. Application of Quadratic Interpolation Whale Optimization Algorithm in Cylindricity Error evaluation.
Metrol. Meas. Tech. 2019, 46, 58–60.
27. He, Q.; Wei, K.; Xu, Q. Mixed strategy based improved whale optimization algorithm. Appl. Res. Comput. 2019, 36, 3647–3651.
28. Qiu, X.; Wang, R.; Zhang, W.; Zhang, Z.; Zhang, Q. Improved Whale Optimizer Algorithm Based on Hybrid Strategy. Comput.
Eng. Appl. 2022, 58, 70–78.
29. Chen, Y.; Han, B.; Xu, G.; Kan, Y.; Zhao, Z. Spatial Straightness Error Evaluation with Improved Whale Optimization Algorithm.
Mech. Sci. Technol. Aero. Eng. 2022, 41, 1102–1111.
30. Xu, J.; Yan, F. The Application of Improved Whale Optimization Algorithm in Power Load Dispatching. Oper. Res. Manag. Sci.
2020, 29, 149–159.
31. Naderi, E.; Mirzaei, L.; Pourakbari-Kasmaei, M.; Cerna, F.V.; Lehtonen, M. Optimization of active power dispatch considering
unified power flow controller: Application of evolutionary algorithms in a fuzzy framework. Evol. Intell. 2024, 17, 1357–1387.
[CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.