0% found this document useful (0 votes)
14 views

Symmetry

The operational stability of the power transformer is essential for maintaining the symmetry, balance, and security of power systems. Once the power transformer fails, it will lead to heightened instability within grid operations. Accurate prediction of oil temperature is crucial for efficient transformer operation. The proposed model performs effectively across various prediction steps, consistently outperforming benchmark models.

Uploaded by

RoboTrigger
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

Symmetry

The operational stability of the power transformer is essential for maintaining the symmetry, balance, and security of power systems. Once the power transformer fails, it will lead to heightened instability within grid operations. Accurate prediction of oil temperature is crucial for efficient transformer operation. The proposed model performs effectively across various prediction steps, consistently outperforming benchmark models.

Uploaded by

RoboTrigger
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

SS symmetry

Article
Top-Oil Temperature Prediction of Power Transformer Based
on Long Short-Term Memory Neural Network with
Self-Attention Mechanism Optimized by Improved
Whale Optimization Algorithm
Dexu Zou 1,2 , He Xu 3 , Hao Quan 3, * , Jianhua Yin 3 , Qingjun Peng 2 , Shan Wang 2 , Weiju Dai 2 and Zhihu Hong 2

1 School of Electrical Engineering, Chongqing University, Chongqing 400044, China; [email protected]


2 Electric Power Research Institute, China Southern Power Grid Yunnan Power Grid Co., Ltd.,
Kunming 650217, China; [email protected] (Q.P.); [email protected] (S.W.); [email protected] (W.D.);
[email protected] (Z.H.)
3 School of Automation, Nanjing University of Science and Technology, Nanjing 210094, China;
[email protected] (H.X.); [email protected] (J.Y.)
* Correspondence: [email protected]

Abstract: The operational stability of the power transformer is essential for maintaining the symmetry,
balance, and security of power systems. Once the power transformer fails, it will lead to heightened
instability within grid operations. Accurate prediction of oil temperature is crucial for efficient trans-
former operation. To address challenges such as the difficulty in selecting model hyperparameters
and incomplete consideration of temporal information in transformer oil temperature prediction,
a novel model is constructed based on the improved whale optimization algorithm (IWOA) and
long short-term memory (LSTM) neural network with self-attention (SA) mechanism. To incorporate
holistic and local information, the SA is integrated with the LSTM model. Furthermore, the IWOA is
employed in the optimization of the hyper-parameters for the LSTM-SA model. The standard IWOA
is improved by incorporating adaptive parameters, thresholds, and a Latin hypercube sampling
Citation: Zou, D.; Xu, H.; Quan, H.; initialization strategy. The proposed method was applied and tested using real operational data
Yin, J.; Peng, Q.; Wang, S.; Dai, W.;
from two transformers within a practical power grid. The results of the single-step prediction experi-
Hong, Z. Top-Oil Temperature
ments demonstrate that the proposed method significantly improves the accuracy of oil temperature
Prediction of Power Transformer
prediction for power transformers, with enhancements ranging from 1.06% to 18.85% compared to
Based on Long Short-Term Memory
benchmark models. Additionally, the proposed model performs effectively across various prediction
Neural Network with Self-Attention
Mechanism Optimized by Improved
steps, consistently outperforming benchmark models.
Whale Optimization Algorithm.
Symmetry 2024, 16, 1382. https:// Keywords: power transformer; top-oil temperature prediction; self-attention mechanism; whale
doi.org/10.3390/sym16101382 optimization algorithm; long short-term memory networks

Academic Editor: Hsien-Chung Wu

Received: 7 August 2024


Revised: 30 September 2024 1. Introduction
Accepted: 13 October 2024 Power transformers undertake a vital role in the symmetrical operation of power
Published: 17 October 2024
systems [1]. They serve as critical infrastructure for power transmission and distribution,
with extensive applications in various other fields, such as transportation [2]. Once the
power transformer fails, it can severely disrupt the normality of the power system operation,
Copyright: © 2024 by the authors.
potentially causing widespread power outages and significant economic losses [3]. As a
Licensee MDPI, Basel, Switzerland. vital component of the power system, the stable operation of the transformer is fundamental
This article is an open access article to maintaining the symmetry and balance of the power system [4,5].
distributed under the terms and Top oil temperature is significant for determining whether the transformer can main-
conditions of the Creative Commons tain normal operation. In practice, the transformer internal faults rely on the trend of the oil
Attribution (CC BY) license (https:// temperature to make judgments [6,7]. Therefore, the good performance of oil temperature
creativecommons.org/licenses/by/ prediction helps professionals find problems promptly in the transformer’s daily opera-
4.0/). tion and maintenance. By reliably forecasting oil temperature, we can not only prevent

Symmetry 2024, 16, 1382. https://doi.org/10.3390/sym16101382 https://www.mdpi.com/journal/symmetry


Symmetry 2024, 16, 1382 2 of 17

unexpected failures but also optimize maintenance schedules, reduce operational risks,
and extend the transformer’s lifespan. Effective oil temperature prediction enhances the
overall reliability and efficiency of the power system, making it an essential component in
maintaining the symmetrical operation of the electrical grid.
Researchers generally study the prediction of transformer oil temperatures through
mathematical and data-driven models [8–10]. Zhao et al. used the least squares method
to establish a parameter identification algorithm [11], and this mathematical model can
effectively predict the top oil temperature but lacks strong generalization ability. Wang et al.
establish a thermal circuit model to simulate the changes in the transformer temperature
over time, but it has a lengthy computation time [12].
With the development of intelligent algorithms, artificial intelligence technologies have
been applied to the field of power system forecasting. Interesting studies can be found in the
fields of load forecasting [13], vehicle-to-grid (V2G) scheduling prediction [14], and solar
irradiance forecasting [15]. There have been some research efforts focused on predicting
transformer oil temperature using these algorithms. Qing et al. developed a model based
on artificial neural networks for forecasting the top oil temperature of transformers [16],
and this model significantly reduces the computational time but ignores the selection
of optimal hyperparameters. Tan et al. proposed a forecast model that considers path
analysis and similar moments [17], but the validation dataset is small and the adaptability
is difficult to confirmed. Li et al. introduced a regression model with enhanced particle
swarm optimization (PSO) for transformer top oil temperature forecast [18]. However, the
large sampling interval of data caused the substandard performance. Based on a similar
day, Tan et al. introduced a method to predict top oil temperature. The above approach
relies solely on single-day similarity for prediction and deteriorates the model prediction
performance [19]. To sum up, these studies do not fully consider temporal information
of different input features, thus failing to combine global and local information within
transformer operational data. In addition, the optimal hyper-parameters of the model are
difficult to determine.
To tackle the issues mentioned, this paper introduces a novel method: an improved
whale optimization algorithm (IWOA) optimized long short-term memory (LSTM) neural
network with self-attention (SA) mechanism model. The proposed method comprehen-
sively addresses challenges related to the difficulty in selecting hyperparameters for the oil
temperature prediction model and the insufficient consideration of temporal information. It
integrates SA with LSTM and utilizes the IWOA to obtain the optimal hyper-parameters for
the LSTM-SA model, resulting in high prediction accuracy. Finally, the proposed method is
tested with actual operating data in a practical power grid. The results demonstrate that
the proposed method has better forecasting performance.
The remaining sections of this paper are as below: Section 2 discusses the power trans-
former and top-oil temperature. Section 3 introduces the LSTM-SA model and the IWOA.
Section 4 presents a case study that shows the superiority of the IWOA for optimization
and the effectiveness of the proposed method for predicting top-oil temperature. Finally,
conclusions and discussions are presented in Section 5.

2. Power Transformer and Top-Oil Temperature


The top oil temperature of a transformer is a crucial indicator for measuring the re-
liability of transformer operation, monitoring the internal insulation status. Accurately
predicting the top oil temperature of the power transformer is of great significance for
analyzing potential faults, carrying out transformer operation and maintenance, main-
taining the symmetry and balance of the power system, and achieving early warning of
transformer failures. It is a key factor in limiting the transformer’s load capacity and
assessing its operational lifespan.
There are two merits to considering top oil temperature as the subject of study. First,
researchers can easily access real-time monitoring data for the transformer’s top oil tem-
perature, thanks to advanced sensor technologies and the widespread implementation
Symmetry 2024, 16, x FOR PEER REVIEW 3 of 18

Symmetry 2024, 16, 1382 3 of 17


There are two merits to considering top oil temperature as the subject of study. First,
researchers can easily access real-time monitoring data for the transformer’s top oil tem-
perature, thanks to advanced sensor technologies and the widespread implementation of
of smart grids. This accessibility facilitates continuous monitoring and data collection,
smart grids. This accessibility facilitates continuous monitoring and data collection, which
which are essential for accurate prediction and timely intervention. Second, the hot spot
are essential for accurate prediction and timely intervention. Second, the hot spot temper-
temperature that is difficult to obtain can be calculated from the transformer top oil tem-
ature that is difficult to obtain can be calculated from the transformer top oil temperature.
perature. Hot spot temperature is crucial, as it represents the highest temperature within
Hot spot temperature is crucial, as it represents the highest temperature within the trans-
the transformer and is a direct indicator of the condition of the transformer’s insulation.
former and is a direct indicator of the condition of the transformer’s insulation. Accurate
estimation estimation
Accurate of this temperature
of this temperature is vital for
is vital for predicting thepredicting
remainingthelife remaining life of the
of the insulation
insulation
and planningand planning maintenance
maintenance activities. activities.
The above advantages havehave
The above advantages mademade
the topthe
oiltop oil temperature
temperature highlybyfavored
highly favored research-by re-
ers, and it has now become a hot research topic [20]. The basic construction of an oil-im- of
searchers, and it has now become a hot research topic [20]. The basic construction
an oil-immersed
mersed transformer transformer
is graphicallyis represented
graphically in represented in Figure
Figure 1. This 1. Thisonpaper
paper focuses focuses
improv-
ing the accuracy of oil temperature prediction, particularly in addressing the challenges the
on improving the accuracy of oil temperature prediction, particularly in addressing
challenges
posed by theposed by the nonlinearity
nonlinearity and time-series and time-series characteristics
characteristics of the data. of the data.

Figure
Figure1.1.The
Thebasic
basicconstruction
constructionof of
anan
oil-immersed transformer.
oil-immersed transformer.

3. The
3. TheProposed
ProposedIWOA-LSTM-SA
IWOA-LSTM-SA Method
Method forfor Top-Oil
Top-Oil Temperature
Temperature Prediction
Prediction
3.1.Framework
3.1. Framework
Inthis
In thisstudy,
study,IWOA-LSTM-SA
IWOA-LSTM-SA hashas been
been developed
developed for for transformer
transformer oil temperature
oil temperature
forecasting,ininwhich
forecasting, whichIWOA
IWOA has
has been
been employed
employed to precisely
to precisely search
search optimal
optimal inputinput
hyper-hyper-
parameters
Symmetry 2024, 16, x FOR PEER REVIEW and LSTM-SA as the forecasting model to combine global and local
parameters and LSTM-SA as the forecasting model to combine global and local infor-18 information.
4 of
The flowchart
mation. is presented
The flowchart in Figure
is presented 2.
in Figure 2.

Data Preprocessing IWOA optimization framework


Setting up the framework
Begin
parameters

Transformer historical data


WOA algorithm initialization

Three phase Three phase Oil


voltrage current
··· temperature
Calculate the fitness values for
all individuals

Data preprocessing

Select individuals with high


Normalized processing is performed, and fitness values and record their
the data is divided into a training set and a location
test set.

Update individual positions


Obtain the optimal parameters
Optimized LSTM-SA operation

Set the model by the obtained parameters No


Has the maximum number of
iterations been reached?
Train the model and make predictions on
the test set Yes

Output global
Output prediction results optimal location

End

Figure 2.
Figure 2. Flow
Flow chart
chartof
ofIWOA-LSTM-SA.
IWOA-LSTM-SA.

The main phases of the IWOA-LSTM-SA will be detailed in the following sections.

3.2. LSTM Integrated by SA


LSTM is a specialized type of recurrent neural network (RNN), specifically designed
Symmetry 2024, 16, 1382 4 of 17

The main phases of the IWOA-LSTM-SA will be detailed in the following sections.

3.2. LSTM Integrated by SA


LSTM is a specialized type of recurrent neural network (RNN), specifically designed
to process temporal data sequences. On the basis of traditional RNN, LSTM introduces the
concept of “gating”, which not only overcomes the gradient vanishing but also selects sam-
ples. Therefore, LSTM is more suitable for solving nonlinear temporal structure problems.
Each memory block of an LSTM comprises one or more self-connected memory cells and
three gating units: the input gate, the output gate, and the forget gate. The specific structure
of the gate is shown in Figure 3. The forgetting gate is responsible for deciding which
information should be discarded from the cell state, effectively determining the extent to
which the previous cell state is preserved within the current cell state. The calculation
equation is as below:
mt = σ (Wm × [rt−1 , xt ] + pm ) (1)
The input gate controls which the current input is stored in the unit state. The formulas
for input gates and candidate cell states is as below:

st = σ (Ws × [rt−1 , xt ] + ps ) (2)

The output gate regulates the current output and decides the output information. The
formula for calculation is given below:

gt = σ Wg · [rt−1 , xt ] + p g (3)

Symmetry 2024, 16, x FOR PEER REVIEW rt = ot · tanh(Ct ) (4)


The formula for calculating the cell state is as below:

Cet = tanh(WC × [rt−1 , xt ] + pC ) (5)


C = m ⋅ C + st ⋅ C 
Ct = mt · Ctt−1 + st t · Cet −t 1 t
(6)

Forget gate
rt

Ct−1 × + Ct

tanh
st × gt
×
mt

σ tanh σ
σ

rt−1 rt

Xt Input gate Ouput gate

Figure 3.3.LSTM
Figure LSTMstructure diagram.
structure diagram.
In summary, LSTM is suitable for processing time series data, so this paper uses LSTM
In summary,
to establish LSTM
a temperature is suitable
prediction model. forFurthermore,
processingittime seriestodata,
is difficult process solong
this pape
LSTM
sequence todata
establish a temperature
for the LSTM model that weprediction model.
introduce SA Furthermore,
to solve it ismethod
this problem. This difficult to p
considers
long both local
sequence and
data global
for information.
the LSTM model that we introduce SA to solve this problem
It consists of three components. Firstly, the data that come from the LSTM model is
method considers both local and global information.
the input of the SA layer. Secondly, the matrices q, k, and v are calculated using the weight
It consists
matrices of three
Wq , Wk , and components.
Wv . Thirdly, a1,2 is the Firstly, thebetween
dot product data that come
q1 and from
k2 , and a2,2the LSTM mo
is the
the input of the SA layer. Secondly, the matrices q , k , and v are calculated usin
1,2
weight matrices W q , Wk , and Wv . Thirdly, a is the dot product between q1 a
2,2
and a is the dot product between q2 and k2 . The attention matrix M means th
relation between different time steps. The structure is shown in Figure 4.
LSTM to establish a temperature prediction model. Furthermore, it is difficult to proces
long sequence data for the LSTM model that we introduce SA to solve this problem. Thi
method considers both local and global information.
It consists of three components. Firstly, the data that come from the LSTM model i
Symmetry 2024, 16, 1382 the input of the SA layer. Secondly, the matrices q , k , and v are calculated using the
5 of 17

weight matrices W q , Wk , and Wv . Thirdly, a1,2 is the dot product between q1 and k2
2,2
dot a is between
andproduct the dot product
q2 and k2between q2 and
. The attention k2 . The
matrix attention
M means the matrix
correlation means the cor
M between
different
relation time steps.different
between The structure
timeissteps.
shown in Figure
The structure4. is shown in Figure 4.

Self-Attention Attention
bt
Mechanism Layer matrix M Attention matrix

···
M

b2
b1

Softmax Softmax Softmax

a 1,1 a 2 ,1 ··· a t ,1 Matmul a1, 2 a 2,2 ··· at ,2 Matmul a1,t a2,t ··· at , t Matmul

· · ·

q1 k1 v1 q2 k2 v2 qt kt vt
Wq Wk Wv Wq Wk Wv ··· Wq Wk Wv

LSTM Layer
h1 h2 hn Output

C1 C2 Cn
X + C1 X + C2 X + Cn

X tanh tanh X tanh tanh X tanh tanh

Forget Input Output Forget Input Output


··· Forget Input Output
gate gate gate gate gate gate gate gate gate

+ X + X + X

X1 X2 Xt

Datainput Datainput Datainput


LSTM Layer

Figure4.4.LSTM-SA
Figure LSTM-SA structure.
structure.

3.3. Hyper-Parameters Optimization by IWOA


3.3. Hyper-Parameters Optimization by IWOA
The Whale Optimization Algorithm (WOA) was introduced to deal intricate optimiza-
The Whale Optimization Algorithm (WOA) was introduced to deal intricate optimi
tion problems by Mirjalili et al. [21,22]. The WOA can be formulated as the following steps:
zation problems
encircling by Mirjalili
prey, bubble-net et al.method
attacking [21,22].and
The WOA
search forcan
prey.be formulated as the following
steps: encircling prey, bubble-net attacking method and search for prey.
3.3.1. Encircling Prey
3.3.1.Humpback
Encircling Prey can identify and encircle their prey. In the population, the remain-
whales
ing whales will try to adjust their positions towards the direction of the best search agent
as defined by the equation:
→ →∗ → →→∗ →
G ( t + 1) = G ( t ) − A C G ( t ) − G ( t ) (7)

→ →∗
where t denotes the current iteration; G is a vector indicating the position; G is the place
→ →
vector of the best solution acquired yet, A and C are calculated from the following:
→ →→ →
A = 2a r 1 − a (8)
→ →
C = 2r2 (9)
→ → →
where a is an adjustment vector and a is linearly decreasing from 2 to 0; the vectors r1 and

r2 are random vectors that fall within the range of [0, 1].

3.3.2. Bubble-Net Attacking Method


Humpback whale predation consists of two main mechanisms: shrinkage bracketing
mechanism and the spiral updating location.
→ →
(1) Shrinkage bracketing mechanism: As a decreases, A represents an any value within
the range of [−1, 1]. The new position is determined by the distance between its
Symmetry 2024, 16, 1382 6 of 17

original position and the position of the currently best-so-far whale. The equation for
calculation is as below:
 
→ t
a = 2× 1− (10)
tmax
(2) Spiral updating location: the WOA uses spiral updating location to launch attacks on
prey, and the spiral hunting equation is as below:

→ →∗ → →∗
G (t+1) = ebl cos(2πl ) · G (t) − G (t) + G (t) (11)

where l is a random count within the interval [−1, 1] and b represents a constant. They
approach the prey using two mechanisms: a shrinking circle and a spiral-shaped path. The
updated equations are as follows.
 ∗
→ → → →∗ →
 G (t) − A C · G (t) − G (t) , p < 0.5

→ 
G ( t +1) = →∗ → →∗ (12)
 ebl cos(2πl ) · G (t) − G (t) + G (t), p ≥ 0.5

where p falls within the range of [0,1].

3.3.3. Search for Prey


Humpback whales search for their prey randomly, with their locations varying relative
to each other. In this stage, the position of a searching whale is modified according to the
position of a randomly selected whale, as opposed to being updated based on the current
best whale. The calculation formula is as listed below:
→ → → →→ →
G (t+1) = G rand (t) − A · C G rand (t) − G (t) (13)


where G rand denotes the random location of a whale.

3.3.4. Improved Whale Optimization Algorithm


The original WOA faces certain limitations, particularly in terms of inadequate local
search capabilities and insufficient population diversity. Therefore, it is necessary to
further improve the strategy and adjust the algorithm [23]. For example, Naderi et al.
proposed a Whale Optimization Algorithm enhanced by wavelet mutation, aimed at
improving the algorithm’s convergence characteristics to address the complex trade-off
between generation costs and water consumption [24]. In this study, an approach takes a
different direction by introducing three key improvements: Latin Hypercube Sampling
for more diverse and uniform population initialization, an adaptive selection threshold to
dynamically adjust the whale’s movement strategy, and a nonlinear parameter adjustment
to enhance local search capabilities. These modifications are designed to address different
aspects of the original WOA’s limitations. The specific improvements are as follows:
(1) Latin Hypercube Sampling (LHS) initialization of population: as stated in [25], popu-
lation initialization plays a crucial role in swarm intelligence optimization algorithms.
In WOA, population initialization follows a random approach. However, it can lead
to uneven population distribution and individual overlap [26]. Therefore, it is neces-
sary to optimize the population initialization. IWOA incorporates LHS to increase
the diversity of initial population, and this method can initialize population more
uniformly and efficiently.
(2) Adaptive selection threshold: in WOA, the whales choose either encircling activity or
spiral movement with 50% probability. However, this method prevents the whale pop-
ulation from choosing the appropriate movement for the current population [27,28]. In
Symmetry 2024, 16, 1382 7 of 17

this paper, an adaptive selection threshold is used to replace the fixed threshold. The
method automatically adjusts the threshold according to the problem’s characteristics
throughout the search process. The calculation is given by the following formula:
" !#
t et tf
pa = 1 − × L× +f× (14)
( L + f )tmax etmax tmax f
where t denotes the current iteration, while tmax denotes the maximum iteration count; L,
f are control parameters, and their values are 2 and 4, respectively.
In our method, when the threshold is larger in the initial stage, the whale will pref-
erentially choose the encircling movement strategy. With the increasing of iterations, the
threshold decreases, thus the whale is more likely to choose the spiral motion strategy.
Equation (12) is updated to Equation (15).
 ∗
→ → → →∗ →
 G (t) − A C · G (t) − G (t) , p < p a



G ( t +1) = →∗ → →∗ (15)
 ebl cos(2πl ) · G (t) − G (t) + G (t), p ≥ p a


(3) Adaptive parameter: in traditional method, a decreases linearly from 2 to 0. In
order to enhances local searching ability, this study uses a nonlinear strategy to
adjust b in Equation (16), which influences the shape of the logarithmic spiral. It
can significantly improve the effectiveness of local search and the speed of global
search, thereby enhancing overall accuracy [29]. At the same time, we establish a
relationship between b and t to achieve adaptive adjustment. Equation (10) is updated
to Equation (16).

Symmetry 2024, 16, x FOR PEER REVIEW  ⇀  q  8 of 1


 a (t) = 2 × 1 − tanh k t
t max
  (16)
 b(t) = v − v × t
tmax

wherek,k,v are
where v are control
control parameters,
parameters, and their
and their valuesvalues are 10,
are 4 and 4 and 10, respectively.
respectively.
The IWOA flowchart is illustrated in Figure
The IWOA flowchart is illustrated in Figure 5. 5.

Figure5.5.Flow
Figure Flow chart
chart of the
of the IWOA.
IWOA.

4. Case Studies and Results Analysis


4.1. Data Source
This study includes two datasets. Dataset 1 consists of transformer operation dat
collected from a 500 kV substation from 1 April to 30 June in 2022, with a sampling period
of half an hour. In total, there are 4368 samples. The characteristic parameters includ
Symmetry 2024, 16, 1382 8 of 17

4. Case Studies and Results Analysis


4.1. Data Source
This study includes two datasets. Dataset 1 consists of transformer operation data
collected from a 500 kV substation from 1 April to 30 June in 2022, with a sampling period
of half an hour. In total, there are 4368 samples. The characteristic parameters include
high-voltage-side three-phase current (AI , BI , CI ), active and reactive power (P, Q), high-
voltage-side three-phase voltage (AU , BU , CU ), and top-oil temperature (T). This paper used
the Pearson correlation coefficient method to select features, and the results are shown in
Table 1. Dataset 2 consists of transformer operation data collected from a 220 kV substation
from 10 February 2021 to 10 February 2022, with a sampling period of half an hour. In total,
there are 17,518 samples.

Table 1. Correlation matrix.

AI BI CI P Q AU BU CU T
AI 1.000 0.999 0.999 0.999 0.925 −0.862 −0.866 −0.835 0.371
BI 0.999 1.000 0.999 0.999 0.924 −0.863 −0.866 −0.835 0.371
CI 0.999 0.999 1.000 0.999 0.925 −0.862 −0.866 −0.835 0.371
P 0.999 0.999 0.999 1.000 0.925 −0.857 −0.859 −0.828 0.369
Q 0.925 0.924 0.925 0.925 1.000 −0.842 −0.844 −0.823 0.372
AU −0.862 −0.863 −0.862 −0.857 −0.842 1.000 0.979 0.964 −0.346
BU −0.866 −0.866 −0.866 −0.859 −0.844 0.979 1.000 0.981 −0.342
CU −0.835 −0.835 −0.835 −0.828 −0.823 0.964 0.981 1.000 −0.339
T 0.371 0.371 0.371 0.369 0.372 −0.346 −0.342 −0.339 1.000

As shown in Table 1, the correlation coefficient between the top-oil temperature and
the high-voltage side three-phase current is 0.371, and the correlation coefficients with
active power and reactive power are 0.369 and 0.372, respectively, indicating a positive
correlation. The correlation coefficients between the top-oil temperature and the high-
voltage side three-phase voltage are −0.346, −0.342, and −0.339, respectively, indicating a
negative correlation with the top-oil temperature. This also suggests that the high-voltage
side three-phase voltage, current, and active and reactive power have some influence on
the transformer oil temperature. Similarly, a correlation analysis of the input features of
Dataset 2 based on the Pearson correlation coefficient method is conducted. Ultimately,
this paper selects high-voltage-side current, active and reactive power, voltage, and top-oil
temperature as input features. The dataset is split into training and test sets, in which 80%
is used for training and 20% for testing.

4.2. Comparison of Algorithm Optimization Results


This paper compared the performance of IWOA with traditional methods, which
consist of GA, PSO, and the original WOA. Appendix A, Table A1 presents the ten test
functions employed for evaluation, which are derived from the studies conducted in [30,31].
In Appendix A, Table A1: Each function has a dimension of 30, and the minimum
value is 0. To ensure the fairness of the comparison, the iteration is set to 500. The crossover
probability of GA is set to 1, and the variance probability is 0.1. Meanwhile, the learning
factor c1 = c2 = 2 for PSO, and b is 10 for WOA. Each algorithm runs independently 30 times.
The average and the best results are utilized for comparison, as shown in Table 2. The
average convergence curve of each algorithm is shown in Figure 6.
In Table 2, the optimal value reaches 0 in the F5 , F6 and F8 functions, and the average
values also show significant improvement. As shown in Figure 6, IWOA exhibits better
convergence performance compared to traditional algorithms. These findings confirm the
effectiveness of the enhancement strategies for WOA.
Symmetry 2024, 16, 1382 9 of 17

Table 2. Comparison of test results for each algorithm.


Symmetry 2024, 16, x FOR PEER REVIEW 9 of 18
Function Evaluation Index GA PSO WOA IWOA
Mean 3602.311 0.035 7.21 × 10−10 1.46 × 10−19
F1 −
Best 1454.955 0.001 3.32 × 10 13 1.17 × 10−24
the transformer oil temperature.
Mean Similarly,
21.197 a correlation
32.013 analysis5.16of×the
10−input
9 features
1.73 × 10−13 of
Dataset F2 2 based on the Pearson correlation coefficient method is conducted. Ultimately,
Best 13.936 0.081 5.12 × 10−9 2.24 × 10−15
this paper selects high-voltage-side
Mean current,
3477.958 active 0.047
and reactive8.98 × 10−voltage,
power, 10 4.16and −20
× 10top-
F3 − 12 −
oil temperature as input Bestfeatures. The1771.241 0.001
dataset is split 1.68 ×
into training and 10 test sets, 10 22
1.42in×which
80% isFused Mean
for training 1.432
and 20% for testing. 5.176 0.015 0.00075
4
Best 0.413 0.065 0.003 0.00014
Mean 28.474 51.152 0 0
4.2. Comparison
F5 of Algorithm Optimization Results
Best 5.522 0 0 0
This paper comparedMean the performance
91.831 of IWOA 127.257with traditional
0.462 methods, 10−16
1.78 ×which
F6 − 11
consist of GA, PSO, and Bestthe original 64.795
WOA. Appendix 69.170A, Table A1× presents
6.78 10 the ten 0 test
functions employed Mean
for evaluation, 11.337 are derived
which 2.028 from the3.936 studies 1.49 × 10−11
conducted in
F7
[30,31]. Best 9.197 0.023 8.06 × 10−7 1.35 × 10−12
Mean 77.000 551.976 0.988 0
InF8Appendix A, Table A1: Each function has a dimension of 30, and the minimum
Best 35.494 185.625 0 0
value is 0. To ensure theMean fairness of the comparison,
75.910 the iteration is
727.867 set to 500. The
−0.898 crosso-
−0.829
F9
ver probability of GA Best
is set to 1, and the variance probability is 0.1. Meanwhile, the learn-
28.593 479.302 −0.967 −0.986
ing factor c1 = c2 = 2 for PSO, and b is 73.449
Mean 10 for WOA.596.665
Each algorithm−runs 0.890 independently
−0.796 30
F10
times. The average and Best 26.910
the best results 332.989
are utilized for comparison, −0.980
as shown in−Table0.899 2.
The average convergence curve of each algorithm is shown in Figure 6.

(a) Convergence of algorithms on function F1 (b) Convergence of algorithms on function F2

(c) Convergence of algorithms on function F3 (d) Convergence of algorithms on function F4

Figure 6. Cont.
Symmetry2024,
Symmetry 16,x1382
2024,16, FOR PEER REVIEW 10 10
ofof1817

(e) Convergence of algorithms on function F5 (f) Convergence of algorithms on function F6

(g) Convergence of algorithms on function F7 (h) Convergence of algorithms on function F8

(i) Convergence of algorithms on function F9 (j) Convergence of algorithms on function F10


Figure
Figure6.6.Average
Averageconvergence
convergencecurves
curvesfor
foreach
eachalgorithm.
algorithm.

4.3. In
One-Step
Table 2,Prediction
the optimal value reaches 0 in the F5, F6 and F8 functions, and the average
valuesSingle-step
also show oil
significant improvement.
temperature predictionAs shownforecasting
involves in Figure 6,theIWOA exhibits better
transformer’s top oil
convergence performance compared to traditional algorithms. These findings
temperature for the next time step using historical data. In this experiment, the confirm the
prediction
effectiveness
is for 30 minofinto
thethe
enhancement strategies
future. To balance thefor WOA.and testing errors, we introduced L2
training
regularization and dropout during the model training. Specifically, a dropout rate of 0.1
Table
was 2. Comparison
applied, alongofwith
test results for each algorithm.
L2 regularization using a factor of 0.01. The prediction results
for Dataset 1, demonstrating
Function Evaluation Index the effectiveness
GA ofPSO
the method, WOA
are presentedIWOA in Figure 7.
To further illustrate the trade-off between training and testing errors,−10 Figure 8 provides a
Mean 3602.311 0.035 7.21 × 10 1.46 × 10−19
F1
comparison of the training and testing errors.
Best 1454.955 0.001 3.32 × 10−13 1.17 × 10−24
Mean 21.197 32.013 5.16 × 10 −9 1.73 × 10−13
F2
Best 13.936 0.081 5.12 × 10−9 2.24 × 10−15
Mean 3477.958 0.047 8.98 × 10 −10 4.16 × 10−20
F3
Best 1771.241 0.001 1.68 × 10 −12 1.42 × 10−22
Mean 1.432 5.176 0.015 0.00075
F4
Best 0.413 0.065 0.003 0.00014
Mean 28.474 51.152 0 0
is for 30 min into the future. To balance the training and testing errors, we introduced L2
regularizationand
regularization anddropout
dropoutduring
duringthe
themodel
modeltraining.
training.Specifically,
Specifically,aadropout
dropoutraterateofof0.1
0.1
was applied, along with L2 regularization using a factor of 0.01. The prediction results
was applied, along with L2 regularization using a factor of 0.01. The prediction results for fo
Dataset1,1,demonstrating
Dataset demonstratingthetheeffectiveness
effectivenessofofthe
themethod,
method,are arepresented
presentedininFigure
Figure7.7.To To
Symmetry 2024, 16, 1382 further illustrate the trade-off between training and testing errors, Figure
further illustrate the trade-off between training and testing errors, Figure 11 8 provides
8 of
provides
17 aa
comparisonofofthe
comparison thetraining
trainingand
andtesting
testingerrors.
errors.

Figure7.
Figure
Figure 7.7.The
Theprediction
The prediction
prediction results
results
results ofofIWOA-LSTM-SA.
IWOA-LSTM-SA.
of IWOA-LSTM-SA.

Figure8.
Figure
Figure 8.8.Training
Training
Training and
and
and testing
testing
testing errors
errors
errors over
over
over iterations.
iterations.
iterations.

Theoretically,
Theoretically,
Theoretically, when
whenthere
when is a is
there
there significant gap between
isaasignificant
significant trainingtraining
gapbetween
gap between and test errors,
training andtest
and it usually
test errors,ititusu-
errors, usu
indicates over-fitting,
allyindicates where
indicatesover-fitting, the
over-fitting,where model
wherethe performs
themodel well
modelperforms on
performswellthe training
wellon onthe data but
thetraining struggles
trainingdata databut
butstrug-
strug
ally
to generalize to unseen data. As illustrated in Figure 8, both the training and test losses
gles to generalize to unseen data. As illustrated in Figure 8,
gles to generalize to unseen data. As illustrated in Figure 8, both the training and test both the training and tes
decrease rapidly during the initial epochs and then converge to similar values as training
lossesdecrease
losses decrease
progresses.
rapidlythat
rapidly
This suggests
during
during theachieved
the
we have
initialepochs
initial epochs andthen
and
a well-balanced
then converge
converge
trade-off totosimilar
between
similar valuesas
values
training
a
training
training
and progresses.
progresses.
testing This
This
errors. This suggests
suggests
balance was thatthatwewehave
successfullyhave achieved
achieved
attained aawell-balanced
well-balanced
by applying regularization trade-offbe-
trade-off be
tween
tween training
training and
and testing
testing errors.
errors. This
This balance
balance was
was successfully
successfully
techniques, such as L2 regularization and dropout, which helped control model complexity, attained
attained byby applying
applying reg
reg-
ularization
ularization
mitigate techniques,
techniques,
over-fitting, such as
such
and enhance as
the L2
L2 regularization
regularization
model’s andcapabilities.
and
generalization dropout, which
dropout, which helped helped control
contro
model
model Tocomplexity,
assess the performance
complexity, of this method,
mitigateover-fitting,
mitigate over-fitting, andthis
and paperthe
enhance
enhance compared
themodel’s
model’s it generalization
with benchmarkcapabil-
generalization capabil
methods,
ities.
ities. including BP, gate recurrent unit (GRU), convolutional neural networks (CNN),
LSTM,To LSTM-SA,
assessthe andperformance
the WOA-LSTM-SA models. In order to reduce the accidental error, this
To assess performance ofofthis
thismethod,
method, this
this papercompared
paper compared ititwith
withbenchmark
benchmark
paper conducted 10 repeated experiments and averaged the results to show the forecasting
methods,including
methods, includingBP, BP,gate
gaterecurrent
recurrentunit unit(GRU),
(GRU),convolutional
convolutionalneural neuralnetworks
networks(CNN), (CNN)
performance. Figure 9 displays the prediction results for each model on Dataset 1. It
LSTM,
LSTM, LSTM-SA, and WOA-LSTM-SA models. In order to reduce
is evident that the proposed model shows the best prediction result compared to all error,
LSTM-SA, and WOA-LSTM-SA models. In order to reduce the
the accidental
accidental error
thispaper
this paperconducted
benchmark conducted
models. The1010repeated
repeated
reason experiments
is thatexperiments
the proposed and andaveraged
averaged
approach not only the
the resultsto
results
combines toshow
both show thefore-
localthe fore
casting
casting
and global performance.
performance. Figure
information Figure 9 displays
but also9utilizes
displays IWOA the
the toprediction
prediction results
determineresults for each
for each
the optimal model on Dataset 1.1
model
hyper-parameters. on Dataset
Table 3 presents the comparative results.
It is evident that the proposed model shows the best prediction result compared to all
benchmark models. The reason is that the proposed approach not only combines both
Symmetry 2024, 16, 1382 local and global information but also utilizes IWOA to determine the optimal hyper-pa-
12 of 17
rameters. Table 3 presents the comparative results.

Figure 9.
Figure 9. Performance
Performance comparison
comparison across
across models.
models.

Table 3. Model
Table 3. Model prediction
prediction evaluation
evaluation indexes.
indexes.

Model
Model RMSERMSE MAE MAE MAPEMAPE(%)
(%) RR22 Time (s)
Time (s)
BP BP 1.698
1.698 1.228
1.228 2.581
2.581 0.825
0.825 13.287
13.287
CNN CNN 1.646
1.646 1.170
1.170 2.462
2.462 0.836
0.836 32.317
32.317
GRU 1.553 1.011 2.144 0.854 96.109
GRU 1.553 1.011 2.144 0.854 96.109
Dataset 1 LSTM 1.633 1.022 2.175 0.838 129.666
Dataset 1 LSTM
LSTM-SA 1.633
1.537 1.022
1.031 2.175
2.253 0.838
0.861 129.666
174.497
WOA-LSTM-SA 1.537
LSTM-SA 1.462 0.998
1.031 2.103
2.253 0.870
0.861 11,058.906
174.497
IWOA-LSTM-SA 1.438 0.989 2.089 0.873 10,083.375
WOA-LSTM-SA 1.462 0.998 2.103 0.870 11,058.906
BP
IWOA-LSTM-SA 1.438 0.923 0.715
0.989 2.428
2.089 0.974
0.873 38.216
10,083.375
CNN 0.824 0.596 1.929 0.979 80.746
BPGRU 0.923
0.758 0.715
0.544 2.428
1.772 0.974
0.982 38.216
165.984
Dataset 2 CNN LSTM 0.824
0.874 0.596
0.643 1.929
2.129 0.979
0.977 80.746
234.946
LSTM-SA
GRU 0.809
0.758 0.576
0.544 1.890
1.772 0.980
0.982 383.995
165.984
Dataset 2 WOA-LSTM-SA
LSTM 0.757
0.874 0.535
0.643 1.739
2.129 0.982
0.977 13,016.477
234.946
IWOA-LSTM-SA 0.749 0.524 1.703 0.983 11,075.689
LSTM-SA 0.809 0.576 1.890 0.980 383.995
WOA-LSTM-SA 0.757 0.535 1.739 0.982 13,016.477
From Table 3, it is evident0.749
IWOA-LSTM-SA that our method
0.524 does not have an advantage
1.703 0.983 in terms of
11,075.689
computation time compared to traditional machine learning models. Therefore, in scenarios
where prediction accuracy is not a primary concern, traditional machine learning models
From Table 3, it is evident that our method does not have an advantage in terms of
can still be considered for top oil temperature prediction of transformers. The prediction
computation time compared to traditional machine learning models. Therefore, in scenar-
model proposed in this paper, however, places a greater emphasis on improving prediction
ios where prediction accuracy is not a primary concern, traditional machine learning mod-
accuracy. To analyze and compare each model more comprehensively, this paper includes
els can still be considered for top oil temperature prediction of transformers. The predic-
a residual plot. Using Dataset 1 as an example, in the residual plot (Figure
Symmetry 2024, 16, x FOR PEER REVIEW of10),
18 the true
tion model proposed in this paper, however, places a greater emphasis on13improving pre-
values are shown on the horizontal axis, while the vertical axis represents the residual
diction accuracy. To analyze and compare each model more comprehensively, this paper
values (percentage).
includes a residual plot. Using Dataset 1 as an example, in the residual plot (Figure 10),
the true values are shown on the horizontal axis, while the vertical axis represents the
residual values (percentage).

Figure
Figure10.10.
Model residuals.
Model residuals.

The residual percentage is relatively higher for the data between 30 and 43 °C and 55
to 60 °C. The reason is as follows: there are about 4000 sample points within the tempera-
ture range of 43 to 55 °C, whereas the temperature ranges of 30~43 °C and 55~60 °C each
contain approximately 200 sample points. This unbalanced distribution leads to low ac-
Symmetry 2024, 16, 1382 13 of 17

The residual percentage is relatively higher for the data between 30 and 43 ◦ C and 55 to
60 ◦ C.
The reason is as follows: there are about 4000 sample points within the temperature
range of 43 to 55 ◦ C, whereas the temperature ranges of 30~43 ◦ C and 55~60 ◦ C each contain
approximately 200 sample points. This unbalanced distribution leads to low accuracy on
sparse samples.

4.4. Ablation Experiment


To comprehensively validate the effectiveness of each component of the proposed
method (IWOA-LSTM-SA), ablation experiments were conducted. Specifically, the exper-
iments compared the following models: LSTM, LSTM-SA, WOA-LSTM, IWOA-LSTM,
and WOA-LSTM-SA, with the LSTM model serving as the benchmark for comparison and
analysis. Results are shown in Table 4.

Table 4. Ablation experiment evaluation metrics.

LSTM LSTM-SA WOA-LSTM IWOA-LSTM WOA-LSTM-SA IWOA-LSTM-SA


RMSE 1.633 1.537 1.596 1.517 1.462 1.438
Dataset 1
MAPE 2.175 2.253 2.141 2.106 2.103 2.089
RMSE 0.874 0.809 0.837 0.782 0.757 0.749
Dataset 2
MAPE 2.129 1.890 2.042 1.814 1.739 1.703

As shown in Table 4, the proposed model demonstrates higher prediction accuracy


compared to the baseline model LSTM and other comparative models. Compared to LSTM,
the RMSE of LSTM-SA decreased by 5.88% on Dataset 1 and by 7.44% on Dataset 2; the
MAPE increased by 3.59% on Dataset 1 but decreased by 11.23% on Dataset 2. This validates
the effectiveness of combining the SA algorithm with LSTM. Compared to LSTM-SA, the
RMSE of WOA-LSTM-SA and IWOA-LSTM-SA decreased by 4.88% and 6.44% on Dataset
1, and by 6.43% and 7.42% on Dataset 2, respectively. The MAPE decreased by 6.66% and
7.28% on Dataset 1, and by 7.99% and 9.89% on Dataset 2, respectively. This validates
the effectiveness of the optimization algorithms proposed in the models. Additionally,
compared to WOA-LSTM and IWOA-LSTM, the RMSE of the proposed model decreased
by 9.89% and 5.21% on Dataset 1, and by 10.51% and 4.22% on Dataset 2, respectively. The
MAPE decreased by 2.43% and 0.81% on Dataset 1, and by 16.60% and 6.12% on Dataset 2,
respectively.
In summary, compared to using optimization algorithms or SA individually, combin-
ing them results in a greater improvement in the performance of the prediction model.

4.5. Multi-Step Forecasting


The multi-step prediction model refers to a model that predicts a series of values rather
than a single value. Multi-step prediction is more important in real-world power system
operations because it provides longer-term temperature trend forecasts, which help to
identify potential issues in advance. Therefore, this section conducts a multi-step prediction
analysis, where the prediction steps are set to 3 steps (90 min) and 5 steps (150 min). The
evaluation metrics are shown in Table 5, and the prediction results (for one week) are
presented in Figure 11.
From Table 5, it can be seen that the error increases as the prediction step increases
across all models. By comparing the RMSE metric, it can be concluded that the proposed
model exhibits better accuracy across different prediction steps compared to the baseline
model. Specifically, in Dataset 1 and Dataset 2, for the 3 step prediction, the RMSE of the
proposed model is 1.537 and 1.015, respectively. This represents reductions of 12.83% and
38.65% compared to the BP model, 6.98% and 20.89% compared to the CNN model, 3.75%
and 13.62% compared to the GRU model, 4.24% and 27.16% compared to the LSTM model,
1.60% and 17.93% compared to the LSTM-SA model, and 1.16% and 4.34% compared to the
WOA-LSTM-SA model. For the 5 step prediction, the RMSE of the proposed model is 1.714
Symmetry 2024, 16, 1382 14 of 17

and 1.634, representing reductions of 12.60% and 11.11% compared to the BP model, 7.61%
and 15.89% compared to the CNN model, 6.49% and 17.30% compared to the GRU model,
5.19% and 14.14% compared to the LSTM model, 4.56% and 12.82% compared to the LSTM-
SA model, and 3.06% and 1.80% compared to the WOA-LSTM-SA model. By analyzing the
multi-step prediction metrics, we conclude that the proposed model demonstrates good
performance across different prediction steps compared to traditional models.

Table 5. Multi-step prediction evaluation metrics.

Step Model RMSE MAE MAPE (%) Time (s)


BP 1.698 1.228 2.581 13.287
CNN 1.646 1.170 2.462 32.317
GRU 1.553 1.011 2.144 96.109
1 (30 min) LSTM 1.633 1.022 2.175 129.666
LSTM-SA 1.537 1.031 2.253 174.497
WOA-LSTM-SA 1.462 0.998 2.103 11,058.906
IWOA-LSTM-SA 1.438 0.989 2.089 10,083.375
BP 1.763 1.382 2.873 14.082
CNN 1.652 1.221 2.557 22.572
GRU 1.597 1.133 2.409 95.775
Dataset 1 3 (90 min) LSTM 1.605 1.164 2.453 179.898
LSTM-SA 1.562 1.162 2.448 229.012
WOA-LSTM-SA 1.555 1.102 2.311 11,746.135
IWOA-LSTM-SA 1.537 1.088 2.308 10,149.217
BP 1.961 1.611 3.351 13.617
CNN 1.855 1.411 2.973 21.579
GRU 1.833 1.387 2.943 98.763
5 (150 min) LSTM 1.808 1.367 2.878 197.507
LSTM-SA 1.796 1.345 2.832 240.519
WOA-LSTM-SA 1.768 1.352 2.859 12,212.086
IWOA-LSTM-SA 1.714 1.294 2.702 10,778.976
BP 0.923 0.715 2.428 38.216
CNN 0.824 0.596 1.929 80.746
GRU 0.758 0.544 1.772 165.984
1 (30 min) LSTM 0.874 0.643 2.129 234.946
LSTM-SA 0.809 0.576 1.890 383.995
WOA-LSTM-SA 0.757 0.535 1.739 13,016.477
IWOA-LSTM-SA 0.749 0.524 1.703 11,075.689
BP 1.654 1.124 4.225 37.313
CNN 1.283 1.012 3.166 79.190
GRU 1.175 0.831 2.821 229.788
Dataset 2 3 (90 min) LSTM 1.394 1.080 3.674 320.336
LSTM-SA 1.237 0.923 3.111 433.645
WOA-LSTM-SA 1.061 0.833 2.746 13,623.563
IWOA-LSTM-SA 1.015 0.750 2.537 11,284.158
BP 1.838 1.568 4.854 37.081
CNN 1.943 1.403 4.933 77.883
GRU 1.976 1.387 4.801 264.860
5(150 min) LSTM 1.903 1.414 4.765 171.239
LSTM-SA 1.874 1.365 4.810 414.213
WOA-LSTM-SA 1.664 1.249 4.298 12,823.645
IWOA-LSTM-SA 1.634 1.229 4.162 10,984.776
ther than a single value. Multi-step prediction is more important in real-world power sys-
tem operations because it provides longer-term temperature trend forecasts, which help
to identify potential issues in advance. Therefore, this section conducts a multi-step pre-
diction analysis, where the prediction steps are set to 3 steps (90 min) and 5 steps (150
Symmetry 2024, 16, 1382 min). The evaluation metrics are shown in Table 5, and the prediction results (for one
15 of 17
week) are presented in Figure 11.

(a) 3-step prediction results for dataset 1 (b) 5-step prediction results for dataset 1

(c) 3-step prediction results for dataset 2 (d) 5-step prediction results for dataset 2
Figure 11.11.
Figure Multi-step
Multi-stepprediction performance
prediction performance comparison
comparison across
across models
models (one week).
(one week).

5. Conclusions
Table 5. Multi-step prediction evaluation metrics.
Oil temperature prediction can effectively prevent symmetrical and asymmetrical
Step
faults in transformers. Model
This paper RMSE
adopts a novel approachMAE
to improveMAPE (%) Time
the performance of (s)
top-oil temperature prediction BPduring transformer
1.698 operations.
1.228The proposed
2.581 model has
13.287
been tested using actual data, and some conclusions can be obtained as follows:
CNN 1.646 1.170 2.462 32.317
(1)To verify the efficacy of the IWOA, this paper conducts tests with eight test functions.
GRU 1.553 1.011 2.144 96.109
The findings demonstrate that the IWOA outperforms GA, PSO, and WOA in terms
1 (30 min)speed and
of convergence LSTM
accuracy. 1.633 1.022 2.175 129.666
LSTM-SA
(2) To verify the effectiveness of the proposed1.537
model, 1.031
extensive 2.253
experiments were 174.497
con-
ducted using actual operating data.
WOA-LSTM-SA 1.462 The experimental results
0.998 indicate
2.103that the pro-
11,058.906
posed approach outperforms current state-of-the-art methods. On Dataset 1, the model
IWOA-LSTM-SA
in RMSE of 15.31%,1.438 0.98911.94%, 6.44%,
2.089 and 1.98%
10,083.375
Datasetachieved
1 reductions 12.64%, 7.41%,
compared to the BP, CNN,BP 1.763
GRU, LSTM, LSTM-SA, and1.382
WOA-LSTM-SA 2.873methods,14.082
re-
spectively. Similarly, on Dataset
CNN 2, the model demonstrated
1.652 1.221 significant improvements,
2.557 22.572
with RMSE reductions of 18.85%, 9.09%, 1.19%, 14.29%, 7.42%, and 1.06% compared
GRU 1.597 1.133 2.409 95.775
to the same benchmark methods.
3 (90 min)
(3) The proposed LSTMeffectively1.605
model performs 1.164
across various 2.453
prediction steps compared 179.898
to
benchmark models. Specifically,
LSTM-SAfor the 3-step
1.562 prediction,
1.162the RMSE2.448
of the proposed
229.012
model is 1.537 and WOA-LSTM-SA
1.015 for Dataset 1 and1.555
Dataset 2, respectively,
1.102 reflecting
2.311 reductions
11,746.135
of 12.83% and 38.65% compared to the BP model, 6.98% and 20.89% compared to
the CNN model, 3.75%IWOA-LSTM-SA 1.537 to the GRU
and 13.62% compared 1.088model, 4.24%
2.308and 27.16%
10,149.217
compared to the LSTM model, 1.60% and 17.93% compared to the LSTM-SA model,
and 1.16% and 4.34% compared to the WOA-LSTM-SA model. For the 5-step pre-
diction, the RMSE of the proposed model is 1.714 and 1.634, representing reductions
of 12.60% and 11.11% compared to the BP model, 7.61% and 15.89% compared to
Symmetry 2024, 16, 1382 16 of 17

the CNN model, 6.49% and 17.30% compared to the GRU model, 5.19% and 14.14%
compared to the LSTM model, 4.56% and 12.82% compared to the LSTM-SA model,
and 3.06% and 1.80% compared to the WOA-LSTM-SA model.

Author Contributions: D.Z. led the conceptualization, methodology, software development, and
original draft preparation. Validation was carried out by D.Z., H.X. and H.Q., while H.X. and D.Z.
handled formal analysis. H.Q. managed the investigation, and Z.H. and W.D. provided resources.
S.W. was responsible for data curation. Writing—review and editing involved D.Z., H.X., H.Q., Q.P.
and J.Y., with visualization by D.Z., H.X. and J.Y. Supervision was provided by D.Z. and H.Q., project
administration by Q.P. and S.W., and funding acquisition by D.Z. All authors have read and agreed
to the published version of the manuscript.
Funding: This work was supported by the Electric Power Research Institute of Yunnan Power Grid
Co., Ltd., Kunming, Yunnan, China (No. YNKJXM20220009).
Data Availability Statement: Data are contained in the article.
Conflicts of Interest: Authors Dexu Zou, Qingjun Peng, Shan Wang, Weiju Dai, and Zhihu Hong
were employed by the company China Southern Power Grid Yunnan Power Grid Co., Ltd. The
remaining authors declare that the research was conducted in the absence of any commercial or
financial relationships that could be construed as a potential conflict of interest.

Appendix A
Table A1 displays the ten test functions used in this study.

Table A1. Test functions.

Function Range
k
F1 ( x ) = ∑ xn2 [−100, 100]
n =1
k k
F2 ( x ) = ∑ | xn | + ∏ | xn | [−10, 10]
n =1
 n=
1
2
k n
F3 ( x ) = ∑ ∑ xi [−100, 100]
n =1 i −1
k
F4 ( x ) = ∑ nxn4 + random[0, 1) [−1.28, 1.28]
n =1   
1 xn [−600, 600]
∗ ∑ xn2 − ∏ cos √

F5 ( x ) = 1 +
4000
 2 n
F6 ( x ) = xs n − 10 cos! (2πxn ) + 10 [−5.12, 5.12]
1 k 2

1 k

[−32, 32]
F7 ( x ) = 20 − 20 exp −0.2 ∑ xn − exp ∑ cos(2πxn ) + e
k n =1 k n =1
k −1
 
k
F8 ( x ) = πk 10 sin(πy1 ) + ∑ (yn − 1)2 1 + 10 sin2 (πyn+1 ) + (yn − 1)2 + ∑ µ( xn , 10, 100, 4)
  [−50, 50]
n =1  p  n = 1
F9 ( x ) = ∑id=d − xi × sin | xi | + 418.98288727243369 × d [−500, 500]
   0.2
F10 ( x ) = ∑id=1 (ln( xi − 2))2 + (ln(10 − xi ))2 − ∏10 i =1 x i
[2, 10]

References
1. Xu, X.; He, Y.; Li, X.; Peng, F.; Xu, Y. Overload Capacity for Distribution Transformers with Natural-Ester Immersed High-
Temperature Resistant Insulating Paper. Power Sys. Technol. 2018, 42, 1001–1006.
2. Wang, S.; Gao, M.; Zhuo, R. Research on high efficient order reduction algorithm for temperature coupling simulation model of
transformer. High Volt. Appar. 2023, 59, 115–126.
3. Liu, X.; Xie, J.; Luo, Y. A novel power transformer fault diagnosis method based on data augmentation for KPCA and deep
residual network. Energy Rep. 2023, 9, 620–627. [CrossRef]
4. Chen, T.; Chen, Y.; Li, X. Prediction for dissolved gas concentration in power transformer oil based on CEEMDAN-SG-BiLSTM.
High Volt. Appar. 2023, 59, 168–175.
5. Zang, C.; Zeng, J.; Li, P. Intelligent diagnosis model of mechanical fault for power transformer based on SVM algorithm. High
Volt. Appar. 2023, 59, 216–222.
6. Ji, H.; Wu, X.; Wang, H. A New Prediction Method of Transformer Oil Temperature Based on C-Prophet. Adv. Power Syst. Hyd.
Eng. 2023, 39, 48–55.
Symmetry 2024, 16, 1382 17 of 17

7. Tan, F.; Xu, G.; Zhang, P. Research on Top Oil Temperature Prediction Method of Similar Day Transformer Based on Topsis and
Entropy Method. Elect. Power Sci. Eng. 2021, 37, 62–69.
8. Amoda, O.A.; Tylavsky, D.J.; McCulla, G.A.; Knuth, W.A. Acceptability of three transformer hottest-spot temperature models.
IEEE Trans. Power Deliv. 2011, 27, 13–22. [CrossRef]
9. Zhou, L.; Wang, J.; Wang, L.; Yuan, S.; Huang, L.; Wand, D.; Guo, L. A Method for Hot-Spot Temperature Prediction and Thermal
Capacity Estimation for Traction Transformers in High-Speed Railway Based on Genetic Programming. IEEE Trans. Transp.
Electrif. 2019, 5, 1319–1328. [CrossRef]
10. Deng, Y.; Ruan, J.; Quan, Y.; Gong, R.; Huang, D.; Duan, C.; Xie, Y. A Method for Hot Spot Temperature Prediction of a 10 kV
Oil-Immersed Transformer. IEEE Access 2019, 7, 107380. [CrossRef]
11. Zhao, B.; Zhang, X. Parameter Identification of Transformer Top Oil Temperature Model and Prediction of Top Oil Tempeature.
High. Volt. Eng. 2004, 30, 9–10.
12. Wang, H.; Su, P.; Wang, X. Prediction of Surface Temperatures of Large Oil-Immersed Power Transformers. J. Tsinghua Univ. Sci.
Technol. 2005, 45, 569–572.
13. Tan, M.; Hu, C.; Chen, J.; Wang, L.; Li, Z. Multi-node load forecasting based on multi-task learning with modal feature extraction.
Eng. Appl. Artif. Intell. 2022, 112, 104856. [CrossRef]
14. Shang, Y.; Li, S. FedPT-V2G: Security enhanced federated transformer learning for real-time V2G dispatch with non-IID data.
Appl. Energy 2024, 358, 122626. [CrossRef]
15. Bai, M.; Yao, P.; Dong, H.; Fang, Z.; Jin, W.; Yang, X.; Liu, J.; Yu, D. Spatial-temporal characteristics analysis of solar irradiance
forecast errors in Europe and North America. Energy 2024, 297, 131187. [CrossRef]
16. Qing, H.; Jennie, S.; Daniel, J. Prediction of top-oil temperature for transformers using neural network. IEEE Trans. Power Deliv.
2000, 15, 1205–1211.
17. Tan, F.; Chen, H.; He, J. Top oil temperature forecasting of UHV transformer based on path analysis and similar time. Elect. Power
Autom. Equip. 2021, 41, 217–224.
18. Li, S.; Xue, J.; Wu, M.; Xie, R.; Jin, B.; Zhang, H.; Li, Q. Prediction of Transformer Top-oil Temperature with the Improved Weighted
Support Vector Regression Based on Particle Swarm Optimization. High Volt. Appar. 2021, 57, 103–109.
19. Tan, F.L.; Xu, G.; Li, Y.F.; Chen, H.; He, J.H. A method of transformer top oil temperature forecasting based on similar day and
similar hour. Elect. Power Eng. Tech. 2022, 41, 193–200.
20. Yi, Y. Research on Prediction Method of Transformer Top-Oil Temperature Based on Assisting Dispatchers in Decision-Making.
Master’s Thesis, Southwest Jiaotong University, Chengdu, China, 2017.
21. Gharehchopogh, F.S.; Gholizadeh, H. A comprehensive survey: Whale Optimization Algorithm and its applications. Swarm Evol.
Comput. 2019, 48, 1–24. [CrossRef]
22. Brodzicki, A.; Piekarski, M.; Jaworek-Korjakowska, J. The whale optimization algorithm approach for deep neural networks.
Sensors 2021, 21, 8003. [CrossRef] [PubMed]
23. Mostafa Bozorgi, S.; Yazdani, S. IWOA: An improved whale optimization algorithm for optimization problems. J. Comput. Des.
Eng. 2019, 6, 243–259. [CrossRef]
24. Naderi, E.; Azizivahed, A.; Asrari, A. A step toward cleaner energy production: A water saving-based optimization approach for
economic dispatch in modern power systems. Electr. Power Syst. Res. 2022, 204, 107689. [CrossRef]
25. Gao, W.; Liu, S.; Huang, L. Inspired artificial bee colony algorithm for global optimization problems. Acta Electron. Sin. 2012, 40,
2396.
26. Shi, X.; Li, M.; Wei, Q. Application of Quadratic Interpolation Whale Optimization Algorithm in Cylindricity Error evaluation.
Metrol. Meas. Tech. 2019, 46, 58–60.
27. He, Q.; Wei, K.; Xu, Q. Mixed strategy based improved whale optimization algorithm. Appl. Res. Comput. 2019, 36, 3647–3651.
28. Qiu, X.; Wang, R.; Zhang, W.; Zhang, Z.; Zhang, Q. Improved Whale Optimizer Algorithm Based on Hybrid Strategy. Comput.
Eng. Appl. 2022, 58, 70–78.
29. Chen, Y.; Han, B.; Xu, G.; Kan, Y.; Zhao, Z. Spatial Straightness Error Evaluation with Improved Whale Optimization Algorithm.
Mech. Sci. Technol. Aero. Eng. 2022, 41, 1102–1111.
30. Xu, J.; Yan, F. The Application of Improved Whale Optimization Algorithm in Power Load Dispatching. Oper. Res. Manag. Sci.
2020, 29, 149–159.
31. Naderi, E.; Mirzaei, L.; Pourakbari-Kasmaei, M.; Cerna, F.V.; Lehtonen, M. Optimization of active power dispatch considering
unified power flow controller: Application of evolutionary algorithms in a fuzzy framework. Evol. Intell. 2024, 17, 1357–1387.
[CrossRef]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.

You might also like