0% found this document useful (0 votes)
8 views

Analysis of The State of High-Voltage Current Transformers Based On Gradient Boosting On Decision Trees

Uploaded by

K
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Analysis of The State of High-Voltage Current Transformers Based On Gradient Boosting On Decision Trees

Uploaded by

K
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

2154 IEEE TRANSACTIONS ON POWER DELIVERY, VOL. 36, NO.

4, AUGUST 2021

Analysis of the State of High-Voltage Current


Transformers Based on Gradient Boosting on
Decision Trees
Alexandra I. Khalyasmaa , Member, IEEE, Mihail D. Senyuk , Member, IEEE,
and Stanislav A. Eroshenko , Member, IEEE

Abstract—This paper addresses the problem of instrument cur- in high-voltage equipment loading growth and emphasizes the
rent transformers technical state assessment based on machine need to monitor the generation, consumption and transmission
learning methods. The introductory parts of the paper provide a equipment state, where frequent system elements failures lead
detailed analysis of modern methods and approaches for technical
state assessment of high-voltage power equipment of power plants to power system facilities low controllability and observability,
and substations as well as a review of modern software tools and the which are already influenced by the stochastic nature of power
latest trends in the given field of study. Justification of the relevance network operation conditions [4],[6]. At the same time, power
of the presented research aimed at instrument current transform- equipment state proactive diagnostics and monitoring are of
ers technical state assessment is provided along with the motivation
particular interest to the consumers - the generation and high-
for machine learning methods application for improvement of the
accuracy and quality of high-voltage equipment state classification. voltage equipment owners, as it contributes to significant repair
Within the framework of the study, a comparative analysis of and maintenance costs savings from energy undersupply, in-
gradient boosting on decision trees and random forest algorithms cluding damage resulting from the disruption of the consumers’
was carried out for a given mathematical problem formulation. production processes.
The main stages of processing the initial dataset are proposed as a On the other hand, the modern development of wireless In-
step-by-step procedure, including feature extraction, feature trans-
formation, feature interactions, etc. The outperforming efficiency ternet networks as well as cloud computing in data processing
of gradient boosting on decision trees algorithm was validated for and the Internet of Things (IoT) technology provide a promis-
real power equipment fleet. The resulting classification quality met- ing opportunity for partial or complete automatization of the
rics of current transformers technical state assessment, Precision high-voltage equipment inspection data analysis and state mon-
and Recall, are estimated to be 87.1% and 83.7%, correspondingly. itoring [7]. The so-called on-line power equipment monitoring
Index Terms—Current transformers, gradient boosting, high- development began in 1980 and, thanks to the electronic indica-
voltage equipment, machine learning, random forest, technical tion systems and computing technologies, has made significant
state assessment. progress today [8], which allows reliable high-voltage power
equipment state identification through a system of sensors and
I. INTRODUCTION data transmission facilities even at remote sites without on-site
engineering staff.
RENDS in the electric power industry development are pri-
T marily aimed at renewable generation and electric energy
storage systems (ESS) implementation [1], Smart Grid networks
Thus, the problem of new methods and software tools de-
velopment for high-voltage equipment on-line technical state
monitoring requires deliberate attention, both from the point
development [2]–[4], and ultra-high voltage equipment testing
of energy systems controllability improvement and operational
[5]. These innovations, along with active implementation of
costs minimization, and from the point of power supply relia-
distributed generation by consumers, determine the extremely
bility improvement.
variable nature of consumption in modern systems, which results

Manuscript received March 5, 2020; revised June 23, 2020; accepted Septem- II. HEALTH-INDEX MONITORING: STATE-OF-THE-ART
ber 1, 2020. Date of publication September 4, 2020; date of current version July
23, 2021. The work was supported by Russian Science Foundation research Modern systems for monitoring and diagnosing of high-
project No. 18-79-00201. Paper no. TPWRD-00338-2020. (Corresponding au- voltage power equipment technical state allow one to analyze
thor: Alexandra I. Khalyasmaa.)
Alexandra I. Khalyasmaa and Stanislav A. Eroshenko are with the Ural the wear rates of the main functional nodes and constructive
Federal University named after the first President of Russia B.N. Yeltsin, elements by using non-destructive control methods and can be
Ekaterinburg, Russia, and also with the Novosibirsk State Technical University, distinguished as follows:
Novosibirsk, Russia (e-mail: [email protected]; [email protected]).
Mihail D. Senyuk is with the Ural Federal University named after the first 1) Power and instrument transformers, technical state analy-
President of Russia B.N. Yeltsin, Ekaterinburg, Russia (e-mail: mdsenuk@ sis of which is most often implemented by using physic-
gmail.com). ochemical or chromatographic analysis of transformer oil
Color versions of one or more of the figures in this article are available online
at https://ieeexplore.ieee.org. or using infrared thermography [9] and partial discharges
Digital Object Identifier 10.1109/TPWRD.2020.3021702 monitoring [10];
0885-8977 © 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: University of the West Indies (UWI). Downloaded on February 28,2024 at 21:33:45 UTC from IEEE Xplore. Restrictions apply.
KHALYASMAA et al.: ANALYSIS OF THE STATE OF HIGH-VOLTAGE CURRENT TRANSFORMERS BASED ON GRADIENT BOOSTING 2155

2) Cable lines, analyzing the technical state of which is Machine learning (ML) today is actively used to analyze the
carried out by distributed temperature sensors (insulation state of geographically extended objects, consisting of a large
wear and cable loading analysis) [11], cable screens and number of elements, such as overhead power transmission lines,
armor electric currents monitoring [12] and monitoring of where the problem of technical state identification is solved
the partial discharges in insulation by using special sensors simultaneously with fault location search and defect dynamics
[13]; assessment. So, as an example, the authors of [17], [18] ana-
3) Overhead power transmission lines, technical state assess- lyze the application of robotic systems for detection of power
ment of which is carried out by contamination monitoring transmission lines wires’ icing. In fact, properly trained systems
[14], [15] and wire icing control [16]–[18]; would be able to perform visual inspection of any equipment in
4) Rotating electric machines, the diagnostic methods of the future.
which are aimed at detecting stator vibrations [19], [20], Given the ever-growing data volume in the energy sector,
analyzing the magnetic field strength in the air gap and there is a room for deep neural networks application in future
revealing partial discharges and short circuits in the wind- monitoring systems, as was implemented in [21], where a back
ings [21]–[23], etc. propagation method for a deep ANN is used to diagnose the
To implement such systems as described above, electrical generators state, having the generator stator and rotor currents
equipment manufacturers and energy utilities are developing as an input and the emergency lights as an output.
multiple software that differ from each other in structure, im- A similar approach to the ANN method application is used
plementation principles, mathematical approaches, application by the authors of the study [22] – the application of the adaptive
sphere, etc. [24]–[29]. Some software packages offered by major resonance theory network [23], trained on the basis of Weibull
commercial companies [24]–[27] use permanent monitoring of distribution in order to analyze the state of the elements of power
high-voltage equipment operation mode parameters not only for generation and motor equipment.
the purpose of actual state assessment, but also in order to fore- Also complex approaches do appear, that are implemented on
cast the lifecycle. For example, ABB implements this approach the basis of several ML algorithms as presented in the study [24],
by risk analysis, using the failure probability distributions. where the support vector machine method with optimization by
Some researchers, for example, in [29], instead of installing the particle swarm approach is used for power equipment failures
software for equipment state analysis, put to a browser prod- diagnosing.
uct that can be accessed remotely from all available devices, In addition to the indicated methods of power equipment
like a regular web-site. Data from the monitoring system is diagnostics, IoT technology [8] is expected to play an important
uploaded by the user to the server and, based on retrospective role for the power grid facilities state monitoring in the nearest
data on system elements operation, the equipment health index is future, resulting in ultimate improvement of their observability
determined. and controllability. In addition, the cloud technologies develop-
In any case, no matter how the software was implemented, ment should allow shifting to a new paradigm in observation
in the form of a software package or a browser product, the data processing, the distinguishing feature of which will be
operation of monitoring and diagnostic systems for high-voltage the processing of big data arrays and increasing the analytical
equipment of power plants and substations largely depends on calculations complexity [8].
the mathematical methods of data processing and analysis.
The majority of the modern systems still use classic rule-based III. JUSTIFICATION OF CURRENT TRANSFORMERS
mathematical approaches, statistical analysis [30], etc., but such TECHNICAL STATE ANALYSIS
approaches are not always effective. On-line monitoring of
high-voltage equipment technical state today can rightfully be A. The Necessity of Technical State Analysis
attributed to the so-called systems with Big Data, since the num- Observability is one of the key requirements for modern power
ber of the analyzed parameters and the volumes of processed data systems. To ensure observability, state estimation systems have
are difficult to be correctly interpreted using classical methods, found wide application throughout the industry. Power system
not only from the point of view of the algorithms themselves, state estimation is referred to as power flow calculations based on
but also from the point of view of computational capabilities that limited telemetry information coming from power grid facilities.
are required for classical approaches implementation. One of the main types of power equipment providing the
All of the above forced the developers of modern systems for power grid observability is an instrument transformer (IT) -
the power equipment state monitoring and diagnosing to look current transformer (CT) or voltage transformer (VT). These
for new methods and approaches, which resulted a whole drive devices are also used for electrical energy commercial and
of machine learning methods to solve the problems of power technical metering, relay protection and emergency automation.
equipment technical state analysis. Artificial neural networks The role of ITs is associated with the application of secondary
(ANNs) are used quite actively in modern technical state analysis circuit signals in emergency automation, for instance, in Auto-
software, for example, in [20], where they are used to recognize matic Overcurrent Limiting Systems, which is triggered by fault
various types of partial discharges of power generation equip- detectors, resulting in load or generation shedding. Therefore,
ment insulation, which stand for more than 40% of all failures a technical malfunction of ITs can result in emergency system
of rotating machines. failure or overcontrol.

Authorized licensed use limited to: University of the West Indies (UWI). Downloaded on February 28,2024 at 21:33:45 UTC from IEEE Xplore. Restrictions apply.
2156 IEEE TRANSACTIONS ON POWER DELIVERY, VOL. 36, NO. 4, AUGUST 2021

The energy utilities, owning power generation and grid facili- TABLE I
FEATURE EXTRACTION FOR CT TECHNICAL STATE ANALYSIS
ties, being the part of the integrated electrical energy generation
and transmission process, aim to ensure equal reliability and
technological process controllability for all the power equipment
units, so the technical state of the ITs as an integrated part of
the technological facility, is also carefully addressed in power
equipment maintenance and repair plans (MRP). In turn, the
MRP plans are the basis for the assets management corporate
systems.
All of the above makes the task of the ITs technical state
analysis no less important than any other high-voltage equipment
type, owned by the utility.

B. Machine Learning Methods Application


The authors of the paper performed numerous studies proving
the effectiveness of ML methods to solve the technical state
identification problem for high-voltage power equipment, such
as power transformers. One of the distinctive features of power
transformer technical state analysis is data redundancy, resulting
from numerous in-built monitoring systems. The presence of on-
line monitoring systems in comparison with periodic technical
diagnostic results, given as a part of scheduled maintenance and
repair, provides not only a sufficient amount of data for analysis,
but significantly improves data quality and minimizes the human
factor influence, resulting in runouts and gaps in the dataset.
The problem of technical state analysis of ITs in contrast to
analyzing the state of power transformers is more complicated
due to the lack of monitoring systems for such type of the equip-
ment. In the majority of cases it is not viable from economic point
of view. Under these conditions the given problem formulation
is to be treated as a multicriteria task in the presence of data
uncertainty, which means it is much more difficult to achieve A. Feature Extraction
the required accuracy of technical state identification. The main The initial dataset for oil-filled CTs technical state analysis is
objective of the study is to provide the required accuracy of composed of the technical diagnostic protocols data, provided
power equipment state identification in real operation conditions by one of the regional power system operators. The protocols
with high level of data uncertainty and relatively small initial were obtained as a result of various non-destructive testing
dataset. procedures, implemented in accordance with the established
The developed system is able to precisely classify the CTs regulatory documentation. The testing procedures were per-
state, driven by the need of the power network owners to pro- formed both in accordance with the scheduled and unplanned
vide the annual technical reports, required by the Ministries of maintenance and repair.
Power&Energy in many countries. The solution presented in this Data collection for CTs state analysis was carried out man-
paper is used specifically for analyzing the CTs technical state ually based on inspection documents and passport data. As a
in a specific time frame. result, the initial database was formed, which includes a set of
features presented in Table I.
As provided by the regulatory documentation, power equip-
IV. CASE STUDY
ment technical state inspection is typically carried out according
In order to confirm the hypothesis about the effectiveness to the standard schedule, but on average not more than once a
of ML methods application for СTs technical state assessment year or once every two years. As a result, the volume of the
problem, which is considered as a multicriteria task under dataset for the analyzed equipment for each equipment item
uncertainty, a mathematical model of the adaptive system for separately is very small and at the very best there are 20–30
CTs technical state analysis was developed and implemented on regular measurements for the entire operation period.
Python (Jupiter Notebook). The system was tested on the basis The authors of the paper, using an example of circuit breakers
of real CTs’ fleet of the regional power system. technical state analysis, in [31] demonstrated the necessary
In order to ensure adequate system performance, training and minimum datasets volumes for the correct ensemble algorithms
testing sets were designed on the basis of the same type of CTs: performance to solve the classification task. The CT technical
TFZM type, rated voltage 110 kV, age from 20 to 40 years. state assessment problem suits these requirements.

Authorized licensed use limited to: University of the West Indies (UWI). Downloaded on February 28,2024 at 21:33:45 UTC from IEEE Xplore. Restrictions apply.
KHALYASMAA et al.: ANALYSIS OF THE STATE OF HIGH-VOLTAGE CURRENT TRANSFORMERS BASED ON GRADIENT BOOSTING 2157

Fig. 1. Feature statistical distribution graphs. Fig. 2. Feature statistical distribution graphs arter logarithmic transformation.

B. Feature Transformation TABLE II


CLASS DISTRIBUTION
The monotonous features transformation is critical for some
of the algorithms and does not affect the others. So, in this case
it is necessary to analyze statistical distribution of the features.
1) Logarithmic Distribution Adjustment: Initially, features
statistical distribution was analyzed using the box-and-whiskers
diagram – a graph used in descriptive statistics for one-
dimensional probability distribution. The given format allows
one to display 25 (first quartile), 50 (median), 75 (third quartile),
98 and 2 (boundaries of the statistically significant sample) Given a small data sample size describing a particular class,
percentiles on one graph. The boundaries of the box are the first the trained algorithm will not have sufficient generalizing ability.
and third quartiles (25th and 75th percentiles, respectively), the It is obvious that the equipment “normal” state will prevail over
line in the middle of the box is the median (50th percentile). The all other classes, while the “faulty” and “unsatisfactory” states
whiskers ends are the edges of a statistically significant sample. will be observed rarely. With an insufficient number of samples
Based on the obtained graphs, the distribution skewness can be in the last two classes, the system may work incorrectly, since,
estimated. The majority of ML algorithms make the assumption in fact, it will be trained to recognize only the prevailing class.
of a normal data distribution; in the presence of distribution To expand the sample and simplify the classification problem,
skewness, it is recommended to apply logarithmic transfor- it was decided to combine the classes into pairs: “unsatisfactory”
mation to the dataset as far as the distorted distribution may and “faulty” into class 0 (64 entries, 4.63% of the total sample),
deteriorate the performance of the algorithm. Fig. 1 shows the “normal” and “satisfactory” into class 1 (1255 entries, 95.4% of
examples of feature distribution graphs. the sample).
It can be seen from Fig. 1, that CO2 , moist, tgRmain20 feature From the point of view of high-voltage power equipment
distributions have positive skewness and the tgOil90 , acidNr, technical state diagnostics, the combination of these classes is
Rsecondary20 , C2 H4 , CH4 , C2 H6 distributions are strongly pos- also justified, since while analyzing class identity it was found
itively skewed too (along with H2 , CO, C2 H2 ). Vbreakdown that the classes “faulty and unsatisfactory” and “satisfactory and
distribution is weakly negatively skewed. The features moist, normal” are overlapping and the differences in the parameters
tgRmain20 , tgOil90 , Iconn , Rsecondary20 , all chromatography fea- boundary values are observed only in 8% of the features in the
tures have a significant number of runouts beyond the statisti- first case, and in 13% of the features in the second case.
cally significant sample boundaries. To further reduce feature distribution nonuniformity during
In order to reduce the feature distributions skewness to zero data filtering class 1 as the most complete one will have priority
and to eliminate the runouts, a logarithmic transformation was in terms of the data sample reduction as far as it will not lead to
carried out using the formula log10 (x+0.0001) for chromatog- the loss of informational value of the dataset.
raphy features, moist, acidNr, tgOil90 , Rmain20 , Rsecondary20 , 3) Fill-in-the-Gaps: In order to analyze data gaps and the
tgRmain20 . The transformation results are presented in Fig. 2. feasibility of data recovery, the relation between the number of
For the chromatography features, it was only possible to reduce gaps and the corresponding features was analyzed (Fig. 3.).
the number of runouts; while the moist, acidNr, tgOil90 , Rmain20 , It can be seen from Fig. 3 that the most complete fea-
Rsecondary20 , tgRmain20 significantly reduced the number of tures are rubber_age, porcelain, air_filter_malfunction, low_oil,
runouts and their distribution took a less skewed form. outer_heating, no_leveling, and the gaps are observed only in
2) Feature Interactions: The initial dataset has a significant class 1. The gaps were filled in with the most common values
number of gaps and it is not balanced by classes. The distribu- defined using the function df.mode () of the pandas library. At
tion of the equipment state classes taken from the laboratories the same time, for the feature “heating of external contacts”
diagnostic protocols is shown in Table II. (outer_heating), it was decided not to restore the gaps, because
Authorized licensed use limited to: University of the West Indies (UWI). Downloaded on February 28,2024 at 21:33:45 UTC from IEEE Xplore. Restrictions apply.
2158 IEEE TRANSACTIONS ON POWER DELIVERY, VOL. 36, NO. 4, AUGUST 2021

Fig. 4. Gaps distribution in the initial dataset after the transformation.

Fig. 3. Gaps distribution in the initial dataset.

the absence of overheating may also be a characteristic of both


normal and faulty states. For the Vbreakdown and Iconn features,
gaps with rare exceptions are also referred to class 1. The
analysis of these parameters shows that:
- The feature samples are more than 50% filled in and
Fig. 5. State-referred feature distributions (blue – class 0, orange – class 1).
Spearman’s correlations with the CT state for both features
are equal to 0.2 and −0.3, which confirms, that complete
neglection of the features it is not justified.
- Data gaps for these parameters are predominantly referred The transformer oil industrial purity class indicates the oil
to operable CTs. impurities number, including those that are capable of creating
- It is not possible to restore the values, since Vbreakdown conductivity. Due to the lack of data in the sample and the
has no associated parameters for recovery, and Iconn is potential correlation of industrial oil purity with breakdown
characterized by significant uncertainty even if we compare voltage for different observations, the oil purity class feature
the equal CT models. was also completely removed from the dataset. The “year”
The gapped rows can only be deleted for class 1, because feature was converted into the “age” attribute of the equipment
it will not affect the algorithm generalizing performance, since according to the formula [age] = 2019 – [year].
the class is complete. For class 0, similar actions will lead to After all the transformations (Fig. 4), the final distribution
significant reduction of the sample, including about a quarter of data between the classes was 8.85% (class 0; 59 lines) and
of the rows, that also contain data on chromatography or resis- 91.15% (class 1; 601 lines). Fig. 5 illustrates the example of the
tance monitoring. Therefore, for class 1, the lines with gaps in “age” feature statistical distribution depending on the CT state.
Vbreakdown , Iconn were deleted, for class 0, they were preserved. Based on the obtained distributions, it can be seen that faulty
The feature outer_heating and the features associated with CTs are more likely to be operated for more than 40 years with
thermal imaging monitoring are correlated. It was noticed, transformer oil breakdown voltage less than 40 kV, main winding
that for the rows with the data from the thermal imager, the resistance less than 3000 MΩ, secondary winding resistance less
outer_heating = 2 and outer_heating = 1 are indicated when the than 50 MΩ, higher insulation tangent and concentrations of
CT has a maximum difference with the ambient air temperature C2 H4 , CO, CO2 .
– 7 °C and 4 °C, respectively. 4) Feature Collinearity Analysis: Collinearity analysis is
The presence of a significant number of gaps in the features is also used to eliminate redundant features. In this study, it was
probably due to the fact that thermal imaging control in Russian experimentally proved that collinearity feature analysis is more
Federation has become mandatory only since 2003, and the CTs efficiently implemented based on the Spearman correlation coef-
state is been analyzed starting from 1996. Since outer_heating ficient analysis with subsequent correlation matrix composition.
feature correlates with thermal imaging data and has fewer gaps, Spearman’s correlation coefficient is calculated as follows:
 
the features associated with thermal imaging control have been  
completely removed from the dataset. p = 1 − 6 d n n2 − 1
2
(1)
The features associated with chromatographic analysis, de-
scribing the gas concentrations variation rate were not available, where d is the difference between the two ranks of each obser-
therefore, they were completely removed. vation, n is the number of observations.

Authorized licensed use limited to: University of the West Indies (UWI). Downloaded on February 28,2024 at 21:33:45 UTC from IEEE Xplore. Restrictions apply.
KHALYASMAA et al.: ANALYSIS OF THE STATE OF HIGH-VOLTAGE CURRENT TRANSFORMERS BASED ON GRADIENT BOOSTING 2159

Fig. 6. Spearman’s correlation coefficients in relation to the state pattern.

Fig. 6 provides graphical interpretation of Spearman’s cor- Fig. 7. Spearman’s correlation matrix of the features.
relation coefficients for the features given in relation to CTs
technical state assessment problem.
Correlation values are interpreted in such a way that positive the previous model error. Classifiers are added iteratively until
coefficients mean a higher target value with a feature value an improvement is possible. As a boosting result a composition
increase, and vice versa for negative values. As a result, features with powerful predictive properties can be obtained despite the
high values such as winding resistance, oil breakdown voltage, fact that each of the “weak” classifiers is only slightly more
and oil resistance tangent more likely correspond to the target accurate than a random choice [4].
growth and are relevant for the class 1. In supervised learning, for a data set D =
For the features such as chromatography analysis and cate- {(xi , yi ) : xi ∈ Rn , yi ∈ R} a decision trees composition
gorial features, representing the faulty state absence as zero, the uses K additive functions to forecast the answer:
correlation is negative; in other words, with negative correlation,
K

small values of quantitative features or a greater number of
ŷi = F (xi ) = fi (xi ), (2)
zeros in the category to a greater extent indicate the normal
j=i
state of the CT (the target more often will be equal to 1).
The correlation amplitude shows how clear-cut the trend is. where f (x) = wq(x) . In this case q: Rm → T describes the
Spearman correlation coefficients were also calculated for the structure of each decision tree, which allocates the data point
features with respect to the state pattern. to the corresponding leaf node with a weight coefficient w ࢠ T.
Fig. 7 provides the Spearman cross-correlation matrix for all
the features. B. Random Forest
The correlation values gaps are caused by the absence or small
Random Forest is a ML algorithm that uses an ensemble of
cross-filling of some features. From the Fig. 7 it is possible to
decision trees and combines random sub-space method with
notice the mutual dependence of different features. For example,
bagging procedure. Bagging is based on bootstrap, which is a
by the chromatography and outer_heating high correlation we
statistical method for generating m new samples of size n from
can assume that the gas concentrations increase coincides with
N objects of the original sample. An item is n times selected
the porcelain cap damage.
into one of m samples and then returned to its original set. In
other words, each item can be obtained from the initial sample
V. MACHINE LEARNING ALGORITHMS ANALYSIS with a probability of 1/n and its reoccurrence in differently
Within this study a comparative analysis of the most suitable generated samples is also possible. When bagging, the samples
ML algorithms (XGBoost gradient boosting on decision trees are generated by the described approach. Each classifier ai (x)
and Random Forest) for the presented problem was performed. is trained on the basis of its own sample, and the final classifier
provides the average response of all the algorithms:
A. XGBoost m

Boosting is a technique for combining basic classifiers to a (x) = 1/m ai (x) (3)
i=1
create a more accurate system than each of the basic classifiers
individually [3]. During boosting an additive composition is Each tree in the Random Forest algorithm is trained on the
created, where each subsequent classifier is trained to minimize basis of one of the samples obtained by bootstrap.

Authorized licensed use limited to: University of the West Indies (UWI). Downloaded on February 28,2024 at 21:33:45 UTC from IEEE Xplore. Restrictions apply.
2160 IEEE TRANSACTIONS ON POWER DELIVERY, VOL. 36, NO. 4, AUGUST 2021

TABLE III
CLASSIFICATION QUALITY METRICS

Fig. 8. Feature importance for XGBoost model.

state skipping, which can have much greater consequences if


such equipment unit is mistakenly left in operation without
additional repair or diagnostic actions, than false classification
The initial data sample for Random Forest has been changed of unsatisfactory state.
by filling in the data gaps. Features with less than 50% of In order to increase the algorithm accuracy, the hyper pa-
the completed entries were excluded (chromatography data, rameters were adjusted using the GridSearch function, which
tgOil90 ); the data gaps corresponding to the features with float enumerates possible combinations of model parameters that give
entities (Iconn , moist, Vbreakdown , acidNr, Tflame , Rmain20 , the highest metric of the model quality, such as Precision, Recall,
Rsecondary20 , tgRmain20 ) were filled in with the average val- which are determined by the following expressions:
ues obtained from Imputer transformer (strategy = “mean”); TP TP
Precision = , Recall = (4)
for categorial features and discrete numerical features (rub- (T P + F P ) (T P + F N )
ber_age, porcelain, no_leveling, air_filter_malfunction, low_oil, TP – true positive; FP – false positive; FN – false negative.
outer_heating, age), the most common values were allocated The resulting accuracy and completeness metrics are defined
to data gaps using the Imputer transformer too (strategy = as the average between these metrics for each of the classes
“most_frequent”). (precision_macro and recall_macro); this is done to account for
the classes’ imbalance.
C. Model Tuning
D. Result Analysis
In order to assess the quality of XGBoost and RandomForest
classifiers performance, in this paper we considered a set of The importance of features for each of the algorithms is
relevant error metrics, presented and analyzed in Table III. In- defined as the classification accuracy improvement.
deed, from Table III it can be seen that according to the simplest Determination of the feature importance for XGBoost clas-
metric - Accuracy, the performance of the algorithms is almost sifier is represented by a quantitative measure of the specific
equivalent (0.926 for XGBoost and 0.909 for RandomForest), feature participation in the decision nodes of the tree. So, if a
but such a metric is not able to detect type 1 (false negative) feature participates in all of the decision nodes, then its impor-
and type 2 (false positive) errors and only identifies the share of tance is equal to 1, if the feature is absent in the decision nodes
correct answers of the algorithm. of the tree, then it equals to 0. For Random forest classifier the
The analysis of the type 1 and type 2 errors from the set of impurity-based feature importances are estimated by using the
incorrectly identified states of the power equipment should be embedded procedure, which defines the total decrease in node
carried out as far as from the point of power equipment operation impurity averaged over all trees of the ensemble.
whatever the accuracy of the system is, it is of crucial importance The results for XGBoost and Random Forest are presented
that the majority of the errors are false positives rather than in Figs. 8–9. It can be seen from Figs. 8–9 that, in general, the
false negatives (for faulty states), which lead to dangerous states algorithms choose practically a similar set of features, but the
skipping. importance distribution between the features is significantly dif-
The full list of considered metrics is provided in Table III with ferent, and, for example, some of the features that are practically
the corresponding justification. not important for the XGBoost algorithm have significant weight
That is the reason why Precision and Recall metrics were for Random Forest. So, we can conclude that these algorithms
considered, which evaluate the quality of the algorithm for each form an almost identical set of features, but it is the weight
of the classes separately. From the point of view of power (importance) of each feature that determines the accuracy of CT
equipment operation, the Precision metric characterizes faulty technical state recognition.

Authorized licensed use limited to: University of the West Indies (UWI). Downloaded on February 28,2024 at 21:33:45 UTC from IEEE Xplore. Restrictions apply.
KHALYASMAA et al.: ANALYSIS OF THE STATE OF HIGH-VOLTAGE CURRENT TRANSFORMERS BASED ON GRADIENT BOOSTING 2161

Fig. 9. Feature importance for Random Forest model.

TABLE IV
DATA SAMPLE CHARACTERISTICS FOR CT TECHNICAL STATE ASSESSMENT

Fig. 10. CTs group technical state assessment result.

Table III provided below presents the basic characteristics of


the samples and features used in the study. Algorithm settings
and calculation results of CTs technical state analysis are pre-
sented in Table IV.
It can be seen from Table IV that the algorithms classification
quality scores (Precision and Recall) differ by 18.4% and 19.7%
for XGBoost and Random Forest, correspondingly. In this case it
Fig. 11. Quality metrics of the algorithms for “unsatisfactory” class.
is obvious that XGBoost has a significantly greater performance
in identifying the power equipment state than Random Forest
classifier. For each of the algorithms, the Mean accuracy by
addition, it was decided to provide the output for both of the
classes is estimated. And as it can be seen from Table IV the
algorithms – XGBoost and Random Forest for their behavior
difference is also considerable.
scrupulous analysis.
Fig. 10 shows the results of technical state analysis for a group
VI. MODEL IMPLEMENTATION of CTs owned by the regional transmission system operator. As
On the basis of the presented mathematical model, the authors it can be seen from the Fig. 10, Random Forest classifier results
developed a software tool for CTs technical state analysis, which in the greater amount of normal states, which corresponds to
was piloted in transmission system operator of the Sverdlovsk the increased number of false positives for class 1 (normal and
region, Russia. The software functionality provides the opportu- satisfactory states). Given the technical state classification goals,
nity to analyze the technical state of both a single CT and a group the Random Forest classifier is more likely to skip the dangerous
of CTs in order to perform maintenance priority assessment. state of the false positive CTs, which may result in power system
CTs group analysis is highly demanded by the companies owing emergency.
one or more substations in order to be able to develop justified Confirmation of the obtained results is also reflected in Fig. 11,
maintenance and repair plans. providing the resulting values of Precision and Recall metrics
Since the software tool is been implemented as a pilot project, for “unsatisfactory” class for both of the algorithms, which
it was decided to keep the system training option to be able to illustrate, firstly, the adequate partitioning of the 0 class into “un-
analyze various sets of samples according to the data available satisfactory” and “faulty” classes and, secondly, the possibility
at the enterprise, which allows taking into account the individual of multi-class problem solution in the form of binary pairwise
characteristics and operation conditions of the specific CTs. In classification.

Authorized licensed use limited to: University of the West Indies (UWI). Downloaded on February 28,2024 at 21:33:45 UTC from IEEE Xplore. Restrictions apply.
2162 IEEE TRANSACTIONS ON POWER DELIVERY, VOL. 36, NO. 4, AUGUST 2021

TABLE V is characterized by a significant number of gaps, the presence of


ALGORITHM TUNING AND QUALITY ASSESSMENT
outliers, and highly uneven distribution of classes.
The creation of the initial dataset is associated with the need
to ensure a sufficient number of samples in various classes.
Therefore, a separate objective of this study was the creation
of correct and sufficient datasets for the recognition of power
equipment various states, which was being solved over the past
10 years and formed on the basis of data from various power
network facilities in Russia.
Within the framework of the presented study, it was possible
to develop a step-by-step algorithm of data preprocessing for
solving the classification task of power equipment technical
state assessment, resulting in the improvement of the machine
learning algorithms performance. The proposed approach allows
not only to reduce the required number of analyzed features, but
TABLE VI also to preserve the generalizing properties of the algorithm for
CTS REPAIR PRIORITY LIST a given accuracy level within the CTs technical state assessment
problem under uncertainty.
Moreover, implicit dependencies and patterns in the dataset
were identified, which was also demonstrated in the presented
work. Despite the active development and application of ma-
chine learning methods to solve the problem of power equipment
technical state assessment, the main problem of their correct and
effective application is still high-quality processing of raw data,
having no universal solution so far.
This paper presents the practical experience of CTs technical
state analysis based on ML methods: XGBoost and Random
Forest. The results are validated based on technical diagnostics
data of the power equipment of the regional power system.
As a result of initial data processing and the introduction of
algorithm quality additional metrics, it was possible to achieve
Generally, Fig. 11 presents the output windows for displaying
the Precision and Recall scores of CTs technical state assess-
the basic metrics of each of the algorithms for different classes.
ment of 87.1% and 83.7%, correspondingly, which is the most
Table V shows the tabular form of CTs repair priority, charac-
competitive result compared to other ML algorithms previously
terized by the membership probability of the object in a certain
described by the authors in other studies [31], [32].
class for XGBoost algorithm.
In order to evaluate the model quality and compare the pre-
The real software product that was introduced at the industrial
sented algorithms, additional metrics were introduced: Precision
facility and operated for several years, thanks to the applica-
and Recall. The choice of the algorithm operation metrics gives
tion of state classification self-modifying rules, allowed us to
the opportunity to ignore the classes balance. That means that the
partition the class 0 into two independent classes “faulty” and
proposed metrics can be applicable in conditions of unbalanced
“unsatisfactory” (Fig. 10) due to accumulated statistics, acquired
samples, which is especially typical for high-voltage equipment
from the testing results of power equipment (aged over 30 years)
due to a significantly smaller share of faulty equipment units.
technical diagnostics.
The practical case study clearly demonstrates that despite the
fact that XGBoost and Random Forest have very similar results
VII. CONCLUSION in terms of accuracy score, in fact, Random Forest generalizing
The major contribution of the presented research is not only performance is worse, which can only be seen by introducing
in the object of study – instrument current transformers, but additional metrics.
in creating a new approach to the formation and processing According to the authors’ point of view, the obtained ac-
of initial data (training and testing sets) based on Feature curacy of technical state identification gives the floor for the
Extraction, Feature Transformation, Logarithmic distribution next-level computational challenge: to search and determine
adjustment, Feature Interactions, Fill-in-the-gaps in data and the relationships and implicit dependencies for an integrated
Feature collinearity analysis to improve the accuracy of power facility (substation, power plant or a power system) instead of
equipment state classification (from mathematical point of view) analyzing the separate equipment units technical state. Given the
and to provide correct interpretation of machine learning results fact of indisputable mutual influence of power grid equipment
(from technical point of view). not only inside the boundaries of one substation, but also within
In the paper, the authors demonstrated an example of de- the entire power system, the proposed approach will allow
signing the dataset based on real CT diagnostics data, which for fairly accurate long-term forecasting of power equipment

Authorized licensed use limited to: University of the West Indies (UWI). Downloaded on February 28,2024 at 21:33:45 UTC from IEEE Xplore. Restrictions apply.
KHALYASMAA et al.: ANALYSIS OF THE STATE OF HIGH-VOLTAGE CURRENT TRANSFORMERS BASED ON GRADIENT BOOSTING 2163

technical state and, subsequently, on its basis to control the [23] J. Ben Ali, B. Chebel-Morello, L. Saidi, S. Malinowski, and F. Fnaiech,
high-voltage equipment lifecycle. That will give the driver not “Accurate bearing remaining useful life prediction based on Weibull
distribution and artificial neural network,” Mech. Syst. Signal Process.,
just for condition-based maintenance, but for the new equipment vol. 56, 2015.
and new concepts development for power system operation, [24] A. Bellini, F. Filippetti, C. Tassoni, and G.-A. Capolino, “Advances in
maintenance and repair. diagnostic techniques for induction machines,” IEEE Trans. Ind. Elect.,
vol. 55, no. 12, pp. 4109–4126, Dec. 2008.
[25] “Schneider Electric technical solutions for predictive control,” Dec. 2018.
REFERENCES [Online]. Available: https//www.se.com/ru/ru/work/insights/improving-
mining-asset-performance-with-predictive-analytics.jsp
[1] M. Sedghi, A. Ahmadian, and M. Aliakbar-Golkar, “Optimal storage [26] “Monitoring and diagnostic & predictive maintenance solutions for
planning inactive distribution network considering uncertainty of wind semiconductor equipment,” Advantech Co, Ltd. Jun. 2020. [Online].
power distributed generation,” IEEE Trans. Power Syst., vol. 31, no. 1, Available: https//ecauk.com/files/2016/02/Monitoring-and-Diagnostic-
pp. 304–316, Jan. 2016. Predictive-Maintenance-Solutions-for-Semiconductor-Equipment.pdf
[2] H. Farhangi, “The path of the smart grid,” IEEE Power Energy Mag., vol. 8, [27] “ABB lifestretch. Reliability Management servicessubstation Risk
no. 1, pp. 18–28, Jan. /Feb. 2010. Assessment & lifestretch,” Jun. 2020. [Online]. Available: https//new.
[3] M. Amin, and P. F. Schewe “Preventing blackouts building a smarter power abb.com/docs/librariesprovider78/eventos/jjtts-2017/presentaciones-
grid,” Sci. Am., vol. 8, pp. 60–67, 2007. chile/reliability-management-services-peter-lokang.pdf?sfvrsn = 2
[4] P. Mcdaniel and S. Mclaughlin, “Security and privacy challenges in [28] “Monitoring system Prana”, Prana Co, Ltd. Jun. 2020. [Online]. Available:
the smart grid,” IEEE Secur. Priv. Mag., vol. 7, no. 3, pp. 75–77, https://prana-system.com/en
May/Jun. 2009. [29] S. A. Naumov et al., “Experience in use of remote access and predic-
[5] B. S. Reddy and A. R. Verma “Novel technique for electric stress reduction tive analytics for power equipment’s condition,” Thermal Eng., vol. 65,
across ceramic disc insulators used in UHV AC and DC transmission pp. 189–199.
systems,” Appl. Energy, vol. 185, Part 2, pp. 1724–1731, 2017. [30] G. Tanasescu, B. Gorgan, S. Busoi, A. Badita, and P. V. Notingher, “Mon-
[6] X. Peng, P. Jirutitijaroen, and C. Singh, “A distributionally robust opti- itoring and diagnosis of electrical equipment using a web software. Health
mization model for unit commitment considering uncertain wind power index and remaining lifetime estimation,” in Proc. Conf. Diagnostics Elect.
generation,” IEEE Trans. Power Syst., vol. 32, no. 1, pp. 39–49, Jan. 2017. Eng. (Diagnostika), 2016, pp. 1–4.
[7] S. Li and J. Li, “Condition monitoring and diagnosis of power equipment: [31] A. I. Khalyasmaa, M. D. Senyuk, and S. A. Eroshenko, “High-voltage cir-
Review and prospective,” in High Voltage, vol. 2, no. 2, pp. 82–91, 2017. cuit breakers technical state patterns recognition based on machine learn-
[8] Z. J. Wang, Condition Monitoring and Fault Diagnosis for Power Equip- ing methods,” in IEEE Trans. Power Del., vol. 34, no. 4, pp. 1747–1756,
ment. Shanghai: Shanghai Jiao Tong University Press, pp. 2–3, 2012. Aug. 2019.
[9] W. W. Li et al., “Frequency dependence of breakdown performance of [32] A. Khalyasmaa et al., “Machine learning algorithms for power transform-
XLPE with different artificial defects,” IEEE Trans. Dielectr. Electr. Insul., ers technical state assessment,” in Proc. Int. Multi-Conf. Eng. Comput. Inf.
vol. 19, no. 4, pp. 1351–1359, Aug. 2012. Sci. (SIBIRCON), 2019, pp. 0601–0606.
[10] I. Fofana and Y. Hadjadj “Electrical-based diagnostic techniques for as-
sessing insulation condition in aged transformers,” Energies, vol. 9, no. 9,
pp. 679–705, 2016. Alexandra I. Khalyasmaa (Member, IEEE) was
[11] X. Q. Shen et al., “Temperature measurement of power cable based on born in 1986. She received the Ph.D. degree in
distributed optical fiber sensor,” J. Phys., Conf. Series, vol. 679, pp. 1–2, 2015. Now she holds a post of Associate Profes-
2016. sor of a Automated electrical systems Department
[12] C. Zhou, Y. Yang, Li Mingzhen, and Z. Wenjun, “An integrated cable with Ural Federal University named after the first
condition diagnosis and fault localization system via sheath current moni- President of Russia B.N. Yeltsin. International rep-
toring,” in Proc. Int. Conf. Condition MonitoringDiagnosis, 2016, pp. 1–8. resentative of the Russian National Committee of CI-
[13] M. Wu, H. Cao, J. Cao, H.-L. Nguyen, J. B. Gomes and S. Priyadarsini GRE in D2 Study Committee “Information Systems
Krishnaswamy, “An overview of state-of-the-art partial discharge analysis and Telecommunications”, Leader of the National
techniques for condition monitoring,” IEEE Elect. Insul. Mag., vol. 31, Working Group (WG-6) “Information and analytical
no. 6, pp. 22–35, Dec. 2015. systems in power equipment lifecycle management”
[14] Z. Su and Q. Li, “Historical review and summary on measures against SC D2 “Information Systems and Telecommunications” of the Russian National
pollution flashover occurred in power grids in China,” Power Syst. Technol., Committee of CIGRE. Her specific fields of interest and research include power
vol. 34, no. 12, pp. 125–130, 2010. equipment technical state assessment, power systems transients, distributed
[15] S. Gao, L. Wang, C. Zhao, Z. Zhou, and M. Zhu, “Pollution flashover pre- generation.
warning system based on prediction of flashover voltage,” High Voltage
Eng., vol. 11, pp. 3365–3373, 2014.
[16] X. Huang et al., “On-line transmission-line icing monitoring technology Mihail D. Senyuk (Member, IEEE) was born in 1994.
based on three groups of force sensors and angle sensors,” High Volt. Eng., He received the M.Sc. degree in 2017. Now he is
vol. 40, no. 2, pp. 374–380, 2014. an Engineer with Joint Stock Company Scientific
[17] R. S. Gonçalves and J. C. M. Carvalho “A mobile robot to be applied and Technical Center of Unified Power System. His
in high-voltage power lines,” J. Brazilian Soc. Mech. Sci. Eng., vol. 37, specific fields of interest and research include power
pp. 349–359, 2015. systems transients, computer science, power equip-
[18] R. S. Goncalves et al., “Review and latest trends in mobile robots used ment technical state assessment.
on power transmission lines,” Int. J. Adv. Robot. Syst., vol. 10, pp. 1–14,
2013.
[19] B. Vidyasagar and S. S. T. Ram “Condition monitoring analysis of syn-
chronous generator based on an adaptive technique,” in Proc. Int. Conf.
Inventive Syst. Control, 2017, pp. 1–12. Stanislav A. Eroshenko (Member, IEEE) was born in
[20] Y. A. Asiri, A. O. Vouk, L. Renfort, D. Clark, and J. C. NeuralWare “Neural 1987. He received graduate degree from Ural Federal
network based classification of partial discharge in HV motors,” in Proc. University in 2011. Now he holds a post of Senior
Elect. Insul. Conf., 2011, pp. 333–339. Lecturer with Automated Electrical Systems Depart-
[21] R. Yuan “Fault diagnosis for engine by support vector machine and ment. International Representative of the Russian Na-
improved particle swarm optimization algorithm,” J. Inf. Comput. Sci., tional Committee of CIGRE in C3 Study Committee
vol. 11, no. 13, pp. 4827–4835, 2014. “Power System Environmental Performance”. His
[22] G. G. Rigatos, N. Zervos, D. Serpanos, V. Siadimas, P. Siano, and specific fields of research include power equipment
M. Abbaszadeh, “Condition monitoring of wind-power units using the technical state assessment, distributed generation and
derivative-free nonlinear Kalman filter,” in Proc. IEEE 16th Int. Conf. Ind. renewable energy sources.
Informat., 2018, pp. 472–477.

Authorized licensed use limited to: University of the West Indies (UWI). Downloaded on February 28,2024 at 21:33:45 UTC from IEEE Xplore. Restrictions apply.

You might also like