Machine Learning Algorithms Fusion Based On DGA Da
Machine Learning Algorithms Fusion Based On DGA Da
2478/sbeef-2023-0014
Abdelmoumene Hechifa1, Abdelaziz Lakehal2,*, Arnaud Nanfak3, Lotfi Saidi4, Chouaib Labiod5
1
LGMM Laboratory, Faculty of Technology, University of 20 August 1955-Skikda, Skikda, Algeria.
2
Laboratory of Research on Electromechanical and Dependability, University of Souk Ahras, Souk-Ahras, Algeria
3
Laboratory of Energy, Materials, Modelling and Methods, National Higher Polytechnic School of Douala, University of
Douala ,Douala , Cameroon.
4
University of Tunis, ENSIT – Laboratory of Signal Image and Energy Mastery, Tunis, Tunisia
5
Electrical Engineering Department, Faculty of Technology, University of El Oued, El Oued, Algeria.
* Corresponding author: [email protected]
Abstract: Dissolved Gas Analysis (DGA) continues to be systems. Regular monitoring is necessary to maintain
widely recognized as a valuable method in recent times for the their availability and prevent any potential faults. In the
early identification of issues in oil-filled power transformers. event of a transformer fault, the entire power network can
It has gained extensive adoption as a primary approach for be affected, leading to catastrophic consequences in the
transmission of electricity [1]. Power transformers are
the early discovery of these issues, relying on the analysis of
susceptible to both thermal and electrical stresses, which
dissolved gases. This contributes to enhancing the
have the potential to induce decomposition of the
dependability of electrical systems. This paper proposes an
insulating oil, subsequently resulting in oxidation. This
efficient fusion method based on DGA data using the two best
process leads to the production of various gases,
Machine Learning algorithms , the neural network (MLP),
including Methane (CH4), carbon monoxide (CO),
the naïve Bayes (NB) throughdata input vector ppm, a Hydrogen (H2), Acetylene (C2H2), Ethylene (C2H4),
percentage input vector, and an Logarithmic input vector. The Ethane (C2H6), carbon dioxide (CO2) [2]. To analyze the
fusion method predictively combined the two classifiers and gases produced by transformer faults, a commonly used
obtained a statistical evaluation: accuracy, recall, precision, method is Dissolved Gas Analysis (DGA) [3].
and F-measure higher than both classifiers separately. The
DGA is a powerful diagnostic technique and widely used
proposed fusion method was evaluated for performance using
by the majority of power company for detecting thermal
a test database and compared with conventional and smart
and electrical faults in transformers. International
methods. Results showed that the proposed model
committees acknowledge its efficiency in diagnosing
outperformed both traditional and intelligent methods in issues resulting from oil or paper, including
terms of diagnostic accuracy when using percentage and Low/Medium/High thermal faults or electrical faults,
logarithmic input vectors. The Prediction Based Fusion including Partial discharge and Low/High energy
(PBF) vector Percentages achieved an accuracy rate of discharge [4]. To simplify the process of identifying
97.22%, while PBF vector Logarithmic achieved an accuracy potential faults in transformers, established rules and
rate of 95.83%. These rates were higher than those achieved methods rely on the analysis of dissolved gas
by traditional methods, such as the Modified RRM/CEGB concentrations [5].
method 91.67% and Modified RRM/IEC method 90.28%.
Accurate methods for diagnosing faults in power
Additionally, the proposed model surpassed the accuracy rates transformers are essential for conducting a thorough
of intelligent methods, such as CSUS ANN 88.89% and DGA analysis. At present, there are traditional methods
Conditional Probability 93.06%. represented in ratio methods: Dornenburg method [6],
Keywords: Dissolved gas analysis; fusion method, Multilayer modified Rogers four ratios, modified IEC ratios
Perceptron, Naïve Bayes, Percentages, Logarithmic. methods [7], HYOSUN Corporation gas ratio method
[8], three ratios technique [9], In addition, the graphic
1. INTRODUCTION methods are: Duval triangle method [10], Gouda triangle
method [11], pentagon methods [12] [13]. The traditional
The importance of power transformers lies in their methods are not accurate enough to know the type of
crucial role in ensuring the smooth operation of power fault, so they are weak and suffer from decision-making,
5
Scientific Bulletin of the Electrical Engineering Faculty – Year 23 No.2 (49) ISSN 2286-2455
but by turning to artificial intelligence, diagnosing faults procedure represents the pivotal phase for enhancing the
for transformers has become available and with high model's accuracy. In this study, three different input
precision. The methods used in the literature are: ANN vectors were employed, namely the original data input
vector (ppm), a percentage input vector, and a
[14], fuzzy logic [15], support vector machines [16], k-
Logarithmic input vector, as shown in Table 1.
nearest neighbor [17], Bayesian networks [18], ensemble
learning [19], and deep learning [20]. Table 1. Input features
Data Format
In this paper, both the neural network algorithm and the
naïve Bayes (NB) theory were presented using different X= [C2H6, C2H2, CH4, C2H4,
input vectors (ppm) H2]
input vectors and a proposed method for fusion them to
diagnose the six power transformer faults using the X= [%C2H6, %C2H2, %CH4,
KNIME analytics platform. input vectors (Percentage) %C2H4, %H2]
6
Scientific Bulletin of the Electrical Engineering Faculty – Year 23 No.2 (49) ISSN 2286-2455
4. MODEL EVALUATION
Figure 1. The structure of the (MLP) based classifier
3.2 Naive Bayes (NB) In the field of data mining, there exists a dependable
method to assess the accuracy of data, which is crucial in
The Naive Bayes (NB) algorithm is technique inspired by supporting documentation systems. The effectiveness of
Bayes' theorem. It operates on the principle of probability the model is evaluated using statistical measures,
theory and statistical methods. Bayes theorem establish including Accuracy, recall, precision, and F-measure
independent values based on what precedes them [25]. [28].
The NB algorithm is considered "naive" because it
𝑇𝑇𝑇𝑇+𝑇𝑇𝑇𝑇
assumption of independence among all variables 𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴 = (4)
𝑇𝑇𝑇𝑇+𝑇𝑇𝑇𝑇+𝐹𝐹𝐹𝐹+𝐹𝐹𝐹𝐹
concerning class values, which may not hold in real- 𝑇𝑇𝑇𝑇
world scenarios. Nonetheless, this approach is favored 𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃 = (5)
𝑇𝑇𝑇𝑇+𝐹𝐹𝐹𝐹
for its fast-learning capabilities [26]. The Bayes theorem 𝑇𝑇𝑇𝑇
𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅 = (6)
algorithm offers the advantage of classifying various 𝑇𝑇𝑇𝑇+𝐹𝐹𝐹𝐹
objects, with the posterior probability equation being 2(𝑃𝑃𝑃𝑃 𝑒𝑒𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐×𝑅𝑅𝑅𝑅 𝑐𝑐𝑎𝑎𝑎𝑎𝑎𝑎)
𝐹𝐹−𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 = (7)
described as follows [27]: 𝑃𝑃𝑃𝑃 𝑒𝑒𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐+𝑅𝑅𝑅𝑅 𝑐𝑐𝑎𝑎𝑎𝑎𝑎𝑎
𝑃𝑃�𝐵𝐵 �𝐴𝐴�𝑃𝑃(𝐴𝐴)
𝑃𝑃(𝐴𝐴|𝐵𝐵) = (2)
𝑃𝑃(𝐵𝐵)
5. RESULTS AND DISCUSSION
𝑃𝑃(𝐴𝐴|𝐵𝐵) is the posterior probability of A under condition
To assess the diagnostic efficacy of the proposed model
B.
and to predict transformer faults that could endanger the
𝑃𝑃(𝐴𝐴) i prior probability A. condition of the electrical system, a dataset consisting of
240 samples was used which is as follows: 168 training
𝑃𝑃(𝐵𝐵|𝐴𝐴) posterior probability under condition A.
samples and 72 testing samples. The used data are in
𝑃𝑃(𝐵𝐵) is the prior probability B. ppm, percentage, and arithmetic. Where this paper
The posterior probability: represents one of the most powerful and accurate
algorithms, namely, the naive rule and neural networks in
𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙ℎ𝑜𝑜𝑜𝑜𝑜𝑜×𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝
𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃 = (3) diagnosing transformer faults, which in turn, their
𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒
prediction explaining the first stage, which is the
The principle of the NB algorithm can be illustrated preparation and processing of data involves collecting
through a simplified diagram and how to classify things; data and extracting features for input vectors. In the
Figure 2 illustrates the basic structure of a NB. second stage, classifiers are trained and tested. The third
stage, known as the Prediction Based Fusion (PBF)
method, is a machine learning and data science technique
7
Scientific Bulletin of the Electrical Engineering Faculty – Year 23 No.2 (49) ISSN 2286-2455
used to combine multiple predictions from MLP and NB The KNIME analytics platform environment allows
algorithms to improve the overall accuracy and engineers to develop and implement algorithms in a short
robustness of predictions. This is achieved by averaging time through a group of interconnected nodes, each of
the predictions of different models. Finally, in stage four, which performs a specific function and enables the expert
the proposed model is evaluated. They were combined to enter output and modify data [29]. Figure 4 represents
through ensemble learning to increase the accuracy of the the proposed model and PBF with a simplified
model, using the KNIME analytics platform. Figure 3 explanation of the steps.
represents the proposed model structure.
Data pre-processing: The input data is processed by the
Reader node, which scans the input file to ascertain the
quantity and categories of columns present. The
Duplicate Rows node then removes all duplicate rows
within the input table, followed by processing any
instances of missing data.
Model training and testing: First, the data is partitioned
into training and testing segments via the partition node.
Then, the model is trained and evaluated using a node
learner and predictor.
Combined prediction: The predictions are collected first
by the joiner node, which keeps the class and prediction
columns, and the PBF node is collected using the mean
predictions of both NB and MLP algorithms.
Evaluation: The proposed model was evaluated through
accuracy, recall, precision, and F-measure.
Figure 3. Flowchart of the proposed model
Figure 4. The proposed model using PBF to combine predictions of different classifiers
5.1 Performance evaluation results The best results of the proposed method for PBF at input
vector ratio of 97.22% accuracy, 98.33% recall, 97.62%
When comparing Figure 5 with Table 2, it can be seen
precision, and F-measure of 97.83% were superior to
that the PBF of assembling the algorithm of NB with the
both input vector ppm with 91.67% accuracy, 86.08
algorithm of MLP over both algorithms separately when
recall, 92.73% precision, and F-measure. 94.75, as well
the input vectors are ppm, percentages, and logarithmic.
as a logarithmic input vector with 95.83% precision,
The proposed method of PBF is characterized by its
96.14% recall, 94.72% precision and 95.32% F-measure.
reliability in fault detection in terms of overall accuracy,
and macro average : precision, recall, and F-measure.
8
Scientific Bulletin of the Electrical Engineering Faculty – Year 23 No.2 (49) ISSN 2286-2455
While for the algorithms separately, the high efficiency percentages and ppm, and the algorithm of the NB
of the MLP algorithm in terms of the input vector was in prevailed when the input vector was logarithmic.
Table 2. Performance results of the proposed model
Input features
Proposed ppm Percentages Logarithmic
classification Acc% Rec% Pre% F-m% Acc% Rec% Pre% F-m% Acc% Rec% Pre% F-m%
techniques
NB 77.78 70.94 72.30 81.91 91.67 94.56 90.88 92.45 93.06 94.56 91.81 92.61
MLP 84.72 81.53 86.70 81.6 94.44 96.67 95.83 95.77 88.89 84.97 85.63 84.37
Prediction
Based Fusion of 91.67 86.08 92.73 94.75 97.22 98.33 97.62 97.84 95.83 96.14 94.72 95.32
NB With MLP
100
90
80
70
60
50
40
30
20
10
0
Acc% Rec% Pre% F-m% Acc% Rec% Pre% F-m% Acc% Rec% Pre% F-m%
ppm Percentages Logarithmic
5.2 Comparisons with previous studies format. The PBF (vector Percentages) achieved an
accuracy rate of 97.22%, while PBF (vector Logarithmic)
For the effectiveness of the proposed fusion prediction
achieved an accuracy rate of 95.83%. These rates were
model and to make comparisons with previous studies,
higher than those achieved by traditional methods, such
the test data set consisting of 72 samples was used to find
as the Modified RRM/CEGB method 91.67% and
out the reliability of the proposed method in predicting
Modified RRM/IEC method 90.28%, as well as
transformer faults. Table 3 shows a comparison of the
intelligent methods, such as CSUS ANN 88.89% and
results obtained with other techniques, whether
Conditional Probability 93.06%.
traditional or Intelligent.
The comparison results demonstrated that the proposed
model outperformed both traditional and intelligent
methods in terms of diagnostic accuracy when using
percentage and logarithmic input vectors. This
improvement in accuracy can be attributed to the
preprocessing of the data, which involved converting the
original ppm data to either percentage or logarithmic
9
Scientific Bulletin of the Electrical Engineering Faculty – Year 23 No.2 (49) ISSN 2286-2455
Table 3. Comparison between the proposed method with traditional and Intelligent methods
[1] Nasirul, Haque, Jamshed Aadil, Chatterjee Kingshuk, and [9] Gouda, Osama E., Salah H. El Hoshy, and Hassan H. EL Tamaly.
Chatterjee Soumya. "Accurate Sensing of Power Transformer "Proposed three ratios technique for the interpretation of mineral
Faults From Dissolved Gas Data Using Random Forest Classifier oil transformers based dissolved gas analysis." IET Generation,
Transmiss Distribu 12, (2018): 2650-2661.
10
Scientific Bulletin of the Electrical Engineering Faculty – Year 23 No.2 (49) ISSN 2286-2455
[10] Khiar, Mohd Shahril Ahmad, Sharin Ab Ghani, Imran Sutan [24] Xiang, Weiming, Hoang-Dung Tran, and Taylor T. Johnson.
Chairul, Yasmin Hanum Md Thayoob, and Young Zaidey Yang "Output reachable set estimation and verification for multilayer
Ghazali. "On-site OLTC monitoring using Duval triangle and neural networks." IEEE Transactions on Neural Networks and
DWRM." In Proceedings of the 2nd International Conference on Learning Systems 29, no. 11 (2018): 5777–5783.
Technology, Informatics, Management, Engineering and
Environment, TIME E 2014, Bandung, Indonesia 1 (2014): 216– [25] Ketjie, Viny Christanti Mawardi, and Novario Jaya Perdana.
221. "Prediction of credit card using the Naïve Bayes method and C4.5
algorithm. IOP Conference Series." Materials Science and
[11] Gouda, Osama E., Salah H. El‐Hoshy, and Hassan H. EL‐Tamaly. Engineering 1007, no. 1 (2020): 012161.
"Condition assessment of power transformers based on dissolved
gas analysis." IET Generation, Transmission & Distribution 13, [26] Mahamdi, Yassine, Ahmed Boubakeur, Abdelouahab Mekhaldi,
no.12 (2019): 2299-2310. and Youcef Benmahamed. "Power Transformer Fault Prediction
using Naive Bayes and Decision tree based on Dissolved Gas
[12] Duval, Michel, and Laurent Lamarre. "The duval pentagon-a new Analysis." ENP Engineering Science Journal 2, no. 1 (2022): 1-5.
complementary tool for the interpretation of dissolved gas analysis
in transformers." IEEE Electrical Insulation Magazine 30, no. 6 [27] Balaji, V.R., S.T. Suganthi, R. Rajadevi, V. Krishna Kumar, B.
(2014): 9–12. Saravana Balaji, and Sanjeevi Pandiyan. "Skin disease detection
and segmentation using dynamic graph cut algorithm and
[13] Mansour, Diaa-Eldin A. "Development of a new graphical classification through Naive Bayes classifier." Measurement:
technique for dissolved gas analysis in power transformers based Journal of the International Measurement Confederation 163
on the five combustible gases." IEEE Transactions on Dielectrics (2020): 107922.
and Electrical Insulation 22, no. 5 (2015): 2507–2512.
[28] Rahmad, F, Y Suryanto, and K Ramli. "Performance Comparison
[14] Li, Anyi, Xiaohui Yang, Zihao Xie, and Chunsheng Yang. "An of Anti-Spam Technology Using Confusion Matrix
optimized GRNN-enabled approach for power transformer fault Classification." IOP Conference Series: Materials Science and
diagnosis." IEEJ Trans. Electr. Electron. Eng 14, no. 8 (2019): Engineering 879, no. 1 (2020): 012076.
1181–1188.
[29] Nicola, George, Michael R. Berthold, Michael P. Hedrick, and
[15] Taha, Ibrahim B. M., Sherif S. M. Ghoneim, and Hatim G. Zaini. Michael K. Gilson. "Connecting proteins with drug-like
"A Fuzzy Diagnostic System for Incipient Transformer Faults compounds: Open source drug discovery workflows with
Based on DGA of the Insulating Transformer Oils." International BindingDB and KNIME." Database-Oxford, 2015: 1–22.
Review of Electrical Engineering (I.R.E.E.) 11, no. 3 (2016): 305-
313. [30] Ghoneim, Sherif SM, Ibrahim BM Taha, and Nagy I. Elkalashy.
"Integrated ANN-based proactive fault diagnostic scheme for
[16] Zhang, Yiyi, et al. "A fault diagnosis model of power transformers power transformers using dissolved gas analysis." IEEE
based on dissolved gas analysis features selection and improved Transactions on Dielectrics and Electrical Insulation 23, no. 3
Krill Herd algorithm optimized support vector machine." IEEE (2016): 1838-1845.
Access 7 (2019): 102803–102811.
[31] Taha, Ibrahim B.M, Diaa-Eldin A. Mansour, S. S. Ghoneim, et
[17] Benmahamed, Y, Y Kemari, M Teguar, and A Boubakeur. Nagy I. Elkalashy, "Conditional probability based interpretation
"Diagnosis of power transformer oil using KNN and nave bayes of dissolved gas analysis for transformer incipient faults." IET
classifiers." IEEE 2nd International Conference on Dielectrics Generation, Transmission & Distribution 11.4 (2017): 943-951.
(ICD), 2018: 1–4.
[18] Lakehal, Abdelaziz, and Fouad Tachi. "Bayesian duval triangle
method for fault prediction and assessment of oil immersed
transformers." Measurement and control 50, no. 4 (2017): 103-
109.
[19] Nanfak, Arnaud, Charles Hubert Kom, and Samuel Eke. "Hybrid
Method for Power Transformers Faults Diagnosis Based on
Ensemble Bagged Tree Classification and Training Subsets Using
Rogers and Gouda Ratios." Int. J. Intell.Eng. Syst 15, no. 5
(2022): 12‑24.
[20] Wu, Xiaoxin, Yigang He, and Jiajun Duan. "A deep parallel
diagnostic method for transformer dissolved gas
analysis." Applied Sciences 10, no. 4 (2020): 1329.
[21] Ibrahim, Saleh I., Sherif S.M. Ghoneim, and Ibrahim B.M Taha.
"DGALab: an extensible software implementation for DGA." IET
Gener. Transm. Distrib 12, no. 18 (2018): 4117-4124.
[22] Patekar, Kalinda D., and Bhoopesh Chaudhry. "DGA analysis of
transformer using artificial neutral network to improve reliability
in Power Transformers." IEEE 4th International Conference on
Condition Assessment Techniques in Electrical Systems
(CATCON), 2019: 1–5.
[23] Ramkumar, M., C. Ganesh Babu, K Vinoth Kumar, D Hepsiba4,
A. Manjunathan, and R. Sarath Kumar. "ECG Cardiac arrhythmias
Classification using DWT, ICA and MLP Neural Networks."
Journal of Physics: Conference Series 1831, no. 1 (2021): 012015.
11