Machine Learning Methods For Detecting Anomalies in A Power Transformer by Monitoring Its Hot-Spot Temperature - Chiara & Miguel
Machine Learning Methods For Detecting Anomalies in A Power Transformer by Monitoring Its Hot-Spot Temperature - Chiara & Miguel
Abstract—This paper analyzes and compares different machine transformer. Section IV compares the results obtained in the
learning methods such as decision trees, SOMs, MLPs and rough previous section, and finally Section V presents conclusions.
sets for the classification of the operation condition of a power
transformer. The purpose is to construct a classification model
II. PRELIMINARY DATA ANALYSIS
able to estimate the hot-spot temperature as a function of other
external input variables. The classifier would then be used to Whatever the method used for automatic extraction of
detect anomalous operation conditions of the transformer by knowledge from data, a preliminary analysis of them is
comparing the observed and estimated hot-spot temperatures. required in order to better know the data and remove noisy or
incorrect samples. Finally the filtered data will correspond to
Keywords- Classification methods; anomaly detection; power the training set to be used.
transformer; decision trees; neural networks; rough sets
A data set collected during normal operation of a power
transformer will be used as a reference throughout this paper.
I. INTRODUCTION
Sensors were installed in the transformer in order to monitor its
The aim of this paper is to test and compare different types performance. The particular case analyzed was the detection of
of classifiers that can be used for anomaly detection of a power anomalies by monitoring the hot-spot temperature which is the
transformer. Classification techniques are widely used for maximum temperature reached in the transformer winding. The
anomaly detection in industrial systems [1]. This paper focuses data analysed were selected covering the four seasons over a
on those artificial intelligence techniques that belong to the one year period. The months selected were April, July, August,
branch of machine learning [2]. These are algorithms that make November, December and January with a sampling time of 15
it possible to automatically learn from empirical data the minutes each. Blocks of missing data were removed and, in
relations existing among variables [3]. Different machine addition, also non-working conditions were included. The total
learning algorithms that are commonly used for classification number of available samples was 12063.
problems are considered. A decision tree [4][5], a multi-layer
perceptron [6][7], a self-organizing map [8][9] and a rough sets The variables used in this study were: current, hot-spot and
classifier [10] are used to extract knowledge rules used for fault ambient temperatures. The analysis of the data using scattered
detection and their performances are compared. plots shows an almost linear relation between hot-spot
temperature and current. In the ambient temperature, two main
Real data collected during normal operation conditions of a clusters of data can be identified, which are almost linearly
power transformer were used. In a power transformer, one of related with the hot-spot temperature. These two clusters
the most important critical variables that needs to be observed correspond to the two peaks of the distribution of the ambient
is the hot-spot temperature [11]. Then, the classifiers will temperature that describe the typical values in winter and
model different normal operation conditions of the transformer, summer. These relations have to be taken into account when
by classifying them according to the values of the hot-spot selecting the input variables of a model for predicting the hot-
temperature. In other words, the classifiers will identify the spot temperature and detecting possible anomalies.
relations between some measured variables, describing the
operation condition of the component, and the hot-spot In addition to this, the physical relation between variables
temperature. has to be taken into account and those that actually represent
external inputs that can explain the hot-spot temperature should
The paper is organised as follows. Section II presents some be preferred in order to properly detect anomalies. Thanks to
preliminary considerations including a pre-analysis of the data the physical knowledge of the system, it is known that the two
used. Section III describes the different machine learning types main factors that determine the hot-spot temperature are the
used for constructing classifiers able to detect anomalies during current circulating in the winding and the ambient temperature.
continuous monitoring of the hot-spot temperature in a power Therefore, these variables will be used for explaining the
dynamics of the hot-spot temperature.
III. CLASSIFICATION MODELS Within the leaves of the tree, the number of incorrectly
In this section, different classifiers based on Decision trees, classified instances is also presented. Notice that normally the
Self-Organizing Maps, Multi-Layer Perceptrons and Rough classification error is low when ambient temperature and
Sets are presented. Classification rules are extracted and results current have low values (corresponding to when the hot-spot
are analyzed and compared. temperature is low too) or when ambient temperature and
current are very high (and the hot-spot temperature is high too).
The classifiers to be developed do not require the Besides, there are operation conditions for which the
discretization of the input variables. Therefore, the original classification error is much higher than the rest. Typically they
numerical current and ambient temperature data are used. correspond to medium-high values of the hot-spot temperature
However some of them require discretization of the variable or to values of the ambient temperature and the current that are
predicted, the hot-spot temperature in the case analyzed, and neither very high nor very low. Moreover, notice that in the
for this reason this variable was divided into three categories: confusion matrix calculated with test data (TABLE 1), the
low, medium and high using a clustering technique based on percentage classification error is much higher when the hot-
the k-means algorithm. To train the classifiers, 66% of the data spot temperature is high (75 incorrectly classified instances of
were randomly extracted from the original data set for training 669).
(and validation when used), and the remaining 34% was used === Classifier model (full training set) ===
for testing. J48 pruned tree
------------------
Tamb <= 19.07
A. Model based on a Decision Tree | I <= 116.35
| | I <= 91.89
A decision tree was trained using the J48 algorithm of | | | Tamb <= 17.07: L (2147.0/16.0)
WEKA tool [12]. Using the default confidence factor (equal to | | | Tamb > 17.07
0.25) for the pruning algorithm, the resulting tree was quite | | | | I <= 73.64: L (11.0)
| | | | I > 73.64
well expanded, having a size of 55 and 28 leaves. The | | | | | I <= 90.31: M (13.0/1.0)
classification error evaluated on the test data set was equal to | | | | | I > 90.31: L (3.0)
5.901%. By reducing the confidence factor to 0.005, more | | I > 91.89
| | | Tamb <= 12.73: L (776.0/95.0)
pruning was done. The new resulting decision tree had 20 | | | Tamb > 12.73
leaves and the classification error was equal to 6.3887%. It | | | | I <= 99.41
increased, as expected, but it was not significantly higher than | | | | | Tamb <= 16.07: L (150.0/34.0)
| | | | | Tamb > 16.07: M (23.0/5.0)
the previous tree obtaining less complexity in the tree. Training | | | | I > 99.41: M (315.0/74.0)
and test errors are shown in TABLE 1. The confusion matrix | I > 116.35
obtained is presented in TABLE 2 and it confirms that unknown | | I <= 136.84
| | | Tamb <= 10.4
instances of low, medium and high hot-spot temperatures are | | | | Tamb <= 6.87: L (83.0/18.0)
classified with acceptable error. | | | | Tamb > 6.87
| | | | | I <= 121.4: L (53.0/15.0)
TABLE 1: Classification error of decision trees with higher or lower | | | | | I > 121.4: M (179.0/64.0)
confidence factor (less or more pruned, respectively). | | | Tamb > 10.4: M (726.0/29.0)
| | I > 136.84: M (2076.0/17.0)
Classification error (full Classification Tamb > 19.07
Confidence factor
training data) [%] error (test) [%] | I <= 105.16
| | I <= 95.55: M (2661.0)
0.25 5.1065 5.901 | | I > 95.55
| | | Tamb <= 31.93: M (571.0/122.0)
0.005 5.3469 6.3887 | | | Tamb > 31.93: H (19.0/6.0)
| I > 105.16
| | Tamb <= 21: M (74.0)
| | Tamb > 21
TABLE 2: Confusion matrix of the decision tree on test data. | | | I <= 113.43
| | | | Tamb <= 25.53: M (45.0/7.0)
Classified as Æ L M H | | | | Tamb > 25.53: H (466.0/124.0)
| | | I > 113.43: H (1672.0/18.0)
L 1048 62 0
Number of Leaves : 20
M 81 2122 44 Size of the tree : 39
Time taken to build model: 0.17 seconds
H 0 75 669
=== Evaluation on test split ===
=== Summary ===
The resulting tree is shown in Fig. 1. The leaves that clearly Correctly Classified Instances 3839 93.6113 %
have a large number of samples from the same class are Incorrectly Classified Instances 262 6.3887 %
highlighted in yellow. Kappa statistic 0.8921
Mean absolute error 0.0645
By looking at the tree, it can be deduced that when the Root mean squared error 0.184
Relative absolute error 16.3947 %
ambient temperature is low (less than 19 °C) the hot-spot Root relative squared error 41.3521 %
temperature is low (L), if the current is lower than 116 A, while Total Number of Instances 4101
it is medium (M) if the current is higher than this value. When
the ambient temperature is higher than 19 °C, typically the hot- Figure 1: Decision Tree.
spot temperature is medium, if the current is lower than 105 A,
while it is high (H) if the current is higher than this value.
B. Self-Organizing Map TABLE 4 shows the typical values of the patterns of current
A self-organizing map was trained using data of normal and ambient temperature in the different areas of the map. It
operation of the power transformer. The Matlab Neural can be said that these values reflect qualitatively the results of
Network Toolbox [13] was used for training this neural the classification made by the decision tree. Notice in fact that
network. Weights and biases of the map were obtained using a low values of the hot-spot temperature are normally associated
batch unsupervised training algorithm. This means that the hot- to low values of current and ambient temperature. Similarly,
spot temperature labels were not used inside the algorithm to high hot-spot temperature values correspond to high current
update the network parameters. The distance function used by and ambient temperature. A medium hot-spot temperature is
the training algorithm is called linkdist, which is a function of observed with either high current or high ambient temperature.
the Euclidean distance that converts it into a discrete distance. Fig. 4 shows the result of the classification of the whole data
The neighbourhood size was set to 1 during the whole training set made by the SOM obtained.
process. The number of neurons in the map was chosen
properly in order to minimize the quantization error, but
assuring good generalization capacity of the map.
Nevertheless, very large maps should normally be avoided,
since they would have quite bad generalization capacity even
for new normal operation data. The performance of the SOM
classifier was also evaluated to choose the proper map size.
The SOM classifier has been constructed as follows: each
neuron has been labelled using the information given by the
cases that have been assigned to it. The label of a neuron
corresponds to the label of the majority of the samples which
are represented there. When a new sample of current and
ambient temperature is collected, the map assigns it to a neuron
and classifies it according to the label of that neuron. A variety
of maps have been trained using different sizes: 5x5, 10x8, Figure 2: SOM sample hits (number of samples assigned to each cell
12x12, 16x16 The small map had a quite higher MSE (26.26) during training).
and looking at the distribution of the training data into the map,
it seemed that very different operating conditions may had
been grouped together. The largest map (16x16) had lower
MSE (2.1531) but it may be too detailed and split into different
neurons similar data patterns. In addition, the difference
between the classification error obtained with training and test
data is higher than the difference obtained with a smaller map
of size 12x12. This means that the map is losing generalization
ability. The map with size 12x12 had an MSE equal to 3.8769
and may achieve a good compromise between classification
error and generalization. This was selected as the final result.
TABLE 3: MSE and classification errors for selection of map structure.
Classification
SOM Classification
error (training) Total MSE Figure 3: SOM with cells classified according to the values of the hot-spot
sizes error (test) [%]
[%] temperature (1 = Low, 2 = Medium, 3 = High).
[5 5] 9.0806 9.9488 26.26 TABLE 4: Mean values of current and ambient temperature patterns in the
areas of the map corresponding to low, medium and high hot-spot temperature.
[10 6] 6.9329 7.9249 9.0844 Two values are shown for the medium case, corresponding to the mean values
of patterns in the bottom left area and in the top right one.
[12 12] 6.6943 7.3153 3.8769
Hot-spot temperature Current Ambient temperature
[16 16] 6.4557 7.1446 2.1531
L 85.939 10.737
Fig. 2 shows the number of training data that are
represented by each pattern in the selected map (12x12). Notice M (bottom) 145.22 11.791
that there are operation conditions that are more frequent and M (top) 88.405 25.093
others that are represented less. Notice also that three main
H 134.68 27.154
clusters can be roughly identified in the map. Fig. 3 represents
the SOM classifier where the different values of the hot-spot The percentage of incorrectly classified instances with the
temperature are easily distinguished in the map. The SOM test data set is 7.3153%. This is a little lower than the value
groups together values of current and ambient temperature that obtained with the decision tree (6.3887%). The confusion
all give the same qualitative value of the hot-spot temperature. matrix evaluated with the test data set can be observed in Table
This will help to the detection of possible anomalies when the 5.
map is used with new samples.
Notice that, the classifier obtained is very similar to the neurons in the hidden layer from 10 to 15 reduces the
decision tree (no high temperatures are classified as low and classification error with both training and test set. By contrast,
vice-versa) and that the numbers indicated are also very increasing the number of neurons to 20 makes the classification
similar. The SOM classifier only shows a higher percentage of error with test data increase and exceed the value obtained with
incorrectly classified instances in correspondence to low hot- 15 neurons. With 20 neurons the generalization capacity of the
spot temperatures. network gets worse. Consequently, the MLP having 15 neurons
in the hidden layer was selected as the best choice among the
tested MLP’s.
TABLE 6: Classification errors for the selection of the MLP structure.
Neurons in the Classification error Classification
hidden layer (training) [%] error (test) [%]
10 9.2439 10.1195
15 9.0681 9.3392
20 8.7918 9.4855
tree. By contrast, it performs worse than the SOM and the TABLE 9: Decision table obtained with the labeled attributes.
decision tree when the high hot-spot temperature achieves Case I Tamb Ths
medium-high values. This means that the partition of the input
1 M A A
space made by the SOM (Fig. 3) seems closer to the real hot-
spot temperature behaviour. 2 B B B
3 M M M
4 A B M
5 B A M