Rule-Based Classification of Energy Theft and Anom
Rule-Based Classification of Energy Theft and Anom
Research Article
Abstract: The invent of advanced metering infrastructure (AMI) opens the door for a comprehensive analysis of consumers
consumption patterns including energy theft studies, which were not possible beforehand. This study proposes a fraud detection
methodology using data mining techniques such as hierarchical clustering and decision tree classification to identify
abnormalities in consumer consumption patterns and further classify the abnormality type into the anomaly, fraud, high or low
power consumption based on rule-based learning. The proposed algorithm uses real-time dataset of Nana Kajaliyala village,
Gujarat, India. The focus has been on generalizing the algorithm for varied practical cases to make it adaptive towards non-
malicious changes in consumer profile. Simultaneously, this study proposes a novel validation technique used for validation,
which utilizes predicted profiles to ensure accurate bifurcation between anomaly and theft targets. The result exhibits high
detection ratio and low false-positive ratio due to the application of appropriate validation block. The proposed methodology is
also investigated from point of view of privacy preservation and is found to be relatively secure owing to low-sampling rates,
minimal usage of metadata and communication layer. The proposed algorithm has an edge over state-of-the-art theft detection
algorithms in detection accuracy and robustness towards outliers.
1 Introduction detection, it also unfolds the new dimensions of electricity theft via
intercepting communication link or modifying the stored data.
Energy theft rarely makes headlines, but in fact, it is a highly There are many ways of electricity theft, the unfussy one is to
predominant concern for power sectors and the nation as a whole. connect power lines directly to external feeder thereby bypassing
The world loses about ∼90 billion annually to electricity theft, of meter. The more complex method includes varying the meter
which solely India contributes up to 17 billion, followed by Brazil connection or tinkering the meter itself. The outspread view of
and Russia with 10 billion and 5.1 billion, respectively, as in 2015 theft attacks and other factor affecting aggregate technical and
[1]. World Bank estimates that electricity theft stroke India about commercial (AT&C) losses are conveyed through Fig. 1. The
1.5% of its total gross domestic product. The approximate AT&C estimation framework includes consideration of
transmission and distribution losses of Indian states hovers around transmission, distribution, metering, collection as well as
23% with a few exceptions where it reaches about 50%. Regardless recovering losses. Usually, such a framework may be hindered by
of state governments’ efforts, the reduction in thefts scenario has non-trivial approximation of electrical theft which may occur at
been marginal. Despite Gujarat enjoying one of the most efficient any level and in any form as shown in Fig. 1. According to
power sectors in India, it has lost about 1 billion in the year 2014– McLaughlin et al. [5], the possible energy theft in AMI can be
2015 to energy theft, which comprehends to ∼153 million units of classified into three major categories. Fig. 2 shows a considerable
power loss. Paschim Gujarat Vij Company Ltd (PGVCL) component of these theft attacks, which are enumerated below:
contributes more than 40% of Gujarat energy theft scenario [2–4].
However, with research and development power sectors focusing i. Physical attacks: This type of attack includes modifying the
on advanced metering infrastructure (AMI) and its application in external wiring or tinker with the meter physically to account
lessening the energy theft, the road ahead for these sectors looks lower energy usage. For instance, placing a strong magnet
brighter. However, with the possible advantages of AMI in theft nearby meter which invites large error in measurement via
interfering with its power supply transformer, or by
Fig. 1 AT&C losses: interaction with major theft attacks Fig. 2 Energy theft categorisation
approaches to theft detection is the availability and knowledge of Consumers exhibiting trend, seasonality, and multiple load profiles
labelled dataset that includes theft and anomaly labels. In may prove relatively tricky to handle. For example, a consumer
unsupervised learning, the abnormality is detected without any may follow one or more load patterns spread over weekend,
prior knowledge about target labels where a clustering method is weekday or holiday. Noise and outliers can be taken care off by apt
used to group dataset based on similar pattern and irregular data pre-processing but one of the key challenges faced after an
patterns are labelled for testing and validation phase. Whereas, in appropriate data pre-processing while extraction of RP of a
supervised learning, a pre-labelled dataset is required and employs consumer. The proposed methodology finds a number of load
a classification model to train for both normal and abnormal data patterns followed by a consumer using gap evaluation criteria [36],
subsets. On the contradictory, in semi-supervised learning, the and then load profiles of each consumer are clustered using k-mean
model is trained only for one class either normal or abnormal using to find the RP using centroids of each significant cluster. However,
semi-supervised learning algorithm followed by single labelled our methodology constrained the maximum number of clusters to
classification. 4, empirically based on data visualisation.
The proposed methodology employs unsupervised learning for Upper–lower bounds are computed for each consumer cluster to
detection of abnormality in consumer profile owing to lack of identify outliers or insignificant cluster profiles. Upper bound is
labelled theft dataset. In other words, the model does not use theft calculated using three scaled median absolute deviations and lower
data or risk data, which in all probabilities hard to acquire by bound is calculated using a minimum of sample points of all load
utilities. The proposed methodology is divided into two phases: profiles in a cluster as given by (1) and (2), respectively. The
training phase and implementation phase. median absolute deviation is used instead of standard deviation as
it is more resilient to outliers. Furthermore, every load profile
3.1 Training phase within the cluster is checked against upper–lower bounds, load
profile with more than threshold m points violating these bounds is
The schematic shown in Fig. 3 depicts the flow diagram of the considered as an outlier and ignored in the derivation of RP. This
training phase. Dataset used in this study is a property of Gujarat step helps in removing anomaly during computation of RP.
Urja Vikas Nigam Limited, which was collected as a part of smart Additionally, a state transition probability (STP) associated with
meter pilot project of a Nana Kajaliya village, Gujarat in the area each RP is computed. STP may be defined as a ratio of consumer
monitored by PGVCL. The power supply in Nana Kajaliyala is profile in each cluster to a total number of consumer profiles. It
done through distribution transformer of 100 kVA containing 170 gives the probability denoting how frequently each load pattern
signal phases and one three-phase consumer. The major population followed by a consumer. The formulation of upper–lower bounds
of the village is farmers and a few commercial units. The smart and STP for a consumer cluster is given below
meter dataset consists of one year data of 171 consumers at 15 min
resolution. Hence, each load profile is represented by 96 data Upper j = c × median Lij − median L j (1)
points. The major components of the training phase are as follows.
where c can be defined as
3.1.1 Data pre-processing: The foremost step of any data mining
methodology is data pre-processing that involves recasting of −1
typically unformatted or raw data into a meaningful format. Data c=
pre-processing is required to convert the raw data into refined 2 × erfcinv(3/2)
information assets used for decision making and operation. Smart
meter datasets are inconsistent and may contain missing or Here, erfcinv() is the inverse complementary error function
corrupted values. Hence, appropriate data pre-processing becomes
an inevitable step which may consist of tasks such as data cleaning, Lower j = min (Lij, . . . , LNj ) (2)
data filtering, data transformation, data reduction etc. The Nana
Kajaliyala village dataset comprised bad data, missing data, and for iϵ(1, . . . , N), jϵ(1, . . . , n)
outlier profiles. The proposed mechanism uses forecasting
techniques, such as seasonal autoregressive integrated moving STP = N /T n (3)
average [34] model and exponential smoothing to settle missing
and bad data replacement issues. The data smoothing process is where L is a load profile matrix of size n × N. N is the number of
managed using a Savitzky–Golay digital filter [35] to eradicate load profiles in a cluster, n is the number of sample data points in
outlier peaks from the load profiles. each load profile and T n is a total number of load profiles of a
consumer.
3.1.2 RP derivation: RP can be interpreted as daily or repetitive
load curve/curves exhibited by a consumer over a period. In other 3.1.3 RP clustering: Derivation of RPs is followed by clustering
words, a RP depicts the regular or most common consumption step to make sure expeditious operation of the detection algorithm.
pattern of a consumer. A consumer RP can only be derived Hierarchical agglomerative clustering is carried out on RPs to club
articulately if a sufficient sample space of data is available. together similar consumers, which help the detection algorithm to
Sufficiency of data length is subjective and based on factors such navigate swiftly towards the suspicious consumer. Agglomerative
as trend, seasonality, noise, outliers, and missing data samples. clustering is a bottom-up approach where initially each data point
Precision × Recall
F1Score = 2 × (9)
Precision + Recall
fp
FPR = (10)
fp + tn
tp + tn
Accuracy = (11)
tp + tn + fp + fn
distribution transformer data to identify various abnormalities and vi. Fig. 11i shows the high magnitude of section and cluster
theft scenario; and to achieve a list of theft probable consumer. variance. Additionally, the mean deviation from upper–lower
This algorithm effectively enacts over a one year data of all the bounds and RP are also beyond the thresholds in all sections
consumers to detect their sectional wise abnormal behaviour. Note which are classified as high theft or low load profile. Such a
that Fig. 11 represents sectional classification identified as a result target suggests that future scrutiny is required.
of the algorithm executed in September 2017 data of a consumer.
The upper–lower bounds highlighted are the tolerance belt where However, evaluation methods for theft detection algorithms are
the consumer profile was anticipated as a result of historical data as highly reliant on the availability of data, sampling rate, and privacy
explained in the training phase. It also involves highlighted concerns.
sections consisting of abnormalities, the consumer RP, and
classification with respect to each section. It is evident from Fig. 11 4.1 Evaluation
that various classified abnormalities are identified in the consumer
profile in real-time as listed: There are mainly two evaluation approaches based on the
availability of theft or risk dataset for training algorithm. The
i. Fig. 11a shows the normal detected profiles which do not or evaluation approach when theft data is known uses accuracy and
marginally violates the upper-lower bounds, hence no sectional FPR measure for the known theft data, whereas the evaluation
abnormalities were detected. approach uses accuracy and FPR measure based on synthesised
ii. Figs. 11b and c represent high load anomaly in section (A) and abnormal dataset when the theft or risk data is unknown. The
(E, M), respectively, which may or may not entertain the utility proposed methodology utilises Nana Kajaliyala village dataset of
in theft detection but could be crucial in demand–response or 171 consumers for theft detection, it shall be noted that the dataset
operation level analysis. lacks validated theft data. Owing to the absence of knowledge
iii. Figs. 11d and e unveil consumer preliminary theft about theft scenarios in the dataset, a mixed approach for
classification in section (N) and (E, M), respectively. evaluation is adapted. It uses accuracy and FPR measure on
Moreover, the theft magnitude identified using the attributes synthesised data as well as evaluates the algorithm if it correctly
were suggested as low and medium, respectively. classifies normal dataset on actual time series as shown in Fig. 11
and stores the priority risk consumer list to aid analytic results for
iv. Fig. 11f is detected as high theft owing sectional variance in
the utility usage. A mixture of synthesised theft profiles,
(M, A, N) sections as well as the input profile shows high
synthesised abnormal profiles, and actual normal profiles of several
magnitude deviation from RP and its identified cluster as is
consumers is fed to the proposed model during the testing phase to
evident from the figure.
assess the performance of the theft detection algorithm. The
v. Figs. 11g and h highlight the consumer profiles that exhibit shaping of synthesised data that imitates the real-world theft
multiple types of abnormalities in the same profile. For scenario is very tricky. The construction of theft data and various
instance, Fig. 11h represents a profile that had medium theft abnormal data can be devised as an output of the listed functions.
and high abnormality detected in the sections (A, N) and (E),
respectively.