0% found this document useful (0 votes)
13 views

Rule-Based Classification of Energy Theft and Anom

This document summarizes a research article about using rule-based classification and data mining techniques to identify energy theft and anomalies in consumer load profiles using smart meter data. The proposed methodology uses hierarchical clustering and decision tree classification to detect abnormalities and further classify them as anomalies, fraud, or high/low power consumption. Testing on real smart meter data from India achieved high detection rates and low false positives by validating predictions against actual consumption patterns. The approach aims to generalize well to different situations while preserving privacy by using low-sampling rates and minimal metadata.

Uploaded by

Kiran Teja
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

Rule-Based Classification of Energy Theft and Anom

This document summarizes a research article about using rule-based classification and data mining techniques to identify energy theft and anomalies in consumer load profiles using smart meter data. The proposed methodology uses hierarchical clustering and decision tree classification to detect abnormalities and further classify them as anomalies, fraud, or high/low power consumption. Testing on real smart meter data from India achieved high detection rates and low false positives by validating predictions against actual consumption patterns. The approach aims to generalize well to different situations while preserving privacy by using low-sampling rates and minimal metadata.

Uploaded by

Kiran Teja
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

IET Smart Grid

Research Article

Rule-based classification of energy theft and eISSN 2515-2947


Received on 16th March 2019

anomalies in consumers load demand profile


Revised 25th June 2019
Accepted on 18th July 2019
E-First on 13th September 2019
doi: 10.1049/iet-stg.2019.0081
www.ietdl.org

Sonal Jain1 , Kushan A. Choksi2, Naran M. Pindoriya1


1Department of Electrical Engineering, Indian Institute of Technology Gandhinagar, Palaj – 382355, Gujarat, India
2Department of Electrical Engineering, Indian Institute of Technology Bombay, Powai, Mumbai – 400076, Maharashtra, India
E-mail: [email protected]

Abstract: The invent of advanced metering infrastructure (AMI) opens the door for a comprehensive analysis of consumers
consumption patterns including energy theft studies, which were not possible beforehand. This study proposes a fraud detection
methodology using data mining techniques such as hierarchical clustering and decision tree classification to identify
abnormalities in consumer consumption patterns and further classify the abnormality type into the anomaly, fraud, high or low
power consumption based on rule-based learning. The proposed algorithm uses real-time dataset of Nana Kajaliyala village,
Gujarat, India. The focus has been on generalizing the algorithm for varied practical cases to make it adaptive towards non-
malicious changes in consumer profile. Simultaneously, this study proposes a novel validation technique used for validation,
which utilizes predicted profiles to ensure accurate bifurcation between anomaly and theft targets. The result exhibits high
detection ratio and low false-positive ratio due to the application of appropriate validation block. The proposed methodology is
also investigated from point of view of privacy preservation and is found to be relatively secure owing to low-sampling rates,
minimal usage of metadata and communication layer. The proposed algorithm has an edge over state-of-the-art theft detection
algorithms in detection accuracy and robustness towards outliers.

1 Introduction detection, it also unfolds the new dimensions of electricity theft via
intercepting communication link or modifying the stored data.
Energy theft rarely makes headlines, but in fact, it is a highly There are many ways of electricity theft, the unfussy one is to
predominant concern for power sectors and the nation as a whole. connect power lines directly to external feeder thereby bypassing
The world loses about ∼90 billion annually to electricity theft, of meter. The more complex method includes varying the meter
which solely India contributes up to 17 billion, followed by Brazil connection or tinkering the meter itself. The outspread view of
and Russia with 10 billion and 5.1 billion, respectively, as in 2015 theft attacks and other factor affecting aggregate technical and
[1]. World Bank estimates that electricity theft stroke India about commercial (AT&C) losses are conveyed through Fig. 1. The
1.5% of its total gross domestic product. The approximate AT&C estimation framework includes consideration of
transmission and distribution losses of Indian states hovers around transmission, distribution, metering, collection as well as
23% with a few exceptions where it reaches about 50%. Regardless recovering losses. Usually, such a framework may be hindered by
of state governments’ efforts, the reduction in thefts scenario has non-trivial approximation of electrical theft which may occur at
been marginal. Despite Gujarat enjoying one of the most efficient any level and in any form as shown in Fig. 1. According to
power sectors in India, it has lost about 1 billion in the year 2014– McLaughlin et al. [5], the possible energy theft in AMI can be
2015 to energy theft, which comprehends to ∼153 million units of classified into three major categories. Fig. 2 shows a considerable
power loss. Paschim Gujarat Vij Company Ltd (PGVCL) component of these theft attacks, which are enumerated below:
contributes more than 40% of Gujarat energy theft scenario [2–4].
However, with research and development power sectors focusing i. Physical attacks: This type of attack includes modifying the
on advanced metering infrastructure (AMI) and its application in external wiring or tinker with the meter physically to account
lessening the energy theft, the road ahead for these sectors looks lower energy usage. For instance, placing a strong magnet
brighter. However, with the possible advantages of AMI in theft nearby meter which invites large error in measurement via
interfering with its power supply transformer, or by

Fig. 1 AT&C losses: interaction with major theft attacks Fig. 2 Energy theft categorisation

IET Smart Grid, 2019, Vol. 2 Iss. 4, pp. 612-624 612


This is an open access article published by the IET under the Creative Commons Attribution License
(http://creativecommons.org/licenses/by/3.0/)
disconnecting or reversing the meter. Such an attack corrupts extracting representative profile (RP) of consumers or a group
ideal energy consumption or energy metering block. of consumers.
ii. Cyber-attacks: Any attempt to harm or distort communication ii. Load pattern: Load pattern associated with a consumer may
network or any computerised system falls under cyber-attacks. vary on weekdays and weekends or a consumer may exhibit
These types of attacks are achieved via distant computer or multiple profiles on various days of a week. Hence,
network. Energy metering and communication blocks are most consistency of a consumer profile is difficult to determine.
vulnerable to such attacks. For instance, modifying data over a iii. Change of appliance: A consumer may augment or decrease
communication link or stored on a computer is a typical the number of appliances or replace a type of appliance, which
paradigm of a cyber-attack. can change its load pattern ensuring increased failure rates of
iii. Data attacks: The integrity of data should remain intact the designed algorithm. Hence, the theft detection algorithm
throughout its entire life-cycle, which consist of data should be adaptive to counter such changes.
generation, transmission, and storage. The type of attack that iv. Load shifting: Peak load demands of a consumer may shift
breaches data integrity comes under data attacks. Data attack during a day due to various reason such as an office or school
can transpire at any instant and may manipulate metering, time change, change of season etc. Hence, the devised
communication or energy billing blocks of AT&C losses algorithm should be robust to such perturbation.
estimation framework v. Anomaly: Consumer profile may contain anomaly due to
various reasons such as festive occasion, change in the number
(energy input − energy realised) of persons at home or unavailability at home for some time or
AT&C loss =
energy input a whole day, so theft detection algorithm needs to accurately
identify the thin line between thefts and anomaly.
whereas energy realised and collection efficiency given as vi. Lack of training theft profile: The major challenge associated
with the classification-based algorithm is that it is very difficult
Energy realised = energy billed × colletion efficiency to own training data for theft events which are pre-identified,
so the algorithm is bound to be trained for simulated or
amount realised manipulated profiles to depict the real world theft scenario.
Collection efficiency =
amount billed
All these challenges need to be addressed properly to reduce the
Theft detection schemes either target partial or entire attack FPR and increase the true detection ratio (DR). The proposed
spectrum based on data availability or objective of the algorithm. methodology uses a decision tree based classification approach to
The detailed review of various techniques to analyse and prevent increase the true DR and reduce the chance of false detection rate
non-technical losses (NTL) was given in [6]. It provides a review by incorporating probabilistic and predictive attributes as discussed
on the usage of a smart meter for prevention of hardware and in rule-base creation.
software based on technical losses (TL) using smart meter dataset.
Recent schemes for energy theft detection largely tagged into three
major strategies [7], which can be functionally classified into (i)
2 Related work, research gaps, and objectives
state estimation-based approach; (ii) game theory-based approach; In a classification-based approach, SVM is the most widely used
and (iii) classification-based approach. State estimation-based data mining technique for energy theft detection. The popularity of
technique manoeuvres state monitoring information to improve SVM over logistic regression or classifier is because it is a
theft detection accuracy. State monitoring can be achieved by maximum margin classifier and can be easily evolved for non-
employing a wireless sensor network [4, 8, 9], radio frequency linear boundaries. Depuru et al. [15] use SVM and neural network
identification tagging [10], mutual inspection among utility and for load profile evaluation and estimation of the SVM parameter,
consumer [11] etc. Such methods promise high detection accuracy, respectively. It proposes a data encoding technique for faster
incurring extra investment costs for implementing a monitoring evaluation of load profile. However, the shortcoming of this model
system. On the other hand, in game theory-based schemes [12–14] lies in its inability to identify various types of attacks owing to the
energy theft is devised as a game between the power utility and binary conversion of the load profile during encoding. In [16], a C-
electricity thieves. The goal of electricity thieves is defined as a SVM based theft model was presented, which uses 24 daily
function of a predefined amount of electricity theft, and terms average consumer profiles calculated over 24 months and
minimising the probability of being detected, whereas the goal of creditability worth rating (CWR) to train the model. The use of the
utility devised to maximise the probability of detection and CWR value depends upon consumer commercial behaviours such
minimising operational cost for fraud detection mechanism [7]. as evading paying bills or late payment. Such an approach was able
These methods perform reasonably well and can be a potential to detect only sudden changes in the load profile over the long-term
solution to reduce energy theft. However, codifying all players trend. The disadvantage of this method hovers around its
utility function and prospective strategies remains to be a dependence on average consumption and the use of huge training
challenging task. Hence, despite its being economical, game dataset, which limits the accuracy of this model. An extension of
theory-based approaches can be arduous and non-trivial. this work was given in [17] for decision making and shortlisting
Classification-based schemes are based on high-end data analytic consumer with a higher probability of fraud, with the inclusion of
tools used to the classification of consumer or group of consumers human knowledge and implementing a fuzzy interface system on
based on their electricity consumption over a period and identify selected parameters.
irregularities in their electricity consumption. The use of smart A SVM-based approach was used in [18] where consumers
meter data provides the additional advantage of a detailed analysis were classified into different SVM classes based on their load
of consumers consumption pattern in classification-based theft profile deviation. The training dataset was segregated based on
detection methods. Data mining methods such as support vector consumer approximated load profiles, consumer category, their
machine (SVM), decision tree learning, neural network, and fuzzy geographical area, types of load they use and seasonality. Though
clustering are used to identify the abnormality in consumer the use of an approximate profile for training model can limit the
consumption pattern. Despite such methods being reasonably accuracy of the algorithm towards real-world dataset. In [19], the
accurate as well as economical, they tend to suffer from high false performance of various data mining techniques was compared,
positive rate (FPR). The challenges associated with the which includes extreme machine learning (ELM), online
classification-based model that uses a consumption pattern of the sequential-extreme machine learning (OS-ELM) and SVM for
consumer for theft detection can be indexed as follows: analysing NTL activities. It concluded that ELM and OS-ELM
give better accuracy in classification than SVM. In [20], extreme
i. Seasonality: Seasonal variations in load profile by the virtue of learning machine (ELM) algorithm was used for theft detection
changing environmental condition, change in load which was an extension of [19]. Clustering within consumer was
requirements or variation in time of use; can prove a barrier in utilised to identify different load patterns of a consumer. The lower

IET Smart Grid, 2019, Vol. 2 Iss. 4, pp. 612-624 613


This is an open access article published by the IET under the Creative Commons Attribution License
(http://creativecommons.org/licenses/by/3.0/)
Table 1 Research gap and objectives use of this approach is handicapped by the fact that it requires a
Scheme Research gap Objective derived huge chunk of metadata limiting generalisation of the approach. In
[15, 16, 30, 31] unable to detect various identify various types of [28], a consumption pattern-based energy theft detection was
types of theft attacks theft attacks presented using SVM classification. It exploits distribution
transformer meter data to identify an area with possible theft and to
[16, 29–31] lack's ability to should identify multi-
validate the theft by checking transmitted and received aggregated
accommodate profile consumers and
power. It overcame the issue of imbalance data during SVM
consumers using multiple handle them appropriately
training by generating synthetic attack dataset using different
power consumption
functions on a benign sample. However, it has a very large memory
pattern
requirement because the size of the dataset will increase
[16, 18] limited classification provide high classification tremendously with time, as for every benign sample more attack
accuracy and high FPR accuracy and reduced samples are generated and stored.
FPR However, in most of the cases, it is difficult to get theft data or
[15, 16, 19, 20, inability to distinguish devise an algorithm to labelled data which ensures the popularity of unsupervised
28, 29] between theft and draw a clear bifurcation learning-based algorithm. Such an approach was presented in [29],
anomaly between theft and which is a combination of state-based and classification-based
anomaly approach to detect various types of theft attacks with high
[15, 16, 20, 28, unable to classify identify various types of accuracy. It uses a maximum information coefficient technique to
29, 31] different types of abnormality in the find a correlation between NTL and consumer load profile and a
abnormality consumer profile clustering technique by fast search and find of density peak to
[16, 21, 29–31] ignores TL during Technical loss component capture abnormal behaviour. Similarly, an unsupervised algorithm
computation of NTL and to be considered for the was given in [30] for NTL and anomaly detection in a load profile.
diagnosing theft computation of NTL It uses the optimum path forest method compared with all other
clustering techniques such as k-means, one-class SVM, Birch and
Gaussian mixture model etc. However, these techniques do not
and upper bounds computed for each cluster using its mean and consider TL while calculating NTL. In [31], fuzzy c-means based
standard deviation to identify irregularities in clustered profiles. clustering technique is used for fraud detection using several
Any load profile breaching these upper-lower bounds was features of consumer profile such as average consumption,
considered as suspicious. This method also uses the prediction maximum consumption, standard deviation, inspection remarks,
technique for identifying and forecasting NTL behaviour. and the average consumption of residential area of the consumer
However, its shortcoming was an ambiguous approach in over last six months duration. Abnormalities in consumer pattern
categorisation between theft or anomaly. A multi-sensor-based detected by computing the difference of Euclidean distance
theft detection mechanism was proposed in [5] that uses combined between the current profile and standard profile with cluster mean
information from the various sensor and consumer's consumption of that consumer. However, the use of aggregated data and long
pattern for accurate detection of energy theft. Accuracy was detection time limits the performance of this method. A
achieved at a price that incurs the extra cost of deploying sensors generalising deep learning approach that utilises both labelled as
and raising privacy concerns. well as unlabelled data was presented in [32], named as
A technique considering the privacy of consumer data using multitasking feature extracting fraud detector to handle high-
peer-to-peer (P2P) computing was presented in [21], which uses a dimensional data by extracting features.
centralised meter in each neighbourhood area to report the total The high FPR of the existing classification-based algorithm can
consumption and it is considered to be equal to a linear be costly as manual inspection is required once the theft is
combination of meter readings of all consumers under this area. detected. Moreover, an extensive review of the state-of-the-art
The suspicious consumer is detected by solving the linear methodologies led to a few research gaps and establishment of
equations. However, this method does not consider the TL during desirable objectives that a theft detection research should have, are
calculation and can identify only a limited type of theft scenario listed in Table 1. Research gap poses multiple queries which could
such as for constant reduction in electricity consumption. The idea be used to formulate various objectives. The objectives of the
of privacy preservation has been borrowed in the proposed proposed mechanism which endeavours to address the research gap
methodology. The disclosure of user private information such as can be noted as follows:
power consumption at a sampling rate of <1 s, grid topological
location of the user or user meter id raise concerns about consumer i. The proposed methodology uses no metadata and ensures one
privacy, safety etc. Selling and buying of such information for point computation without any communication of raw data to
insurance, theft, marketing or desegregation purposes can be avoid third party computing to meet privacy concerns.
threatening to user privacy and safety. Such information can even ii. The proposed methodology aims at being adaptive to various
be used to analyse if robbers alarm is set on or off [22]. Many non-malicious changes in the consumer profile such as change
researchers have advocated the use of dataset with high sampling of appliance, load shifting, and seasonality.
time for any sort of data mining for privacy preservation [23]. iii. The focus is predominantly on the reduction of FPR by
However, most of the previous research work in energy theft validating suspicious consumer load profile via reinstating its
detection have lacked from the point of view of privacy-preserving predicted profile during theft estimation.
energy theft detection [24]. The proposed methodology ensures iv. The proposed mechanism tries to draw an articulate line
that no intrusive metadata other than power consumption metering addressing the difference between theft and anomaly by
data is used. The data used has a sampling time of 15 min which is matching the variance in consumer's load with theft pattern.
sufficiently large to avoid any type of appliance level segregation v. It desires to classify the type of abnormality of consumer
[25, 26]. Moreover, the knowledge of grid topology is also profile such as anomaly, high load profile, load low profile,
avoided. A combination of decision tree and SVM-based approach high or low load at the time of day.
was employed in [27] to reduce the false positive ratio (FPR) in vi. The proposed mechanism also aims to address the fallacious
theft detection. It had its edge over other approaches as it allowed a theft detection issues related to multi-profile consumers by
real-time theft detection at transmission, distribution and consumer utilising novel attributes.
level. Here, decision tree learning was used to predict the
electricity usage of consumer, which was trained using parameters
such as the number of appliances, number of persons, temperature, 3 Proposed methodology
season, and time slot. The output and input of decision tree In [33], three fundamental approaches were specifically presented
learning along with actual load profile were fed to SVM for for abnormality detection: unsupervised, supervised, and semi-
classification of pattern into two classes, normal or malicious. The supervised. The basic difference between these three fundamental

614 IET Smart Grid, 2019, Vol. 2 Iss. 4, pp. 612-624


This is an open access article published by the IET under the Creative Commons Attribution License
(http://creativecommons.org/licenses/by/3.0/)
Fig. 3 Schematic flow of training phase of the proposed methodology

approaches to theft detection is the availability and knowledge of Consumers exhibiting trend, seasonality, and multiple load profiles
labelled dataset that includes theft and anomaly labels. In may prove relatively tricky to handle. For example, a consumer
unsupervised learning, the abnormality is detected without any may follow one or more load patterns spread over weekend,
prior knowledge about target labels where a clustering method is weekday or holiday. Noise and outliers can be taken care off by apt
used to group dataset based on similar pattern and irregular data pre-processing but one of the key challenges faced after an
patterns are labelled for testing and validation phase. Whereas, in appropriate data pre-processing while extraction of RP of a
supervised learning, a pre-labelled dataset is required and employs consumer. The proposed methodology finds a number of load
a classification model to train for both normal and abnormal data patterns followed by a consumer using gap evaluation criteria [36],
subsets. On the contradictory, in semi-supervised learning, the and then load profiles of each consumer are clustered using k-mean
model is trained only for one class either normal or abnormal using to find the RP using centroids of each significant cluster. However,
semi-supervised learning algorithm followed by single labelled our methodology constrained the maximum number of clusters to
classification. 4, empirically based on data visualisation.
The proposed methodology employs unsupervised learning for Upper–lower bounds are computed for each consumer cluster to
detection of abnormality in consumer profile owing to lack of identify outliers or insignificant cluster profiles. Upper bound is
labelled theft dataset. In other words, the model does not use theft calculated using three scaled median absolute deviations and lower
data or risk data, which in all probabilities hard to acquire by bound is calculated using a minimum of sample points of all load
utilities. The proposed methodology is divided into two phases: profiles in a cluster as given by (1) and (2), respectively. The
training phase and implementation phase. median absolute deviation is used instead of standard deviation as
it is more resilient to outliers. Furthermore, every load profile
3.1 Training phase within the cluster is checked against upper–lower bounds, load
profile with more than threshold m points violating these bounds is
The schematic shown in Fig. 3 depicts the flow diagram of the considered as an outlier and ignored in the derivation of RP. This
training phase. Dataset used in this study is a property of Gujarat step helps in removing anomaly during computation of RP.
Urja Vikas Nigam Limited, which was collected as a part of smart Additionally, a state transition probability (STP) associated with
meter pilot project of a Nana Kajaliya village, Gujarat in the area each RP is computed. STP may be defined as a ratio of consumer
monitored by PGVCL. The power supply in Nana Kajaliyala is profile in each cluster to a total number of consumer profiles. It
done through distribution transformer of 100 kVA containing 170 gives the probability denoting how frequently each load pattern
signal phases and one three-phase consumer. The major population followed by a consumer. The formulation of upper–lower bounds
of the village is farmers and a few commercial units. The smart and STP for a consumer cluster is given below
meter dataset consists of one year data of 171 consumers at 15 min
resolution. Hence, each load profile is represented by 96 data Upper j = c × median Lij − median L j (1)
points. The major components of the training phase are as follows.
where c can be defined as
3.1.1 Data pre-processing: The foremost step of any data mining
methodology is data pre-processing that involves recasting of −1
typically unformatted or raw data into a meaningful format. Data c=
pre-processing is required to convert the raw data into refined 2 × erfcinv(3/2)
information assets used for decision making and operation. Smart
meter datasets are inconsistent and may contain missing or Here, erfcinv() is the inverse complementary error function
corrupted values. Hence, appropriate data pre-processing becomes
an inevitable step which may consist of tasks such as data cleaning, Lower j = min (Lij, . . . , LNj ) (2)
data filtering, data transformation, data reduction etc. The Nana
Kajaliyala village dataset comprised bad data, missing data, and for iϵ(1, . . . , N), jϵ(1, . . . , n)
outlier profiles. The proposed mechanism uses forecasting
techniques, such as seasonal autoregressive integrated moving STP = N /T n (3)
average [34] model and exponential smoothing to settle missing
and bad data replacement issues. The data smoothing process is where L is a load profile matrix of size n × N. N is the number of
managed using a Savitzky–Golay digital filter [35] to eradicate load profiles in a cluster, n is the number of sample data points in
outlier peaks from the load profiles. each load profile and T n is a total number of load profiles of a
consumer.
3.1.2 RP derivation: RP can be interpreted as daily or repetitive
load curve/curves exhibited by a consumer over a period. In other 3.1.3 RP clustering: Derivation of RPs is followed by clustering
words, a RP depicts the regular or most common consumption step to make sure expeditious operation of the detection algorithm.
pattern of a consumer. A consumer RP can only be derived Hierarchical agglomerative clustering is carried out on RPs to club
articulately if a sufficient sample space of data is available. together similar consumers, which help the detection algorithm to
Sufficiency of data length is subjective and based on factors such navigate swiftly towards the suspicious consumer. Agglomerative
as trend, seasonality, noise, outliers, and missing data samples. clustering is a bottom-up approach where initially each data point

IET Smart Grid, 2019, Vol. 2 Iss. 4, pp. 612-624 615


This is an open access article published by the IET under the Creative Commons Attribution License
(http://creativecommons.org/licenses/by/3.0/)
decide optimal numbers of clusters. In such cases, the hierarchical
algorithm allows intrinsic visualisation of variation cluster
grouping owing to variation in a number of clusters.

3.1.4 Rule-base formation: A rule-base system is a stack of


attributes and targets, which can be classified to identify probable
energy thefts. In other words, it is a set of states and action, which
translates the rule-base attributes to theft probability targets to
enlighten the knowledge about probable energy thefts. In order to
understand the targets of rule-base, it is implicitly important to
understand what each attribute brings to the table. The rule-base
attributes selected by the proposed mechanism includes clusterVar,
highMag, lowMag, loadVar, highLoad, lowLoad, secVar,
secHighLoad and secLowLoad computed for each high risk or high
theft probable consumer profiles.
However, the challenge lies in identifying theft probable
consumers and theft time span during the training of the model. To
identify the same, initially, the power transmitted to the
Fig. 4 Clustering representation using dendrogram corresponding distribution transformer meter is equated with an
aggregated amount of received power by each consumer smart
meter. It gives the total power lost, which includes both TL and
NTL. NTL comprises power theft and other losses. Furthermore, to
identify NTL, it is necessary to compute TL precisely. There are
various methods presented in the literature for computing TL. In
[38], a method is given for the precise calculation of TL in
branches of the distribution system by employing a specific circuit
in each branch, whereas the method presented in [39–41] depends
upon the physical characteristics of the distribution line, yet they
are less accurate due to the dependency of such features on
environmental condition. However, due to lack of information
about branch topology and physical characteristics of the line,
finding exact TL remains out of the scope to the proposed work.
Hence, an empirically computed percentage error ε term associated
with TL is employed as suggested in [38]. In case of no NTL, the
received power from the smart meter at time t can be written as

Etm(t) = ∑ Esm (t) + Etl(t) + ε


i (5)
i
Fig. 5 Leaf level clusters of hierarchical clustering
In the case of NTL, the transmitted power will be given as
is treated as an individual cluster and then the pair of clusters are
grouped recursively based on their similarity or dissimilarity
Etm(t) > η × ∑ Esmi(t) + Etl(t) + ε + Th (6)
metrics. The linkage criterion determines the distance between i
clusters as a function of the pairwise distances between data points
in the clusters. Some commonly used linkage criteria to compute Here, Etm(t) represents transmitted power from the transformer at
the distance between two clusters are given in [37]. The proposed
time t, Etl(t) depicts TL, ∑i Esmi(t) is the total amount of received
mechanism uses average linkage criteria for hierarchical clustering,
which computes average distance between all pairs of data points power from consumer's smart meter, ε is the error of calculating TL
of two clusters as given by (4), where a and b represent data points and Th shows tolerance as a function of peak transmitted power.
of clusters A and B, and na and nb defines the total number of data The term η corresponds to metering and communication efficiency
points in clusters A and B, respectively and dis() is the chosen of smart metering infrastructure. This study assumes the metered
distance matrix data to be truthful, hence η value is set to 1. However, the inclusion
of η would help the utility to have a reign over data or cyber-
na nb attacks involving metering or communication blocks as shown in
1 Fig. 1.
(4)
na ⋅ nb i∑ ∑ dis(ai, b j)
d(A, B) =
=1 j=1 If condition given by (10) holds true then the corresponding
magnitude of theft, theft location, and duration is computed. These
A taxonomic tree of hierarchical clustering is presented using attributes are then used to identify priority order for clusters to be
dendrogram, which suggests the relationship between the linkage examined. For example, if there is a theft event of a certain
distances and number of clusters. A popular unquoted thumb rule magnitude in the evening then clusters having an equal or
finds an optimum number of clusters by employing a horizontal correlated magnitude in the evening are examined first as more
segregation line at a height equal to half the length of the largest chances of a possible theft in those clusters with respect to other
linkage value. However, an optimum number of clusters is verified clusters. Once the clusters are known, the consumer under those
with the help of gap evaluation criteria. Hierarchical clustering is clusters are tagged as high theft probable and will be scrutinised
preferred over flat clustering as it provides an informative structure further as its cluster variance, load variance, and sectional variance
and has a property of reproduce-ability. Moreover, it is difficult to parameters are computed to detect variation in current load profiles
decide the optimal number of clusters in k-means and k-medoids, with respect to their respective cluster's history, upper–lower
which can be done easily by looking at the dendrogram in bounds, and sectional load. High theft probable profiles are used to
hierarchical clustering. The dendrogram of hierarchical clustering train the classification model using attributes and targets as
is shown in Fig. 4. The clusters of RPs at the leaf level of discussed earlier. For every consumer profile used in training, the
dendrogram are shown in Fig. 5. It shall be noted that an optimal following attributes are assigned for scrutinising consumer profile:
number of clusters is unknown initially in case of theft detection
problem formulation, hence cluster numbers have to be varied to

616 IET Smart Grid, 2019, Vol. 2 Iss. 4, pp. 612-624


This is an open access article published by the IET under the Creative Commons Attribution License
(http://creativecommons.org/licenses/by/3.0/)
(a) Cluster variance: It is given by a three-state variable clusterVar,
which represents the association of the new consumer's input
profile with its previously identified cluster or clusters. In other
words, the aggregation of clusterVar over new input profiles for the
same consumer is a measure of the consistency of a consumer to its
RP and cluster pre-defined tagging. To identify a cluster of new
input profile, its Euclidean distance is computed with all cluster
centroids and assigned a temporary cluster tag having a minimum
Euclidean distance. Variable clusterVar is assigned as 0 if the new
input profile is tagged to its pre-assigned cluster else it is assigned
value 1 or 2 depending on its cluster distance with its pre-identified
cluster. For instance, if the cluster tag is such that the distance of
tagged cluster with consumer's pre-identified cluster (calculated
using (6)) is less than predefined threshold th, then clusterVar is set
to 1, otherwise, it is set to 2 as shown in Algorithm 1 (see Fig. 6).
Here, the value of th is chosen analytically using clusters of RPs.
Its value depends upon the acceptable variance of input profile
from its pre-identified cluster. Hence, for proposed work value of
th is defined as twice of the standard deviation of the cluster. It can
also be chosen by looking at dendrogram in Fig. 4. It shall be noted
Fig. 6 Algorithm 1: calculate cluster variance that this study takes an assumption that if the variation of input
profile varies (⩽ twice the standard deviation of the cluster), it is
identified in the same cluster. Moreover, cluster variance also
decides the value of highMag and lowMag flags based on the
magnitude of the new temporary cluster with respect to its previous
identified cluster. highMag flag is set if the magnitude of the new
identified cluster is high then its previous identified cluster.
Similarly, lowMag flag is set if a newly identified cluster has low
magnitude with respect to its previous identified cluster.
(b) Load variance: The informative evaluation of input profile with
consumer upper-lower bounds is enveloped in the form of variables
such as loadVar, lowLoad, and highLoad, where loadVar represents
the variation in new input profile with respect to its upper-lower
bounds. highLoad represents the high load of new input profile
with its previous load profile, whereas the lowLoad represents a
load of new input profile is low with respect to its regular load
profile. It can be summarised as an attribute encapsulating the
reproducibility of consumer profile in a confidence band as defined
by upper-lower bounds. For instance, if m points go outside these
Fig. 7 Algorithm 2: calculate load variance bounds than load variance variable loadVar is set to 1. Here,
variable m depicts the tolerance of abnormality. In the proposed
Table 2 Temporal windows defining the sectional mechanism, the value of m is chosen as 10% (for computing
consideration results) of sampling rate at an instance which can be varied
according to the utility requirement. The selection of parameter m
Section name Section time Data points
is an empirical selection based on average outlier point located
early-morning (E) 12 AM to 4 AM 1:16 during the visualisation of the data set. It shall be noted that m
morning (M) 4 AM to 10 AM 17:40 allows the utility to focus on various types of theft, for instance,
afternoon (A) 10 AM to 5 PM 41:70 lower value of m for the detection of ephemeral thefts whereas a
night (N) 5 PM to 12 AM 71:96 larger value of m for very high abnormality. The variables
highLoad and lowLoad are assigned values based on a mean
violation of upper–lower bounds by the incoming input profile as
shown in Algorithm 2 (see Fig. 7). In summary, the load variance
algorithm extracts information regarding a quantitative violation of
input profile from the confidence band which proves decisive in
theft decision making.
(c) Sectional variance: It is a very important attribute accumulating
information regarding the time of probable theft or anomaly or
profile-variation. It uses a sectional window approach to identify
the time of abnormality in the load profile. Four sections are
considered throughout a day depending upon dataset. Section
break-up is shown in Table 2. Sectional variance is represented by
variable secVar as a combination of four values [E M A N], the
corresponding bit set to 1 if abnormality found in that section. The
abnormality is calculated using upper–lower bounds of dubious
section. The average of upper–lower bounds of the corresponding
section is compared with an average of a dubious section of input
profile and secLowload and secHighload variables which
represents low load and high load in the section are set accordingly.
Algorithm 3 (see Fig. 8) shows the computation of sectional
variance variables for early-morning section (E). Here, k is also an
Fig. 8 Algorithm 3: calculate sectional variance (early-morning) abnormality tolerance similar to m in Algorithm 2, whose value set
20% of sample points of the dubious section for an instance and
can be varied based on utility requirement.

IET Smart Grid, 2019, Vol. 2 Iss. 4, pp. 612-624 617


This is an open access article published by the IET under the Creative Commons Attribution License
(http://creativecommons.org/licenses/by/3.0/)
(d) Theft pattern attributes: Theft pattern attributes are calculated To depict the dependency of various rule-base attributes on load
using (10). The coincidence or overlap to the same with the profile, consumer load profiles of one month with their upper–
sectional attributes of input profile help in deciding the type of lower bounds are shown in Fig. 9a. The RP of consumer calculated
abnormality in the input profile. The idea behind assigning target using these load profiles is depicted in Fig. 9b. Fig. 9c presents
values arises from identifying the correlation among input profile various types of abnormalities in consumer profile and the value of
attributes to theft pattern attributes. Decision making is felicitated rule-base attributes for these profiles is given by Table 3.
by attributes TV and MV, which denotes the time of probable theft The interplay between the above-mentioned attributes gives rise
and power loss magnitude. Variable TV is given four values [E M to various practical scenarios. The preliminary target related to
A N] similar to sectional variance calculation, computed over the these cases were identified based on simultaneous values of the
NTL pattern. MV is a three-state variable, which represents the cluster, load and sectional variance to the theft pattern attributes as
magnitude of NTL with respect to its acceptable range represented shown in Table 4. It shall be noted that variables interplay is
by Th in (10). If NTL is detected below the threshold Th then MV constrained to early-morning (1:16 samples) theft scenario in the
is set to −1; whereas if NTL is detected above Th it is set to 1 table but the target assignment is done for full day theft detection.
otherwise it remains 0. From Table 4, it is evident that the proposed mechanism considers
theft only when the value of theft attribute TV intersects with
secLowload variable, other cases are considered as an anomaly,
low load, and high load. Such a case represents a practical scenario
where the consumer profile shows a steep decrease in profile
simultaneous to increased NTL values which assures the profile to
be a probable suspect for theft. The severity of theft of input profile
depends upon its clusterVar and loadVar attributes. If clusterVar is
2 and lowMag is 1 then it is considered as high theft; if loadVar is 1
and lowLoad is 1 is considered as medium theft. The culmination
of rule-base and reflecting it on to the decision tree classification
model is the conclusion to the training phase of the proposed
mechanism.

3.1.5 Classification: The generated rule-based model is converted


into a decision tree for classification of input profile. The
efficiency of classification is evaluated using two metrics: (i)
precision and (ii) recall. Precision and recall values portray the
correctness and completeness of the classification method,
respectively, are formulated as shown in (11) and (12). However,
these measures alone are not sufficient to identify the performance
of the algorithm, hence, the F1 score is used that shows the balance
between precision and recall as given by (13), where tp represents
true-positive cases, fp represents false-positive cases and p
represents the total number of positive cases of classification
Fig. 9 Abnormalities in consumer profile (LB, lower bound; UB, upper algorithm
bound)
(a) Normal load profiles of the consumer over one month, (b) RP of consumer, (c) tp
Precision = (7)
Various abnormalities in the consumer profile tp + fp

Table 3 Rule-base attributes for various types of abnormality depiction in Fig. 9c


Load clusterVar highMag lowMag loadVar highLoad lowLoad secVar [VE secHighLoad [HE secLowLoad [LE
profile (CV) (HC) (LC) (LV) (HL) (LL) VM VA VN] HM HA HN] LM LA LN]
P1 1 0 0 1 1 0 [1 1 1 1] [1 0 1 0] [0 0 0 0]
P2 0 0 0 1 0 0 [1 0 0 1] [1 0 0 0] [0 0 0 0]
P3 2 0 1 1 0 1 [1 1 1 1] [0 0 0 0] [1 1 1 1]
P4 1 0 0 1 0 0 [0 0 0 1] [0 0 0 0] [0 0 0 1]
P5 0 0 0 0 0 0 [0 0 0 0] [0 0 0 0] [0 0 0 0]

Table 4 Synergy of distinct rule–base attributes and preliminary targets


Input profile attributes Theft attributes Preliminary target
[CV HC LC] [LV LH LL] [VE VM VA VN] [HE HM HA HN] [LE LM LA LN] TV MV
[0 0 0] [0 0 0] [0 0 0 0] [0 0 0 0] [0 0 0 0] [0 1 0 0] — normal profile
[1 0 0] [0 0 0] [0 0 0 0] [0 0 0 0] [0 0 0 0] [0 1 0 0] — normal profile
[0 0 0] [1 0 0] [1 0 1 0] [1 0 0 0] [0 0 1 0] [0 1 0 0] — anomaly
[1 0 0] [1 1 0] [1 1 0 0] [1 1 0 0] [0 0 0 0] [0 1 0 0] — high load anomaly
[2 1 0] [0 0 0] [0 0 0 0] [0 0 0 0] [0 0 0 0] [0 1 0 0] — data error
[2 1 0] [1 1 0] — — — [0 1 0 0] — high load profile
[1 0 0] [1 1 0] [0 1 0 1] [0 1 0 1] [0 0 0 0] [0 1 0 0] — anomaly
[2 0 1] [1 0 1] [1 0 0 1] [0 0 0 0] [1 0 0 1] [0 1 0 0] 1 anomaly/low load
[2 0 1] [1 0 1] [1 1 1 1] [0 0 0 0] [1 1 1 1] [0 1 0 0] 1 low load/high theft
[2 0 1] [1 0 1] [0 1 0 0] [0 0 0 0] [0 1 0 0] [0 1 0 0] 1 high theft
[1 0 1] [1 0 1] [0 1 0 0] [0 0 0 0] [0 1 0 0] [0 1 0 0] 1 medium theft
[0 0 0] [1 0 0] [0 1 0 0] [0 0 0 0] [0 1 0 0] [0 1 0 0] 1 low theft

618 IET Smart Grid, 2019, Vol. 2 Iss. 4, pp. 612-624


This is an open access article published by the IET under the Creative Commons Attribution License
(http://creativecommons.org/licenses/by/3.0/)
tp
Recall = (8)
p

Precision × Recall
F1Score = 2 × (9)
Precision + Recall

The probability of incorrect rejection of the null hypothesis is


popularly evaluated using FPR which is given by (14) and the
accuracy of the classification algorithm is given by (15)

fp
FPR = (10)
fp + tn

tp + tn
Accuracy = (11)
tp + tn + fp + fn

Here, tp, tn, fp, and fn represent true-positive, true-negative, false-


positive, and false-negative, respectively.

3.2 Implementation phase


One-year data of 171 consumers at 15 min resolution has been split
into 60 and 40% for training and testing phase, respectively.
Additionally, the training phase is marshalled using k-fold
validation to avoid any miscalculations and evade over training.
Subsequently, decision-tree classification is trained using the
extracted rule-base. The remaining 40% of data is used for
implementation or testing phase of the proposed model. The major
constituent of the implementation phase is theft validation and data
post-processing, which includes updating of consumer features
such as upper–lower bounds and cluster tag and upgrade rule-base
for any unseen abnormality.
Fig. 10 represents the flow diagram of the implementation
phase of the proposed mechanism. Initially, the transmitted power
and aggregated power of all consumers are used to identify theft
pattern, theft location, and severity of theft using (10). If any
probable NTL is identified, the theft attributes such as TV and MV Fig. 10 Algorithmic flow of implementation phase of the proposed
are computed. This information is used to select appropriate mechanism
clusters and a consumer under these clusters are examined first.
The load profile analysis block then computes the value of various processing to ensure apt information interpretation and integration.
rule-base attributes such as classVar, highMag, lowMag, secVar, It is used to update the model for non-malicious changes in
secHighLoad, and secLowLoad. The outputs of this block are consumer profile which could happen due to seasonality, change of
cascaded as input to the classification block which employs appliance or change in the number of people. In other words, data-
decision tree classification to spit out the preliminary target post processing ensures that variation in normal profiles due to
associated with the scrutinised profile or input profile. The output seasonality and trend are incorporated, whereas abnormalities in
of the classification transformed to achieve a preliminary the profile are avoided while updating consumer features for the
suspicious consumer list. However, the classification block may next detection cycle. Information filtering and interpretation are
end up giving ambiguous and fallacious output aiding to FPR. This partially taken care off by the validation block, whereas
makes the importance of cross-validation using validation block information integration ensures proper recasting of rule-base
paramount. attributes, upper–lower bounds, cluster tag, STP and CWR for the
next theft detection cycles. The updating of historical data can be
achieved either on a daily or weekly basis depending upon the type
3.2.1 Validation: The salient feature of the proposed methodology
of change in consumer profile. The proposed mechanism suggests
is the validation of suspicious consumer using the predicted profile
the following update rule in the form of cases as described below:
and aggregated power loss profile or theft profile. The prediction of
suspicious consumer's normal profile is achieved using S-ARIMA
Case 1. If the consumer load profile classified as ‘normal load’
forecasting to counter the seasonality and trend exhibited by
then the new profile is used in updating the database on a daily
profile. Now each of the suspicious consumer profile is replaced by
basis.
its predicted profile to find out the aggregated power of all the
consumers that are further used to identify the NTL using (10). If Case 2. If the consumer load profile is classified as ‘high load’, the
the NTL pattern obtained is having lesser magnitude than the profile needs to be updated sub weekly or weekly depending on
previous theft pattern, the sensitivity of magnitude decrements utility requirements.
owing to each consumer predicted profile is computed to determine Case 3. If the consumer profile is classified as ‘anomaly’ then the
finalised suspicious consumer list with sensitivity values from the consumer profile is observed over a week before being updated.
pre-identified preliminary consumer list. Moreover, the theft Case 4. If the consumer load profile is classified as ‘data error’, the
priority is decided with the value of CWR and variation in STP utility needs to alarm and recasting of the data base is avoided.
over a period of a week. The value of CWR depends upon previous Case 5. If the consumer is classified under any kind of theft, utility
theft scenario detected for this consumer. Validation process on a needs to be alarmed with priority value and updating of the profile
classification output makes sure that chances of the wrong is paused till any manual interruption or inspection by the utility.
classification are reduced and FPR is improved.
4 Results and inferences
3.2.2 Data-post processing: Data-post processing may be
This study executes the theft detection algorithm on the data
divided into data filtering, information interpretation, and
comprising 171 consumers of Nana Kajaliyala village and the
information integration. The proposed methodology uses data post-

IET Smart Grid, 2019, Vol. 2 Iss. 4, pp. 612-624 619


This is an open access article published by the IET under the Creative Commons Attribution License
(http://creativecommons.org/licenses/by/3.0/)
Fig. 11 Portrayal of classified preliminary target and associated sectional windows a consumer profile for month September 2017
(a) Normal load profiles, (b) Anomaly, (c) High load profile, (d) Theft possibility, (e) Theft possibility, (f) Theft possibility, (g) Theft and anomaly, (h) Theft and anomaly, (i) Low
load profile

distribution transformer data to identify various abnormalities and vi. Fig. 11i shows the high magnitude of section and cluster
theft scenario; and to achieve a list of theft probable consumer. variance. Additionally, the mean deviation from upper–lower
This algorithm effectively enacts over a one year data of all the bounds and RP are also beyond the thresholds in all sections
consumers to detect their sectional wise abnormal behaviour. Note which are classified as high theft or low load profile. Such a
that Fig. 11 represents sectional classification identified as a result target suggests that future scrutiny is required.
of the algorithm executed in September 2017 data of a consumer.
The upper–lower bounds highlighted are the tolerance belt where However, evaluation methods for theft detection algorithms are
the consumer profile was anticipated as a result of historical data as highly reliant on the availability of data, sampling rate, and privacy
explained in the training phase. It also involves highlighted concerns.
sections consisting of abnormalities, the consumer RP, and
classification with respect to each section. It is evident from Fig. 11 4.1 Evaluation
that various classified abnormalities are identified in the consumer
profile in real-time as listed: There are mainly two evaluation approaches based on the
availability of theft or risk dataset for training algorithm. The
i. Fig. 11a shows the normal detected profiles which do not or evaluation approach when theft data is known uses accuracy and
marginally violates the upper-lower bounds, hence no sectional FPR measure for the known theft data, whereas the evaluation
abnormalities were detected. approach uses accuracy and FPR measure based on synthesised
ii. Figs. 11b and c represent high load anomaly in section (A) and abnormal dataset when the theft or risk data is unknown. The
(E, M), respectively, which may or may not entertain the utility proposed methodology utilises Nana Kajaliyala village dataset of
in theft detection but could be crucial in demand–response or 171 consumers for theft detection, it shall be noted that the dataset
operation level analysis. lacks validated theft data. Owing to the absence of knowledge
iii. Figs. 11d and e unveil consumer preliminary theft about theft scenarios in the dataset, a mixed approach for
classification in section (N) and (E, M), respectively. evaluation is adapted. It uses accuracy and FPR measure on
Moreover, the theft magnitude identified using the attributes synthesised data as well as evaluates the algorithm if it correctly
were suggested as low and medium, respectively. classifies normal dataset on actual time series as shown in Fig. 11
and stores the priority risk consumer list to aid analytic results for
iv. Fig. 11f is detected as high theft owing sectional variance in
the utility usage. A mixture of synthesised theft profiles,
(M, A, N) sections as well as the input profile shows high
synthesised abnormal profiles, and actual normal profiles of several
magnitude deviation from RP and its identified cluster as is
consumers is fed to the proposed model during the testing phase to
evident from the figure.
assess the performance of the theft detection algorithm. The
v. Figs. 11g and h highlight the consumer profiles that exhibit shaping of synthesised data that imitates the real-world theft
multiple types of abnormalities in the same profile. For scenario is very tricky. The construction of theft data and various
instance, Fig. 11h represents a profile that had medium theft abnormal data can be devised as an output of the listed functions.
and high abnormality detected in the sections (A, N) and (E),
respectively.

620 IET Smart Grid, 2019, Vol. 2 Iss. 4, pp. 612-624


This is an open access article published by the IET under the Creative Commons Attribution License
(http://creativecommons.org/licenses/by/3.0/)
i. All zero reading: f 1(IPt) = 0; tε(1, . . . , ts).
ii. Low consumption: f 2(IPt) = IPt × rt; tε(1, . . . , ts). Here,
rt = random(0, 0.25).
iii. High consumption: f 3(IPt) = IPt × rt; tε(1, . . . , ts). Here,
rt = random(1.0, 1.75).
iv. Theft data: f 4(IPt) = IPt × rk; tε(t1, . . . , t2); t1 < t2 ≤ ts.
Here, rk = random(0, 0.25); kε(1, . . . , (t2 − t1)).
v. Anomaly: f 5(IPt) = IPt × rt; tε(1, . . . , ts). Here,
rt = random(0.0, max (IPt)).
vi. Noise: f 6(IP) = awgn(IP).

The theft severity of generated abnormal profiles can be computed


as
Fig. 12 Impression of synthetic profile generation for a consumer
s = IP′t − IPt (12)
Table 5 Performance of model for different types of
Here, IP represents an input profile, IP′ represents generated abnormality in the load profile
abnormal profile and ts represents the number of samples in the Theft Abnormality type
input profile. Function random() is used to generate random severity, s Ephemeral Sub- Sectional (E, Trend
numbers to ensure fairness of the abnormality generation and (1–3 interval) sectional M, A, N)
function awgn() is used to add white Gaussian noise to IP. Fig. 12 (3–15
shows various synthetic profiles generated based on a consumer's interval)
RP. It shall be noted that synthetic data includes a mix of noise, s ≤ 2σ 0.27 0.35 0.38 0.21
abnormality, theft as well as normal profiles. The issue of data 2σ < s ≤ 3σ 0.31 0.36 0.45 0.29
imbalance for theft data was resolved using generative adversarial 3σ < s ≤ 4σ 0.65 0.72 0.82 0.40
model (GAN) [42].
s > 4σ 0.72 0.89 0.96 0.59
A generative adversarial network is a form of the neural
network used to generate additional data related to the input random 0.54 0.68 0.79 0.48
dataset. It finds the patterns in input dataset thus generates data that
is quite similar to the input dataset. It is mainly used to solve data
imbalance problem when one has relatively less data for a scenario Table 6 Performance of model on the different ratio of
or a class and additional data is required to train the classification abnormality
model effectively. It consists of simultaneous training of two Performance metric Abnormality in dataset
models, a generative model and a discriminative model. A 10% 20% 30% 40%
generative model captures data distribution and generates data precision 0.87 0.85 0.84 0.80
whereas a discriminative model estimates the probability of how recall 0.98 0.97 0.97 0.96
dissimilar it looks like original data. The discriminative model
F1 score 0.92 0.90 0.90 0.87
gives a loss function which then used to update the generative
model. In this study, a variant of GAN knows as a conditional FPR 0.10 0.15 0.18 0.20
generative adversarial network (CGAN) [43] is used to generate accuracy 0.92 0.91 0.89 0.85
synthetic data for various types of abnormalities in the load profile.
The CGAN accepts input data with class labels.
mimic situations where thefts are taking place during industry or
4.2 Performance comparison office start and close hours.
• Trend abnormalities: These are gradual abnormalities where
The performance of the algorithm is analysed for a different loads are bypassed from metering gradually over a period of few
proportion of abnormality in consumer dataset. However, to avoid days. These abnormalities are formulated to mimic a situation
the imbalance of theft data and generate more abnormal dataset a where small changes (theft) are applied over the long term to
GAN [42] is used prior to performance evaluation. Table 5 depicts deviate from the normal pattern without being noticed.
the algorithm performance when subjected to various types of
abnormalities varying with respect to severity level as suggested in Note that these are synthetic abnormalities formulated manually to
(12). Here, σ represents the standard deviation from regular load depict various scenarios and are proliferated using GAN for
profile or RP. Types of abnormalities considered in Table 5 can be analysis. Some notable findings from Table 5 are listed as below:
categorised into ephemeral, sub-sectional, sectional, and trend. The
explanation to such categorisation can be expanded as follows: • The algorithm performs with considerable accuracy and
consistency for sectional and sub-sectional abnormalities.
• Ephemeral abnormalities: Abnormality for very small interval • Performance of the algorithm generally improves for
such as (1–3) intervals of various magnitudes are considered as abnormalities of magnitude higher than thrice of standard
ephemeral abnormalities. This are short lived abnormalities deviation from the regular profile. (≥ 3σ).
formulated to mimic small changes; covering practical scenarios
• The algorithm finds it difficult to detect ephemeral abnormalities
where defaulter changes meter reading for short spans.
of less magnitude (< 3σ).
• Sub-sectional abnormalities: Abnormality seen in single
• Performance of algorithm dips when trend abnormalities of
sections (E, M, A or N) of a day for (3–15) intervals with
magnitude less than thrice of standard deviation (< 3σ) are
various magnitudes. These abnormalities are formulated to
mimic a scenario where defaulter bypass load for a certain subject.
duration in a section. (Assumption: defaulter has knowledge of • Overall performance of the algorithm is generally upright
the sectional window of theft detection algorithm). barring ephemeral abnormalities and trend abnormalities (< 3σ).
• Sectional: Abnormality observed across the sections (E, M, A or
N) or across a combination of sections (EM, EA, MA etc.) with Table 6 shows the performance matrix of the algorithm that
various magnitudes. These abnormalities are formulated to comprises precision (P), recall (R), accuracy, FPR, and widely
accepted F1 score. The performance evaluation parameters are
equated with varying synthetic data to actual data ratio as shown in
IET Smart Grid, 2019, Vol. 2 Iss. 4, pp. 612-624 621
This is an open access article published by the IET under the Creative Commons Attribution License
(http://creativecommons.org/licenses/by/3.0/)
Table 7 Comparison of the various theft detection mechanisms
Scheme Technique DR FPR Adaptive Theft/anomaly Data storage and Privacy
difference processing cost preservation
[31] fuzzy clustering and 74.5 — no no medium strong
classification
[28] SVM 94 medium yes no high medium
[27] decision tree and SVM 92.50 low yes no medium weak
[19] ELM 70.53 — no no medium weak
[21] P2P computing 100 — yes — — strong
proposed method clustering, rule-base, and 97 low yes yes low weak
decision-tree classification

Table 6. From the table, it is evident that the precision of model


reduces moderately with an increase in synthetic or abnormal data
to actual data ratio but the recall or DR do not show a significant
variation and remains competitively comparable.
Table 7 depicts the comparison among various theft detection
mechanisms. The comparison parameters such as DR, FPR etc. are
tested on the Nana Kajaliyala village dataset. It shall be noted that
the proposed methodology has comparable accuracy and reduced
FPR in addition to the other advantage such as being adaptive
towards non-malicious changes in consumer profile. Moreover, it
uses theft attributes such as TV and MV, which depicts theft
location and magnitude, respectively, and their intersection with
the sectional attributes of input profile defines theft and anomaly
cases separately. The adaptiveness of the proposed methodology is
achieved owing to a step of updating the rule base as explained in
Section 3.1.4. Furthermore, the proposed methodology aims at
validating the preliminary achieved classification targets using the
validation block to ensure the righteousness of the detection.
Moreover, validation block turns up as a novel outcome of
Fig. 13 Approximate theft scenario on 1 September 2017 proposed methodology having multiple advantages to the utility as
indexed below:

i. It ensures accurate simulation results employed for planning


and simulation.
ii. It provides an empirical approximate of TL that the simulation
results may show in the real-time practical scenario as shown
in Fig. 13.
iii. It administers a forecast of consumer consumption pattern and
also provides the utility with double checked probable theft
consumer's priority order for field scrutiny.

It shall be noted that all other techniques compared in Table 7, lack


any kind of cross-validation block to distinguish between theft and
anomaly. The results discussed in Table 7 clearly depict the
advantages attained due to the novel validation approach of the
proposed methodology, which is depicted in Fig. 14.
A demographic of approximate theft scenario in the dataset
Fig. 14 Correlative analysis of theft detection and validation over a used is shown in Fig. 14, which shows the theft scenario detected
period of six month before and after validation on a real dataset from July 2017 to
December 2017. Fig. 15 shows the sectional abnormality, depicted
by Venn diagram focusing on the percentage of theft occurred
during each section of the day. It can be clearly manifested that
most of the theft scenarios are identified during the morning and
evening sections. It is notable that the algorithm is evaluated only
using the synthesised data hence, it would be advisable to
implement such an algorithm in practical systems once it gets
trained and validated using the real theft scenario or proved theft
scenario, which was out of the scope of this study.

4.3 Discussion, recommendation, and future scope


Theft detection can be a highly gullible asset for an electrical utility
owing to numerous uncertainty in power consumption at the
distribution end. Even various well intended circumstance can lead
to a huge variation in the consumer load pattern, which may result
in increased FPR. Such variation may be encroaching, gradual,
sudden, short-lived, periodic or permanent. For instance, addition
Fig. 15 Sectional theft scenario of database or repairing of a major load, house parties, circuit break down,
festivals, weekend, unexpected seasonal change may be the cause

622 IET Smart Grid, 2019, Vol. 2 Iss. 4, pp. 612-624


This is an open access article published by the IET under the Creative Commons Attribution License
(http://creativecommons.org/licenses/by/3.0/)
of such variations. This study endeavours to reduce such increment scrutinised using GAN generated synthetic data to analyse the
in FPR owing to non-malicious variation by utilising a two-step robustness of the algorithm. This study proposes a forecasting-
classification–validation algorithm and multi-level updating of the based validation block, which ensures liberalism towards non-
dataset. Validation block is devised to take care of sudden, short- malicious activity and stern focus on malicious activity.
lived, unexpected seasonal, and permanent variation, it is supported Subsequently, data post-processing makes the algorithm adaptive
by timely data updating block, which is carefully formulated to towards non-malicious changes in consumer profile. The algorithm
handle gradual, encroaching and permanent non-malicious performs comparable with most of the popular theft detection
variations. The sectional attributes approach safeguards a clear techniques and provides novel features that may prove vital for the
bifurcation between anomaly and theft using the theft pattern utility in implementation and planning of demand response and
sectional overlap knowledge. Moreover, the data pre-processing demand side management. In accuracy of the detection can be
step to derive RP for multi-profile consumer ensures that FPR is further improved using the real theft data for training and metadata
not affected due to the switching of consumer profiles. for accurate computation of aggregate technical and commercial
It is noticeable from the demographic results that about 23% of losses.
profiles are detected to have abnormality over a period of six
months in Nana Kajaliyala village, out of which 57% were theft or 6 Acknowledgments
low load anomaly while the rest were high load abnormality.
Hence, the study believes that the dataset consists of a mix of This work was supported by Department of Science and
malicious and non-malicious consumers but would recommend Technology (DST), Government of India under grant no.
validation and scrutiny of the results with detailed metadata inputs. DST/INT/UK/P-138/2016. We greatly acknowledge Gujarat Urja
Additionally, the proposed methodology may exhibit improved Vikas Nigam Ltd (GUVNL) for providing consumers energy meter
accuracy and pose various open-ended future scope with each dataset.
increasing meta input such as network topology, exact TL,
appliance or sub-circuit data and most importantly real or practical 7 References
risk dataset. This study strongly advocates the use of practical risk
[1] ‘Weo-2017 special report’. International Energy Agency, 2017. Available at
data instead of synthetic data because it is nearly unmanageable to https://www.gogla.org/sites/default/files/resource_do /
imitate real-world theft exactly using the synthetic dataset even weo2017specialreport_energyaccessoutlook.pdf
after utilising advanced GAN algorithm. This study recommends [2] Zhang, F.: ‘In the dark: how much do power sector distortions cost south
the following ideas which may be effective in the reduction of Asia?’ (South Asia Development Forum, Washington DC, 2018)
[3] Golden, M., Min, B.: ‘Theft and loss of electricity in an Indian state’,
AT&C losses such as International Growth Centre, London School of Economics and Political
Science, 2012
• Frequent estimation of meter efficiency and collection [4] ‘Power theft of Rs 201 crore caught last year’. Daily News & Analysis, 2013.
efficiency. Available at https://www.dnaindia.com/ahmedabad/report-power-theft-of-
rs-201-crore-caught-last-year-1837071
• Statically evaluation of metered data can help the utility to better [5] McLaughlin, S., Holbert, B., Fawaz, A., et al.: ‘A multi-sensor energy theft
plan operation and stay up-to-date with high risk consumer list. detection framework for advanced metering infrastructures’, IEEE J. Sel.
• Geographic (sector wise) monitoring of an electrical network, Areas Commun., 2013, 31, (7), pp. 1319–1330
[6] Ahmad, T.: ‘Non-technical loss analysis and prevention using smart meters’,
can improve the efficiency of any cognitive or decision-making Renew. Sustain. Energy Rev., 2017, 72, pp. 573–589
algorithm. [7] Jiang, R., Lu, R., Wang, Y., et al.: ‘Energy-theft detection issues for advanced
• Integration of AMI with billing block can prove decisive in metering infrastructure in smart grid’, Tsinghua Sci. Technol., 2014, 19, (2),
pp. 105–120
identifying any type of cyber or data attack. [8] Lo, C.H., Ansari, N.: ‘CONSUMER: a novel hybrid intrusion detection
• Spreading consumer awareness regarding punishment and system for distribution networks in smart grid’, IEEE Trans. Emerging Top.
incentives scheme on participating in a theft and reporting any Comput., 2013, 1, (1), pp. 33–44
theft, respectively. [9] Yerra, R.V.P., Bharathi, A.K., Rajalakshmi, P., et al.: ‘WSN based power
monitoring in smart grids’. 2011 Seventh Int. Conf. on Intelligent Sensors,
• Prioritise theft detection employments. Sensor Networks and Information Processing (ISSNIP), Adelaide, Australia,
• Plan regular on-site scrutiny for any malicious activity. 2011, pp. 401–406
[10] Khoo, B., Cheng, Y.: ‘Using RFID for anti-theft in a Chinese electrical supply
company: a cost-benefit analysis’. Wireless Telecommunications Symp.
In spite of rather interesting learning from the proposed (WTS), New York City, USA, 2011, pp. 1–6
methodology, we are working on up-gradations of feasibility and [11] Xiao, Z., Xiao, Y., Du, D.C.: ‘Non-repudiation in neighborhood area networks
implementation aspects of the proposed solution. The major for smart grid’, IEEE Commun. Mag., 2013, 51, (1), pp. 18–26
aspects that we are focusing on are consumer privacy concerns and [12] Amin, S., Schwartz, G.A., Tembine, H.: ‘Incentives and security in electricity
distribution networks’. Int. Conf. on Decision and Game Theory for Security,
generalisation of results for publicly available data sets. Berlin, Germany, 2012, pp. 264–280
Furthermore, we expect improvement in result by implementing [13] Cárdenas, A.A., Amin, S., Schwartz, G., et al.: ‘A game theory model for
the state-of-the-art hyper-parameter tuning techniques evolving in electricity theft detection and privacy-aware control in AMI systems’. 2012
deep learning. Moreover, the proposed methodology is inefficient 50th Annual Allerton Conf. on Communication, Control, and Computing
(Allerton), Monticello, USA, 2012, pp. 1830–1837
to identify the various types of thefts such as data theft, physical [14] Amin, S., Schwartz, G.A., Cardenas, A.A., et al.: ‘Game-theoretic models of
theft or cyber thefts. We expect that the proposed methodology can electricity theft detection in smart utility networks: providing new capabilities
be modified by using available metadata to achieve the further with advanced metering infrastructure’, IEEE Control Syst., 2015, 35, (1), pp.
classification of thefts. 66–81
[15] Depuru, S.S.S.R., Wang, L., Devabhaktuni, V., et al.: ‘A hybrid neural
network model and encoding technique for enhanced classification of energy
5 Conclusions consumption data’. 2011 IEEE Power and Energy Society General Meeting,
San Diego, USA, 2011, pp. 1–8
This study presents a theft and anomaly detection approach based [16] Nagi, J., Yap, K.S., Tiong, S.K., et al.: ‘Nontechnical loss detection for
on distribution transformer and real consumers smart meter dataset. metered customers in power utility using support vector machines’, IEEE
Trans. Power Deliv., 2010, 25, (2), pp. 1162–1171
It tries to mitigate the privacy concern by considering low [17] Nagi, J., Yap, K.S., Tiong, S.K., et al.: ‘Improving SVM-based nontechnical
sampling rates and avoid any type of metadata other than consumer loss detection in power utility using the fuzzy inference system’, IEEE Trans.
power patterns. Moreover, the methodology presented avoids and Power Deliv., 2011, 26, (2), pp. 1284–1285
communication layer and is a one point computation which makes [18] Depuru, S.S.S.R., Wang, L., Devabhaktuni, V.: ‘Support vector machine based
data classification for detection of electricity theft’. 2011 IEEE/PES Power
it robust to any kind of communication related privacy concern. It Systems Conf. and Exposition (PSCE), Phoenix, USA, 2011, pp. 1–8
suggests a unique data pre-processing that consist of outlier [19] Nizar, A., Dong, Z., Wang, Y.: ‘Power utility nontechnical loss analysis with
identifications, RP, and upper–lower bounds extraction. The extreme learning machine method’, IEEE Trans. Power Syst., 2008, 23, (3),
hierarchical clustering and decision-tree-based approach exhibit an pp. 946–955
[20] Nizar, A., Dong, Z.: ‘Identification and detection of electricity customer
accurate classification of abnormalities into several classes which behaviour irregularities’. IEEE/PES Power Systems Conf. and Exposition,
are representatives of theft severity, theft pattern, and non- 2009. PSCE'09, Seattle, USA, 2009, pp. 1–10
malicious abnormalities. Moreover, the detection algorithm is

IET Smart Grid, 2019, Vol. 2 Iss. 4, pp. 612-624 623


This is an open access article published by the IET under the Creative Commons Attribution License
(http://creativecommons.org/licenses/by/3.0/)
[21] Salinas, S., Li, M., Li, P.: ‘Privacy-preserving energy theft detection in smart [32] Hu, T., Guo, Q., Shen, X., et al.: ‘Utilizing unlabeled data to detect electricity
grids: A p2p computing approach’, IEEE J. Sel. Areas Commun., 2013, 31, fraud in AMI: a semisupervised deep learning approach’, IEEE Trans. Neural
(9), pp. 257–267 Netw. Learn. Syst., 2019, to appear
[22] Ruzzelli, A.G., Nicolas, C., Schoofs, A., et al.: ‘Real-time recognition and [33] Hodge, V., Austin, J.: ‘A survey of outlier detection methodologies’, Artif.
profiling of appliances through a single electricity sensor’. 2010 7th Annual Intell. Rev., 2004, 22, (2), pp. 85–126
IEEE Communications Society Conf. on Sensor, Mesh and Ad Hoc [34] Brockwell, P.J., Davis, R.A., Calder, M.V.: ‘Introduction to time series and
Communications and Networks (SECON), Boston, USA, 2010, pp. 1–9 forecasting’, vol. 2, (Springer, New York City, 2002)
[23] ‘Privacy and the new energy infrastructure’, 2009. Available at http:// [35] Savitzky, A., Golay, M.J.: ‘Smoothing and differentiation of data by
ssrn.com/paper=1370731 simplified least squares procedures’, Anal. Chem., 1964, 36, (8), pp. 1627–
[24] Li, F., Luo, B., Liu, P.: ‘Secure information aggregation for smart grids using 1639
homomorphic encryption’. 2010 First IEEE Int. Conf. on Smart Grid [36] Tibshirani, R., Walther, G., Hastie, T.: ‘Estimating the number of clusters in a
Communications, Gaithersburg, USA, 2010, pp. 327–332 data set via the gap statistic’, J. R. Statist. Soc. B, Statist. Methodol., 2001, 63,
[25] Choksi, K.A., Jain, S.K.: ‘Novel computational-index as a representative (2), pp. 411–423
feature for non-intrusive load monitoring’. 2018 IEEE Int. Conf. on [37] Szekely, G.J., Rizzo, M.L.: ‘Hierarchical clustering via joint between-within
Information Communication and Signal Processing (ICICSP), Singapore, distances: extending ward's minimum variance method’, J. Classif., 2005, 22,
2018, pp. 44–48 (2), pp. 151–183
[26] Choksi, K.A., Jain, S.K.: ‘Pattern matrix and decision tree based technique for [38] Nikovski, D.N., Wang, Z.: ‘Method for detecting power theft in a power
non-intrusive monitoring of home appliances’. 2017 7th Int. Conf. on Power distribution system’. US Patent 9,945,889, Google Patents, 2018
Systems (ICPS), Pune, India, 2017, pp. 824–829 [39] Méffe, C.O.N.K.A., Cavaretti, S.J.S.C.J.: ‘A new method for the computation
[27] Jindal, A., Dua, A., Kaur, K., et al.: ‘Decision tree and SVM-based data of technical losses in electrical power distribution systems’. Electricity
analytics for theft detection in smart grid’, IEEE Trans. Ind. Inf., 2016, 12, Distribution, Amsterdam, The Netherlands, 2001
(3), pp. 1005–1016 [40] Meffe, A., de Oliveira, C.C.B.: ‘Technical loss calculation by distribution
[28] Jokar, P., Arianpoo, N., Leung, V.C., et al.: ‘Electricity theft detection in AMI system segment with corrections from measurements’. Proc. IET Conf. and
using customers’ consumption patterns’, IEEE Trans. Smart Grid, 2016, 7, Exhibition Electrical Distribution, Prague, Czech Republic, 2009, pp. 1–4
(1), pp. 216–226 [41] Rao, P.N., Deekshit, R.: ‘Energy loss estimation in distribution feeders’, IEEE
[29] Zheng, K., Chen, Q., Wang, Y., et al.: ‘A novel combined data-driven Trans. Power Deliv., 2006, 21, (3), pp. 1092–1100
approach for electricity theft detection’, IEEE Trans. Ind. Inf., 2018, 15, (3), [42] Mirza, M., Osindero, S.: ‘Conditional generative adversarial nets’, arXiv
pp. 1809–1819 preprint arXiv:14111784, 2014
[30] Júnior, L.A.P., Ramos, C.C.O., Rodrigues, D., et al.: ‘Unsupervised non- [43] Goodfellow, I., Pouget-Abadie, J., Mirza, M., et al.: ‘Generative adversarial
technical losses identification through optimum-path forest’, Electr. Power nets’. Advances in Neural Information Processing Systems, Montreal,
Syst. Res., 2016, 140, pp. 413–423 Canada, 2014, pp. 2672–2680
[31] Angelos, E.W.S., Saavedra, O.R., Cortés, O.A.C., et al.: ‘Detection and
identification of abnormalities in customer consumptions in power
distribution systems’, IEEE Trans. Power Deliv., 2011, 26, (4), pp. 2436–2442

624 IET Smart Grid, 2019, Vol. 2 Iss. 4, pp. 612-624


This is an open access article published by the IET under the Creative Commons Attribution License
(http://creativecommons.org/licenses/by/3.0/)

You might also like