0% found this document useful (0 votes)
10 views

Fight For Code 2

This paper proposes a new technique for electricity theft detection that uses adaptive synthesis to handle imbalanced data and a deep siamese network combining CNN and LSTM to extract and analyze features from consumer electricity consumption profiles. The method aims to improve on existing approaches that have low detection rates due to class imbalance or use of inappropriate classifiers.

Uploaded by

Saddam Ansari
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

Fight For Code 2

This paper proposes a new technique for electricity theft detection that uses adaptive synthesis to handle imbalanced data and a deep siamese network combining CNN and LSTM to extract and analyze features from consumer electricity consumption profiles. The method aims to improve on existing approaches that have low detection rates due to class imbalance or use of inappropriate classifiers.

Uploaded by

Saddam Ansari
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Journal of Parallel and Distributed Computing 153 (2021) 44–52

Contents lists available at ScienceDirect

J. Parallel Distrib. Comput.


journal homepage: www.elsevier.com/locate/jpdc

An adaptive synthesis to handle imbalanced big data with deep


siamese network for electricity theft detection in smart grids

Nadeem Javaid , Naeem Jan, Muhammad Umar Javed
Department of Computer Science, COMSATS University Islamabad, Islamabad 44000, Pakistan

article info a b s t r a c t

Article history: The bi-directional flow of energy and information in the smart grid makes it possible to record and
Received 29 March 2020 analyze the electricity consumption profiles of consumers. Because of the increasing rate of inflation
Received in revised form 24 October 2020 over the past few years, people started looking for means to use electricity illegally, termed as
Accepted 7 March 2021
electricity theft. Many data analytics techniques are proposed in the literature for electricity theft
Available online 23 March 2021
detection (ETD). These techniques help in the detection of suspected illegal consumers. However, the
Keywords: existing approaches have a low ETD rate either due to improper handling of the imbalanced class
Big data analytics problem in a dataset or the selection of inappropriate classifier. In this paper, a robust big data analytics
Adaptive synthesis technique is proposed to resolve the aforementioned concerns. Firstly, adaptive synthesis (ADASYN)
Electricity theft detection is applied to handle the imbalanced class problem of data. Secondly convolutional neural network
Deep learning
(CNN) and long-short term memory (LSTM) integrated deep siamese network (DSN) are proposed to
Deep Siamese network
discriminate the features of both honest and fraudulent consumers. Specifically, the task of feature
extraction from weekly energy consumption profiles is handed over to the CNN module while the
LSTM module performs the sequence learning. Finally, the DSN contemplates on the shared features
provided by the CNN-LSTM and applies final judgment. The data analytics is performed on different
train–test ratios of the real-time smart meters’ data. The simulation results validate the proposed
model’s effectiveness in terms of high area under the curve, F1 -Score, precision and recall.
© 2021 Elsevier Inc. All rights reserved.

1. Introduction the benefits of bi-directional communication and known as smart


grids (SGs) [3,4]. The roll-out of advanced metering infrastruc-
Going with the United Nation’s 2030 vision, ‘‘electricity for ture (AMI) in the SG makes it possible to provide the real time
all’’, is the major objective of all countries.1 Both developed and and fine-tuned measurements to the utilities. The addition of
developing countries are striving to add maximum amount of communication layer to traditional metering establishes a bridge
electricity to the national grid. While the power authorities strug- between consumers and utility [5]. Although, numerous benefits
gle to ensure efficient power distribution to every household, are provided by the AMI, however, the power systems became
the energy theft became a hurdle in this endeavor. According to more exposed to cyber attacks due to the addition of this extra
a report, a loss of approximately 100 million Canadian dollars layer [6]. In contrast, the traditional meters are only vulnerable to
per year is revealed due to electricity theft that is equal to the physical tampering. In this paper, the fraud committed by either
amount of electricity required to power around 77000 homes for utilities or feeders is beyond the scope and focus is on detecting
a year [1]. The yearly loss in revenue caused by the electricity irregularities in the electricity consumption of consumers.
theft in America is 6 U.S. dollars. Similarly, the percentage of In a SG, the transmission and distribution of power include
electricity loss caused due to theft is 0.5% to 25% in Brazil, 3.5% both technical losses (TLs) and non-technical losses (NTLs). The
in Philippines and up to 1% in United Kingdom. Each year, the former include dissipation of energy due to Joules effect, which
revenue loss due to electricity theft reaches approximately 96 in fact is caused by the emission of electrons due to heat. The
billion U.S. dollars [2] worldwide. assessment of TLs is necessary for accounting NTLs. Electricity
With the advancements made in information and communica- theft is an intended act of illegal usage of electricity, which is a
tion technology, the traditional power grids are now able to grasp major source of NTLs. These losses represent the energy, which
is consumed by the consumers, but not billed. These are also
∗ Corresponding author. known as commercial losses or electricity theft. The main issue
E-mail address: [email protected] (N. Javaid). concerning NTLs is that they cannot be detected precisely. Only
URL: https://www.njavaid.com (N. Javaid). the difference between the dispatched amount of energy from
1 http://www.worldenergyoutlook.org utilities and the bill paid for the consumed energy is calculated.

https://doi.org/10.1016/j.jpdc.2021.03.002
0743-7315/© 2021 Elsevier Inc. All rights reserved.
N. Javaid, N. Jan and M.U. Javed Journal of Parallel and Distributed Computing 153 (2021) 44–52

The reason behind this irregularity is either the illegal use of elec- The research work in [1,14] show that analyzing the electricity
tricity or the occurrence of technical faults [7]. This irregularity consumption patterns of consumers is beneficial in detecting the
falls under one of the two groups: internal fraud and external suspicious consumers. However, after going through the existing
fraud. The former is committed by the employees for achieving literature on the topic of ETD [6]-[11], it is concluded that ETD
financial benefits while the later is perpetrated by the consumers has the following limitations:
for reducing electricity bills. Ultimately, the main goal behind this
irregularity is to hinder the actual electricity consumption and • the models which are applied for ETD do not take care of
consequently achieve financial benefits [8]. proper class balancing,
The vulnerabilities related to NTLs are generally categorized • in many cases, the attachment of special devices is required,
into three classes: physical attacks, cyber attacks and data attacks. • in highly dynamic time series analyses, methods such as
The physical attacks include meter tampering, reverse metering, support vector machine (SVM), random forest (RF), logis-
bypassing the meters by direct supply, double-tapping, washing tic regression (LR), etc., have low ETD rate and high false
out meter display, using bogus meters, encountering loops in positive (F+ ) rate,
terminal blocks and deploying tilted meters [8]. In developing • the deep learning approaches do not discriminate the deci-
countries, the most frequently committed electricity frauds are sive features appropriately and
reverse metering and direct supply [7]. The cyber attacks are • in sequential time series data, the convolution neural net-
launched remotely by intercepting the communication line and work (CNN) and multi-layer perceptron (MLP) do not per-
altering actual readings with malign readings. Whereas, data at- form well. Moreover, CNN fails to provide the exact source
tacks are the fusion of both physical attacks and cyber attacks. The of NTL.
motive behind data attack is to specifically target the recorded In this paper, a robust big data analytics method for electricity
measurements of electricity and adulterate them by fake data theft detection (ETD) in the SG is proposed to better discrim-
injection [8]. inate the fair and fraud consumers on the basis of electricity
In the past, the primary means of detecting power theft was consumption data. The main contributions of this study are as
on-site inspections and manual analytics of electricity consump- under:
tion records. However, these approaches are time consuming and
result in low success rate. Recently, the emergence of information • according to the nature of problem, an enhanced strategy
technology and advancements in machine learning resulted in for data preprocessing is adopted,
more robust solutions. Generally, the solutions to handle NTLs can • to avoid overfitting and to handle class imbalance issue,
be grouped into three categories: hardware based, non-hardware adaptive synthesis (ADASYN) method is used,
based (data-driven) and hybrid of both. Hardware based solutions • CNN and long short term memory (LSTM) are integrated in
involve the deployment of devices on different locations, i.e., sen- a deep siamese network (DSN) in order to learn the key
sors and they mainly deal with the design and architecture of the features and to achieve high ETD rate and
smart meters [9] to achieve high ETD rate. However, they have • the performance metrics such as mean average precision
high operational and maintenance cost of the specialized hard- (mAP) and area under the curve (AUC) are used to better
ware. In contrast, non-hardware based solutions restrain high comprehend the results.
potential due to the low operational and maintenance cost. These
solutions detect the fraud through machine learning algorithms Rest of the paper is organized as follows. The review on
and classifiers. They can further be categorized into state based, various existing electricity theft strategies is given in Section 2.
game theory based and artificial intelligence (AI) based methods. The problem analysis and solutions to the problems are described
The state based methods estimate the aggregated NTLs by in Section 3 and Section 4, respectively. The simulation results are
calculating the TLs of a specific area. These methods calculate the discussed in Section 5. Finally, the paper is concluded in Section 6.
difference between the amount of energy consumed and the cor-
responding invoiced energy. Moreover, different measurements 2. Related work
are estimate like deviation in voltage, power, etc., for detecting
NTLs, which result in high precision and low cost [10]. However, Review on the state of the art ETD solutions is generally
the state based methods only provide the aggregated NTLs and categorized into two groups: hardware based solutions and non-
fail in providing the specific source of the loss. Unlike state hardware based solutions. A comprehensive review on system
estimation based method, in game-theoretic method [11], there is level and data level threats of AMI can be studied in [15,16].
a contest between the utility and the aberrant consumer. The aim
of fraudster consumer is to outmatch the utility. However, the 2.1. Hardware based solutions
game-theoretic methods highly rely on strong estimation for theft
characterization. On the contrary, the AI based methods mainly In hardware based solutions, deployment of special purpose
focus on the patterns of electricity consumption, which are an- hardware and modification to the physical architecture are per-
alyzed through machine learning algorithms. Both classification formed to strengthen the system against vulnerabilities. An iden-
and clustering methods require labeled and un-labeled data in tity based key establishment model is proposed in [9] in order to
order to fetch the aberrant consumers from the pool of massive avoid relying on pairing. The proposed model is based on elliptic
electricity consumption profiles [12]. curve cryptography (ECC), which enhances the performance along
Detecting anomalous patterns from electricity consumption with the mitigation of computational overhead. Using Chebyshev
profiles is a challenging task in the presence of imbalanced class polynomial to access the security features of smart meter, a
distribution problem in data. In real world scenario, the num- power-authenticated key exchange protocol is proposed in [17].
ber of fair electricity consumers are significantly more than the To address the ephemeral security problem, an authentication
thieves, which creates an issue of imbalanced distribution in scheme based on ECC is proposed in [18], which aims to mitigate
dataset. Therefore, it may be considered a special type of anomaly the communication and computational complexity. Although, the
detection. In AI based methods, classifiers mostly result in low hardware based solutions give acceptable results, concentration is
ETD rate, mainly due to the underrepresentation of the minority still focused on data-driven approaches for NTL detection due to
class [13]. the following reasons [19]:
45
N. Javaid, N. Jan and M.U. Javed Journal of Parallel and Distributed Computing 153 (2021) 44–52

• high deployment and maintenance cost due to specialized show asymmetric behavior. This assumption leads to scrutinize
metering hardware, and analyze the electricity consumption patterns of consumers,
• negative benefit–cost ratio (BCR), i.e., the cost outweighs the which violate the uniform control limit.
benefits, However, it is a challenging and an arduous task to capture
• failure in detecting specific source of NTL and the dynamic changes in time series due to the following reasons:
• vulnerability of specialized meter hardware in extreme (1) due to the imbalanced nature of dataset, the distribution
weather conditions. is skewed towards the dominating class and consequently, the
classifiers do not discriminate the decision boundary. Hence, the
2.2. Non-hardware based solutions classifier tends to overfit [1],

In contrast to the hardware based solutions, the data-driven (2) the energy consumption data mostly consists of missing val-
approach surges more rapidly in detecting NTLs. In [2], a two- ues and outliers. The smoothing spline can detect the outliers,
fold machine learning technique is adopted to minimize the however, it is difficult to capture the true continuity. The selec-
ratio of misclassified instances. In the first step, the maximum tion of thresholds (knots) and their location are two big chal-
information coefficient (MIC) determines the correlation between lenges. Moreover, by increasing the degree from a certain thresh-
the suspicions and the consumption profiles. In the second step, old, the chances of misclassification increase. Hence, the suspi-
clustering is performed to find the density peaks. Similarly in [20], cious consumers can be misclassified. As shown in Fig. 2(a), the
clustering is used to extract a prototype from consumption pat- consumption of a fraudster consumer shows unusual activity,
terns. The unseen data samples are categorized by a distance- which is normalized by the smoothing spline [13,22],
measurer; the instance with significant distance is considered as (3) extracting decisive features from a highly dynamic sequential
malign. In contrast, the work performed in [8,21], use a super- time series is significant, which traditional CNN lacks [1].
vised learning approach to handle ETD through relative entropy
and gradient boosting classifiers (GBCs). A hybrid of MLP and (4) in literature, most of the datasets referred to electricity theft
LSTM is adopted to detect NTL in AMI [22]. In order to find the are unlabeled. The synthetic attacks are launched, which do not
suspicions’ rank, fuzzy logic is applied in [23]. A framework for show the true relation between consumed energy [21],
feature engineering with combination of both genetic algorithm (5) the selection of suitable performance metrics is of great im-
(GA) and finite mixture model (FMM) is implemented in [24]. portance in ETD. The most widely used performance measure
For final judgment in NTL detection, gradient boosting machine i.e., accuracy is an inadequate measure in terms of fraud de-
(GBM) is applied. GA is an efficient heuristic algorithm, however, tection, because the cases of theft are reared as compared to
it fails in providing the global optima. A similar approach is the adversary. The classifier shows higher accuracy, even though
proposed in [25], which uses black hole algorithm (BHA) for the theft cases are misclassified, which negates the true relation
feature extraction. Although, BHA extracts the optimal features between weekly consumed energy [25]. Similarly, low ETD rate,
from time series data, the performance of model is still inefficient minimum AUC and high F+ rate are observed in [7].
in terms of F+ rate.
4. Our approach
3. Problem analysis
The proposed ETD technique consists of two steps. In the first
By analyzing the consumption patterns of electricity con- step, the preprocessing is done in which the issues of missing val-
sumers, it becomes evident that the fraudsters and the fair con- ues, data standardization and handling the imbalanced class are
sumers can be differentiated by their consumption profiles. There- resolved. In the second step, a three-fold operation is performed,
fore, experiments are performed on the consumed electricity which involves decisive feature extraction, analysis of sequential
data, as inspired from [1], in order to validate the problem. time series and the application of a classifier. The details are
Fig. 1(a) shows the electricity consumption of benign consumers provided in the following subsections.
during October 2016. By visualizing the results, it is difficult
to analyze the key characteristics from the sequential or one- 4.1. Data preprocessing
dimensional (1-D) load profile. However, by choosing the weekly
load profile, it can be seen that the consumption of a fair con- The preliminary analysis of data is a mandatory step in highly
sumer shows symmetric behavior, as depicted in Fig. 1(b). In dynamic time series analysis, which includes imputation, outlier
our scenario, weekly consumption profile of consumers is pre- detection, data standardization, handling imbalance data, etc.
ferred over daily consumption for CNN, because the behaviors of
consumers are weekly periodic. As shown in Fig. 1(b), a strong 4.1.1. Handling missing values and data standardization
relation exists between the weekly consumed energy, which The electricity consumption records of consumers contain ei-
shows the peak consumption on 3rd day while the lowest con- ther incomplete information or missing values. The reasons be-
sumption is recorded on 6th day of each week. The exception hind this issue may be the failure of hardware or corruption of
is found on 5th day of the 4th week. The reason behind this data. In case of high time series data, the missing values cannot
deviation is the intermittent nature of a fair consumer. Therefore, be dropped. However, the imputation is performed synthetically
it is deduced from Fig. 1(b) that the consumption profiles of the in order to fill these values. In most cases, the filling of missing
benign consumers follow a periodic pattern. Similarly, the daily values is performed through averaging. In this paper, the missing
and weekly time series of the fraudster consumer is exhibited in values are recovered through interpolation method [1], as under:
Fig. 2(a) and Fig. 2(b), respectively, which show a non-periodic
behavior at each time interval. In contrast to Fig. 1(b), an abrupt
{ zi−1 +zi+1
2
if zi ∈ NaN , zi−1 , zi+1 ̸ ∈ NaN
and highest peak is observed on 3rd day of the 1st week, as shown f (zi ) = (1)
zi otherwise,
in Fig. 2(b), which validates the problem.
After analyzing the time series data of both fair and fraudster where, zi is the recorded or missed (null) observation in the
consumers, it is observed that the consumption patterns of fair dataset. The null value is represented as NaN. If zi is null, then
consumers follow a symmetric pattern, in contrast, the suspicions it is filled according to Eq. (1).
46
N. Javaid, N. Jan and M.U. Javed Journal of Parallel and Distributed Computing 153 (2021) 44–52

Fig. 1. Electricity consumption pattern of a honest consumer. (a) Date-wise electricity consumption. (b) Weekly electricity consumption..

Fig. 2. Electricity consumption pattern of a fraudulent consumer. (a) Date-wise electricity consumption. (b) Weekly electricity consumption..

Similarly, the data standardization is performed using min −


−max normalization [1], using Eq. (2).
zi − min(z)
f (zi ) = (2)
max(z) − min(z)
where, min(z) shows minimum value of z and max(z) represents
maximum value of z (see Fig. 3).

4.1.2. Handling imbalanced class distribution


A dataset is considered as imbalanced or biased, if the sample
points of one class (majority class) highly dominate the instances
of other class (minority class). Due to underrepresentation of
minority class, the distribution is skewed towards the major-
ity class. Consequently, the classifier cannot discriminate the
decision boundary. Hence, it becomes unable to learn the key
characteristics of minority class and tends to overfit. The issues
related to imbalanced data are not only limited to image recogni-
tion, semantic segmentation, but are also applied equally to time
series data [26]. The existing remedies for handling imbalanced
class issues fall under one of the three solutions: cost-sensitive
approach, algorithm-level approach and data-level handling ap-
proach [27]. In cost sensitive approach, the effects of highly
dominating class are reduced in the training stage. The misclas-
sification costs of both the dominating and suppressing classes
are taken into account and the weights are assigned accordingly.
Hence, the cost-sensitive approach tweaks the minority class
towards the dominating class. In algorithm-level approach, the
model is modified and trained in such a way that the scarce
instances are favored and over-weighted, so that the disparity
Fig. 3. System model of the proposed DSN.
produced by the majority class is reduced during learning stage.
Traditionally, the class balancing was achieved by data-level ap-
proach, which includes both undersampling and oversampling
techniques. In undersampling, the majority class is sacrificed a right choices are eliminated. Similarly, copying the instances of
lot by down-sizing the actual data because in most cases the minority class mostly leads to overfitting, which is a downfall
47
N. Javaid, N. Jan and M.U. Javed Journal of Parallel and Distributed Computing 153 (2021) 44–52

of oversampling. The right choice for the selection of technique which operates in two main steps: shared feature extractor and
related to handling the imbalanced class issue depends upon the distance measurer or cost estimator. The shared feature extractor
nature of problem. is the encoding of features while the cost function estimates the
In this paper, the responsibility of handling imbalanced data difference between two embedding streams.
is assigned from algorithm-level to data-level. In particular, the
oversampling technique is adopted in order to avoid the problem 4.2.4. Mathematical formulation for CNN-LSTM
of decisive sample elimination caused by undersampling tech- The combination of CNN and LSTM is used in the proposed
nique. Specifically, for oversampling, ADASYN sampling approach work to discriminate the features of two different types of con-
is applied in order to better comprehend the selected points [28]. sumers, i.e., honest and fraudulent. The mathematical formula-
In contrast to simply duplicating the instances of minority class, tion of the CNN-LSTM module used in the underlying work is
the ADASYN selects samples and injects some noise. The impact described below.
of noise addition results in better generalization of the model. The two input sequences, i.e., ψi and ψj are taken parallelly by
The reason behind the selection of ADASYN is not only to avoid the CNN-LSTM module, such that both ψi , ψj = {(x1 , y1 ), (x2 , y2 ),
overfitting, but also to emphasize outliers’ detection in the feature . . . , (xn , yn )}, where, xi shows the input features and yi ∈ [0, 1] is
space. the corresponding target values (yi = 0 implies that the instance
belongs to fair class). The features of both the classes are learned
4.2. Proposed deep siamese (CNN-LSTM) network architecture for by the CNN-LSTM module and finally the encoding of features is
ETD performed [32], using Eqs. (3) and (4):
Ei = δ{ωn .δ{...δ{ω2 .[δ (ω1 .ψi + b1 ) + b2 ]...} + bn }}, (3)
In the second step of the proposed methodology, identification
of the fraudulent consumers is performed via joint integration of Ej = δ{ωn .δ{...δ{ω2 .[δ (ω1 .ψj + b1 ) + b2 ]...} + bn }}, (4)
CNN-LSTM with DSN. The details are provided in the following
where, δ (.), ωn and b, show the sigmoid function, weights and
subsections:
biases, respectively. Thereafter, the shared features are fed to
a loss function, which discriminates the features on the basis
4.2.1. Features extraction through convolution neural networks of similarity measure. Therefore, the classification loss such as
The preliminary data analytics show the periodicity and non- binary cross entropy is not viable. Instead, a constructive loss
periodicity in electricity consumption of fair and fraudulent con- function is used, as in [32], to better comprehend the features,
sumers. The identification of a fraudster consumer is difficult given in equation (5).
when analyzing the daily electricity consumption record, since
the electricity consumption of each day shows a relatively inde- LossDSN
i,j = di,j .max[0, (1 − d̂i,j )] + (1 − di,j ).d̂i,j , (5)
pendent pattern. Therefore, aligning the electricity consumption
where, di,j is the Euclidean distance, which is calculated for the
of several weeks is beneficial for detecting abnormal patterns. The
work done in [1] indicates that CNN performs well in such situa- features’ output accordingly, i.e., d̂i,j = ∥Ei − Ej ∥2 . Similarly, di,j
shows the actual distance, given in Eq. (6).
tion, hence the daily electricity consumption data is transformed
to weekly consumption, accordingly. A deep CNN is trained on the 1,
{
if yi ̸ = ŷj
weekly electricity consumption profile through multiple stacked di,j = (6)
0, otherwise.
convolutional layers, convolution filters, a max-pooling layer and
a fully connected layer. Convolution is the element-wise multipli- The objective of training DSN is to minimize the variance between
cation of weights with corresponding inputs. After convolution, di,j and d̂i,j .
the features-map is obtained by sliding the convolution filter or
kernel over the input vector. 5. Simulation results

4.2.2. Sequence learning through long short term memory In this section, the simulations are performed in order to com-
The association of memory to the NN makes it more powerful pare the performance of the proposed model with the benchmark
to handle time series data, which becomes the inherent behavior schemes.
of recurrent neural network (RNN) [29]. The problems associated
with RNN are vanishing and exploding gradients [30]. These 5.1. Simulation setup
issues arise due to the ignorance of long-term and short-term
dependencies. Unlike traditional RNN models, LSTM is introduced 5.1.1. Dataset acquisition
to overcome the aforementioned limitations [31]. The structure of The dataset is acquired from the largest power providing com-
LSTM is same as RNN except the repeating module. Instead of a pany in China i.e., SGCC,2 which is publicly available. The daily
single NN layer, LSTM has more layers, which demonstrate the consumption record is available for 1035 days i.e., from January
better representation of time series data. In fact, LSTM is capable 1, 2014 to October 31, 2016. The ground truth of the dataset states
to handle the vanishing gradient problem and to remember the that 9% of the total consumers are declared as electricity thieves,
information for a long period of time, which is practically its which demonstrates a high ratio.
default behavior.
In our work, the daily electricity consumption profile is ana- 5.1.2. Performance metrics
lyzed by LSTM. Moreover, LSTM is also capable to fetch the time In order to detect NTL from the pool of electricity consumption
window of anomalous time series. profile, the performance metrics such as true positives (T+ ) and
true negatives (T− ) show the correctly classified instances. In
4.2.3. Supervised learning based on deep siamese network contrast, false negatives (F− ) and F+ reflect an opposite scenario,
DSN can be applied to the problem, where the aim is to where F− shows the number of fraud consumers, which are
discriminate features on the basis of similarity measurer [32]. misclassified as fair and vice versa. The objective behind the
Unlike traditional CNN, which has low generalization ability, DSN accurate detection of NTL is to reduce the F+ , which consequently
works superior because of its best feature extraction capabili-
ties [32–34]. DSN is a supervised machine learning technique, 2 http://www.sgcc.com.cn/

48
N. Javaid, N. Jan and M.U. Javed Journal of Parallel and Distributed Computing 153 (2021) 44–52

maximizes T+ . Other performance metrics related to classifica- Table 1


tion are recall, precision, specificity, F1 -score, accuracy, mAP, and Significance of handling imbalanced class distribution.

AUC of receiver operating characteristics (ROC) curve, given by Performance Without handling Undersampling Oversampling
metrics imbalance class (RUS) (ADASYN)
Eqs. (7)–(11), taken from [13].
mAP 0.5952 0.5997 0.8988
AUC 0.6270 0.6520 0.9250
+
T F1 -Score 0.6467 0.5500 0.9249
Recall = , (7) Accuracy 0.7065 0.6519 0.9241
T+ + F−
Precision 0.7524 0.6500 0.9153
T+ Recall 0.3771 0.6300 0.9347
Precision = , (8)
T+ + F+
T−
Specificity = , (9) Table 2
T− + F+ SVM hyperparameters’ selection.
Precision ∗ Recall Hyperparameter Values range Optimal value
F1 − score = 2 ∗ , (10) C 0.001, 0.01, 0.1, 1, 10, 100 0.001
Precision + Recall
Kernel Linear, RBF RBF
T+ + T−
Accuracy = + . (11)
T + T− + F+ + F−
Though, accuracy and recall are widely used in the literature
the performance is worst without handling imbalanced class is-
as performance metrics, however, they are inadequate in case
sue. Especially, in case of recall, where a lot of fraud instances
of imbalanced class distribution, as shown in Table 1. Similarly,
are misclassified as fair. Similarly, the performance of ADASYN
precision, specificity and F1 -Score do not show accurate results is better than that of random undersampling (RUS). The reason
and are not reliable when used individually. behind the low-performance of RUS is due to the elimination of
In order to detect NTL without the loss of information, selec- decisive features.
tion of reliable performance metrics is required [13]. The perfor- The consequences of using accuracy as a performance metric
mance metrics such as mAP and AUC are applied in this work to are that it results in low T+ rate and high F+ rate. These results
better comprehend the imbalanced data. As mentioned in [1,2], can also be seen in Table 1 that even though the accuracy is
ROC curve and mAP are the best performance metrics used for higher, AUC and mAP are still minimized. Therefore, it is deduced
detecting suspects in imbalanced class distribution. that accuracy does not guarantee accurate classification of the
The ROC curve is the graphical representation of T+ rate and instances in skewed distributions.
+
F rate. It is used to evaluate the performance of a classifier.
The area under the ROC curve is called AUC, which separates the 5.3. Comparative analysis
distribution of fraudulent class from fair class, as given in Eq. (12).
The limits of ROC curve range from 0 to 1. The ideal situation The proposed model is compared with the baseline methods
arises when no curve overlaps each other. AUC approaching 1 for validation purpose. The baseline methods used for comparison
demonstrates the validity of classifier while AUC less than 0.5 are discussed below.
shows that the classifier does not have the ability to discriminate
the classes [1,2]. AUC is calculated using Eq. (12), taken from [1]. 5.3.1. Support vector machine
SVM is an elegant technique used for both classification and
regression tasks. It discriminates the boundary of different classes
Ri − 21 |S |(|S | + 1)

i∈S by a hyperplane. The construction of hyperplane is entirely de-
AUC = , (12)
|S | ∗ |H | pendent upon the selection of support vectors. Table 2 shows the
optimized hyperparameters obtained through grid search. The
where, Ri denotes the rank of suspicion degree of fraudulent
regularization parameter C is selected to be 0.001 with radial
consumers in ascending order while |S | and |H | are the cardinality
basis function (RBF) as a kernel.
of suspicious and honest consumers, respectively.
The second performance metric used in this paper is mAP. It is 5.3.2. Logistic regression
defined as the mean of all average precisions. It is used for useful LR is a simple and an elegant technique used for binary clas-
information retrieval, when the performance metrics discussed in sification. In it, both classes are separated by a hyperplane h =
Eqs. (7)–(11) fail. (w, b), where w and b show the norm and intercept of the hyper-
Let yk show the number of fraudulent consumers and k de- plane, respectively. The finding of optimal w and b implies that
notes the top rank fraudulent consumers, such that the precision the hyperplane can accurately separate the decision boundary of
y
is defined as P@k = kk . The calculations performed by mAP for both the classes. The operations performed by LR are same as NN
information retrieval are given in Eq. (13), taken from [1]. for training input features using trained weight metrics.
wT .x
A distance metric for each observation such as di = ∥w∥i is
∑r
i=1 P@ki
mAP@N = (13) used to find the margin with hyperplane, where w and x denote
r
the weights and corresponding input metrics. ∥w∥ is the norm
where, r shows the number of suspicious consumers of top to the hyperplane and is assumed as a unit vector. The weights
ranked theft labels N. The value of N is 100 in our scenario. accompanied with input feature metrics are passed through a
sigmoid function f (d) = 1+1ed . The library: LIBLINEAR solver is
5.2. Effect of imbalanced distribution on performance metrics used to train the classifier and find the optimal weights while
using the logarithmic loss function [35]. During grid-search, the
In imbalanced class problem, one class significantly dominates hyperparameters of LR are obtained, as given in Table 3, where
the other class, which results in the suppression of the minority C is the hyperparameter used to handle the overfitting and R
class. The effect of least important and significant performance shows the type of regularization. The best hyperparameters are
metrics can be seen in Table 1. achieved when C = 0.01 and L2 norm is selected as the type
Table 1 shows the comparative analysis of DSN (with and of regularization. The careful selection of these parameters is
without) handling imbalanced class distribution. It is seen that essential for the performance of the forecasting model [36]
49
N. Javaid, N. Jan and M.U. Javed Journal of Parallel and Distributed Computing 153 (2021) 44–52

Fig. 5. ROC–AUC and PR curve of CNN-LSTM model.


Fig. 4. Performance of CNN-LSTM model.

Table 3
LR hyperparameters’ selection.
Hyperparameter Values range Optimal value
C 0.001, 0.01, 0.1, 10, 100 0.01
R L1 norm, L2 norm L2 norm

Table 4
RF hyperparameters’ selection.
Hyperparameter Values range Optimal value
Number of decision trees 800, 1200, 1600, 2000 1200
Maximum depth 10, 15, 20, 25 20
Minimum sample splits 5, 10, 15, 20 15
Minimum sample leaves 4, 8, 12, 16 16

Table 5
CNN-LSTM hyperparameters’ selection.
Hyperparameter Optimal value Hyperparameter Optimal value Fig. 6. Performance of DSN.
Number of neurons 64 Stride 1
Number of CNN-layers 6 Dropout 0.1
Number of LSTM-layers 4 Dense layer 128
5.3.5. Wide and deep CNN
Number of filters 10 Activation function LeakyReLu
To capture both the wide and deep information in time series
data for NTL detection, a wide and deep CNN (WD-CNN) is pro-
posed in [1]. The wide component takes the daily consumption
5.3.3. Random forest (1-D data) as an input while the deep component analyzes the
RF is an ensemble model with decision tree as a baseline clas- weekly consumption profile, which is represented as 2-D data.
sifier. In order to make better predictions, RF combines multiple The rectified linear unit (ReLu) is used as an activation func-
tion to detect the positive value. Whereas, the metrics AUC and
decision trees (DT) on the bases of bootstrapping and feature
mAP are used to measure the performance of the model. The
sampling. Simultaneous execution of bootstrapping and feature hyperparameters used to train the model are same as used in [1].
extraction yields a different model each time. The essence of RF
is that it can reduce variance efficiently. The samples which are 5.3.6. Results and discussion
selected by the classifier are called in-bag-samples (ibs) while the Table 6 provides an overview of the performance metrics used
remaining samples are known as out-of-bag (oob) samples. The for each classifier for different training ratios, i.e., 60%, 70% and
ibs are used to train the classifier while oob are used to validate 80%, respectively. Similarly, the detail of each performance metric
is given in order to better understand its importance in ETD. All
the model. Table 4 shows the best generalized hyperparameters
the results obtained for traditional classifiers such as LR, SVM,
of RF. RF show an increasing trend. By investigating the results, it is
observed that the performance of traditional classifiers is en-
hanced by the increase in training instances. In contrast, the deep
5.3.4. CNN-LSTM networks entirely depend on the selection of hyperparameters
For performing comparative analysis, CNN and LSTM are in- along with the change in model’s training ratio. Moreover, it is
tegrated to extract the features and analyze the time series clear in Table 6 that the proposed model is successfully applied
data [37]. The details of hyperparameters are given in Table 5. to both small sized and immensely large-sized datasets. Similarly,
The performance of LSTM–CNN model without DSN is shown the proposed model’s performance is visualized in Figs. 6 and 7.
in Figs. 4 and 5. Although, the features are extracted by CNN
6. Conclusion
and the sequence information is preserved by LSTM, this hybrid
model still fails to provide efficient results due to the lack of In this paper, electricity theft is detected in the SG using
discrimination between instances. dataset obtained through AMI. A novel theft detection method
50
N. Javaid, N. Jan and M.U. Javed Journal of Parallel and Distributed Computing 153 (2021) 44–52

Table 6
Comparative analysis of DSN with benchmark schemes.
Method Training ratio-60% Training ratio-70% Training ratio-80%
P R F1 Acc mAP AUC P R F1 Acc mAP AUC P R F1 Acc mAP AUC
LR 0.710 0.710 0.680 0.700 0.645 0.702 0.725 0.740 0.715 0.720 0.640 0.716 0.730 0.725 0.725 0.730 0.668 0.720
SVM 0.675 0.670 0.680 0.676 0.6140 0.677 0.685 0.675 0.675 0.680 0.619 0.684 0.680 0.680 0.680 0.680 0.628 0.688
RF 0.700 0.550 0.551 0.710 0.687 0.706 0.740 0.730 0.735 0.740 0.652 0.735 0.750 0.750 0.750 0.750 0.681 0.749
CNN-LSTM 0.664 0.615 0.661 0.836 0.638 0.666 0.629 0.662 0.636 0.839 0.641 0.670 0.670 0.69 0.676 0.832 0.66 0.73
WD-CNN 0.640 0.691 0.651 0.820 0.669 0.689 0.624 0.720 0.770 0.770 0.689 0.718 0.661 0.760 0.685 0.840 0.711 0.756
DSN 0.875 0.839 0.857 0.839 0.814 0.860 0.840 0.850 0.845 0.844 0.819 0.844 0.912 0.923 0.928 0.953 0.900 0.934

[5] Sana Mujeeb, Nadeem Javaid, ESAENARX And DE-RELM: Novel schemes
for big data predictive analytics of electricity load and price, Sustainable
Cities Soc. (ISSN: 2210-6707) 51 (101642) (2019) 1–16.
[6] P. Jokar, N. Arianpoo, V.C. Leung, Electricity theft detection in AMI using
customers’ consumption patterns, IEEE Trans. Smart Grid 7 (1) (2015)
216–226.
[7] M.S. Saeed, M.W. Mustafa, U.U. Sheikh, T.A. Jumani, N.H. Mirjat, Ensemble
bagged tree based classification for reducing non-technical losses in Multan
electric power company of Pakistan, Electronics 8 (8) (2019) 860–876.
[8] S.K. Singh, R. Bose, A. Joshi, Energy theft detection for AMI using princi-
pal component analysis based reconstructed data, IET Cyber-Phys. Syst.:
Theory Appl. 4 (2) (2019) 179–185.
[9] A.M. Mohamad, Y.A.R.I. Mohamed, Investigation and assessment of stabi-
lization solutions for DC microgrid with dynamic loads, IEEE Trans. Smart
Grid 10 (5) (2019) 5735–5747.
[10] A.V. Martins, R.M. Bacurau, A.D. dos Santos, E.C. Ferreira, Non-intrusive
energy meter for non-technical losses identification, IEEE Trans. Instrum.
Meas. (2019) 1–8.
Fig. 7. ROC–AUC and PR curve of DSN. [11] S. Amin, G.A. Schwartz, A.A. Cardenas, S.S. Sastry, Game-theoretic models
of electricity theft detection in smart utility networks: Providing new
capabilities with advanced metering infrastructure, IEEE Control Syst. Mag.
is introduced via joint integration of CNN-LSTM and DSN. The 35 (1) (2015) 66–81.
[12] T. Ahmad, H. Chen, J. Wang, Y. Guo, Review of various modeling techniques
CNN component is capable to handle the weekly 2-D electric-
for the detection of electricity theft in smart grid environment, Renew.
ity consumption profile by generalizing the model efficiently, Sustain. Energy Rev. 82 (2018) 2916–2933.
whereas, the LSTM module memorizes the daily 1-D sequen- [13] N.F. Avila, G. Figueroa, C.C. Chu, NTL detection in electric distribution
tial electricity consumption data. Moving ahead, DSN performs systems using the maximal overlap discrete wavelet-packet transform and
judgment on the shared feature extractor and discriminates the random undersampling boosting, IEEE Trans. Power Syst. 33 (6) (2018)
deviating patterns of fraudulent class consumers from the fair 7171–7180.
class consumers. The analysis is performed on high resolution [14] W. Li, T. Logenthiran, V.T. Phan, W.L. Woo, A novel smart energy theft
time series data, provided by SGCC. The simulation results depict system (SETS) for IoT-based smart home, IEEE Internet Things J. 6 (3)
(2019) 5531–5539.
that SDN has high ETD rate with an increased AUC and mAP
[15] P. Kumar, Y. Lin, G. Bai, A. Paverd, J.S. Dong, A. Martin, Smart grid metering
of 0.93% and 0.9%, respectively. Its comparative analysis with networks: A survey on security privacy and open research issues, IEEE
benchmark methods, such as LR, SVM, RF, CNN–LSTM and WD- Commun. Surv. Tutor. 21 (3) (2019) 2886–2927.
CNN, show that it achieves highest values for all performance [16] J.L. Viegas, P.R. Esteves, R. Melicio, V.M.F. Mendes, S.M. Vieira, Solutions for
parameters: precision, recall, MaP, Accuracy, AUC and F1 -Score. It detection of non-technical losses in the electricity grid: A review, Renew.
maintains its performance for all three training ratios: 60%, 70% Sustain. Energy Rev. 80 (2017) 1256–1268.
and 80%. [17] D. Abbasinezhad-Mood, M. Nikooghadam, Efficient anonymous password-
authenticated key exchange protocol to read isolated smart meters by
utilization of extended Chebyshev chaotic maps, IEEE Trans. Ind. Inf. 14
Declaration of competing interest
(11) (2018) 4815–4828.
[18] D. Abbasinezhad-Mood, M. Nikooghadam, Design and hardware implemen-
The authors declare that they have no known competing finan- tation of a security-enhanced elliptic curve cryptography based lightweight
cial interests or personal relationships that could have appeared authentication scheme for smart grid communications, Future Gener.
to influence the work reported in this paper. Comput. Syst. 84 (2018) 47–57.
[19] Saeed Muhammad Salman, Mohd Wazir Mustafa, Nawaf N. Hamadneh,
References Nawa A. Alshammari, Usman Ullah Sheikh, Touqeer Ahmed Jumani, Sai-
fulnizam Bin Abd Khalid, Ilyas Khan, Detection of non-technical losses
[1] Z. Zheng, Y. Yang, X. Niu, H.N. Dai, Y. Zhou, Wide and deep convolutional in power utilities-A comprehensive systematic review, Energies 13 (18)
neural networks for electricity-theft detection to secure smart grids, IEEE (2020) 4727.
Trans. Ind. Inf. 14 (4) (2017) 1606–1615. [20] J.L. Viegas, P.R. Esteves, S.M. Vieira, Clustering-based novelty detection for
[2] K. Zheng, Q. Chen, Y. Wang, C. Kang, Q. Xia, A novel combined data-driven identification of non-technical losses, Int. J. Electr. Power Energy Syst. 101
approach for electricity theft detection, IEEE Trans. Ind. Inf. 15 (3) (2018) (2018) 301–310.
1809–1819. [21] R. Punmiya, S. Choe, Energy theft detection using gradient boosting theft
[3] Asif Khan, Nadeem Javaid, Jaya learning-based optimization for optimal detector with feature engineering-based preprocessing, IEEE Trans. Smart
sizing of stand-alone photovoltaic, wind turbine and battery systems, Grid 10 (2) (2019) 2326–2329.
Engineering (ISSN: 2095-8099) (2020) 1–21. [22] M.M. Buzau, J. Tejedor-Aguilera, P. Cruz-Romero, A. Gomez-Exposito,
[4] Ashfaq Ahmad, Nadeem Javaid, Mohsen Guizani, Nabil Ali Alrajeh, Zahoor Hybrid deep neural networks for detection of non-technical losses in
Ali Khan, An accurate and fast converging short-term load forecasting electricity smart meters, IEEE Trans. Power Syst. (2019) 1–10.
model for industrial applications in a smart grid, IEEE Trans. Ind. Inf. (ISSN: [23] J.V. Spiric, S.S. Stankovic, M.B. Docic, Identification of suspicious electricity
1551-3203) 13 (5) (2017) 2587–2596. customers, Int. J. Electr. Power Energy Syst. 95 (2018) 635–643.

51
N. Javaid, N. Jan and M.U. Javed Journal of Parallel and Distributed Computing 153 (2021) 44–52

[24] R. Razavi, A. Gharipour, M. Fleury, I.J. Akpan, A practical feature- Nadeem Javaid (S’8, M’11, SM’16) received the bache-
lor degree in computer science from Gomal University,
engineering framework for electricity theft detection in smart grids, Appl.
Dera Ismail Khan, Pakistan, in 1995, the master degree
Energy 238 (2019) 481–494.
in electronics from Quaid-i-Azam University, Islamabad,
[25] C.C. Ramos, D. Rodrigues, A.N. de Souza, J.P. Papa, On the study of
Pakistan, in 1999, and the Ph.D. degree from the Uni-
commercial losses in Brazil: a binary black hole algorithm for theft
versity of Paris-Est, France, in 2010. He is currently
characterization, IEEE Trans. Smart Grid 9 (2) (2016) 676–683. an Associate Professor and the Founding Director of
[26] S.H. Khan, M. Bennamoun, F. Sohel, R. Togneri, I. Naseem, Integrating the Communications Over Sensors (ComSens) Research
geometrical context for semantic labeling of indoor scenes using rgb Laboratory, Department of Computer Science, COMSATS
images, Int. J. Comput. Vis. 117 (1) (2016) 1–20. University Islamabad, Islamabad. He has supervised 126
[27] R. Razavi-Far, M. Farajzadeh-Zanjani, B. Wang, M. Saif, S. Chakrabarti, master and 20 Ph.D. theses. He has authored over
Imputation-based ensemble techniques for class imbalance learning, IEEE 900 articles in technical journals and international conferences. His research
Trans. Knowl. Data Eng. (2019) 1–14. interests include energy optimization in smart/micro grids and in wireless sensor
[28] H. He, Y. Bai, E.A. Garcia, S. Li, ADASYN: Adaptive synthetic sampling networks, data analytics in smart grids, and blockchain in WSNs, etc. He was
approach for imbalanced learning, in: 2008 IEEE International Joint Con- recipient of the Best University Teacher Award from the Higher Education
ference on Neural Networks (IEEE World Congress on Computational Commission of Pakistan, in 2016, and the Research Productivity Award from
Intelligence), June, IEEE, 2008, pp. 1322–1328. the Pakistan Council for Science and Technology, in 2017. He is also Associate
[29] Rabiya Khalid, Nadeem Javaid, Fahad A. Al-zahrani, Khursheed Aurangzeb, Editor of IEEE Access, Editor of the International Journal of Space-Based and
Emad-ul-Haq Qazi, Tehreem Ashfaq, Electricity load and price forecasting Situated Computing and editor of Sustainable Cities and Society.
using Jaya-long short term memory (JLSTM) in smart grids, Entropy (ISSN:
1099-4300) 22 (1) (2020) 1–21, 10.
[30] Muhammad Adil, Nadeem Javaid, Umar Qasim, Ibrar Ullah, Muhammad
Shafiq, Jin-Ghoo Choi, LSTM And Bat-based RUSBoost approach for elec- Naeem Jan received the B.S. degree in computer
tricity theft detection, Appl. Sci. (ISSN: 1099-4300) 10 (12) (2020) 1–21, science from PMAS, University Institute of Informa-
4378. tion Technology Rawalpindi, Pakistan, in 2014, and
[31] K. Greff, R.K. Srivastava, J. Koutnik, B.R. Steunebrink, J. Schmidhuber, LSTM: the M.S. degree in computer science, under the su-
A search space odyssey, IEEE Trans. Neural Netw. Learn. Syst. 28 (10) pervision of Dr. N. Javaid, from the Department of
(2016) 2222–2232. Computer Science, COMSATS University Islamabad, Is-
[32] T. Hu, Q. Guo, X. Shen, H. Sun, R. Wu, H. Xi, Utilizing unlabeled data to lamabad Campus, Pakistan, in 2017. He is currently
detect electricity fraud in AMI: A semisupervised deep learning approach, with ComSens (Communication over Sensors) Research
IEEE Trans. Neural Netw. Learn. Syst. 30 (11) (2019) 3287–3299. Laboratory, COMSATS University Islamabad. His re-
[33] M. Wang, K. Tan, X. Jia, X. Wang, Y. Chen, A deep siamese network with search interests include wireless sensor networks,
hybrid convolutional feature extraction module for change detection based optimization techniques, Big data analysis, and Internet
of Things.
on multi-sensor remote sensing images, Remote Sens. 12 (2) (2020) 205,
http://dx.doi.org/10.3390/rs12020205.
[34] J. Miao, B. Wang, X. Wu, L. Zhang, B. Hu, J.Q. Zhang, Deep feature extraction
based on siamese network and auto-encoder for hyperspectral image
classification, in: IGARSS 2019-2019 IEEE International Geoscience and Muhammad Umar Javed received the bachelor’s and
Remote Sensing Symposium, July, IEEE, 2019, pp. 397–400. master’s degrees in electrical engineering from Gov-
[35] R.E. Fan, K.W. Chang, C.J. Hsieh, X.R. Wang, C.J. Lin, LIBLINEAR: A library for ernment College University Lahore, Lahore, Pakistan, in
large linear classification, J. Mach. Learn. Res. 9 (Aug) (2008) 1871–1874. 2014 and 2018, respectively. He is currently pursuing
[36] Rabiya Khalid, Nadeem Javaid, A survey on hyperparameters optimization the Ph.D. degree in computer science with the Commu-
algorithms of forecasting models in smart grid, Sustainable Cities Soc. nications Over Sensors (ComSens) Research Laboratory,
(ISSN: 2210-6707) (2020) 1–35, 102275. COMSATS University Islamabad, Islamabad Campus, un-
[37] M. Hasan, R.N. Toma, A.A. Nahid, M.M. Islam, J.M. Kim, Electricity theft der the supervision of Dr. Nadeem Javaid. His research
detection in smart grid systems: A CNN-LSTM based approach, Energies interests include smart grids, electric vehicles, big data
analysis and blockchain.
12 (17) (2019) 3310–3328.

52

You might also like