0% found this document useful (0 votes)
1 views

A_Sequential_Supervised_Machine_Learning_Approach_for_Cyber_Attack_Detection_in_

The document presents a two-layer hierarchical machine learning model for detecting cyberattacks in smart grid systems, achieving a classification accuracy of 95.44%. The model distinguishes between normal operations and cyberattack events in its first layer, while the second layer classifies the specific types of attacks. The research emphasizes the importance of data preprocessing, handling class imbalance, and optimizing the Random Forest Classifier for effective cyberattack detection.

Uploaded by

abir Islam
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1 views

A_Sequential_Supervised_Machine_Learning_Approach_for_Cyber_Attack_Detection_in_

The document presents a two-layer hierarchical machine learning model for detecting cyberattacks in smart grid systems, achieving a classification accuracy of 95.44%. The model distinguishes between normal operations and cyberattack events in its first layer, while the second layer classifies the specific types of attacks. The research emphasizes the importance of data preprocessing, handling class imbalance, and optimizing the Random Forest Classifier for effective cyberattack detection.

Uploaded by

abir Islam
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

A Sequential Supervised Machine Learning

Approach for Cyber Attack Detection in a Smart


Grid System
Yasir Ali Farrukh Zeeshan Ahmad
Electrical and Computer Engineering Pritzker School of Molecular Engineering
NED University of Engineering and Technology The University of Chicago
Karachi, Pakistan Chicago, IL, USA
[email protected] [email protected]

Irfan Khan Rajvikram Madurai Elavarasan


Clean and Resilient Energy Systems Lab Clean and Resilient Energy Systems Lab
Texas A&M University Texas A&M University
Galveston, TX, USA Galveston, TX, USA
[email protected] [email protected]

Abstract—Modern smart grid systems are heavily dependent Recently, a lot of attention has turned towards the
on Information and Communication Technology, and this automated identification of cyberattacks. Several techniques
dependency makes them prone to cyber-attacks. The occurrence and methods involving supervised and unsupervised machine
of a cyber-attack has increased in recent years resulting in learning (ML) algorithms have been proposed for the
substantial damage to power systems. For a reliable and stable detection of cyber-attacks. Such models are provided data
operation, cyber protection, control, and detection techniques related to electrical parameters during normal operation and
are becoming essential. Automated detection of cyberattacks during cyberattacks, which is used to identify cyberattacks
with high accuracy is a challenge. To address this, we propose a after training. A supervised ML algorithm based on Support
two-layer hierarchical machine learning model having an
2021 North American Power Symposium (NAPS) | 978-1-6654-2081-5/21/$31.00 ©2021 IEEE | DOI: 10.1109/NAPS52732.2021.9654767

Vector Machine was proposed in [4]. However, the proposed


accuracy of 95.44 % to improve the detection of cyberattacks.
model was unable to distinguish between different types of
The first layer of the model is used to distinguish between the
two modes of operation – normal state or cyberattack. The transmission and generation anomalies with cyber attacks.
second layer is used to classify the state into different types of Similarly, in Ref. [5], the authors proposed a method using
cyberattacks. The layered approach provides an opportunity for two Euclidean distance-based techniques for detection. This
the model to focus its training on the targeted task of the layer, method had an improved accuracy along with low
resulting in improvement in model accuracy. To validate the computational complexity but lacked a diverse range of attack
effectiveness of the proposed model, we compared its scenarios in training and testing. In [6], author has combined
performance against other recent cyber attack detection models the graph topology with the neural network for the first time
proposed in the literature. using both model and data driven approaches. Following
similar graph topology, Ramakrishna et. al. in [7] has
Keywords— Cyberattack, Machine Learning, Supervised established a framework using graph signal processing (GSP)
Learning, Smart Grid System, Intrusion Detection System, Class but this method is ineffective if attacker gains knowledge of
Imbalance Problem. system parameters and graph filter. In addition to all this, an
overall comparison between several mostly used algorithms is
I. INTRODUCTION
presented in [8], which concludes that Random Forest
With the technological advancement in instrumentation, Classifier (RFC) surpasses all other algorithms in terms of
communication, networking, and control, the conventional weighted accuracy.
power system has transformed into a smart grid system. The In terms of unsupervised and deep ML algorithms, a
flow rate and size of sensed smart grid signals have drastically recently proposed method using stacked Autoencoders is
increased in recent years [1],[2]. This has transformed the introduced in [9], with a classification accuracy of 93.7%.
traditional power system into an intelligent and autonomous This model utilizes a self-adaptive cuckoo search algorithm
smart grid system. Smart grids are now capable of state for optimizing its parameter. However, the complexity of this
estimation, forecasting, controlling/predicting abnormalities, model is high and requires a lot of time in training. In addition
and even providing support to market agents. On the one hand, to this, a detailed overview of lately proposed approaches has
this advancement has proved quite beneficial. It has improved been presented in Ref. [10,11], stating their problems and
the overall efficiency and reliability of the system. At the same limitations. The major issue highlighted in the used algorithms
time, it has come across new technological challenges, such as is related to overfitting and the high computational complexity
bad data injection and false tripping and cyber-attacks. This of the model.
happens due to the vulnerability of the store data on accessible Even with the development of highly accurate and precise
data centers. algorithms, research work is still headed towards developing
Due to the vulnerability of stored data, a hacker may more rigorous and practically applicable methods. Therefore,
corrupt the data as it is transmitted to the power system. This the need to have models that are robust and pertinent is
event is commonly known as a cyber-attack and may result in essential. Here we propose a two-layer hierarchical random
complete electrical blackouts [3], failure of government forest model to detect cyber-attacks in a smart grid system that
infrastructures, and breaches of national security secrets. is robust and efficient in terms of applicability. The proposed
Therefore, it is critical to identify and rectify any such cyber- model divides the problem into two sub-problems, the upper-
attacks in smart grid systems. level sub-problem, and the lower-level sub-problem. In the
upper-level sub-problem, it

XXX-X-XXXX-XXXX-X/XX/$XX.00 ©20XX IEEE

Authorized licensed use limited to: Zhejiang University. Downloaded on January 15,2025 at 08:48:54 UTC from IEEE Xplore. Restrictions apply.
Fig. 1. Proposed methodology and model architecture. Dataset utilized is reduced to 96 features after preprocessing and is then divided into train and test set
with a ratio of 8:2. The segregated test dataset is put aside and is not included in any testing or tunning of the model.

distinguishes between a natural event or an attack event. If the


distinguished data is a potential attack event, it is propagated A. Overview of Data set Utilized
to the lower-level sub-problem that works on identifying the The dataset used for training our supervised model has
specific attack classes. In turn, this approach results in better been obtained from Adhikari et al. [12]. The power system
training of the sub-level models with respect to their specific dataset contains different parameters of four phasor
event classes, leading to higher accuracy and lower measurement units (PMUs) related to the electric
computational complexity. Before training the model, we transmission system. There is a division of data set into three
perform a class balancing which is known to improve the scenarios: binary, three class, and multi-class data set. The
accuracy of ML models. Our model achieves a classification data set adopted for this experiment is multi-class data that
accuracy of 95.44 %, which is higher than or similar to those comprises 78377 data points, having 37 different classes and
obtained using other ML approaches. 128 discrete features. The classes are distributed over the
The rest of the paper is organized as follows; Section II normal operation, fault condition, maintenance scenarios,
provides a detailed overview of the adopted methodology and data injection attack, attack on Relay setting, and remote
proposes the model. Section III presents the results and their
tripping attacks. The ratio of data between these classes is
analysis, following section IV that concludes this research
1:3:1: 2:9:2 [6]. Further detail of the dataset involving its
work.
parameter attribute, system structure, different classes, and
II. PROPOSED MODEL approach can be found in Refs. [12, 13].
Ref. [8] provides a comprehensive comparison of the B. Pre-processing of Data-set
performance of twelve commonly used ML algorithms for Pre-processing is the initial step for solving problems
cyberattack detection. The random forest classifier (RFC) related to data science. It is a method of converting raw data
outperforms all other algorithms in classification accuracy. into a meaningful and understandable format. Raw data is
Therefore, RFC was used as the base model. Here, we mostly noisy, incomplete, and inconsistent. Therefore, in
optimize the RFC by exploring the layered hierarchical order to mitigate such problems, data should be preprocessed.
classification approach towards cyberattacks. Our choice for Data preprocessing can be divided into the following three
selecting RFC as a base classifier is due to its design subparts:
robustness and stability. Moreover, it requires less
Part 1: Handling of Missing or NaN Values
computational cost as compared to deep learning techniques.
Fig. 1 shows a schematic of the proposed model. The Identifying and coping up with missing values is essential
selection and training of the proposed model are explained in for effective cyberattack identification research. If a
the further headings. researcher cannot handle missing values, it may cause the
The dataset employed for this research work is multi class research to end up with an inaccurate deduction about the data.
dataset which is available in 15 subsets. Instead of performing There are several approaches used for dealing with such
training and testing on each subset, an unabridged approach problems. Among those techniques, the most popular ones are
is adopted, and all subsets are combined to form one singular removing instances of Nan/missing values, replacing them
dataset. Before splitting the dataset into a train and test set, it with average value, majority value, or zeros. All of these
methods are tested on a primary RFC having n_estimators set
is preprocessed by reducing its features from 128 to 96
to 100. Moreover, this test run was carried out on a binary
features. Moreover, the class imbalance problem is also class dataset, and it is observed that replacing the NaN values
addressed in this part. After preprocessing, the dataset is with zero tends to produce the most effective results. The
divided with a ratio of 8:2 into train and test sets. The train detailed outcome of each approach is shown in Table I.
set is utilized for training purposes while the test set is kept
aside untouched for final testing only. For selection and TABLE I
HANDLING OF MISSING VALUE
tunning of model parameters, train set is further divided into Approach Validation Accuracy
train and validation set with a ratio of 9:1 and all the results Mean 94.51 %
shown in Table I-V are outcome of validation set. Further Median 94.02 %
details of the dataset are covered in sub-heading A. Drop 94.19 %
Replace with Zero 94.54 %

Authorized licensed use limited to: Zhejiang University. Downloaded on January 15,2025 at 08:48:54 UTC from IEEE Xplore. Restrictions apply.
Part 2: Handling Imbalanced Classification TABLE II
METHODS FOR OVER SAMPLING
An imbalanced classification issue arises when the Methods Validation Accuracy
distribution among different classes of datasets is not None 85.76 %
uniform. This problem can lead to the poor performance of ADASYN 93.89 %
Random Over Sampler 96.23 %
the classification model, especially for the minority class. SMOTE 92.02 %
Imbalance classification raises a challenge for training an Borderline SMOTE 93.72 %
effective model as most of the ML algorithms are designed
for an equal number of datasets in each class. Many real- As per the results, ROS seems to have the best accuracy
world problems are imbalanced in terms of datasets. If they among all other approaches, but it tends to cause over-fitting
are not handled properly, then the resulting outcome will be for the model [15], [16]. In this method, minority classes are
a biased trained model giving an edge to the majority class randomly replicated until the desired ratio of balancing is
achieved. In order to avoid this problem, other modern
and overlooking the minority one [14]. The dataset is
techniques were considered. ADASYN and Borderline
processed to overcome such a problem, and all the classes are SMOTE algorithms have almost the same accuracy. The
balanced out before training the model. Several methods are working of both the algorithm is also quite similar with just a
used for handling such issues, which can be divided into two little difference [17]. However, ADASYN generates synthetic
main approaches: data-driven and algorithm-driven [15]. data that is harder to learn, whereas SMOTE synthesizes data
For our research, the dataset used for training is highly due to the interpolation of minority class datasets that are
imbalanced in terms of classes. Therefore to attain high closely located [16]. These modern algorithms provide an
model accuracy balancing out the dataset is needed. A data- adaptive approach for handling imbalance classification and
driven approach is used for addressing this problem. Data helps in better training of the model. However, ADASYN has
resampling approaches are further divided into oversampling greater accuracy than the Borderline SMOTE, but it was not
and undersampling [10]. used. The reason for neglecting ADASYN is due to its density
Oversampling algorithms are adopted in order to achieve distribution criterion, which automatically calculates the
optimal efficiency while giving priority to the number of data number of samples to be synthesized for each minority class.
instances. Moreover, a clearer picture of class imbalance in While balancing the dataset, ADASYN exceeds the original
the dataset can be seen in Fig. 2 maximum number of datasets in a particular class. Therefore,
it was neglected to maintain the sensitivity of the model. The
difference between SMOTE and Borderline SMOTE
algorithm is that Borderline SMOTE is a variation of SMOTE
algorithm. It only synthesizes data and the decision boundary
present between the classes. In contrast, SMOTE algorithm
generates synthetic data with its k-nearest neighbors of
minority class [15].
A picture of datasets before and after the application of
the Borderline SMOTE algorithm is shown in Fig.3. In
addition to that, the balanced number of datasets in each class
after oversampling is shown in Fig. 4. It can be seen that after
oversampling of the dataset, all the classes are brought into
Fig. 2 Original population of classes in dataset, the population shows unity in terms of data instances; that is, all classes now
that classes are very imbalanced. Moreover, the classes between 30 comprise 4685 data samples. Moreover, the difference of
and 35 do not exist in dataset, hence they have no values. class distribution before and after oversampling of the dataset
It can be apprehended from the figure that the number of is exhibited in Fig. 4, which utilizes voltage and current
data instances in each class is very inconsistent. The highest magnitude of PMU 1 for visualization.
number of data instances is in class 36 with 4685 samples,
while class 21 has the lowest number of data instances with Part 3: Handling standardization of dataset
1242 samples only which are 3.7 times lesser than class 36. Data standardization is essential before implementing ML
Oversampling techniques increase the number of data algorithms, as data standardization can significantly impact
points rather than reducing them to uniformity, which is the the outcome of the ML training model. Therefore, it is very
case in undersampling. Moreover, undersampling approaches crucial to have all the data on the same scale. There are many
can only be used where the dataset is quite extensive, and loss approaches available for data standardization. The methods
in a small amount of data would not cause any significant tested for this particular experiment are described below.
change. However, in our case, we have a small amount of Among these methods, Standard Scaler (SS) acquired the
data. Therefore, undersampling was not an option. Regarding highest accuracy of 95.2% on binary datasets. Therefore, it
oversampling techniques, there are many approaches used for was utilized in this experiment.
better results. Among them, the most popular are: Adaptive Standard Scaler: transform all data features to the same
Synthetic (ADASYN) algorithm, Synthetic Minority Over- magnitude, keeping mean 0 and variance 1. It does not involve
sampling Technique (SMOTE), Random Over Sampler any minimum and maximum value of the features, as shown
(ROS), and Borderline SMOTE. All of these methods were in (1).
applied to determine the most effective technique. The
detailed results for each technique are presented in Table II.

Authorized licensed use limited to: Zhejiang University. Downloaded on January 15,2025 at 08:48:54 UTC from IEEE Xplore. Restrictions apply.
Fig.3. Scatter plot of the dataset, representing the distribution of classes with respect to two features, i.e., voltage and current. Different colors in the plot
represent individual classes. (a) portrays composition of the dataset before oversampling by Borderline Smote algorithm and (b) portrays composition of the
dataset after oversampling by Borderline Smote.

̅
(1)
Mean Normalization: transform all data features such that
the feature vector has one as Euclidian length. Scaling is done
through different numbers for every data point as given in (2)

(2)

Min-Max Scaling: transform all the feature dataset to a


scale between 1 and 0. It involves the computation of
maximum and minimum values in the entire dataset, as given
in (3)
(3)

C. Dimensionality Reduction
Fig. 4 Population of classes after oversampling with BoderlineSMOTE,
The number of the different features present in a dataset is each class is now balanced, having same number of data instances. classes
known as the dimensionality of the dataset. As the number of between 30 and 35 do not exist in dataset, i.e. they are not sampled.
data features increases, training a model becomes
challenging. It will require more computational power and % &'()!* +,&- *$ ,&- *$ +,&- * ,&- * (5)
may lead to overfitting, resulting in performance degradation where P is the data proportion of each split that takes up the
[13]. This issue is often termed as a curse of dimensionality. relative parent node.
To mitigate such issue, high dimensionality statistics and
different reduction techniques are used for data visualization. MDI for the top 10 features of the dataset can be seen in
These methods are also applied in ML for optimizing the Fig. 5. After analyzing the dataset, it is observed that the
outcome of a model. The method used in this research for control panel log, relay log, snort log, and status flag of each
identifying the importance of each feature is through a Mean PMU have very little or no importance in the detection of
Decrease in Impurity (MDI). cyberattacks. Moreover, when it is analyzed with respect to
The MDI is a measure of feature importance in evaluating domain knowledge, these features have no influence over the
a target variable. It calculates an average of total decrement power system. Therefore, these features were dropped from
in node impurity, weighted by the ratio of samples for each the original dataset, and the dimensionality of the dataset was
feature reaching that particular node in a separate decision reduced to 96 features from 128 features. The effect of this
tree. Thus, a higher MDI indicates the higher importance of reduction in feature can be seen in Table III.
that particular feature. The MDI index(G) is defined in (4)
!" !" TABLE III
FEATURE REDUCTION
1 1 (4) Removal of Low MDI Features Validation Accuracy
#$ #$ Before 94.51%
where nc is the number of classes in the target variable and pi After 94.59%
is the ratio of this class. After removing trivial features, the accuracy of the model
increased along with the reduction in computational
The decrease in impurity I is then defined in (5) complexity. Numbers of features depicts the computational

Authorized licensed use limited to: Zhejiang University. Downloaded on January 15,2025 at 08:48:54 UTC from IEEE Xplore. Restrictions apply.
complexity and time required for power disaggregation. E. Performance Metrics
Therefore, lesser the feature lesser will be the computational
cost. A comparison is done between recent approaches and For the proposed model and training, evaluation criteria
proposed model in terms of computational complexity but were set on the accuracy, recall, precision, and F1 score as
only those approaches were selected that utilize the same shown in (7),(8),(9), and (10), respectively. These metrices
dataset as of proposed approach. Table IV illustrates a are the most common and are widey adopted for the
comprehensive comparion with regard to number of features performance evaluation of ML approaches [11].
and computational complexity. . / 0 1+213 / 1+25+2532 13 (7)
TABLE IV + 6768 1 +/ 1 + 2 5 + (8)
COMPARISON OF COMPLEXITY WITH RECENT APPROCHES
Ref. Approach
Number of Computational 9 1 +/ 1 + 2 5 3 (9)
Feature Complexity
[9] SACS-SAE 96 low 51 8 21+/ 21+ 2 53 2 5+ (10)
[20] J-ripper,RF, one-R & NB 128 Medium
[13] AWV 144 Highest where TP and TN refer to true positive and true negative.
- Proposed 94 Lowest Similarly, FP and FN refer to a false positive and false
negative, respectively.
D. RFC Parameter tunning
RFC is a well-known classification algorithm known for III. RESULTS AND ANALYSIS
its robustness towards outliers. Moreover, it can handle noise The core objective of this research work was to develop a
comparatively better than other algorithm of its domain [8]. sequential model having better accuracy and precision along
Like every other model, it is essential to tune the model as per with low computational cost. For achieving this goal, a bi-
the problem, to obtain effective results. Nebrase et al. in [8] level model is proposed using RFC as a base classifier for the
have tested RFC with the same dataset, and the resulting detection of intrusion attacks in smart grid systems. The model
outcome was 92%. For our particular research, the goal of is divided into two layers. The first level sub-problem
parameter tunning was to improve the accuracy level in order classifies between the natural events and attack events.
to develop a sequential model capable of achieving higher Through this level, all-natural events are classified and
accuracy results. There are several hyperparameters of RFC filtered. This layer has an accuracy level of 99% in detecting
that can be adjusted. In this research work, only n_estimators, a natural event. The reason for having such high accuracy is
max_features, and criterion parameters of RFC were tested the better learning of the class boundary of two major classes
and tweaked for better accuracy. The details of the test can be rather than learning for all individual classes. Further, the
found in Table IV. classification of natural events in their specific classes is not
. Table V part of this model. The intention for developing this model
RFC PARAMETER SELECTION was to detect and classify intrusion attacks in smart grid
Parameters Validation Accuracy % systems. All events related to either fault, operation, or
Max_Features
Sqrt 94.54 % maintenance comes under the umbrella of natural events. If
Log2 94.87 % the upper-level classifies the data as an attack event, then it is
Gini 95.08 % passed onto the lower-level sub-problem, which classifies the
Criterion
Entropy 95.04 %
data on the basis of 27 classes of attacks. The overall accuracy
0.02
of the model is 95.44 %.
For training and testing purposes, the dataset of multi-class
0.0195
Mean Decrease in Impurity

having 37 different classes is utilized. To avoid overfitting of


the model, train and test sets were split with the ratio of 8:2 in
0.019
the starting, and the test set was kept aside for the final model
testing. The remaining train set was utilized for training both
0.0185
the layers of the model. However, original class markers
present in the data were remapped as per the layer
0.018
requirement. As for the first layer, classes 1-6,13,14, and 41
were marked as 1 (Natural Event), and all other classes 7-12,
0.0175
15-30, and 35-40 were marked as 0 (Attack Class). For lower-
level, class markers 1-6,13,14, and 41 were removed as this
layer is trained for classifying attack events only. By doing
Fig. 5. Features having significant importance in distinguishing attack this, the proposed model is trained robustly for the accurate
classes. Among 128 features, this figure represents the top 10 features in identification of individual classes.
terms of mean decrease in impurity. In order to justify this proposed model, it is essential to
compare it with its baseline, which is the single-level model
Different numbers of n_estimators were tested out to having all the same parameters and training environment.
obtain the optimal value. The test range comprises 50 to 1000 Moreover, a primary RFC was also tested on the same training
n_estimaors. It was observed that the best result was obtained environment but with no defined parameters that are having
at 330 n_estimators. Therefore, the parameters used in this default parameters. The result of test set for this comparison is
proposed model are Log2 as max_features, Gini as the shown in Fig.7
Criterion, and 330 n_estimators . Other than the baseline, if the proposed model is compared
with the other models proposed by researchers, it can be
deduced that our proposed model has better accuracy in terms
of classifying intrusion attacks on a smart grid system. The

Authorized licensed use limited to: Zhejiang University. Downloaded on January 15,2025 at 08:48:54 UTC from IEEE Xplore. Restrictions apply.
model proposed by Hink et al. [18] has an accuracy of less Allocation", Mathematical Problems in Engineering, vol. 2019, pp. 1-
than 90%, and the model proposed by Keshk et al. [19] has an 15, 2019. Available: 10.1155/2019/2817586.
accuracy of 90.2%. In addition to these models, Defu et al. [3] A. Mohan, N. Meskin and H. Mehrjerdi, "A Comprehensive Review of
the Cyber-Attacks and Cyber-Security on Load Frequency Control of
proposed a novel model in [13] using RFC as a base classifier Power Systems", Energies, vol. 13, no. 15, p. 3860, 2020. Available:
and achieved a weighted accuracy of 93.91%. If we compare 10.3390/en13153860.
our model in terms of intrusion detection in a smart grid [4] M. Esmalifalak, L. Liu, N. Nguyen, R. Zheng and Z. Han, "Detecting
system with the model proposed in [13], our model clearly Stealthy False Data Injection Using Machine Learning in Smart Grid,"
outperforms it with an accuracy of 95.44% on test dataset. in IEEE Systems Journal, vol. 11, no. 3, pp. 1644-1652, Sept. 2017,
It can be deduced from the research work that data doi: 10.1109/JSYST.2014.2341597.
preprocessing plays a crucial part in model performance. [5] P. Lau, W. Wei, L. Wang, Z. Liu and C. -W. Ten, "A Cybersecurity
Insurance Model for Power System Reliability Considering Optimal
From class balancing of the dataset to feature reduction and Defense Resource Allocation," in IEEE Transactions on Smart Grid,
standardizing of data, it helps improve the model training and vol. 11, no. 5, pp. 4403-4414, Sept. 2020, doi:
enhance the model's predictive efficiency. By addressing the 10.1109/TSG.2020.2992782.
class imbalance problem, we provided a better learning [6] Boyaci, O., Umunnakwe, A., Sahu, A., Narimani, M. R., Ismail, M.,
environment to our proposed model, providing improved Davis, K., & Serpedin, E. (2021). Graph Neural Networks Based
efficiency. Detection of Stealth False Data Injection Attacks in Smart Grids. arXiv
preprint arXiv:2104.02012.
95.44%

[7] Ramakrishna, R., & Scaglione, A. (2021). Grid-graph signal processing


95%
95%
95%

(grid-GSP): A graph signal processing framework for the power grid.


92.67%

98%
IEEE Transactions on Signal Processing, 69, 2725-2739.
93%

93%

96%
92%

[8] N. Elmrabit, F. Zhou, F. Li and H. Zhou, "Evaluation of Machine


94%
Learning Algorithms for Anomaly Detection," 2020 International
92% Conference on Cyber Security and Protection of Digital Services
85.15%

90% (Cyber Security), 2020, pp. 1-8, doi:


86%

10.1109/CyberSecurity49315.2020.9138871.
85%
85%

88%
86% [9] Z. Qu, Y. Dong, N. Qu, H. Li, M. Cui, X. Bo, Y. Wu, and S.
Mugemanyi, “False Data Injection Attack Detection in Power Systems
84% Based on Cyber-Physical Attack Genes,” Frontiers in Energy Research,
82% vol. 9, 2021.
80% [10] A. S. Musleh, G. Chen and Z. Y. Dong, "A Survey on the Detection
Proposed Model Single Level Model Primary RFC Model Algorithms for False Data Injection Attacks in Smart Grids," in IEEE
Accuracy Precision Recall F1 Score Transactions on Smart Grid, vol. 11, no. 3, pp. 2218-2234, May 2020,
doi: 10.1109/TSG.2019.2949998.
Fig. 7 Comparison betweem baseline models. All models are trained and
tested in the same environment and same dataset. The difference between [11] Sayghe, A., Hu, Y., Zografopoulos, I., Liu, X., Dutta, R. G., Jin, Y., &
Single level model and the primary RFC model is of parameters. The Konstantinou, C. (2020). Survey of machine learning methods for
detecting false data injection attacks in power systems. IET Smart Grid,
primary model has default parameters, whereas single-layer model
3(5), 581-595.
parameters are similar to the proposed model.
[12] U. Adhikari, S. Pan, T. Moris, R. Borges, and J. Beaver , “Industrial
Control System (ICS) Cyber Attack Datasets,” Tommy Morris. 2016.
IV. CONCLUSIONS [Online]. Available: https://www.sites.google.com/a/uah.edu/tommy-
This study proposes a two-layered hierarchical approach morris-uah/ics-data-sets..
with a baseline classifier to detect cyberattacks on a smart [13] D. Wang, X. Wang, Y. Zhang and L. Jin, "Detection of power grid
power system. We find that the two-layered traditional disturbances and cyber-attacks based on machine learning", Journal of
Information Security and Applications, vol. 46, pp. 42-52, 2019.
random forest algorithm performs better than deep learning Available: 10.1016/j.jisa.2019.02.008 .
algorithms. The limited attack data available makes it harder [14] Y. Zhao and Y. Cen, Data mining applications with R. . Academic
for deep learning approaches to learning the attack scenarios Press: Cambridge, MA, USA, 2013; ISBN 9780124115118.
efficiently. Another issue in the currently used attack datasets [15] F. Thabtah, S. Hammoud, F. Kamalov, and A. Gonsalves, “Data
is a class imbalance that results in model training heavily imbalance in classification: Experimental evaluation,” Information
biased towards normal state instead of attack state. Tackling Sciences, vol. 513, pp. 429–441, 2020.
this issue before the training of the model through class [16] N. Chawla, K. Bowyer, L. Hall and W. Kegelmeyer, "SMOTE:
balancing approaches can lead to improved performance of Synthetic Minority Over-sampling Technique", Journal of Artificial
Intelligence Research, vol. 16, pp. 321-357, 2002. Available:
the current models. The performance results also reveal that 10.1613/jair.953.
feature reduction of the dataset can be quite useful, but it
[17] J. Brandt and E. Lanzén, "A Comparative Review of SMOTE and
should be done considering the domain knowledge. The ADASYN in Imbalanced Data Classification", DIVA, 2021. [Online].
accuracy achieved by the proposed model is compared with Available:http://www.divaportal.org/smash/record.jsf?pid=diva2%3A
the baseline models and found to outperform those for the 1519153&dswid=-2594\.
detection of intrusion attacks in smart grid systems. Our study [18] R. C. Borges Hink, J. M. Beaver, M. A. Buckner, T. Morris, U.
provides techniques to improve the accuracy of attack Adhikari and S. Pan, "Machine learning for power system disturbance
detection models while retaining the traditional ML and cyber-attack discrimination," 2014 7th International Symposium
on Resilient Control Systems (ISRCS), 2014, pp. 1-8, doi:
algorithms with low computational costs. 10.1109/ISRCS.2014.6900095.
[19] M. Keshk, N. Moustafa, E. Sitnikova and G. Creech, "Privacy
REFERENCES preservation intrusion detection technique for SCADA systems," 2017
[1] L. Gao, B. Chen and L. Yu, "Fusion-Based FDI Attack Detection in Military Communications and Information Systems Conference
Cyber-Physical Systems," in IEEE Transactions on Circuits and (MilCIS), 2017, pp. 1-6, doi: 10.1109/MilCIS.2017.8190422.
Systems II: Express Briefs, vol. 67, no. 8, pp. 1487-1491, Aug. 2020, [20] M. Panthi, "Anomaly Detection in Smart Grids using Machine
doi: 10.1109/TCSII.2019.2939276. Learning Techniques," 2020 First International Conference on Power,
[2] Z. Qu et al., "Survivability Evaluation Method for Cascading Failure Control and Computing Technologies (ICPC2T), 2020, pp. 220-222,
of Electric Cyber Physical System Considering Load Optimal doi: 10.1109/ICPC2T48082.2020.9071434.

Authorized licensed use limited to: Zhejiang University. Downloaded on January 15,2025 at 08:48:54 UTC from IEEE Xplore. Restrictions apply.

You might also like