Clustering man in the middle attack on chain and graph-based blockchain in internet of things network using k-means
Clustering man in the middle attack on chain and graph-based blockchain in internet of things network using k-means
Corresponding Author:
Deris Stiawan
Department of Computer Engineering, University of Sriwijaya
Indralaya, Ogan Ilir-30662, Palembang, Indonesia
Email: [email protected]
1. INTRODUCTION
The development of internet of things (IoT) as a smart device in several technologies [1]–[4] that
changes the world with the development of internet networks [5]–[8] are seen in collecting data, and controlling
tools to do certain things through the internet network. Self organization and communication using the cloud as
a data storage medium are vulnerable to attacks because many devices are connected to the internet [9], [10].
Network security in IoT devices is used to protect data during the data transmission process to keep them safe
because devices connected to IoT devices can open gaps for hackers and other problems [11]. Mallik et al. [12]
and Nayak and Samaddar [13] explain about the type of man in the middle (MITM) attack that aims to retrieve
information in a network protocol or secure sockets layer and transport layer security (SSL/TLS) MITM attack
and the domain name system (DNS) spoofing attack that provides different data (data falsification) [14].
Choi, et al. [15] explain the blockchain-based MITM security system that detects MITM attacks by filtering,
detecting, and comparing networks implemented on a network security system on the blockchain in the IoT.
Singh et al. [16], Li and Kassem [17] describe the distributed ledger technology (DLT) which is part
of the blockchain that provides a decentralized data management system in storing and sharing data on every
network transaction. Ferraro et al. [18] explain the directed acyclic graphs (DAGs) in the blockchain
architecture for the DLT can make transactions easier and more linear because the network is peer to peer
[19]–[21]. It provides a detailed analysis of security attack patterns applied to IoT devices. The security
system that exists on a decentralized blockchain that stores and shares data is a decentralized data
management system [22] and peer to peer characteristics can hinder the improvement of blockchain
technology in several aspects of life.
Blockchain technology is categorized into several parts based on the type of architecture that suits
its use case. In the context of blockchain chain based and graph based, there are two types of data structures
used by blockchain to store transaction data and build evidence of consensus [23]. The chain-based
blockchain has a data structure in each block forming a chain and it will continue to grow. In contrast, graph-
based uses a random graph-shaped data structure and each transaction can be directly connected to several
other transactions in the network whose use depends on the purpose of the blockchain being used [24].
The use of the k-means algorithm in the IoT network for grouping data according to their
characteristics has been implemented such as in [25], [26] and show the accuracy in the clustering process of
99.94% with confusion matrix accuracy in the true negative section of 98.62%, true positive of 100%, false
negative of 0.00% and false positive of 1.38%. Related research on DLT in IoT networks that had been
carried out previously discussed the benefits of the data transmission transaction process [27], [28]. These
studies explain the stochastic mechanism in the transaction process that existed in the blockchain architecture
for DLT to make transactions faster and more stable using the Markov chain Monte Carlo (MCMC)
algorithm, which was proven by a numerical balance of 25% on each transaction sent through the protocol.
In general, in security system processes of the IoT networks, it is very important to have an
immutable transaction record to analyze a parasitic chain attack, which aims to see the resilience and security
by using the MCMC algorithm in reducing parasitic chain attacks [28]. As for some research, it was found
that the improvement process that focuses on the number of transactions called Tangle [29], [30] have proven
that by using the tip selection algorithm (TSA) method, the level of confidence and sustainability were
getting better along with the increase in the number of transactions. On the other hand, research in 2021
[31]–[33] explains that attacks on IoT networks have increased by up to 20% for the security level of the
identification process in IoT networks integrated with blockchain technology. The use of the elliptic curve
cryptography (ECC)-based algorithm was needed because of the privacy of the security protocol [10]. The
development of IoT aims to connect data through the internet network in the issue of identity security (data
privacy) [34] from various attacks such as MITM attacks that steal passwords, and personal identification
numbers [35]. It generally estimates the theoretical complexity of attacks that allow for multiple
combinations of increased MITM attacks [15], [36].
Therefore, it is necessary to analyze the improvement in the detection of attacks in producing a lower
rate of misclassification of attacks so that the process of sending data in transmission is safe and integrated using
the k-means method. This research discusses the comparison of the performance of blockchain chain-based and
graph-based transactions on data of MITM attack on IoT networks where the traffic features are extracted using
principal component analysis (PCA) and clustered using the k-means method. The results then were displayed
in the form of visualizations. The discussion in this research was as follows: section 2 discusses the proposed
method in determining the data to be clustered. Section 3 provides the results of clustering data of the MITM
attacks and section 4 provides conclusions and hopes for future research.
2. METHOD
In general, the steps in the research methodology used to assist in the preparation of this research
required a clear framework in its stages. The research framework is shown in Figure 1, which consists of a
literature review by reviewing research in recent years, followed by data preparation using a dataset of
550,000 data samples. Next is data preprocessing by performing feature extraction followed by testing,
analyzing the results and drawing conclusions.
Clustering man in the middle attack on chain and graph-based blockchain in … (Sari Nuzulastri)
178 ISSN: 2722-3221
Determining the number of clusters at each center point (centroid) by presenting the cluster, the
centroid value can be found using the formula in (1).
𝑥𝑖
𝑐 = ∑𝑛𝑖=1 (1)
𝑛
Where, c is centroid value, 𝑥𝑖 is point value/the i-th object, n is number of objects. The formula
in (1) can be rewritten as (2).
1 𝑘 𝑁
µ𝑘 = 𝑁 ∑𝑞=1 𝑥𝑞 (2)
𝑘
where, 𝜇𝑘 is centroid of the k-th cluster, 𝑥𝑞 is the q-th object from the k-th cluster, and 𝑁𝑘 is number
of data (samples) from the k-th cluster.
The correctly predicted precision can be calculated by dividing the number of positive prediction
results by the number of positive predictions using (4).
𝑇𝑃
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = (4)
𝑇𝑃+𝐹𝑃
Sensitivity measures how good the model is at identifying positive classes by dividing the number
of positive predictions by the total number of positive cases as in (5).
𝑇𝑃
𝑆𝑒𝑛𝑠𝑖𝑡𝑖𝑣𝑖𝑡𝑦 = 𝑇𝑃+𝐹𝑁 (5)
F1 score provides a balanced average value between sensitivity and precision and expressed as (6).
2∗(𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛∗𝑠𝑒𝑛𝑠𝑖𝑡𝑖𝑣𝑖𝑡𝑦)
𝐹1 𝑠𝑐𝑜𝑟𝑒 = (6)
𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛+𝑠𝑒𝑛𝑠𝑖𝑡𝑖𝑣𝑖𝑡𝑦
Clustering man in the middle attack on chain and graph-based blockchain in … (Sari Nuzulastri)
180 ISSN: 2722-3221
The clustering with k-means method produced a total of six clusters, shown in Figure 7 where nodes
belonging to cluster 0, cluster 1, cluster 2, cluster 3, cluster 4 and cluster 5 are marked with color blue,
yellow, green, brown and red respectively. As can be seen in the figure, most clusters have notably clear
boundaries and contain nodes that are all close together with only a few outlier nodes. Cluster 4 (purple)
though, has more spread-out nodes. Compared to other clusters, cluster 4 also has the least amount of nodes.
This is consistent with the previously discussed silhouette plot where it was the only cluster with low height
or score variations.
To help understanding the clustering result, Figure 8 shows the parallel coordinates plot of each
cluster against all of the features. Using this plot, the relation of each feature of data nodes and the cluster
they belong to can be analyzed. Note that the values in y axis are normalized, hence only their relative values
are meaningful for the analysis. Also note that all six subfigures are scaled differently, their maximum values
in the y axis are different and must be considered when comparing one cluster to the others.
In the Figure 8, cluster 0 (blue) and cluster 2 (green) appear to be very similar, only differing at
sender, where cluster 0 has values around 0 and cluster has values around 1. Furthermore, values of feature
gas_price on both clusters gather in two groups, one group near zero and another group near six. These two
clusters are the only one exhibiting this trait. Compared to other clusters, the other distinct traits are the very
low values of timestamp, height and gas_consumed.
Clustering man in the middle attack on chain and graph-based blockchain in … (Sari Nuzulastri)
182 ISSN: 2722-3221
Cluster 1 (green) in particular is more compressed than other clusters, as signified by it is figure’s y
axis max value of 0.5, while other clusters max values are as high as 12. Generally speaking, all features of
this cluster are near zero. Feature nonce, gas_limit, gas_price and gas_consumed are the most compressed,
with all nodes having values nearing 0. feature sender has the most spread-out values, ranging from -1.5 to
0.5 while timestamp and height are somewhere in between.
Cluster 3 (brown) and cluster 5 (red) are actually quite similar even though their general shapes
appear different. Just like most clusters the nodes in these clusters have nonce, gas_limit, gas_price and
gas_consumed values near zero. The rest of the features though are notably different. The sender values of
cluster 3 are more compressed than those of cluster 5. On the contrary, the values of timestamp and height of
cluster 5 are more compressed and on the higher side in comparison to cluster 3.
Cluster 4 is the most distinctive among the six clusters. It is sender, timestamp and height values are
quite similar to other clusters, in that all the values are near zero. But it has multiple groups of values for
gas_price, gas_consumed and gas_limit. Also, its nonce values are the most spread out, ranging from zero to
around 11. Another notable distinction is the multiple appearance of solitary values of the gas_limit feature
which may indicate outlier nodes within the cluster. Determination of the cluster class based on the similarity
of features in the clustering process on the blockchain includes several aspects such as transaction time which
is a significant feature because it can identify at a certain time, transaction size where data grouping is based
on the size of the data to be transferred, transaction security in identifying groups based on security
characteristics (transaction security attributes such as digital signatures).
Based on the results obtained using the k-means method which shows the advantages in identifying
patterns and finding data for those tested. This is in accordance with the advantages of k-means, namely
simplicity and efficiency. In addition, k-means is easily applied to large data and has better data computation
time efficiency than other methods, while the disadvantage is that it must determine the initial number of
clusters (k value). In this study, the determination of the initial cluster value (k) uses the silhouette score
technique in clustering.
4. CONCLUSION
The PCA method used in feature extraction from incoming transaction data on the IoT network,
reduces the number of features from 16 to 3 features to build a classification model in the clustering process.
The clustering process with the k-means method implemented on the IoT network was carried out by
performing an extraction process on the MITM attack data types. The results of the clustering analysis using
the k-means method with 6 clusters in the transaction process with a silhouette score were 0.417. The
detected Normal data was 97.16%, while the MITM attacks data was 2.84%. In the future, it is hoped that
newly available datasets on the blockchain can be applied to get different features and characteristics using
the implementation of the GMM clustering method and spherical k-means clustering to see better results and
visualization. Other clustering methods can also be explored, especially methods that are derived from
k-means but with more suitable characteristics to be used with the blockchain dataset.
REFERENCES
[1]. S. Kumar, P. Tiwari, and M. Zymbler, “Internet of things is a revolutionary approach for future technology enhancement: a
review,” Journal of Big Data, vol. 6, no. 1, Dec. 2019, doi: 10.1186/s40537-019-0268-2.
[2]. “Number of internet of things (IoT) connected devices worldwide from 2019 to 2030, by vertical,” Statista. [Online]. Available:
https://www.statista.com/statistics/1194682/iot-connected-devices-vertically/
[3]. “Internet of things (IoT) annual revenue from 2020 to 2030, by region,” Statista. [Online]. Available:
https://www.statista.com/statistics/1194715/iot-annual-revenue-regionally/
[4]. S. Sinha, “State of IoT 2023: Number of connected IoT devices growing 16% to 16.7 billion globally,” IoT Analitics. [Online].
Available: https://iot-analytics.com/number-connected-iot-devices/
[5]. I. Mistry, S. Tanwar, S. Tyagi, and N. Kumar, “Blockchain for 5G-enabled IoT for industrial automation: A systematic review,
solutions, and challenges,” Mechanical Systems and Signal Processing, vol. 135, Jan. 2020, doi: 10.1016/j.ymssp.2019.106382.
[6]. A. Vaghani, K. Sood, and S. Yu, “Security and QoS issues in blockchain enabled next-generation smart logistic networks: A
tutorial,” Blockchain: Research and Applications, vol. 3, no. 3, Sep. 2022, doi: 10.1016/j.bcra.2022.100082.
[7]. T. Rathod et al., “Blockchain for future wireless networks: a decade survey,” Sensors, vol. 22, no. 11, May 2022, doi:
10.3390/s22114182.
Clustering man in the middle attack on chain and graph-based blockchain in … (Sari Nuzulastri)
184 ISSN: 2722-3221
[8]. E. Borgia, “The internet of things vision: key features, applications and open issues,” Computer Communications, vol. 54, pp. 1–
31, Dec. 2014, doi: 10.1016/j.comcom.2014.09.008.
[9]. J. Zhou, Z. Cao, X. Dong, and A. V. Vasilakos, “Security and privacy for cloud-based iot: challenges, countermeasures, and
future directions,” IEEE Communications Magazine, vol. 55, no. 1, pp. 26–33, 2017.
[10]. S. Sicari, A. Rizzardi, L. A. Grieco, and A. Coen-Porisini, “Security, privacy and trust in Internet of Things: The road ahead,”
Computer Networks, vol. 76, pp. 146–164, Jan. 2015, doi: 10.1016/j.comnet.2014.11.008.
[11]. O. Eigner, P. Kreimel, and P. Tavolato, “Detection of man-in-the-middle attacks on industrial control networks,” in 2016
International Conference on Software Security and Assurance (ICSSA), IEEE, Aug. 2016, pp. 64–69. doi:
10.1109/ICSSA.2016.19.
[12]. A. Mallik, A. Ahsan, M. M. Z. Shahadat, and J.-C. Tsou, “Man-in-the-middle-attack: Understanding in simple words,”
International Journal of Data and Network Science, pp. 77–92, 2019, doi: 10.5267/j.ijdns.2019.1.001.
[13]. G. Nath Nayak and S. Ghosh Samaddar, “Different flavours of man-in-the-middle attack, consequences and feasible solutions,” in
2010 3rd International Conference on Computer Science and Information Technology, IEEE, Jul. 2010, pp. 491–495. doi:
10.1109/ICCSIT.2010.5563900.
[14]. B. Bhushan, G. Sahoo, and A. K. Rai, “Man-in-the-middle attack in wireless and computer networking-A review,” in 2017 3rd
International Conference on Advances in Computing, Communication and Automation (ICACCA) (Fall), IEEE, Sep. 2017, pp. 1–
6. doi: 10.1109/ICACCAF.2017.8344724.
[15]. J. Choi, B. Ahn, G. Bere, S. Ahmad, H. A. Mantooth, and T. Kim, “Blockchain-based man-in-the-middle (MITM) attack
detection for photovoltaic systems,” in 2021 IEEE Design Methodologies Conference (DMC), IEEE, Jul. 2021, pp. 1–6. doi:
10.1109/DMC51747.2021.9529949.
[16]. S. Singh, A. S. M. S. Hosen, and B. Yoon, “blockchain security attacks, challenges, and solutions for the future distributed IoT
network,” IEEE Access, vol. 9, pp. 13938–13959, 2021, doi: 10.1109/ACCESS.2021.3051602.
[17]. J. Li and M. Kassem, “Applications of distributed ledger technology (DLT) and Blockchain-enabled smart contracts in
construction,” Automation in Construction, vol. 132, Dec. 2021, doi: 10.1016/j.autcon.2021.103955.
[18]. P. Ferraro, C. King, and R. Shorten, “On the stability of unverified transactions in a DAG-based distributed ledger,” IEEE
Transactions on Automatic Control, vol. 65, no. 9, pp. 3772–3783, Sep. 2020, doi: 10.1109/TAC.2019.2950873.
[19]. J. Sedlmeir, H. U. Buhl, G. Fridgen, and R. Keller, “The energy consumption of blockchain technology: beyond Myth,” Business
and Information Systems Engineering, vol. 62, no. 6, pp. 599–608, Dec. 2020, doi: 10.1007/s12599-020-00656-x.
[20]. S. Kably, M. Arioua, and N. Alaoui, “Lightweight direct acyclic graph blockchain for enhancing resource-constrained IoT
environment,” Computers, Materials and Continua, vol. 71, no. 3, pp. 5271–5291, 2022, doi: 10.32604/cmc.2022.020833.
[21]. R. Paulavičius, S. Grigaitis, A. Igumenov, and E. Filatovas, “A decade of blockchain: review of the current status, challenges, and
future directions,” Informatica, vol. 30, no. 4, pp. 729–748, Jan. 2019, doi: 10.15388/Informatica.2019.227.
[22]. A. Abdelmaboud et al., “Blockchain for IoT applications: taxonomy, platforms, recent advances, challenges and future research
directions,” Electronics, vol. 11, no. 4, Feb. 2022, doi: 10.3390/electronics11040630.
[23]. Q. Zhu, J. Pei, X. Liu, and Z. Zhou, “Analyzing commercial aircraft fuel consumption during descent: A case study using an
improved K-means clustering algorithm,” Journal of Cleaner Production, vol. 223, pp. 869–882, Jun. 2019, doi:
10.1016/j.jclepro.2019.02.235.
[24]. H. Y. Wu, X. Yang, C. Yue, H.-Y. Paik, and S. S. Kanhere, “Chain or DAG? Underlying data structures, architectures, topologies
and consensus in distributed ledger technology: A review, taxonomy and research issues,” Journal of Systems Architecture, vol.
131, Oct. 2022, doi: 10.1016/j.sysarc.2022.102720.
[25]. D. Stiawan et al., “Ping flood attack pattern recognition using a k-means algorithm in an internet of things (IoT) network,” IEEE
Access, vol. 9, pp. 116475–116484, 2021, doi: 10.1109/ACCESS.2021.3105517.
[26]. M. J. Brusco, E. Shireman, and D. Steinley, “A comparison of latent class, K-means, and K-median methods for clustering
dichotomous data.,” Psychological Methods, vol. 22, no. 3, pp. 563–580, Sep. 2017, doi: 10.1037/met0000095.
[27]. S. Popov, O. Saa, and P. Finardi, “Equilibria in the tangle,” Computers and Industrial Engineering, vol. 136, pp. 160–172, Oct.
2019, doi: 10.1016/j.cie.2019.07.025.
[28]. A. Cullen, P. Ferraro, C. King, and R. Shorten, “On the resilience of DAG-based distributed ledgers in IoT applications,” IEEE
Internet of Things Journal, vol. 7, no. 8, pp. 7112–7122, Aug. 2020, doi: 10.1109/JIOT.2020.2983401.
[29]. F. Guo, X. Xiao, A. Hecker, and S. Dustdar, “Characterizing IOTA tangle with empirical data,” in GLOBECOM 2020 - 2020
IEEE Global Communications Conference, IEEE, Dec. 2020, pp. 1–6. doi: 10.1109/GLOBECOM42002.2020.9322220.
[30]. P. Kumar, R. Kumar, G. P. Gupta, R. Tripathi, A. Jolfaei, and A. K. M. Najmul Islam, “A blockchain-orchestrated deep learning
approach for secure data transmission in IoT-enabled healthcare system,” Journal of Parallel and Distributed Computing, vol.
172, pp. 69–83, Feb. 2023, doi: 10.1016/j.jpdc.2022.10.002.
[31]. B. K. Mohanta, D. Jena, S. Ramasubbareddy, M. Daneshmand, and A. H. Gandomi, “Addressing security and privacy issues of
iot using blockchain technology,” IEEE Internet of Things Journal, vol. 8, no. 2, pp. 881–888, Jan. 2021, doi:
10.1109/JIOT.2020.3008906.
[32]. H. H. A. Emira, “Authenticating IoT devices issues based on blockchain,” Journal of Cybersecurity and Information
Management, pp. 35–40, 2020, doi: 10.54216/JCIM.010202.
[33]. Q. Fan, J. Chen, L. J. Deborah, and M. Luo, “A secure and efficient authentication and data sharing scheme for Internet of Things
based on blockchain,” Journal of Systems Architecture, vol. 117, Aug. 2021, doi: 10.1016/j.sysarc.2021.102112.
[34]. C. Fan, S. Ghaemi, H. Khazaei, and P. Musilek, “Performance evaluation of blockchain systems: a systematic survey,” IEEE
Access, vol. 8, pp. 126927–126950, 2020, doi: 10.1109/ACCESS.2020.3006078.
[35]. N. Sivasankari and S. Kamalakkannan, “Detection and prevention of man-in-the-middle attack in iot network using regression
modeling,” Advances in Engineering Software, vol. 169, Jul. 2022, doi: 10.1016/j.advengsoft.2022.103126.
[36]. A. Canteaut, M. Naya-Plasencia, and B. Vayssière, “Sieve-in-the-middle: improved MITM attacks,” in Annual Cryptology
Conference, 2013, pp. 222–240. doi: 10.1007/978-3-642-40041-4_13.
[37]. L. Smith, “A tutorial on PCSA,” Department of Computer Science, University of Otago., pp. 12–28, 2006.
[38]. H. Choi, H. Lee, and H. Kim, “Fast detection and visualization of network attacks on parallel coordinates,” Computers and
Security, vol. 28, no. 5, pp. 276–288, Jul. 2009, doi: 10.1016/j.cose.2008.12.003.
[39]. M. Ahmed, R. Seraj, and S. M. S. Islam, “The k-means algorithm: a comprehensive survey and performance evaluation,”
Electronics, vol. 9, no. 8, Aug. 2020, doi: 10.3390/electronics9081295.
[40]. H. Qabbaah, G. Sammour, and K. Vanhoof, “Using k-means clustering and data visualization for monetizing logistics data,” in
2019 2nd International Conference on New Trends in Computing Sciences, ICTCS 2019-Proceedings, 2019. doi:
10.1109/ICTCS.2019.8923108.
BIOGRAPHIES OF AUTHORS
Hadipurnawan Satria received his Ph.D. degree in Computer Science from Sun
Moon University, South Korea. He is currently a Lecturer at the Department of Informatics
Engineering, Faculty of Computer Science, Universitas Sriwijaya. His research interests
include platform-based development, embedded systems, and software engineering. He can be
contacted at email: [email protected].
Clustering man in the middle attack on chain and graph-based blockchain in … (Sari Nuzulastri)