SYNOPSIS
SYNOPSIS
1 Introduction
The Internet of Things (IoT) comprises a vast network of interconnected objects, systems,
and devices designed to collect, exchange, and respond to data. IoT facilitates communi-
cation between devices and humans via sensors, software, and internet connectivity. Its
applications range from smart homes and cities to industrial automation and healthcare
monitoring, offering a wide array of services [1]. By bridging the physical world with
the digital realm, IoT optimizes operations, streamlines decision-making, and fosters un-
precedented levels of automation and connectivity in our daily lives [2]. The emergence
of the IoT has catalyzed a significant wave of digital transformation across numerous
sectors. At its core, IoT is distinguished by the widespread deployment of intelligent and
diverse devices, such as sensors, actuators, and RFIDs, interconnected via the Internet,
enabling seamless communication without human intervention. Presently, the count of
IoT devices exceeds 12 billion, with projections indicating a staggering surge to 125
billion by 2030 [3].
In today’s interconnected world, the proliferation of IoT devices has produced an unprece-
dented amount of data, where anomalies have become a common feature in nearly every
system. Anomaly is something that is different from what is normal or usual [4, 5]. These
anomalies are important indicators for various applications, such as detecting wasteful
use of resources in manufacturing environments, preventing unanticipated problems in
avionics platforms by preventing critical situations or identifying anomalous behavior
in medical instruments. Consequently, the ability to detect anomalies holds immense
potential for enhancing the overall performance of observed systems. However, a key
challenge lies in accurately defining the boundaries between normal and abnormal be-
haviors, thus highlighting the importance of anomaly recognition in various domains [6].
Anomaly detection is the process of identifying deviations in data patterns from the
expected or normal behavior, which may signify malicious activity, equipment failure,
or other forms of abnormal behavior with significant implications. This task poses a
formidable challenge due to the diverse forms anomalies can take and the difficulty in
1/17
ANOMALY DETECTION IN IoT USING MACHINE LEARNING TECHNIQUE
distinguishing them from normal behavior. Anomalies can manifest at various levels of
abstraction, spanning from individual data points to entire systems, and can stem from
a wide range of factors including human error, software bugs, hardware failures, and
malicious actions. Consequently, addressing these challenges has led to the development
of a broad spectrum of anomaly detection techniques.
(3) Collective Anomaly: When a group of related data instances appears to be unusual
in comparison to the entire dataset, it is known as a collective anomaly. Although
2/17
ANOMALY DETECTION IN IoT USING MACHINE LEARNING TECHNIQUE
the individual data instances within this group may not be anomalies on their own,
their occurrence together as a group makes them anomalous [10]. If multiple traffic
sensors detect heavy congestion across various intersections during off-peak hours,
it could indicate a collective anomaly such as a city-wide event or infrastructure
malfunction.
• Noise Reduction: IoT data can be susceptible to noise from various sources such as
electromagnetic interference or environmental factors. Preprocessing techniques
like filtering or smoothing can help in reducing noise and extracting meaningful
signals from the data.
• Handling Missing Values: IoT data streams may have missing or incomplete data
due to sensor failures or communication issues. Data preprocessing techniques
such as imputation or interpolation can handle missing values and ensure continuity
in the data stream.
• Data Quality Assurance: Data in IoT systems frequently contains noise, gaps,
or inconsistencies, often arising from sensor issues, communication glitches, or
environmental influences. Data cleaning plays a crucial role in pinpointing and
correcting these errors, thereby enhancing the overall quality and reliability of the
data.
3/17
ANOMALY DETECTION IN IoT USING MACHINE LEARNING TECHNIQUE
al. [11] proposed a brief study on outlier detection in IoT and WSN using ML techniques.
During their discussion, they identified three distinct types of outliers: errors, events,
and malicious attacks. Statistical-based methods and Classification-based methods were
applied to detect outliers.
Pathak et al. [12] addressed IoT security threats and risks, specifically sensor tamper-
ing. They used machine learning to detect sensor tampering. They proposed a system
called AD-ML for detecting sensor tampering. The system used both unsupervised and
supervised machine learning algorithms. The dataset used for the experiments includes
sensor data packets captured for 18 days. The data-gathering process involves capturing
network packets from sensors connected via Z-wave to the gateway. Packet capture is
executed using data gathering scripts, and network traffic traces are exported in pcap
format using the tcpdump command. Data pre-processing involves filtering out specific
network traffic to reduce noise in the dataset. Packet length occurrence is evaluated for
normal and abnormal days, and TLSv1.2 protocol is used for secure communication.
Traffic patterns analysis reveals packet lengths on normal and abnormal days. For ex-
ample, certain packet lengths were observed on both normal and abnormal days, while
others were missing on abnormal days. This analysis helps in identifying anomalies
in the network traffic. The Isolation forest model achieved an accuracy of 84% with
the Silhouette metric and the decision tree supervised model achieved a classification
accuracy of 91.62%.
Yang et al. [2] reviewed current trends in data anomaly detection in IoT. They discussed
different approaches, such as statistical, machine learning, deep learning, and research
challenges of data anomaly detection in IoT. Statistical methodologies are utilized in
IoT anomaly detection to identify variances from anticipated patterns. These techniques
involve analyzing data from IoT devices and applying statistical models to detect anoma-
lies, which may signify security breaches, system failures, or other abnormal occurrences.
The literature on anomaly detection in IoT emphasizes the identification of abnormal
behavior in data generated by IoT devices.
4/17
ANOMALY DETECTION IN IoT USING MACHINE LEARNING TECHNIQUE
Tsou et al. [13] proposed a framework that leverages the strengths of random forests, a
popular ensemble learning technique, for anomaly detection in WSN. By incorporating
optimal weighted one-class random forests, the framework achieves superior detec-
tion accuracy while minimizing resource utilization. Through extensive experiments,
the framework demonstrates its feasibility and effectiveness in detecting anomalies in
WSNs, outperforming existing unsupervised methods in terms of both detection accuracy
and resource utilization. Wang et al. [14] presented a standalone power source-based
anomaly detection framework that detects anomalies such as electrical theft or leakage
by referencing the normal power usage patterns.
5/17
ANOMALY DETECTION IN IoT USING MACHINE LEARNING TECHNIQUE
6/17
ANOMALY DETECTION IN IoT USING MACHINE LEARNING TECHNIQUE
to reconstruct a clean signal from the noisy raw signal by combining techniques of
time-frequency synthesis and phase space reconstruction (PSR) synthesis. Qu et al. [21]
proposed a modified Kalman filtering technique for gathering raw data and removing
noise based on the output noise characteristics of the sensors.
Parameter con-
Ref Work Done Algorithm used Achievement
sidered
Humidity, Tem-
Methodology for iden- DaRoN (Detection and
perature, Soil Detects noisy
[16] tifying and eliminating Removal of Noise),
moisture, wind data
noise in IoT data. SVM
speed, rain level
7/17
ANOMALY DETECTION IN IoT USING MACHINE LEARNING TECHNIQUE
A unified framework
integrating data pre-
processing, neural Predicts wind
[22] Wind data CEEMDAN technique
networks, and a multi- speed
tracker optimizer for
wind speed prediction.
Handle variety
Enhancing data qual- Cluster method, dom- problems
IoT sensor data
ity: IOT data aggrega- inant subspace estima- (Noise, Miss-
[23] Temperature and
tion through Device-to- tion, and tracking meth- ing values,
humidity
Device Communication ods Outliers, and
redundancy)
(2) Identifying noisy data amidst other anomalies can be difficult due to their similar
characteristics, diverse manifestations, and overlapping with normal data. Setting
appropriate thresholds for anomaly detection while considering noise adds com-
plexity. Addressing these challenges requires sophisticated approaches to anomaly
detection that consider the unique characteristics and complexities of noisy data
within diverse datasets.
8/17
ANOMALY DETECTION IN IoT USING MACHINE LEARNING TECHNIQUE
low battery power, communication errors, and device malfunctions [24]. Missing data can
be considered an anomaly as it deviates from expected data patterns, disrupting dataset
continuity and complicating accurate analysis, particularly in IoT-generated datasets.
For example, In an IoT-based environmental monitoring system, missing temperature
readings from a malfunctioning sensor disrupt the expected data pattern, hindering ac-
curate analysis and anomaly detection. The issue of missing sensor data is prevalent in
IoT systems, arising from factors like unstable network communication, synchronization
challenges, unreliable sensors, and equipment failures. Failing to address missing data
can lead to information loss and potentially incorrect analytical outcomes [25]. Therefore,
predicting and evaluating missing values is crucial. Consequently, there is an ongoing
demand for innovative prediction models to effectively anticipate missing data.
Parameter
Ref Work Done Algorithm used Achievement
considered
Smart hospi-
Detecting anomalies Forecasting
tal building
and imputing missing and regulat-
(outdoor air, K Means clustering, Au-
[18] data in building energy ing building
temperature toencoders
datasets for automated energy seam-
monthly and
preprocessing lessly.
hourly data)
Autoencoders, Neural
Developed an
Imputing Missing Data Measurements network with random
efficient data
in IoT Sensor Networks: from parti- weight, Multiple Impu-
[26] imputation
Significance for On-Site cles and gas tation by chain equa-
for sensor
Sensor Calibration. sensors. tion, Random forest-
calibration
based imputation, KNN
9/17
ANOMALY DETECTION IN IoT USING MACHINE LEARNING TECHNIQUE
Highlight the
Mean, Random, KNN,
A Comparison of Meth- importance of
Building sen- aregImprute (Hmisc),
ods for Missing Data missing value
[28] sor data (Light- MCMC, EM, Re-
Treatment in Building treatment in
ing variable) gression, Stochastic
Sensor Data energy and
regression
building fields
(1) Most of the work does not thoroughly address the practical challenges or constraints
associated with deploying models in real-world applications.
(2) Some of the research work excludes temporal and spatiotemporal estimation
methods within the scope of the research. This exclusion could potentially restrict
the study’s comprehensiveness, as these methods might offer valuable insights and
comparisons in the field of missing data estimation.
(3) The failure to address the missingness mechanism can lead to incomplete evaluation
of imputation techniques, especially in regards to their ability to handle the unique
10/17
ANOMALY DETECTION IN IoT USING MACHINE LEARNING TECHNIQUE
(1) Despite the widespread use of statistical methods in existing noise removal tech-
niques, there exists a significant gap in the research concerning the development
of algorithms capable of distinguishing noise from other anomalies in IoT data.
This gap is primarily attributed to several challenges: Similar characteristics, data
complexity, threshold selection, data imbalance, overlap with normal data, etc.
(2) In the domain of developing models for handling missing data, there is a lack of
understanding regarding the practical challenges of deploying these models in real-
world applications. Additionally, the exclusion of temporal and spatiotemporal
methods limits understanding, hindering valuable insights. Addressing this prob-
lem is crucial for improving the effectiveness of missing data handling techniques
in real-world scenarios.
(1) To explore noise removal approaches to improve IoT sensor data quality.
(2) To investigate the methods for predicting missing values to enhance data quality
for subsequent application of anomaly detection approaches.
(3) To develop an integrated approach for the detection of noise anomalies in IoT data
and denoise them effectively in real-time.
11/17
ANOMALY DETECTION IN IoT USING MACHINE LEARNING TECHNIQUE
4 Methodology
Based on the research objectives, we plan to use the following methods for our research:
(1) Study various issues of noise reduction and missing value prediction in IoT based
on the available literature.
(2) Acquire data from online sources, or real-life sources, or synthetic data such as
UCI Machine Learning Repository, IoT Central, Kaggle, etc.
(3) Study various traditional and Machine Learning techniques to design a suitable
smart data approach for noise reduction and anomaly detection in IoT.
5 Plan of Work
Following stages are identified in this proposed work:
(1) Study various algorithms for mitigating diverse anomaly detection in IoT data.
(2) Dataset collection from online sources or data generation using simulation if the
need arises.
(3) Development of a novel smart data approach for anomaly detection in IoT sensor
data using ML/DL techniques. Comparison and modification of the proposed
models based on the result.
12/17
ANOMALY DETECTION IN IoT USING MACHINE LEARNING TECHNIQUE
References
[1] F. Restuccia, S. D’Oro, and T. Melodia, “Securing the Internet of Things in the
Age of Machine Learning and Software-Defined Networking,” IEEE Internet of
Things Journal, vol. 5, no. 6, pp. 4829–4842, 2018 (cited on page 1).
[2] M. Yang and J. Zhang, “Data Anomaly Detection in the Internet of Things: A
Review of Current Trends and Research Challenges,” IJACSA, International
Journal of Advanced Computer Science and Applications, vol. 14, 2023 (cited on
pages 1 and 4).
[3] L. Sam et al., “IoT Platforms: Enabling the Internet of Things,” IHS Technology:
Landisville, PA, USA, 2016 (cited on page 1).
[5] A Bhayani. “Isolation Forest algorithm for anomaly detection.” (Jan. 2020), [On-
line]. Available: https : / / www . codementor . io / @arpitbhayani /
isolation - forest - algorithmfor - %20anomaly - detection -
133euqilki (cited on page 1).
[6] M. Fahim and A. Sillitti, “Anomaly Detection, Analysis and Prediction Tech-
niques in IoT Environment: A Systematic Literature Review,” IEEE Access, vol. 7,
pp. 81 664–81 681, 2019 (cited on page 1).
[7] C. Aggarwal, “Outlier analysis, in Data Mining,” Springer, pp. 237–263, (cited on
page 2).
[8] Chatterjee and B. S. Ahmed, “IoT anomaly detection methods and applications: A
survey,” Internet of Things, vol. 19, p. 100 568, 2022 (cited on page 2).
13/17
ANOMALY DETECTION IN IoT USING MACHINE LEARNING TECHNIQUE
of Ambient Intelligence and Humanized Computing, vol. 14, pp. 4727–4743, 2023
(cited on page 2).
[11] P. Ghosh Maity and S.Maity, “Outlier Detection in Sensor Data Using Machine
Learning Techniques for IoT Framework and Wireless Sensor Networks: A Brief
Study,” pp. 187–190, 2019 (cited on page 4).
[12] A. Pathak, S. Saguna, K. Mitra, and C. Ahlund, “Anomaly Detection using Ma-
chine Learning to Discover Sensor Tampering in IoT Systems,” pp. 1–6, 2021
(cited on page 4).
[13] Y.-L. Tsou, H.-M. Chu, C. Li, and S.-W. Yang, “Robust distributed anomaly detec-
tion using optimal weighted one-class random forests,” in 2018 IEEE International
Conference on Data Mining (ICDM), IEEE, 2018, pp. 1272–1277 (cited on page
5).
[14] X. Wang and S. H. Ahn, “Real-time prediction and anomaly detection of electrical
load in a residential community,” Applied Energy, vol. 259, p. 114 145, 2020 (cited
on page 5).
[16] A. Jane, “DaRoN: A Technique for Detection and Removal of Noise in IoT Data
by using Central Tendency,” Annals of RSCB, vol. 25, no. 2, p. 3197, 2021 (cited
on pages 6 and 7).
[17] Y. Liu, T. Dillon, W. Yu, W. Rahayu, and F. Mostafa, “Noise Removal in the Pres-
ence of Significant Anomalies for Industrial IoT Sensor Data in Manufacturing,”
14/17
ANOMALY DETECTION IN IoT USING MACHINE LEARNING TECHNIQUE
IEEE Internet of Things Journal, vol. 7, no. 8, pp. 7084–7096, 2020 (cited on
pages 6 and 7).
[18] K. Takahashi, R. Ooka, and S. Ikeda, “Anomaly detection and missing data
imputation in building energy data for automated data pre-processing,” Journal of
Physics: Conference Series, vol. 2069, p. 012 144, Nov. 2021 (cited on pages 6, 7,
and 9).
[19] X. Wang et al., “Solving Sensor Reading Drifting Using Denoising Data Process-
ing Algorithm (DDPA) for Long-Term Continuous and Accurate Monitoring of
Ammonium in Wastewater,” ACS ES&T Water, vol. 1, no. 3, pp. 530–541, 2021
(cited on page 6).
[20] Q. He, X. Wang, and Q. Zhou, “Vibration Sensor Data Denoising Using a Time-
Frequency Manifold for Machinery Fault Diagnosis,” Sensors, vol. 14, no. 1,
pp. 382–402, 2014 (cited on page 6).
[21] J. Qu, Y. Chai, and S. X. Yang, “A real-time de-noising algorithm for e-noses in
a wireless sensor network,” Sensors, vol. 9, no. 02, pp. 895–908, 2009 (cited on
page 7).
[22] J. Wang, Y. Wang, Z. Li, H. Li, and H. Yang, “A combined framework based on
data preprocessing, neural networks and multi-tracker optimizer for wind speed
prediction,” Sustainable Energy Technologies and Assessments, vol. 40, p. 100 757,
2020 (cited on page 8).
[23] S. Sanyal and P. Zhang, “Improving Quality of Data: IoT Data Aggregation Using
Device to Device Communications,” IEEE Access, vol. 6, pp. 67 830–67 840, 2018
(cited on page 8).
[24] M. Güzel, I. Kok, D. Akay, and S. Ozdemir, “ANFIS and Deep Learning based
missing sensor data prediction in IoT,” Concurrency and Computation: Practice
and Experience, vol. 32, e5400, Jun. 2019 (cited on pages 9 and 10).
15/17
ANOMALY DETECTION IN IoT USING MACHINE LEARNING TECHNIQUE
[25] A. Rawat, A. Gupta, A. Singh, and S. Bhushan, “Energy conservation and missing
value prediction model in wireless sensor network,” in 2019 4th International
Conference on Internet of Things: Smart Innovation and Usages (IoT-SIU), 2019,
pp. 1–5 (cited on page 9).
[26] N. U. Okafor and D. T. Delaney, “Missing Data Imputation on IoT Sensor Net-
works: Implications for on-Site Sensor Calibration,” IEEE Sensors Journal, vol. 21,
no. 20, pp. 22 833–22 845, 2021 (cited on page 9).
[29] N. Al-Milli and W. Almobaideen, “Hybrid neural network to impute missing data
for iot applications,” in 2019 IEEE Jordan International Joint Conference on
Electrical Engineering and Information Technology (JEEIT), 2019, pp. 121–125
(not cited).
[30] L. N. B. Srinivas and K. Jayavel, “Missing data estimation and imputation algo-
rithm for wireless sensor network applications,” in 2022 International Conference
on Computer Communication and Informatics (ICCCI), 2022, pp. 1–6 (not cited).
16/17