0% found this document useful (0 votes)
9 views4 pages

fault-tolerance-in-iot

The document presents a comparative study on fault tolerance techniques in Internet of Things (IoT) systems, emphasizing the importance of implementing fault tolerance across various architectural layers. It categorizes existing approaches and discusses current trends, highlighting the relationship between fault tolerance and system performance attributes such as availability, scalability, and energy consumption. The study aims to provide a comprehensive understanding of fault tolerance in IoT and suggests future research directions in this domain.

Uploaded by

Manoj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views4 pages

fault-tolerance-in-iot

The document presents a comparative study on fault tolerance techniques in Internet of Things (IoT) systems, emphasizing the importance of implementing fault tolerance across various architectural layers. It categorizes existing approaches and discusses current trends, highlighting the relationship between fault tolerance and system performance attributes such as availability, scalability, and energy consumption. The study aims to provide a comprehensive understanding of fault tolerance in IoT and suggests future research directions in this domain.

Uploaded by

Manoj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Asian Journal of Convergence in Technology Volume VII and Issue I

ISSN NO: 2350-1146 I.F-5.11

Fault Tolerance in IoT: Techniques and Comparative


Study
Abhay Agrawal Devendra Toshniwal
Department of Computer Science and Information Systems Department of Computer Science and Information Systems
Birla Institute of Technology & Science,Pilani Campus Birla Institute of Technology & Science,Pilani Campus
[email protected] [email protected]

Abstract— Fault tolerance increases system availability and exclusion criteria and a detailed review, the primary studies
reliability by making systems robust to failures and proactive were selected.
enough to tackle failures. Fault tolerance can be introduced at
different architectural layers of the Internet of Things (IoT), this The paper is organized in the following sections: In section
is because a fault can occur at any of the layers. As for example, II, related works in the field of fault tolerance in IoT are
motion sensors, and motors can fail at the root layer, network discussed. Section III explains the taxonomy, on the basis of
connectivity could be disrupted in network layer, computation which fault tolerance in different systems is compared. The
and storage nodes can perform erroneously in their layers, so it comparison is done in section IV. Section V, presents the
becomes crucial to introduce fault tolerance in IoT systems at current trends in the field of fault tolerance in IoT. Section V
every layer. The study paves the way for classifying current and concludes the paper.
possible fault tolerant approaches by presenting different
techniques(replication, network control etc.), architectural II. RELATED WORKS
patterns(centralized, hybrid etc.), layers(network, sense etc.) & Moghaddam et al. [6] discussed different ways of achieving
styles(Microservices, Publish-Subscribe etc.) that can help in
fault tolerance in IoT systems, fault tolerance aspects and
making a system fault tolerant efficiently. Paper also discusses
subdomains in fault tolerance. The paper also shows changing
current trends in fault tolerance, areas that have been widely
worked upon and areas that can act as a future scope, in making
and emerging trends in the field of fault tolerance in IoT the
IoT based systems fault tolerant and efficient. study is performed in a systematic mapping way. And paving a
foundation for future studies in fault tolerance in Iot domain.
Keywords—Fault Tolerance, Internet of Things, Replication, Rullo et al. [7] reviewed fault tolerance techniques based on
Reliability, Availability. redundancy that targets availability and data integrity. The
I. INTRODUCTION paper discusses fault tolerance implementation techniques and
approaches at sensing & network layer. The paper reviews
In order to deliver smart services, IoT is the recent proposed approaches for achieving fault tolerance,
internal/external communication of intelligent elements [1] shows how they can be implemented to introduce fault
through the internet. Reliable and fault-free facilities should be tolerance at device level, overcoming disadvantages of old
offered by a dependable IoT scheme. A fault is a flaw that algorithms.
impacts the correct functionality within the hardware or
software systems [2]. As IoT devices are heterogeneous, highly III. TAXONOMY
distributed, battery-powered, and reliant on wireless The aim of this study is based on the Goal-Question-Metric
communication and affected by scalability, it is especially insights which are as follows:
difficult to create a pattern for Fault Tolerance in IoT. The IoT
devices that are distributed [3] in nature may cause the system Purpose: to have a thorough understanding of IoT fault-
to suffer from server crashes, server omissions, incorrect tolerant systems.
responses, and arbitrary errors. The reliance on wireless and
Issue: through the detection, classification and analysis of
battery makes the IoT devices hardly recoverable [4]. In
different approaches, techniques and architectures.
addition, being exposed to new equipment and facilities
influences the performance of the system. Object: Approaches based on existing IoT frameworks.
Although the IoT was launched more than a decade ago [5], Viewpoint: From the perspectives of both research and
its various aspects and quality of services (QoS) such as Fault industry.
Tolerance are still being attempted by the researchers to define
them well. Therefore the purpose of this research is to define We considered all the selected studies afterwards and
and classify the state of the art of the domain and to highlight filtered them according to a set of well-defined criteria for
the approaches, techniques and architectures that are potentially inclusion and exclusion. According to the guidelines, two key
relevant for modelling IoT with fault tolerance. A drivers have driven the concept of inclusion/exclusion criteria:
comprehensive mapping analysis has been carried out in order (i) keeping the focus of the selected papers on the scope of the
to achieve this objective. Based on precise inclusion and study; and (ii) avoiding grey or non-scientific work.

This work is licensed under a Creative Commons Attribution-Noncommercial 4.0 International License
49
Asian Journal of Convergence in Technology Volume VII and Issue I
ISSN NO: 2350-1146 I.F-5.11

A. Architecture Layers C. Architectural Styles


1) Actuator: 1) Microservices:
Actuators transform an electrical signal into a physical In IoT systems, microservices and SOA have the same
quantity that correlates, such as motion, force, sound, etc. In purpose, which is to create one or several applications from a
paper [1], fault tolerance is introduced at actuate layers by collection of different services. A microservice is a lightweight,
making use of multiple devices achieving a common task. In single-responsibility program that can be independently
the paper, in order to detect presence of person devices like deployed, scaled and evaluated. In paper [4] a system is
CCTV, Bluetooth, Wi-Fi, noise detector, are being used. proposed in which microservices run at containers, the system
keeps track of the status of each microservice, in case of any
2) Sensor: failure, first a repairing attempt is made, in case attempts fail a
A sensor is a tool capable of detecting modifications in an replica is run.
environment. A sensor is useless on its own but it plays a key
role when we use it in an electronic device. A sensor can 2) Service oriented Architecture (SOA):
measure and convert a physical phenomenon (such as Service Oriented Architecture (SOA) put the service at the
temperature, pressure, and so on) into an electrical signal. In core of the design of their IoT application. The core application
paper [4], fault tolerance is introduced at the sense layer. The component, in reality, makes the service accessible over a
paper proposes a novel way of detecting fault in sensors by network for other IoT components. Paper [5] presents a
observing the sensor's voltage value, when power is fluctuated. Platform-as-a-service (PaaS) to developers, where he can
The paper claims that proposed techniques are 99% efficient in develop IoT based applications using API, modules,
detecting the faulty sensor. frameworks etc.
3) Processing and Storage: 3) Publish-Subscribe:
The output level depends on how often the components of Publish/Subscribe is a pattern of messaging aimed at
processing and storage are decentralized forced to the edge. decoupling the sending (publisher) and receiving (subscriber)
groups. In paper [8] fault tolerance is handled by using Apache
Processing is the execution step for a particular system that Kafka, publish/subscribe style for achieving data replication,
can be judged based on the time. Storage is another aspect that showing high performance.
can play an important role in effectively storing large volumes
of data. In Paper [8] fault tolerance is achieved in edge D. Fault-Tolerance Techniques
computing systems by introducing Docker, Apache Kafka and 1) Replication:
Kubernetes. Replication is primarily used in the distributed systems
4) Network: research field to provide fault tolerance. In active replication
In IoT the reliability of networks is also an important aspect each client request is processed by all the servers. In passive
to study, network topologies should be simple and adaptable to replication there is only one server (called primary) that
the changes. Paper [9] focuses on introducing fault tolerance at processes client requests. In paper [12], fault tolerance is
network layer, in which, a routing algorithm is proposed that achieved by dividing end nodes into different groups, within
searches disjoint routes for message exchanges in the system, each group all other nodes act as a backup node for each node.
making it robust to failure 2) Network Control:
B. Architectural Patterns The IoT network is normally split into separate clusters
within the network control scheme. A chosen cluster head (CH)
1) Distributed Collaborative:
makes roll call requests to the other nodes regularly and the
The pattern of the architecture can be distributed which in failure will be verified if it does not receive a response
turn divides the network and the data into different sites. This message. The CH itself does however, establish a single point
can have some advantages and disadvantages as well as of failure.
described in paper [10] of our study.
3) Distributed Recovery Block:
2) Centralized:
In this process, a single program is executed simultaneously
A centralized architecture means a single or a few on a pair of nodes, one of which is active and the other is
organizations are available that have control over the entire inactive. The main active) node performs the task in a no-fault
network. Note that a centralized approach [11] usually implies situation and the other node performs the same task in the
one-hop communication for all members of the network, but is shadow. Afterwards all results will be checked and the results
typically realized by a multi-hop network in the context of associated with the main node will be transmitted as the output
short-range embedded systems. if the test is passed properly. The shadow node becomes active
3) Hybrid and generates the outputs if the primary node test fails. If the
This type of architecture combines both the techniques i.e. primary node test fails, the shadow node becomes active and
centralized and decentralized or distributed. This can result in produces the outputs. In paper [3], a system is proposed in
more improve in the overall performance of the system as a which under no-fault condition, main sensors collect and send
whole as discussed in paper [5]. the data to the central server, but parallel same data is being
sensed by shadow (backup) sensor, in case of fault, shadow
sensor replaces main sensor.

50
Asian Journal of Convergence in Technology Volume VII and Issue I
ISSN NO: 2350-1146 I.F-5.11

4) Time Redundancy: following research papers by considering the different


At all instruction and task stages, time replication may be attributes. Crux of this section presented as current trends is
done. The software is duplicated at the instruction level and the discussed in subsequent section V.
results are subsequently compared to detect a possible error. A
A. Architecture Layers:
program is run twice (or more at the task level to minimize
complex faults. While this technique does not introduce Our study shows that efficiency and availability are related
additional hardware costs, it increases the time taken to ensure to fault-tolerance of IoT systems. However the assessment of
redundancy. The method reduces the efficiency of computation the trade-off between FT and other attributes of IoT efficiency,
and thus absorbs more resources. such as scalability, interoperability and energy consumption,
will be further investigated. Another outcome to be further
E. Quality of IoT Service examined by an overview of the state of practice is that only a
1) Performance: few studies facilitate the relationship between FT techniques
How a system is going to perform in different scenarios by and collaborative architects. All the paper considered falls on
considering some set of measures like time constraints, the aforementioned four architectural layers.
environment, etc. In paper [12], distributed edge computational Papers[1],[10],[11] are focused on the actuate layer, papers
network is being discussed in which, by introducing [1],[3],[9],[10],[11],[13] are related to sense layer, papers
preprocessing at end nodes, much of the computation is [4],[8],[10],[11] are based on processing and storage layer and
handled at the end nodes itself, hence reducing network traffic papers [1],[8],[9], are focused on network layer. So different
in between end nodes and the central server. This also helped in layers are being targeted in each paper to make a network fault
reducing network utilization and decreasing network latency. tolerant.
2) Availability & Security: B. Architectural Patterns
Availability is the ability of the system, to be completely or The question here is for each Fault Tolerance technique,
partially working whenever required. Fault Tolerance and which architectural pattern is more frequently used? Hybrid
availability are not equivalent, as a fault-tolerant system is patterns [4] were used by studies to promote their passive Fault
expected to keep the system running without interruption, Tolerance techniques, while hybrids were used for active FT.
however service interruptions can occur in a highly available Conversely, to deal with passive Fault Tolerance, unified and
system. A fault-tolerant scheme, however, should also preserve collaborative architectural patterns [10] are more fitting.
a high degree of device availability and performance. Obviously, it is easier to approach the network control Fault
Tolerance technique via a hybrid architectural pattern. In
Security is a major concern in IoT systems that link various
general, FT-IoT is assured by a hybrid architecture that if one
components and entities through a network to each other. Paper
fog node fails, the IoT device will move the computation to
[1] focused on introducing fault tolerance at home security
another fog to prevent a single point of failure. To achieve a
systems, which detects home intrusions by making use of
fault tolerant network in paper [10] distributed collaborative
multiple devices.
pattern is followed, in papers [13], [11] centralized pattern is
3) Scalability: being employed, in papers [4],[8],[9] hybrid architecture is
As IoT systems should be able to work properly considering being implemented.
a large number of heterogeneous devices, scalability [9] is also C. Architectural Styles
an important attribute. It is difficult to comment on IoT
scalability as a whole system, but it depends on how to Different architectural styles followed in different papers to
incorporate new resources on demand. achieve a fault tolerant system. Styles employed in papers
under study are as follows. Microservices style is used in paper
4) Interoperability: [13], in papers [4],[10],[11],[5] service oriented style is being
Interoperability allows heterogeneous IoT components to used, in paper[8],[4] cloud based architecture style is being
work efficiently together. The paper [4] performs a used, in papers [8],[4],[11] layered style approach is being
comprehensive survey on the state-of-the-art solutions for followed, in [8] publish/subscribe style is being used.
facilitating interoperability between different IoT platforms.
Also, the key challenges in this topic are presented. D. Fault-Tolerance Techniques
As mentioned in section III, to make a system fault tolerant
5) Energy Consumption: different fault tolerant techniques can be employed. The
Most IoT devices are battery-powered, and it is important to different techniques used by different papers are described
have energy efficiency linked to many other quality attributes, below.
such as performance. Paper [9] introduces an algorithm to
search disjoint paths that can minimize energy consumption 1) Replication:
while dealing with network link failure. In papers [4], [8] active replication is being employed
whereas in paper [10],[11],[12] passive replication is being
IV. COMPARISONS employed.
The following section compares paper on the basis of 2) Network Control:
architecture used, techniques employed, QoS achieved while The key studies have suggested many cluster-based routing
carrying out fault tolerance in IoT based systems. The protocols. Network control scheme is being employed in papers
comparison is totally based on the study that is done on the [10], [11].

51
Asian Journal of Convergence in Technology Volume VII and Issue I
ISSN NO: 2350-1146 I.F-5.11

3) Distributed Recovery Block: and causes delay. The findings of this study are both research-
In papers [3],[13] distributed recovery blocks technique is oriented and industry-oriented and are intended to establish a
employed to ensure node computations are error free. context for future Fault Tolerance IoT related research. We will
analyses the possible incorporation of existing research at the
4) Time Redundancy: industrial level of IoT as a future task. The study will help the
In paper [3] time redundancy technique is followed. readers to analyses the IoT system very minutely.
E. Quality of IoT Service REFERENCES
Quality of Service (QoS) is also an ever-increasing network [1] D. Terry, "Toward a New Approach to IoT Fault Tolerance," in
requirement today. New applications, such as voice and live Computer, vol. 49, no. 8, pp. 80-83, Aug. 2016.
video transmissions, which are accessible to consumers over [2] “What Is Fault Tolerance?: Creating a Fault Tolerant System: Imperva.”
the internet, generate higher standards for the quality of the Learning Center, Imperva, 30 Dec. 2019,
services offered. When the traffic volume is greater than what www.imperva.com/learn/availability/fault-tolerance/.
can be transmitted over the network, devices queue, or keep, [3] Tusher Chakraborty, Akshay Uttama Nambi, Ranveer Chandra, Rahul
the packets are held in memory before the resources are made Sharma, Manohar Swaminathan, Zerina Kapetanovic, and Jonathan
Appavoo. 2018. Fall-curve: A novel primitive for IoT Fault Detection
available to transmit them. In papers [8],[9],[11],[12] and Isolation. In Proceedings of the 16th ACM Conference on Embedded
performance is focused. Availability is used as an attribute in Networked Sensor Systems (SenSys '18). Association for Computing
papers[8],[9],[10], security is described in papers[1],[4], Machinery, New York, NY, USA, 95–107.
scalability is used as an attribute in paper[9], interoperability is [4] N. Mohamed, J. Al-Jaroodi and I. Jawhar, "Towards Fault Tolerant Fog
used as an attribute in paper[4], energy consumption is focussed Computing for IoT-Based Smart City Applications," 2019 IEEE 9th
in paper[9]. Annual Computing and Communication Workshop and Conference
(CCWC), Las Vegas, NV, USA, 2019, pp. 0752-0757.
V. TRENDS [5] Gubbi, Jayavardhana, Rajkumar Buyya, Slaven Marusic, and Marimuthu
Palaniswami. "Internet of Things (IoT): A vision, architectural elements,
It was observed that in most of the papers reviewed, in and future directions." Future generation computer systems 29, no. 7
order to introduce fault tolerance actuate and sense layer was (2013): 1645-1660.
being targeted, replication and network control techniques were [6] Moghaddam, Mahyar Tourchi, and Henry Muccini. "Fault-tolerant iot."
primarily employed, performance and availability is mostly In International Workshop on Software Engineering for Resilient
discussed under QoS attribute. Some papers [1],[3],[9] Systems, pp. 67-84. Springer, Cham, 2019.
discussed novel approaches to achieve fault tolerance [7] Rullo, Antonino, Edoardo Serra, and Jorge Lobo. "Redundancy as a
Measure of Fault-Tolerance for the Internet of Things: A Review." In
techniques, removing disadvantages of old techniques. Policy-Based Autonomic Data Governance, pp. 202-226. Springer,
Energy consumption, one of the QoS attributes is less Cham, 2019.
focused while making system fault tolerant, so there is a wide [8] A. Javed, K. Heljanko, A. Buda and K. Främling, "CEFIoT: A fault-
tolerant IoT architecture for edge and cloud," 2018 IEEE 4th World
scope in this field to work upon. Also, time redundancy Forum on Internet of Things (WF-IoT), Singapore, 2018, pp. 813-818
technique is employed in a handful of papers, hence paving the [9] M. Z. Hasan and F. Al-Turjman, "Optimizing Multipath Routing With
way for future research. It was also observed that Guaranteed Fault Tolerance in Internet of Things," in IEEE Sensors
correspondence between fault tolerance techniques and Journal, vol. 17, no. 19, pp. 6463-6473, 1 Oct.1, 2017.
associated architecture is less studied. So, despite fault [10] P. H. Su, C. Shih, J. Y. Hsu, K. Lin and Y. Wang, "Decentralized fault
tolerance in IoT being studied over a decade, there is still much tolerance mechanism for intelligent IoT/M2M middleware," 2014 IEEE
scope of improvements in the field. World Forum on Internet of Things (WF-IoT), Seoul, 2014, pp. 45-50.
[11] S. Zhou, K. Lin, J. Na, C. Chuang and C. Shih, "Supporting Service
VI. CONCLUSION Adaptation in Fault Tolerant Internet of Things," 2015 IEEE 8th
International Conference on Service-Oriented Computing and
In this paper, we present a systematic analysis of mapping Applications (SOCA), Rome, 2015, pp. 65-72.
with the objective of classifying and defining the state-of-the- [12] M. Mudassar, Y. Zhai, L. Liao and J. Shen, "A Decentralized Latency-
art domain and extracting a collection of methods and Aware Task Allocation and Group Formation Approach With Fault
techniques for Fault Tolerance in IoT. The fault tolerance Tolerance for IoT Applications," in IEEE Access, vol. 8, pp. 4912-
capability of some papers shows that cloud data center faults 4923,2020.
can be addressed in real time by customized design before [13] A. Celesti, L. Carnevale, A. Galletta, M. Fazio and M. Villari, "A
Watchdog Service Making Container-Based Micro-services Reliable in
repair becomes available. In comparison to some of the IoT Clouds," 2017 IEEE 5th International Conference on Future Internet
contributions discussed in the related works, the transition of of Things and Cloud (FiCloud), Prague, 2017, pp. 372-378.
data from failed devices to safe ones takes more excessive time

52

You might also like