Analysis of Distributed Systems
Analysis of Distributed Systems
16
International Journal of Computer Applications (0975 – 8887)
Volume 186 – No.48, November 2024
Providing a unified view is often a difficult task. It involves carried out concurrently on multiple nodes to generate local
ensuring that users cannot discern where a process is running, results independently, thereby enhancing computing
or if parts of a task have been delegated to other processes performance [8]. The Reduce process acts on the local
elsewhere. Users should not need to know where data is stored outcomes to produce a global result. This stage involves
or be concerned with data replication for performance transferring all local results to the nodes responsible for the
improvement. This concept of distribution transparency is a Reduce process, incurring a high data communication cost due
key objective in distributed system design, reminiscent of how to shuffling and transforming data among nodes. Upon
Unix-like operating systems offer a unified file-system gathering all local results at the Reduce nodes, a global result
interface to abstract differences between various resources. is produced through the Reduce process [9].
2.3 Middleware and distributed systems When a data processing task can be accomplished with a single
set of Map and Reduce operations, such as tallying word
Middleware in distributed systems plays a crucial role in frequencies from numerous web pages, the MapReduce
assisting the development of distributed applications by serving program can effectively analyze large data files by leveraging
as a separate layer of software placed on top of the respective a large-scale cluster with numerous nodes. Conversely, if an
operating systems of the computers within the system. This iterative algorithm is converted into a series of Map and Reduce
layer, known as middleware, acts as a manager of resources that operations, the algorithm may not efficiently handle a vast
allows applications to efficiently share and deploy resources distributed dataset due to factors like I/O, communication, and
across the network. In addition to resource management, computing expenses [10].
middleware provides services such as inter application
communication, security, accounting, and failure recovery. 3.2 Apache Spark
Unlike operating systems, middleware operates in a networked
environment. YARN is a popular middleware [3], a framework Apache Spark, originally created at the University of
for resource management and task scheduling. California, Berkeley, is an open-source engine for processing
large-scale data. It differs from Hadoop MapReduce by storing
2.4 Fault Tolerant all interim results in a Resilient Distributed Dataset (RDD) in
memory to reduce I/O costs. It also employs a directed acyclic
Fault tolerance is a key feature of distributed systems, ensuring graph (DAG) task segmentation method for operating on RDD,
that they remain functional even in the event of node failures. similar to MapReduce. Spark's in-memory computing
This resilience is achieved through redundancy, where critical outperforms Hadoop, making it the leading platform for batch
data or services are duplicated across multiple nodes. big data analysis [11-13].
Redundant components can seamlessly take over tasks from
failed nodes, minimizing downtime and preventing data loss. 3.3 Distributed File Systems
Fault tolerance is particularly important for applications that
require high availability, such as financial transaction systems, By employing the divide-and-conquer approach in distributed
healthcare data services, and critical infrastructure monitoring. computing, a significant data file gets divided into several small
files known as data blocks. These data blocks are then
3. Architectural Styles in Distributed distributed across cluster nodes' disks to enhance I/O
performance. This method of storing a large data file is termed
Systems as a distributed data file and is effectively managed on the
Various architectural styles are utilized in distributed systems, cluster using various distributed file systems [14, 15] like GFS
with each one being customized to address specific needs and [16], HDFS [17], TFS [18], and FastDFS [19]. These file
obstacles. This discourse delves into three notable systems play a crucial role in facilitating big data analysis.
architectures, such as MapReduce, Apache Spark, and the Figure 1 describes distributed Computing Frameworks for big
Google File System (GFS), showcasing their respective data analysis.
characteristics. These architectures are all geared towards
effectively handling and overseeing extensive data sets spread GFS, a Linux-based distributed file system developed by
across computer clusters [4]. Google, caters to the specific needs of individual companies
[20]. TFS, on the other hand, is a high-availability, high-
3.1 MapReduce performance distributed file system created by Taobao to
The distributed file system divides large data files, while the address the storage demands of unstructured small files, usually
MapReduce programming model splits the algorithm into under 1 MB in size. FastDFS, an open-source distributed file
segments that can be processed on data blocks in a distributed system, is a lightweight option ideal for online services that
manner to achieve optimal computing performance. Initially utilize files as their primary medium.
created by Google, MapReduce was later integrated into the HDFS, originating from the Apache Hadoop project, was
Apache Hadoop project to transform sequential algorithms into crafted to tackle the complexities of distributed data processing
a MapReduce format, enabling efficient execution on a cluster within extensive clusters. It serves as a fault-tolerant data
[5, 6]. storage system running on standard hardware, making it well-
A MapReduce program consists of two fundamental processes: suited for managing large volumes of big data. As a result,
Map and Reduce [7]. The Map process operates on the data HDFS has gained wide acceptance in the industry for its role in
blocks of a node to produce a local outcome. This process is processing and analyzing big data.
17
International Journal of Computer Applications (0975 – 8887)
Volume 186 – No.48, November 2024
4. FAULT TOLERANCE AND occurred. This method is simple and effective for monitoring
the health of nodes in distributed systems [23].
RESILIENCE
Distributed architecture has the major benefit of fault tolerance. (b) Timeouts: Timeouts are used to detect failures by setting a
Continuous service is ensured by the ability of other nodes to time limit on certain operations or responses. If the expected
take over for a failing one. Large, complicated applications find response is not received within the timeout period, the system
this architecture to be a great option because it can be readily can assume a failure and initiate appropriate recovery or
scaled up by adding more machines. failover procedures [23].
4.1 Fault Tolerance Mechanisms (c) Failure Detectors: Failure detectors help in identifying
component failures within a system. They can vary in accuracy
Fault tolerance refers to the capacity of a system to function and speed, and are crucial for deciding when to trigger a
normally even if some of its components fail. Fault tolerance in failover or recovery process [23].
distributed systems is frequently accomplished by use of
redundancy, replication, and error recovery techniques. 4.3 Algorithms for Distributed Consensus
Replication: Replication is keeping copies of the same data or In distributed systems, consensus methods are necessary to
service on several nodes so that the system may still access the guarantee that every node agrees on a single data value or a
data from another node in the event that one fails. File and single series of events, which is critical to preserving
database systems both availability [22]. consistency among dispersed operations.
Redundancy: Hardware, network, and data layers of a
distributed system can all have redundancy implemented. It is
building up backup copies of system components, including
servers or hard drives, that can take over when the main one
malfunctions [22].
Error Recovery: Error recovery methods are intended to put the
system back to a known good state after a failure. This might
entail processes like checkpointing, in which the system
routinely stores a state snapshot so that, in the event of a crash,
it can resume from this point instead of having to start over
[22].
18
International Journal of Computer Applications (0975 – 8887)
Volume 186 – No.48, November 2024
4.3.1 Paxos: It allows a cluster of distributed database primary constituents: Leader Election, Log Replication, and
nodes or other dispersed group of computers to come to an Safety.
agreement via an asynchronous network. One or more of the
computers offers Paxos a value in order to reach an agreement.
5. RAY FRAMEWORK
When most of the Paxos-running computers concur on a given Ray is a framework that is open-source and offers a
value, consensus is reached. Paxos chooses a single value from straightforward, universal API for constructing distributed
among one or more suggested values and broadcasts it to every applications. Ray is specifically engineered to deliver
cooperating computer. The cluster clocks ahead once every exceptional performance and scalability, especially for
computer (or database node) agrees on the suggested value after applications that require advanced computational capabilities,
the Paxos process has completed [24]. such as machine learning and artificial intelligence [29]. Figure
6 and 7 describes Ray architecture.
4.3.2 Raft: Raft has been designed to be more
comprehensible than Paxos and functions on the basis of a
robust leader concept. The system is partitioned into three
5.1 PERFORMANCE CHARACTERISTICS (b) Scalability: Ray has the potential to easily handle hundreds
to thousands of nodes in a horizontal manner, enabling
(a) High Throughput and Low Latency: Ray's task execution
applications to efficiently utilize more computational resources
framework is optimized to provide high throughput and low
as required, without seeing a substantial decline in
latency in task scheduling and execution. This makes it well-
performance.
suited for applications that require high performance.
19
International Journal of Computer Applications (0975 – 8887)
Volume 186 – No.48, November 2024
5.2 Comparison between Spark and Ray [2] van Steen, M., Tanenbaum, A.S. A brief introduction to
distributed systems. Computing 98, 967–1009 (2016).
Spark is primarily built for data processing workflows and https://doi.org/10.1007/s00607-016-0508-7
batch processing. It also supports streaming data through
micro-batching. It is very suitable for tasks such as ETL [3] P. S. Janardhanan and P. Samuel, "Launch overheads of
workloads, batch queries, and data transformation workflows. spark applications on standalone and hadoop YARN
clusters" in Advances in Electrical and Computer
Ray is specifically designed for applications that require real- Technologies, Singapore:Springer, pp. 47-54, 2020.
time processing and high speed. It has built-in support for both [4] X. Sun, Y. He, D. Wu and J. Z. Huang, "Survey of
batch processing and streaming data. It performs exceptionally Distributed Computing Frameworks for Supporting Big
well in situations that need immediate decision-making and Data Analysis," in Big Data Mining and Analytics, vol. 6,
interactive computing. no. 2, pp. 154-169, June 2023, doi:
Spark utilizes a resilient distributed dataset (RDD) and directed 10.26599/BDMA.2022.9020014.
acyclic graph (DAG) to execute tasks. However, this approach [5] R. Gu, X. Yang, J. Yan, Y. Sun, B. Wang, C. Yuan, et al.,
may not be as effective for iterative algorithms that involve "SHadoop: Improving MapReduce performance by
managing a large amount of mutable state. optimizing job execution mechanism in hadoop clusters",
J. Parallel Distribut. Comput., vol. 74, no. 3, pp. 2166-
Ray supports the execution of dynamic task graphs, which can
2179, 2014.
be more efficient for applications that require frequent
modifications to state or that benefit from precise task [6] I. Polato, R. Ré, A. Goldman and F. Kon, "A
management. comprehensive view of hadoop research-A systematic
literature review", J. Network Comput. Applicat., vol. 46,
Spark executes computations in memory, and its efficiency pp. 1-25, 2014.
greatly depends on memory management and the capability to
[7] Y. Wang, W. Jiang and G. Agrawal, "SciMATE: A novel
store datasets in memory throughout the cluster.
MapReduce-like framework for multiple scientific data
Ray utilizes an object store to manage shared memory among formats", Proc. 2012 12 th IEEE/ACM Int. Symp. Cluster
activities, hence minimizing the costs associated with data Cloud and Grid Computing (CCGRID 2012) , pp. 443-
transportation and duplication. 450, 2012.
Spark and Ray can be used synergistically in IoT applications. [8] J. Dean and S. Ghemawat, "MapReduce: Simplified data
Spark can handle the initial stages of data ingestion, cleaning, processing on large clusters", Commun ACM, vol. 51, no.
and aggregation, while Ray can focus on real-time processing, 1, pp. 107-113, 2008.
decision-making, and AI model deployment. By leveraging the [9] M. R. Ghazi and D. Gangodkar, "Hadoop MapReduce and
strengths of both frameworks, IoT systems can achieve scalable HDFS: A developers perspective", Proc. Comput. Sci.,
data handling, robust analytics, and dynamic performance vol. 48, pp. 45-50, 2015.
optimization, addressing the diverse demands of modern IoT [10] Y. Zhang, Q. Gao, L. Gao and C. Wang, "iMapReduce: A
ecosystems [31-36]. distributed computing framework for iterative
Distributed systems can integrate AI to provide predictive computation", J. Grid Comput., vol. 10, no. 1, pp. 47-68,
analytics, automated diagnosis, and optimized treatment plans. 2012.
Blockchain offers secure, decentralized data management and [11] J. Yu, J. Wu and M. Sarwat, "A demonstration of
ensures the integrity of healthcare records. 5G enhances the geoSpark: A cluster computing framework for processing
connectivity and reliability of IoT devices in distributed big spatial data", Proc. 2016 IEEE 32 nd Int. Conf. Data
systems, enabling real-time data transmission [37-45]. Engineering (ICDE) , pp. 1410-1413, 2016.
[12] Z. Yang, C. Zhang, M. Hu and F. Lin, "OPC: A distributed
6. CONCLUSION computing and memory computing-based effective
Distributed systems play an essential role in enabling scalable solution of big data", Proc. 2015 IEEE Int. Conf. Smart
and fault-tolerant solutions for handling large datasets and City/ SocialCom/SustainCom (SmartCity), pp. 50-53,
complex computations. Through mechanisms like replication, 2015.
redundancy, and failure detection, these systems ensure high
availability and resilience. The architectural styles explored, [13] V. Taran, O. Alienin, S. Stirenko, Y. Gordienko and A.
including MapReduce, Spark, and the Google File System, Rojbi, "Performance evaluation of distributed computing
provide a strong foundation for big data processing. As environments with Hadoop and spark frameworks", Proc.
technology continues to evolve, frameworks such as Ray offer 2017 IEEE Int. Young Scientists Forum on Applied
enhanced capabilities for real-time processing and machine Physics and Engineering (YSF), pp. 80-83, 2017.
learning applications, making distributed computing [14] T. D. Thanh, S. Mohan, E. Choi, S. Kim and P. Kim, "A
indispensable in the modern digital landscape. Future taxonomy and survey on distributed file systems", Proc.
advancements will likely focus on improving fault tolerance, 2008 4 th Int. Conf. Networked Computing and Advanced
reducing latency, and enhancing scalability to meet the Information Management , pp. 144-149, 2008.
growing demands of data-intensive industries. [15] J. Blomer, "A survey on distributed file system
technology", J. Phys. Conf. Ser., vol. 608, pp. 012039,
7. REFERENCES 2015.
[1] https://www.splunk.com/en_us/blog/learn/distributed-
[16] S. Ghemawat, H. Gobioff and S. T. Leung, "The google
systems.html#:~:text=Distributed%20systems%20are%2
file system", ACM SIGOPS Oper. Syst. Rev., vol. 73, no.
0used%20when,to%20news%20about%20your%20organ
5, pp. 29-43, 2003.
ization.
[17] L. Jiang, B. Li and M. Song, "The optimization of HDFS
based on small files", Proc. 2010 3 rd IEEE Int. Conf.
20
International Journal of Computer Applications (0975 – 8887)
Volume 186 – No.48, November 2024
Broadband Network and Multimedia Technology (IC- C Prediction Applying Different ML Classification
BNMT) , pp. 912-915, 2010. Algorithm”, International Conference on Computing and
[18] S. Zhuo, X. Wu, W. Zhang and W. Dou, "Distributed file Communication Networks 2024 (ICCCNet 2024), 2024.
system and classification for small images", Proc. 2013 [36] Javed Mehedi Shamrat, F. M., Tasnim, Z., Chowdhury, T.
IEEE Int. Conf. Green Computing and Communications R., Shema, R., Uddin, M. S., & Sultana, Z. (2022).
and IEEE Internet of Things and IEEE Cyber Physical and Multiple cascading algorithms to evaluate performance of
Social Computing, pp. 2231-2234, 2013. face detection. In Pervasive Computing and Social
[19] H. Che and H. Zhang, "Exploiting fastDFS client-based Networking: Proceedings of ICPCSN 2021 (pp. 89-102).
small file merging", Proc. 2016 Int. Conf Artificial Springer Singapore.
Intelligence and Engineering Applications, pp. 242-246, [37] Javed Mehedi Shamrat, F. M., Ghosh, P., Tasnim, Z.,
2016. Khan, A. A., Uddin, M. S., & Chowdhury, T. R. (2022).
[20] Z. Ullah, S. Jabbar, M. H. Bin, Tariq Alvi and A. Ahmad, Human Face recognition using eigenface, SURF method.
"Analytical study on performance challenges and future In Pervasive Computing and Social Networking:
considerations of Google file system", Int. J. Computer Proceedings of ICPCSN 2021 (pp. 73-88). Springer
Communicat. Eng., vol. 3, no. 4, pp. 279-284, 2014. Singapore.
[21] https://medium.com/@ayeshwery/architectures-in- [38] Kowsher, M., Tahabilder, A., Sanjid, M. Z. I., Prottasha,
distributed-system-b2ace2fca6bb N. J., Uddin, M. S., Hossain, M. A., & Jilani, M. A. K.
(2021). LSTM-ANN & BiLSTM-ANN: Hybrid deep
[22] Tanenbaum, A.S., & Van Steen, M. (2017). "Distributed learning models for enhanced classification accuracy.
Systems: Principles and Paradigms." Procedia Computer Science, 193, 131-140.
[23] Chandra, T.D., & Toueg, S. (1996). "Unreliable Failure [39] Mondai, R., & Rahman, M. M. (2017, July). Dynamic
Detectors for Reliable Distributed Systems." analysis of variable structure based sliding mode
[24] https://medium.com/@mani.saksham12/raft-and-paxos- intelligent load frequency control of interconnected
consensus-algorithms-for-distributed-systems- nonlinear conventional and renewable power system. In
138cd7c2d35a 2017 International Conference on Intelligent Computing,
[25] Ongaro, D., & Ousterhout, J. (2014). "In Search of an Instrumentation and Control Technologies (ICICICT) (pp.
Understandable Consensus Algorithm." 393-400). IEEE.
[26] https://kafka.apache.org/documentation/ [40] Bharati, S., Rahman, M. A., Mondal, R., Podder, P., Alvi,
A. A., & Mahmood, A. (2020). Prediction of energy
[27] https://medium.com/@kajol_singh/unveiling-apache-
consumed by home appliances with the visualization of
kafka-a-comprehensive-guide-to-core-concepts-and-
plot analysis applying different classification algorithm.
functionality-2efd51de2b89
In Frontiers in Intelligent Computing: Theory and
[28] https://bair.berkeley.edu/blog/2018/01/09/ray/ Applications: Proceedings of the 7th International
[29] Moritz, Philipp, et al. "Ray: A distributed framework for Conference on FICTA (2018), Volume 2 (pp. 246-257).
emerging {AI} applications." 13th USENIX symposium Springer Singapore.
on operating systems design and implementation (OSDI [41] Hoque, R., Maniruzzaman, M., Michael, D. L., & Hoque,
18). 2018. M. (2024). Empowering blockchain with SmartNIC:
[30] https://www.datacamp.com/tutorial/distributed- Enhancing performance, security, and scalability. World
processing-using-ray-framework-in-python Journal of Advanced Research and Reviews, 22(1), 151-
[31] Hoque, K., Hossain, M. B., Sami, A., Das, D., Kadir, A., 162.
& Rahman, M. A. (2024). Technological trends in 5G [42] Amit Deb Nath, Rahmanul Hoque, Md. Masum Billah,
networks for IoT-enabled smart healthcare: A review. Numair Bin Sharif, Mahmudul Hoque. Distributed
International Journal of Science and Research Archive, Parallel and Cloud Computing: A Review. International
12(2), 1399-1410. Journal of Computer Applications. 186, 16 (Apr 2024),
[32] Md Shihab Uddin. Addressing IoT Security Challenges 25-32. DOI=10.5120/ijca2024923547
through AI Solutions. International Journal of Computer [43] Maniruzzaman, M., Sami, A., Hoque, R., & Mandal, P.
Applications. 186, 45 (Oct 2024), 50-55. (2024). Pneumonia prediction using deep learning in chest
DOI=10.5120/ijca2024924107 X-ray Images. International Journal of Science and
[33] Khandoker Hoque, Md Boktiar Hossain, Denesh Das, Research Archive, 12(1), 767-773.
Partha Protim Roy. Integration of IoT in Energy Sector. [44] M. S. Miah and M. S. Islam, "Big Data Analytics
International Journal of Computer Applications. 186, 36 Architectural Data Cut off Tactics for Cyber Security and
(Aug 2024), 32-40. DOI=10.5120/ijca2024923981 Its Implication in Digital forensic," 2022 International
[34] Md Maniruzzaman, Md Shihab Uddin, Md Boktiar Conference on Futuristic Technologies (INCOFT),
Hossain, Khandoker Hoque, “Understanding COVID-19 Belgaum, India, 2022, pp. 1-6, doi:
Through Tweets using Machine Learning: A Visualization 10.1109/INCOFT55651.2022.10094342.
of Trends and Conversations”, European Journal of [45] Obaida, M. A., Miah, M. S., & Horaira, M. A. (2011).
Advances in Engineering and Technology, Vol. 10, Issue: Random Early Discard (RED-AQM) Performance
5, pp. 108-114, 2023. Analysis in Terms of TCP Variants and Network
[35] Md Boktiar Hossain, Khandoker Hoque, Mohammad Parameters: Instability in High-Bandwidth-Delay
Atikur Rahman, Priya Podder, Deepak Gupta, “Hepatitis Network. International Journal of Computer Applications,
27(8), 40-44.
IJCATM : www.ijcaonline.org 21