0% found this document useful (0 votes)
62 views20 pages

DS Ass

The document discusses distributed system assignment questions related to topics like distributed computing evolution, limitations of technologies like XML and SOA, locking protocols, transaction management, distributed simulation frameworks, high performance approaches, distributed platform areas and algorithms, clock synchronization, distributed programming evolution, and leading platform strengths and limitations.

Uploaded by

Tame PcAddict
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
62 views20 pages

DS Ass

The document discusses distributed system assignment questions related to topics like distributed computing evolution, limitations of technologies like XML and SOA, locking protocols, transaction management, distributed simulation frameworks, high performance approaches, distributed platform areas and algorithms, clock synchronization, distributed programming evolution, and leading platform strengths and limitations.

Uploaded by

Tame PcAddict
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

TECHNICAL AND VOCATIONAL TRAINING INSTITUTE

DEPARTMENT OF INFORMATION and COMMUNICATION


TECHNOLOGY

Distributed System Assignment #1

Name: Salim Mulugeta


ID: ETUMR/263/14

Submitted to: Prof. Ravindra Babu B.


Submission Date: February, 2023
Contents
Q1. How distributed computing systems are going to be evolved in future and explain it briefly
mentioning/citing with proper references. .................................................................................................. 2
Q2. Write limitations of following technologies a.XML b. SOA c. SOAP d. RESTful .................................... 3
Q3. Explain working principles of 2 phase locking and 3 phase locking ....................................................... 4
Q4. Describe the essentiality of transaction management and explain how transaction management
works in distributed platforms...................................................................................................................... 5
Q5. What are different simulation/emulation frameworks available for distributed computing platforms
to simulate, compare and contrast its capabilities ....................................................................................... 7
Q6. Write different approaches to achieve high performance in distributed environments. ..................... 8
Q7. Explain major distributed platform areas and its algorithm strengths and Weakness.......................... 9
Q8. Write different clock synchronization and leadership algorithms for distributed platforms. ............. 12
Q9. Write a short note on evolution of distributed programming with proper references. ...................... 14
Q10. Describe the current limitations and Strengths of leading distributed computing platforms with
proper references. ...................................................................................................................................... 17

Page 1 of 20
Q1. How distributed computing systems are going to be evolved in future and
explain it briefly mentioning/citing with proper references.
ANS:

A distributed system is a collection of multiple autonomous computing elements that appears to


its users as a single comprehensible system [1]. The ultimate goal of distributed computing is to
maximize performance by connecting users and IT resources in a cost-effective, transparent and
reliable manner.

The present computing paradigm is not scalable since it depends on "shared memory", yet most
physical systems work with message passing, so to gain progress you need to convince people to
surrender one for the other. You can bring down the bar on the progress by making Simultaneous
Multiple processing work as a programming paradigm on the top of message passing.

Because of the quick advancement in PC equipment, software, web, sensor networks, portable
device communications, and interactive media advances, distributed computing systems have
evolved radically to improve and grow various applications with better nature of administrations
and lower cost, particularly those involving human factors [2]. Besides reliability, performance
and availability, many other attributes, such as security, privacy, trustworthiness, situation
awareness, flexibility and rapid development of various applications, have also become important.
Distributed Computing System will serve and evolved in the long run.

With the rapid development of various emerging distributed computing technologies such as Web
services, Grid computing, and Cloud computing, computer networks become the integrant of the
next generation distributed computing systems. Therefore, integration of networking and
distributed computing systems becomes an important research problem for building the next-
generation high performance distributed information infrastructure.

In the near future, distributed application frameworks will support mobile code, multimedia data
streams, user and device mobility, and spontaneous networking [3].

Looking further into the future, essential techniques of distributed systems will be incorporated
into an emerging new area, envisioning billions of communicating smart devices forming a world-
wide distributed computing system several orders of magnitude larger than today's Internet.
Page 2 of 20
Q2. Write limitations of following technologies a.XML b. SOA c. SOAP d. RESTful
ANS:

Here are some limitations of the following technologies:

a. XML:

Parsing and processing XML documents can be computationally expensive and time-consuming,
especially for large documents.

XML is verbose and can lead to larger file sizes, which can affect network transfer times and
storage requirements.

The complexity of XML can make it difficult to read and understand, which can increase the
likelihood of errors and decrease developer productivity.

b. SOA:

SOA is a complex and highly configurable architecture that can be difficult to design and
implement correctly, leading to higher development and maintenance costs.

The decoupling of services in SOA can lead to increased network traffic, which can affect system
performance.

The service-oriented nature of SOA can lead to a proliferation of services, which can become
difficult to manage and govern.

c. SOAP:

SOAP messages can be verbose and can lead to larger file sizes, which can affect network transfer
times and storage requirements.

SOAP can be slower than other communication protocols, such as REST, due to its use of XML
and additional processing requirements.

Page 3 of 20
SOAP is tightly coupled and can be difficult to modify once deployed, which can affect system
agility and flexibility.

d. RESTful:

RESTful services can be less secure than SOAP services, as they rely on HTTP methods and do
not provide built-in security features.

RESTful services can be less reliable than SOAP services, as they rely on the statelessness of
HTTP, which can be affected by network errors and server failures.

The lack of a standardized approach to RESTful service design and documentation can lead to
inconsistencies and difficulties in service discovery and integration.

It is important to note that these technologies have their own strengths and are widely used in
various applications despite their limitations. Therefore, it is important to carefully consider the
specific requirements of an application before selecting a technology to use.

Q3. Explain working principles of 2 phase locking and 3 phase locking


ANS:

Two-phase locking (2PL) and three-phase locking (3PL) are concurrency control protocols used
in database management systems to ensure transaction atomicity, consistency, isolation, and
durability (ACID properties). They are used to prevent conflicts between transactions that may try
to access the same data simultaneously.

Here are the working principles of 2PL and 3PL:

Two-phase locking (2PL): In two-phase locking, a transaction acquires all the locks it needs
before it performs any modifications to the data. The protocol consists of two phases: the growing
phase and the shrinking phase.

Growing phase: During the growing phase, a transaction acquires locks on the data items it needs
to access. Once a lock is acquired, it cannot be released until the end of the transaction.

Shrinking phase: During the shrinking phase, a transaction releases the locks it has acquired after
it has completed its modifications to the data. Once a lock is released, it cannot be reacquired.

Page 4 of 20
The two-phase locking protocol guarantees serializability, meaning that the transactions are
executed in a way that produces the same result as if they were executed serially, one after the
other.

Three-phase locking (3PL): Three-phase locking is an extension of two-phase locking that


adds an additional phase, the validation phase, to prevent deadlock. The protocol consists of three
phases: the growing phase, the validation phase, and the shrinking phase.

Growing phase: During the growing phase, a transaction acquires locks on the data items it needs
to access.

Validation phase: During the validation phase, a transaction checks if it can acquire all the locks
it needs to complete its modifications. If it cannot, it releases all the locks it has acquired and starts
again from the beginning of the growing phase.

Shrinking phase: During the shrinking phase, a transaction releases the locks it has acquired after
it has completed its modifications to the data.

The three-phase locking protocol ensures strict two-phase locking, meaning that a transaction does
not release any locks until the end of the transaction. It also prevents deadlocks by releasing all
locks if a transaction cannot acquire all the locks it needs during the validation phase.

In summary, 2PL and 3PL are concurrency control protocols that ensure the atomicity, consistency,
isolation, and durability of transactions in a database management system. Two-phase locking
acquires all the locks before modifications and releases them all at the end, while three-phase
locking adds a validation phase to prevent deadlock.

Q4. Describe the essentiality of transaction management and explain how


transaction management works in distributed platforms.
ANS:

Transaction management is a critical aspect of distributed computing as it ensures data consistency


and integrity when multiple systems interact with each other. In distributed systems, transactions
may span multiple nodes, and failures in any of the participating nodes can cause data

Page 5 of 20
inconsistencies. Transaction management provides a mechanism for coordinating and controlling
transactions, ensuring that they are atomic, consistent, isolated, and durable (ACID).

ACID properties ensure that transactions are executed reliably and that data integrity is maintained
even in the event of failures or conflicts. The following are the essential properties of a transaction:

Atomicity: A transaction should be treated as a single, indivisible unit of work. Either all of the
changes in the transaction are committed, or none of them are committed.

Consistency: A transaction should ensure that the database is in a consistent state both before and
after the transaction is executed.

Isolation: Transactions should be executed independently of each other. Changes made by one
transaction should not affect other transactions.

Durability: Once a transaction is committed, its changes should persist even in the event of a
system failure.

In distributed systems, transaction management works by using a two-phase commit protocol


(2PC). The 2PC protocol coordinates the commit or rollback of a distributed transaction across
multiple nodes. Here's how it works:

The transaction coordinator (TC) sends a prepare message to all the nodes participating in the
transaction, asking them to prepare to commit the transaction.

Each node checks whether it can commit the transaction. If it can, it sends an agreement message
to the TC. If it cannot, it sends a abort message.

The TC collects all the agreement messages from the nodes. If it receives an abort message from
any node, it sends a rollback message to all the nodes, instructing them to abort the transaction.

If the TC receives agreement messages from all the nodes, it sends a commit message to all the
nodes, instructing them to commit the transaction.

This protocol ensures that all the nodes participating in the transaction agree to commit the
transaction before any changes are made, ensuring consistency and durability across the distributed
system.

Page 6 of 20
In summary, transaction management is essential for ensuring data consistency and integrity in
distributed systems. The two-phase commit protocol provides a mechanism for coordinating and
controlling transactions, ensuring that they are executed reliably and that data integrity is
maintained even in the event of failures or conflicts.

Q5. What are different simulation/emulation frameworks available for distributed


computing platforms to simulate, compare and contrast its capabilities
ANS:

There are several simulation and emulation frameworks available for distributed computing
platforms that allow for the testing, comparison, and optimization of various distributed computing
capabilities. Here are a few examples of such frameworks:

SimGrid: SimGrid is a simulation framework for distributed systems and applications. It provides
a platform for simulating the behavior of distributed computing systems in a controlled
environment. SimGrid can simulate different network topologies, communication protocols, and
application behaviors to analyze and optimize system performance.

CloudSim: CloudSim is a simulation framework for modeling and simulating cloud computing
environments. It provides a platform for simulating various cloud computing scenarios, including
infrastructure-as-a-service (IaaS), platform-as-a-service (PaaS), and software-as-a-service (SaaS)
models. CloudSim can help to analyze and optimize the performance, energy consumption, and
cost-effectiveness of cloud computing environments.

GridSim: GridSim is a simulation framework for modeling and simulating grid computing
environments. It provides a platform for simulating various grid computing scenarios, including
job scheduling, data management, and resource allocation. GridSim can help to analyze and
optimize the performance and efficiency of grid computing environments.

Distem: Distem is an emulation framework for distributed systems and applications. It allows for
the creation of a virtual testbed for running and testing distributed computing applications. Distem
can simulate different network topologies, communication protocols, and application behaviors to
analyze and optimize system performance.

Page 7 of 20
Shadow: Shadow is an emulation framework for network systems and applications. It provides a
platform for running and testing distributed systems and applications in a realistic environment.
Shadow can simulate various network scenarios, including different network topologies, link
delays, packet losses, and congestion, to analyze and optimize the performance of distributed
systems and applications.

These simulation and emulation frameworks can help developers and researchers to analyze,
compare, and optimize the performance of various distributed computing platforms and
applications. By simulating and emulating various scenarios, these frameworks can help to identify
potential issues and bottlenecks, and optimize the performance and efficiency of distributed
computing systems and applications.

Q6. Write different approaches to achieve high performance in distributed


environments.
ANS:

Achieving high performance in distributed environments requires careful consideration of various


factors such as communication, load balancing, fault tolerance, scalability, and latency. Here are
some different approaches that can be used to achieve high performance in distributed
environments:

Distributed Computing Frameworks: Distributed computing frameworks like Apache Hadoop,


Apache Spark, and Apache Flink provide a platform for running large-scale data processing
applications in a distributed environment. These frameworks distribute the data and processing
workload across multiple machines, which can significantly improve performance.

Load Balancing: Load balancing is the process of distributing workloads across multiple
machines in a way that ensures no machine is overloaded. This approach can help to improve
performance by making sure that resources are being used efficiently.

Caching: Caching involves storing frequently accessed data in memory, which can help to reduce
the number of times the data needs to be retrieved from disk. This approach can help to improve
performance by reducing latency.

Page 8 of 20
Parallelism: Parallelism is the use of multiple threads or processes to perform a task
simultaneously. This approach can help to improve performance by reducing the time it takes to
complete a task.

Data Partitioning: Data partitioning involves dividing a dataset into smaller, more manageable
pieces that can be processed in parallel. This approach can help to improve performance by
reducing the amount of data that needs to be processed at any one time.

Replication: Replication involves duplicating data across multiple machines. This approach can
help to improve performance by reducing the time it takes to access data, as data can be retrieved
from the nearest machine.

Message Queuing: Message queuing involves sending messages between machines using a
queuing system. This approach can help to improve performance by reducing the time it takes to
process messages, as messages can be processed asynchronously.

Distributed Database Management Systems: Distributed database management systems like


Apache Cassandra and Apache HBase provide a platform for storing and processing large datasets
across multiple machines. These systems can help to improve performance by distributing data
and processing workload across multiple machines.

Containerization: Containerization involves packaging an application and its dependencies into


a container that can be deployed on any machine. This approach can help to improve performance
by making it easier to deploy and scale applications in a distributed environment.

These approaches can be combined and customized based on the specific requirements of a
distributed system to achieve high performance.

Q7. Explain major distributed platform areas and its algorithm strengths and
Weakness
ANS:

Distributed platforms are designed to support the execution of complex distributed applications
across multiple nodes in a network. They typically provide a set of services and APIs to enable the
development of distributed applications that can leverage the underlying infrastructure's

Page 9 of 20
processing and storage capabilities. Here are some major areas of distributed platforms and their
algorithm strengths and weaknesses:

Distributed Storage:

Distributed storage systems are designed to store and manage large amounts of data across multiple
nodes in a network. These systems typically use algorithms such as distributed hash tables (DHTs)
and gossip protocols to manage data distribution, replication, and consistency.

Strengths:

High availability and fault tolerance: Data is distributed across multiple nodes, making it highly
available and resilient to node failures.

Scalability: The storage capacity can be easily increased by adding more nodes to the network.

Low latency: Data can be accessed quickly from the node closest to the user.

Weaknesses:

Consistency: Consistency can be a challenge in distributed storage systems, especially in the


presence of concurrent updates.

Complexity: Managing a distributed storage system can be complex due to the need for data
distribution, replication, and consistency.

Distributed Computing:

Distributed computing systems are designed to distribute computational tasks across multiple
nodes in a network. These systems typically use algorithms such as MapReduce, Apache Spark,
and Hadoop to distribute tasks, process data, and aggregate results.

Strengths:

Scalability: Computational resources can be easily scaled up or down by adding or removing nodes
from the network.

Fault tolerance: Distributed computing systems can continue to function even if some nodes fail.

Page 10 of 20
Efficiency: Distributed computing systems can process large amounts of data in a relatively short
amount of time.

Weaknesses:

Overhead: The overhead of data transfer and coordination between nodes can affect performance.

Complexity: Developing and managing distributed computing systems can be complex due to the
need for distributed data processing, task coordination, and error handling.

Distributed Messaging:

Distributed messaging systems are designed to enable messaging and event-driven communication
between nodes in a network. These systems typically use algorithms such as publish/subscribe and
message queuing to route messages and events between nodes.

Strengths:

Scalability: Messaging systems can handle large volumes of messages and events.

Decoupling: Messaging systems enable loose coupling between components in a distributed


system, making it easier to maintain and update the system.

Asynchronous: Messaging systems can process messages and events asynchronously, allowing
for more efficient use of computational resources.

Weaknesses:

Ordering: Ensuring message ordering can be a challenge in distributed messaging systems.

Reliability: Messaging systems can be affected by network latency and failures, which can affect
reliability.

In summary, distributed platforms are designed to provide a set of services and APIs to enable the
development of distributed applications that can leverage the underlying infrastructure's
processing and storage capabilities. Each of the major areas of distributed platforms has its own
algorithm strengths and weaknesses, which must be carefully considered when designing and
implementing distributed applications.

Page 11 of 20
Q8. Write different clock synchronization and leadership algorithms for distributed
platforms.
ANS:

Clock synchronization for distributed platforms

Distributed System is a collection of computers connected via the high speed


communication network. In the distributed system, the hardware and software
components communicate and coordinate their actions by message passing. Each node in
distributed systems can share their resources with other nodes. So, there is need of proper
allocation of resources to preserve the state of resources and help coordinate between the
several processes. To resolve such conflicts, synchronization is used. Synchronization in
distributed systems is achieved via clocks.
The physical clocks are used to adjust the time of nodes. Each node in the system can
share its local time with other nodes in the system. The time is set based on UTC
(Universal Time Coordination). UTC is used as a reference time clock for the nodes in
the system.

The clock synchronization can be achieved by 2 ways: External and Internal Clock
Synchronization.

1. External clock synchronization is the one in which an external reference clock is


present. It is used as a reference and the nodes in the system can set and adjust their
time accordingly.
2. Internal clock synchronization is the one in which each node shares its time with other
nodes and all the nodes set and adjust their times accordingly.
There are 2 types of clock synchronization algorithms: Centralized and Distributed.

1. Centralized is the one in which a time server is used as a reference. The single time
server propagates its time to the nodes and all the nodes adjust the time accordingly.
It is dependent on single time server so if that node fails, the whole system will lose
synchronization. Examples of centralized are- Berkeley Algorithm, Passive Time
Server, Active Time Server etc.

Page 12 of 20
2. Distributed is the one in which there is no centralized time server present. Instead the
nodes adjust their time by using their local time and then, taking the average of the
differences of time with other nodes. Distributed algorithms overcome the issue of
centralized algorithms like the scalability and single point failure. Examples of
Distributed algorithms are – Global Averaging Algorithm, Localized Averaging
Algorithm, NTP (Network time protocol) etc.
Leadership algorithms for distributed platforms

Many distributed election algorithms have been proposed to resolve the problem of leader election.
Among all the existing algorithms, the most prominent algorithms are as:

 Bully Algorithm presented by Gracia-Molina in 1982.


 Improved Bully Election Algorithm in Distributed System presented by A.Arghavani in
2011.
 Modified Bully Election Algorithm in Distributed Systems presented by M.S.Kordafshari
and group.
 Ring Algorithm
 Modified Ring Algorithm

BULLY ALGORITHM Bully

Algorithm is one of the most promising election algorithms which were presented by Gracia
Molina in 1982.

Disadvantages: Bully algorithm has following disadvantages.

 It required that every process should know the identity of every other process in the system so
it takes very large space in the system.

 It has high number of message passing during communication which increases heavy
traffic .the message passing has order o (n2).

IMPROVED BULLY ELECTION ALGORITHM

It also overcomes the disadvantages of the original bully. The main concept of this algorithm is
that the algorithm declares the new coordinator before actual or current coordinator is crashed.

Page 13 of 20
Disadvantages

 It has complex structure.

 Every time process updates its database.

 Large database required to maintain the information of each process in database of every
process.

MODIFIED ELECTION ALGORITHM

This algorithm resolve the disadvantages of the bully algorithm.

Disadvantages

 A modified algorithm is also time bounded.

 It is better than bully but also has o (n2) complexity in worst case.

 It is necessary for all process to know the priority of other.

RING ALGORITHM

This election algorithm is based on the use of a ring. We assume that the processes are physically
or logically ordered, so that each process knows who its successor[4].

Q9. Write a short note on evolution of distributed programming with proper


references.
ANS:

In this article, we will see the history of distributed computing systems from the mainframe era
to the current day to the best of my knowledge. It is important to understand the history of
anything in order to track how far we progressed. The distributed computing system is all about
evolution from centralization to decentralization, it depicts how the centralized systems evolved
from time to time towards decentralization. We had a centralized system like mainframe in early
1955 but now we are probably using a decentralized system like edge computing and containers.

1. Mainframe: In the early years of computing between 1960-1967, mainframe-based


computing machines were considered as the best solution for processing large-scale data as they

Page 14 of 20
provided time-sharing to a local clients who interacts with teletype terminals. This type of system
conceptualized the client-server architecture. The client connects and request the server and the
server processes these request, enabling a single time-sharing system to send multiple resources
over a single medium amongst clients. The major drawback it faced was that it was quite
expensive and that lead to the innovation of early disk-based storage and transistor memory.
2. Cluster Networks: In the early 1970s, the development of packet-switching and cluster
computing happens which was considered an alternative for mainframe systems although it was
expensive. In cluster computing, the underlying hardware consists of a collection of similar
workstations or PCs, closely connected by means of a high-speed local-area network where each
node runs the same operating system
3. Internet & PC’s: During this era, the evolution of the internet takes place. New technology
such as TCP/IP had begun to transform the Internet into several connected networks, linking
local networks to the wider Internet. Thus, the number of hosts connected to the network began
to grow rapidly, therefore the centralized naming systems such as HOSTS.TXT couldn’t provide
scalability.
4. World Wide Web: During the 1980 – the 1990s, the creation of HyperText Transfer Protocol
(HTTP) and HyperText Markup Language (HTML) resulted in the first web browsers, websites,s,
and web-server. It was developed by Tim Berners Lee at CERN. Standardization of TCP/IP
provided infrastructure for interconnected networks of networks known as the World Wide Web
(WWW). This leads to the tremendous growth of the number of hosts connected to the Internet.
As the number of PC-based application programs running on independent machines started
growing, the communications between such application programs became extremely complex
and added a growing challenge in the aspect of application-to-application interaction. With the
advent of Network computing which enables remote procedure calls (RPCs) over TCP/IP, it
turned out to be a widely accepted way for application software communication
5. P2P, Grids & Web Services: Peer-to-peer (P2P) computing or networking is a distributed
application architecture that partitions tasks or workloads between peers without the requirement
of a central coordinator. Peers share equal privileges. In a P2P network, each client acts as a
client and server.P2P file sharing was introduced in 1999 when American college student Shawn
Fanning created the music-sharing service Napster.P2P networking enables decentralized
internet. With the introduction of Grid computing, multiple tasks can be completed by computers

Page 15 of 20
jointly connected over a network. It basically makes use of a data grid i.e., a set of computers
can directly interact with each other to perform similar tasks by using middleware. During 1994
– 2000, we also saw the creation of effective x86 virtualization. With the introduction of web
service, platform-independent communication was established which uses XML-based
information exchange systems that use the Internet for direct application-to-application
interaction. Through web services Java can talk with Perl; Windows applications can talk with
Unix applications. Peer-to-peer networks are often created by collections of 12 or fewer
machines.
6. Cloud, Mobile & IoT: Cloud computing came up with the convergence of cluster technology,
virtualization, and middleware. Through cloud computing, you can manage your resources and
applications online over the internet without explicitly building on your hard drive or server. The
major advantage is provided that it can be accessed by anyone from anywhere in the world. Many
cloud providers offer subscription-based services. After paying for a subscription, customers can
access all the computing resources they need. Customers no longer need to update outdated
servers, buy hard drives when they run out of storage, install software updates or buy a software
licenses. The vendor does all that for them. Mobile computing allows us to transmit data, such
as voice, and video over a wireless network. We no longer need to connect our mobile phones
with switches

The evolution of Application Programming Interface (API) based communication over the REST
model was needed to implement scalability, flexibility, portability, caching, and security. Instead
of implementing these capabilities at each and every API separately, there came the requirement
to have a common component to apply these features on top of the API. This requirement leads
the API management platform evolution and today it has become one of the core features of any
distributed system. Instead of considering one computer as one computer, the idea to have
multiple systems within one computer came into existence.

7. Fog and Edge Computing: When the data produced by mobile computing and IoT services
started to grow tremendously, collecting and processing millions of data in real-time was still an
issue. This leads to the concept of edge computing in which client data is processed at the
periphery of the network, it’s all about the matter of location. That data is moved across a WAN
such as the internet, processed, and analyzed closer to the point such as corporate LAN, where

Page 16 of 20
it’s created instead of the centralized data center which may cause latency issues. Fog computing
greatly reduces the need for bandwidth by not sending every bit of information over cloud
channels, and instead aggregating it at certain access points. This type of distributed strategy
lowers costs and improves efficiencies. Companies like IBM are the driving force behind fog
computing. The composition of Fog and Edge computing further extends the Cloud computing
model away from centralized stakeholders to decentralized multi-stakeholder systems which are
capable of providing ultra-low service response times, and increased aggregate bandwidths.
Today distributed system is programmed by application programmers while the underlying
infrastructure management is done by a cloud provider. This is the current state of distributed
systems of computing and it keeps on evolving.

Q10. Describe the current limitations and Strengths of leading distributed computing
platforms with proper references.
ANS:

Distributed computing platforms have become increasingly popular in recent years due to their
ability to handle large-scale and complex computations. Here are the current limitations and
strengths of some leading distributed computing platforms:

 Apache Hadoop: Strengths: Hadoop is widely used for processing large volumes of data in a
fault-tolerant and scalable manner. It supports a variety of data sources and offers a flexible
data processing framework. Hadoop is well-supported by a large community and offers a
variety of tools for data analytics.

Limitations: Hadoop has a high latency due to its reliance on disk I/O and can suffer from
performance issues when processing small files. It is not designed for real-time data processing
and can be complex to set up and manage.

 Apache Spark: Strengths: Spark is a high-performance data processing engine that can handle
both batch and real-time processing. It supports a variety of data sources and offers a flexible
programming model. Spark is well-supported by a large community and offers a variety of
tools for data analytics.

Page 17 of 20
Limitations: Spark can be memory-intensive and requires a significant amount of resources to run.
It may not be suitable for processing extremely large datasets, and its real-time processing
capabilities are limited compared to other platforms.

 Apache Flink: Strengths: Flink is a high-performance data processing engine that offers both
batch and stream processing capabilities. It is designed for low-latency data processing and
offers a flexible programming model. Flink is well-suited for complex event processing and
real-time analytics.

Limitations: Flink is relatively new compared to other distributed computing platforms and may
not have as large of a community or ecosystem of tools. It may also require more expertise to set
up and manage compared to other platforms.

 Apache Kafka: Strengths: Kafka is a high-performance messaging system that can handle
large volumes of data streams. It offers low-latency and high-throughput data processing and
is well-suited for real-time data processing and event-driven architectures. Kafka is widely
used for building scalable and reliable data pipelines.

Limitations: Kafka is not a full-featured data processing engine and may require additional tools
for data processing and analytics. It may also require more expertise to set up and manage
compared to other messaging systems.

In summary, distributed computing platforms have strengths and limitations that should be
considered when selecting a platform for a particular use case. Apache Hadoop, Spark, Flink, and
Kafka are popular platforms with different strengths and limitations that make them suitable for
different types of data processing and analytics.

Page 18 of 20
Reference
[1] M. van Steen and A. S. Tanenbaum, “A brief introduction to distributed
systems,” Computing, vol. 98, no. 10, pp. 967–1009, 2016, doi:
10.1007/s00607-016-0508-7.
[2] S. S. Yau, “Challenges and Future Trends of Distributed Computing
Systems,” pp. 758–758, 2011, doi: 10.1109/hpcc.2011.151.
[3] J. Brier and lia dwi jayanti, No 主観的健康感を中心とした在宅高齢者に
おける 健康関連指標に関する共分散構造分析Title, vol. 21, no. 1. 2020.
[Online]. Available: http://journal.um-
surabaya.ac.id/index.php/JKM/article/view/2203
[4] S. Balhara and K. Khanna, “Leader Election Algorithms in Distributed
Systems,” Int. J. Comput. Sci. Mob. Comput., vol. 3, no. 6, pp. 374–379,
2014.
[5] Zaharia, M., et al. (2010). "Spark: Cluster Computing with Working Sets."
Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud
Computing.
[6] Zaharia, M., et al. (2016). "Apache Flink: Stream and Batch Processing in a
Single Engine." IEEE Data Engineering Bulletin.
[7] Apache Kafka (2022). "Why Use Kafka?" Retrieved from

Web sites
1. https://www.geeksforgeeks.org/evolution-of-distributed-computing-systems/
2. https://insights.daffodilsw.com/blog/distributed-cloud-computing-benefits-and-limitations
3. https://www.geeksforgeeks.org/limitation-of-distributed-system/
4. https://www.techtarget.com/whatis/definition/distributed-computing
5. https://www.geeksforgeeks.org/synchronization-in-distributed-systems/

Page 19 of 20

You might also like