PDC 1.1
PDC 1.1
Distributed System
Sharing resources such as hardware, software, and data is one of the
principles of cloud computing. With different levels of openness to the
software and concurrency, it’s easier to process data simultaneously
through multiple processors. The more fault-tolerant an application is,
the more quickly it can recover from a system failure.
1. Software architecture
i) Layered architecture
Object-based Architecture
2. System architecture
System-level architecture focuses on the entire system and the
placement of components of a distributed system across multiple
machines. The client-server architecture and peer-to-peer architecture
are the two major system-level architectures that hold significance today.
An example would be an ecommerce system that contains a service layer,
a database, and a web front.
i) Client-server architecture
As the name suggests, client-server architecture consists of a client and
a server. The server is where all the work processes are, while the client
is where the user interacts with the service and other resources (remote
server). The client can then request from the server, and the server will
respond accordingly. Typically, only one server handles the remote side;
however, using multiple servers ensures total safety.
Client-server Architecture
Peer-to-Peer Architecture
Pipeline system is like the modern day assembly line setup in factories.
For example in a car manufacturing industry, huge assembly lines are
setup and at each point, there are robotic arms to perform a certain task,
and then the car moves on ahead to the next arm.
Types of Pipeline
It is divided into 2 categories:
1. Arithmetic Pipeline
2. Instruction Pipeline
Arithmetic Pipeline
Arithmetic pipelines are usually found in most of the computers. They
are used for floating point operations, multiplication of fixed point
numbers etc. For example: The input to the Floating Point Adder pipeline
is:
X = A*2^a
Y = B*2^b
Here A and B are mantissas (significant digit of floating point numbers),
while a and b are exponents.
The floating point addition and subtraction is done in 4 parts:
1. Compare the exponents.
2. Align the mantissas.
3. Add or subtract mantissas
4. Produce the result.
Registers are used for storing the intermediate results between the above
operations.
Instruction Pipeline
Let’s take an example, summing the contents of an array of size N. For a single-
core system, one thread would simply sum the elements [0] . . . [N − 1]. For a
dual-core system, however, thread A, running on core 0, could sum the elements
[0] . . . [N/2 − 1] and while thread B, running on core 1, could sum the elements
[N/2] . . . [N − 1]. So the Two threads would be running in parallel on separate
computing cores.
Task Parallelism
Consider again our example above, an example of task parallelism might involve
two threads, each performing a unique statistical operation on the array of
elements. Again The threads are operating in parallel on separate computing
cores, but each is performing a unique operation.
The key differences between Data Parallelisms and Task Parallelisms are −
1. Same task are performed on different 1. Different task are performed on the same
subsets of same data. or different data.
3. As there is only one execution thread 3. As each processor will execute a different
operating on all sets of data, so the thread or process on the same or different set
speedup is more. of data, so speedup is less.
1. Shared Memory
Shared memory systems provide a single, unified memory space accessible by all
processors in a computer. Imagine a whiteboard where multiple people can write
and read simultaneously.
Physically, the memory resides in a central location, accessible by all processors
through a high-bandwidth connection like a memory bus. Hardware enforces data
consistency, ensuring all processors see the same value when accessing a shared
memory location.
Fig-Shared Memory
Hardware Mechanisms for Shared Memory
Memory Bus: The shared memory resides in a central location (DRAM) and is
connected to all processors via a high-bandwidth memory bus. This bus acts as a
critical communication channel, allowing processors to fetch and store data from
the shared memory. However, with multiple processors vying for access, the bus
can become a bottleneck, limiting scalability.
Cache Coherence: To ensure all processors see the same value when accessing a
shared memory location, cache coherence protocols are implemented. These
protocols maintain consistency between the central memory and the private caches
of each processor. There are various cache coherence protocols with varying
trade-offs between performance and complexity.
Disadvantages
• Scalability: Adding processors becomes complex as the shared
memory bus becomes a bottleneck for communication.
• Limited Memory Size: The total memory capacity is restricted by the
central memory unit.
• Single Point of Failure: A hardware failure in the shared memory can
bring the entire system down.
Applications
• Multiprocessor systems designed for tight collaboration between
processes, like scientific simulations with frequent data sharing.
• Operating systems for efficient task management and resource sharing.
Distributed Memory
Distributed memory systems consist of independent processors, each with its local
private memory. There’s no single shared memory space. Communication
between processors happens explicitly by sending and receiving messages.
Fig-Distributed Memory
What is Process?
Processes are basically the programs that are dispatched from the ready state and
are scheduled in the CPU for execution. PCB ( Process Control Block ) holds the
context of process. A process can create other processes which are known as Child
Processes. The process takes more time to terminate, and it is isolated means it
does not share the memory with any other process. The process can have the
following states new, ready, running, waiting, terminated and suspended.
Thread?
Threads are often called “lightweight processes” because they share some features
of processes but are smaller and faster. Each thread is always part of one specific
process. A thread has three states: Running, Ready and Blocked.
A thread takes less time to terminate as compared to the process but unlike the
process, threads do not isolate.
Examples:
• Imagine a word processor that works with two separate tasks happening
at the same time. One task focuses on interacting with the user, like
responding to typing or scrolling, while the other works in the
background to adjust the formatting of the entire document.
For example if you delete a sentence on page 1, the user-focused task
immediately tells the background task to reformat the entire book. While
the background task is busy reformatting, the user-focused task continues
to handle simple actions like letting you scroll through page 1 or click on
things.
• When you use a web browser, threads are working behind the scenes to
handle different tasks simultaneously.
For example: One thread is loading the webpage content (text, images,
videos). Another thread is responding to your actions like scrolling,
clicking, or typing.
A separate thread might be running JavaScript to make the webpage
interactive.
This multitasking makes the browser smooth and responsive.
For instance you can scroll through a page or type in a search bar while
the rest of the page is still loading. If threads weren’t used the browser
would freeze and wait for one task to finish before starting another.
Threads ensure everything feels fast and seamless.
Process vs Thread
Difference Between Process and Thread
The table below represents the difference between process and thread.
Process Thread
It takes more time for creation. It takes less time for creation.
Process switching uses an interface in Thread switching may not require calling
an operating system. involvement of operating system.
Advantages of Process
• Processes work independently in their own memory, ensuring no
interference and better security.
• Resources like CPU and memory are allocated effectively to optimize
performance.
• Processes can be prioritized to ensure important tasks get the resources
they need.
Disadvantages of Process
• Frequent switching between processes can slow down the system and
reduce speed.
• Improper resource management can cause deadlocks where processes
stop working and block progress.
• Having too many processes can make the process table take up a lot of
memory. This can also make searching or updating the table slower,
which can reduce system performance.
Advantages of Thread
• When there is a lot of computing and input/output (I/O) work, threads
help tasks run at the same time, making the app faster.
• Another advantage for having threads is that since they are lighter
weight than processes, they are easier (i.e., faster) to create and destroy
than processes.
• Many apps need to handle different tasks at the same time. For
example, a web browser can load a webpage, play a video, and let you
scroll all at once. Threads make this possible by dividing these tasks
into smaller parts that can run together.
Disadvantages of Thread
• Threads in the same process are not completely independent like
separate processes. They share the same memory space including global
variables. This means one thread can accidentally change or even erase
another thread’s data as there is no protection between them.
• Threads also share resources like files. For example – if one thread
closes a file while another is still using it, it can cause errors or
unexpected behavior.
• If too many threads are created they can slow down the system or cause
it to run out of memory.
CAP Theorem?
The CAP theorem is a fundamental concept in distributed systems theory that
was first proposed by Eric Brewer in 2000 and subsequently shown by Seth
Gilbert and Nancy Lynch in 2002. It asserts that all three of the following
qualities cannot be concurrently guaranteed in any distributed data system:
1. Consistency
Consistency means that all the nodes (databases) inside a network will have the
same copies of a replicated data item visible for various transactions. It
guarantees that every node in a distributed cluster returns the same, most recent,
and successful write. It refers to every client having the same view of the data.
There are various types of consistency models. Consistency in CAP refers to
sequential consistency, a very strong form of consistency.
For example, a user checks his account balance and knows that he has 500
rupees. He spends 200 rupees on some products. Hence the amount of 200 must
be deducted changing his account balance to 300 rupees. This change must be
committed and communicated with all other databases that hold this user’s
details. Otherwise, there will be inconsistency, and the other database might
show his account balance as 500 rupees which is not true.
Consistency problem
2. Availability
Availability means that each read or write request for a data item will either be
processed successfully or will receive a message that the operation cannot be
completed. Every non-failing node returns a response for all the read and write
requests in a reasonable amount of time. The key word here is “every”. In
simple terms, every node (on either side of a network partition) must be able to
respond in a reasonable amount of time.
For example, user A is a content creator having 1000 other users subscribed to
his channel. Another user B who is far away from user A tries to subscribe to
user A’s channel. Since the distance between both users are huge, they are
connected to different database node of the social media network. If the
distributed system follows the principle of availability, user B must be able to
subscribe to user A’s channel.
Availability problem
3. Partition Tolerance
Partition tolerance means that the system can continue operating even if the
network connecting the nodes has a fault that results in two or more partitions,
where the nodes in each partition can only communicate among each other. That
means, the system continues to function and upholds its consistency guarantees
in spite of network partitions. Network partitions are a fact of life. Distributed
systems guaranteeing partition tolerance can gracefully recover from partitions
once the partition heals.
For example, take the example of the same social media network where two
users are trying to find the subscriber count of a particular channel. Due to some
technical fault, there occurs a network outage, the second database connected
by user B losses its connection with first database. Hence the subscriber count
is shown to the user B with the help of replica of data which was previously
stored in database 1 backed up prior to network outage. Hence the distributed
system is partition tolerant.
Partition Tolerance
The CAP theorem states that distributed databases can have at most two
of the three properties: CONSISTENCY, AVAILABILITY, AND
PARTITION TOLERANCE. As a result, database systems prioritize only
two properties at a time.
CAP diagram
2. AP (Availability and Partition Tolerance)
These types of system are distributed in nature, ensuring that the request sent
by the user to view or modify the data present in the database nodes are not
dropped and are processed in presence of a network partition.
The system prioritizes availability over consistency and can respond with
possibly stale data which was replicated from other nodes before the partition
was created due to some technical failure. Such design choices are generally
used while building social media websites such as Facebook, Instagram, Reddit,
etc. and online content websites like YouTube, blog, news, etc. where
consistency is usually not required, and a bigger problem arises if the service is
unavailable causing corporations to lose money since the users may shift to new
platform. The system can be distributed across multiple nodes and is designed
to operate reliably even in the face of network partitions.
Example databases: Amazon DynamoDB, Google Cloud Spanner.
Q7: What are Types of parallelism? Explain Data parallelism, task parallelism with block
diagram and example.
Q8: Explain Flynn’s taxonomy for Parallel computing models. Also compare SIMD, MIMD,
in details.
Q10: Detail differences between Process and Thread. Also discuss multithreading
concept.
Q11: Explain CAP Theorem. Also, discuss Trade-Offs in the CAP Theorem.