Distributed-Computing-Module-4-Important-Topics-PYQs
Distributed-Computing-Module-4-Important-Topics-PYQs
Important-Topics-PYQs
For more notes visit
https://rtpnotes.vercel.app
Distributed-Computing-Module-4-Important-Topics-PYQs
1. Explain no orphans consistency condition.
No-Orphans Consistency in Distributed Systems
2. List any three advantages of using Distributed Shared Memory.
Advantages of DSM
1️. Easy Communication Between Computers
2. Single Address Space (No Need to Move Data)
3. Faster Access Using Locality of Reference
4️. Cost-Effective (Cheaper Than Multiprocessor Systems)
5. No Single Memory Bottleneck
6️. Virtually Unlimited Memory
7️. Portable Across Different Systems
3. Differentiate between coordinated checkpointing and uncoordinated checkpointing
- A. Uncoordinated Checkpointing (Independent Checkpoints)
- B. Coordinated Checkpointing (Planned Checkpoints)
4. List the different types of Messages in Rollback Recovery.
What is Rollback Recovery?
Message Handling in Rollback Recovery
5. Discuss about the issues in implementing distributed shared memory software.
1. How should sharing work?
2. How to make sharing happen?
3. Where to keep the data copies?
4. How to find the data?
5. Too much back-and-forth!
6. Differentiate between deterministic and non-deterministic events in log based
rollback recovery.
1. Deterministic Events
2. Non-Deterministic Events
7. Show that Lamport's Bakery algorithm for shared memory mutual exclusion, satisfy
the three requirements of critical section problem.
What is lamports bakery algorithm?
Steps in the Bakery Algorithm
Step 1: Choose a ticket number
Step 2: Wait for your turn
Step 3: Enter the critical section
Step 4: Leave the critical section
The three requirements of critical section problem
1. Mutual Exclusion
2. Bounded Waiting
3. Progress
8. What are the issues in failure recovery? Illustrate with suitable examples.
Example
Now imagine the tower falls (like a system crash), and you're trying to rebuild it:
Now you’re stuck. You can’t rebuild your part correctly. You're like an orphan—left behind with
no way to move forward.
That way, even if someone disappears, the group can still rebuild the entire tower correctly. No
one is left behind.
If a process depends on a non-deterministic event, then the event’s determinant must be:
This ensures that no process is left with missing information after a failure, and the system
can always recover to a consistent state.
Advantages of DSM
✔ All computers see the same memory instead of handling multiple copies.
✔ No need to move data back and forth between systems.
🛠 Example: If you frequently use a book, you keep it on your desk instead of going to the
library every time.
🛠 Example: Instead of one cashier handling all customers, multiple cashiers serve
different people at once.
✔ DSM combines memory from multiple computers into one large memory space.
✔ Programs can run as if they have huge memory available.
🛠 Example: Instead of one water tank, DSM connects multiple tanks to store more water.
7️. Portable Across Different Systems
✔ DSM programs work on different operating systems without modification.
✔ The interface remains the same, making it easy to develop applications.
🛠 Example: Just like Google Docs works on Windows, Mac, and Mobile, DSM works
across different computers without changes.
Example:
If an airline booking system crashes after booking 50 tickets, it restores the last saved
checkpoint (e.g., after booking 40 tickets) and reprocesses the last 10 tickets.
Types of Messages:
1. In-Transit Messages
These messages were sent but not yet received at the time of failure.
Think of them as messages "on the way" when the system crashed.
They don’t cause inconsistency and are expected to be delivered eventually.
Example: Message m1 is sent but not yet received.
2. Lost Messages
These were sent by a process, but the receiving process rolled back to a state
before receiving them.
The sender remembers the message, but the receiver doesn’t — like someone saying
something, but you forgot it after fainting.
Example: Message m1 after rollback.
3. Delayed Messages
These messages were sent, but arrived too late — either the receiver was down or
already rolled back.
The message shows up after the rollback point, so the system can’t use it directly.
Example: Messages m2 and m5 .
4. Orphan Messages
These are received messages for which the send event was rolled back.
It looks like a message was received without anyone sending it, which breaks
consistency.
Example: Message H becomes orphan when sender Pi rolls back, but receiver Pj
doesn’t.
5. Duplicate Messages
These happen when messages are logged and then replayed after rollback, causing
the receiver to receive them again.
Re-sending is fine, but re-receiving without checking can cause duplicates.
When many computers access the same data, we need clear rules.
For example:
If two computers try to change the same data at the same time, what should happen?
Should one wait? Should both be allowed?
If the rules aren’t clear, programs might behave weirdly or incorrectly. So the first
challenge is just deciding how sharing will actually work.
Let’s say you and 3 friends are working on a group project. You have one shared notebook. If
every time someone wants to read or write something, they have to come all the way to you to
look at the notebook — it’s going to be:
Very slow
Annoying if you're busy
Everyone has to wait in line!
A solution would be
Make copies of the notebook and give one to each friend. That way:
Everyone can read from their own copy whenever they want.
Things become faster and more convenient.
That’s exactly what replication (copying) does in DSM.
If we’re not copying everything everywhere (because that’s slow), we need to decide:
For example:
If you always use data that’s stored far away, everything becomes slow. So we want to keep
data closer to where it's needed most.
If there are multiple copies in different places, it gets tricky to know which one is correct or up-
to-date.
If computers keep asking each other for data all the time, it leads to:
Slower performance
A lot of messages flying around
So the system needs to be smart and try to reduce how much they talk to each other. It
should make sharing smooth, without everyone shouting over the network all the time.
1. Deterministic Events
2. Non-Deterministic Events
Lamport's Bakery Algorithm is a mutual exclusion algorithm designed to solve the critical
section problem in distributed systems, particularly in systems with shared memory. It works
by ensuring that only one process can access the critical section (CS) at a time while respecting
the rules of mutual exclusion, bounded waiting, and progress.
The name "Bakery" comes from the analogy of a bakery where customers take a number
when they enter and are served in the order of the numbers.
In this algorithm, each process follows a set of steps to enter the critical section.
When a process wants to enter the critical section, it picks a ticket number.
This ticket number is assigned based on the maximum ticket number from all other
processes, plus one.
Essentially, the ticket number for a process i is the highest number of tickets currently
assigned + 1.
Choosing a ticket number ensures an orderly sequence, meaning the process with the
smallest number gets to enter the CS(Critical Section) first.
Once a process has the smallest ticket number, it enters the critical section and performs its
work.
After completing its work in the critical section, the process resets its ticket number to 0.
This step is critical because it signals that the process is no longer requesting access to the
CS.
1. Mutual Exclusion
Mutual exclusion means that only one process can be inside the critical section at a time.
How the Bakery Algorithm Ensures Mutual Exclusion:
A process picks a ticket number greater than the current maximum ticket number.
If two processes pick the same number, the one with the smaller process ID will proceed
first. This guarantees no conflicts in entry to the critical section.
2. Bounded Waiting
Bounded waiting means that there is a limit on the number of times other processes can
enter the critical section before the requesting process.
When a process picks a ticket, it waits until its ticket number is the smallest. If multiple
processes have the same ticket, the one with the lower process ID will proceed first.
After a process picks a ticket, the only way a process can get ahead of it is if another
process (with a larger ticket number) goes first.
Since each process can only overtake another process at most once (if two processes pick
the same ticket number), the waiting time is bounded.
Suppose process i picks a ticket, and process j picks the same ticket. The next time j
picks a ticket, its value will definitely be greater than i ’s ticket, thus i can eventually
enter the critical section.
3. Progress
Progress means that if no process is in the critical section, then a process wishing to enter
the critical section should be able to do so, provided it follows the rules.
The algorithm guarantees that, at any point in time, the process with the smallest ticket
number will eventually enter the critical section.
This is because the ticket numbers are assigned in a way that no two processes can have
conflicting orders.
Lexicographic ordering (first by ticket number and then by process ID) ensures that the
process with the smallest ticket or lexicographically smallest combination of ticket number
and process ID will always proceed first.
As soon as a process’s ticket number becomes the smallest, it enters the critical section.
Since the ticket number is always updated based on the max ticket number across all
processes, the system ensures that the next process in line will always be able to proceed
to the CS.
8. What are the issues in failure recovery? Illustrate with
suitable examples.
Failure recovery involves restoring the system to a consistent state after a failure. The following
issues arise during this process:
1. Orphan Messages: These occur when a process receives a message, but the sender's
state is rolled back and it no longer has a record of sending that message.
2. Cascading Rollbacks: To remove orphan messages, other processes may also need to roll
back to earlier checkpoints, leading to a chain reaction of rollbacks.
3. Lost Messages: A message is considered lost if the sender remembers sending it (after
recovery), but the receiver has no record of receiving it due to rollback.
4. Delayed Messages: Messages that are still in transit during failure can arrive at
unpredictable times—before, during, or after recovery—causing inconsistency.
5. Delayed Orphan Messages: These messages arrive at the receiver even though their
send event has been undone due to rollback. Such messages must be discarded.
6. Duplicate Messages: If message logging is used, messages may be replayed during
recovery, which can result in duplicates if not handled properly.
7. Overlapping Failures: If multiple processes fail around the same time, it leads to
complications like amnesia and makes consistent recovery more difficult.
Example
Imagine 3 processes: Pi, Pj, and Pk.
They talk to each other by sending messages over a FIFO (First In First Out) network.
Each process takes checkpoints to save their progress:
Pi: {Ci,0, Ci,1}
Pj: {Cj,0, Cj,1, Cj,2}
Pk: {Ck,0, Ck,1}
Let’s say they exchange messages A to J.
Now, Pi fails suddenly.
When the process fails, it rollbacks to the last checkpoint, which is Ci,1
Pj also needs to be rolled back, this is because
Since both the send and receive events are recorded, this state is consistent.
2. Inconsistent State
A process shows that it received a message, but the corresponding send event is
missing.
This situation is impossible in a correct failure-free execution.
Key Concepts
2. Checkpoints
To reduce the work needed for recovery, processes take checkpoints at regular
intervals. A checkpoint records the current state of a process. If a failure occurs, the
system can recover from the most recent checkpoint and then use the logs to replay the
events that happened after that checkpoint.
3. Recovery After Failure
When a failure occurs, the process will recover by:
Returning to its last checkpoint.
Using the logged events to re-execute the non-deterministic events (like receiving
messages).
This ensures that the process's execution will be identical to the pre-failure
execution, avoiding any inconsistencies.
4. No-Orphans Consistency
A critical condition in rollback recovery is the no-orphans consistency condition. This
condition ensures that after a failure, no process is left in an inconsistent state, also
known as an "orphan."
Every process must have the correct log information or a stable checkpoint to avoid
such inconsistencies.
1. Pessimistic Logging
Pessimistic logging assumes that failures can happen at any time. Therefore, it logs
each non-deterministic event before it happens. This ensures that if a failure occurs, the
system can always roll back to a known consistent state.
Synchronous logging is a feature of pessimistic logging, where logs are immediately
written to stable storage before proceeding. While safe, this can introduce overhead
since the system must wait for the log to be saved before continuing.
Checkpointing is also used to minimize recovery time.
2. Optimistic Logging
Optimistic logging is based on the assumption that failures are rare. It logs events
asynchronously, meaning the system doesn't immediately save the logs but does so
periodically.
This reduces overhead but allows temporary inconsistencies (orphans) to appear, which
will eventually be fixed during recovery.
The system tracks dependencies between processes to ensure proper recovery in case
of failure.
3. Causal Logging
Causal logging combines the strengths of both pessimistic and optimistic logging. It
allows processes to log asynchronously but still ensures no orphan processes are
created. It tracks causal dependencies (the relationship between events) to ensure that
recovery can be done efficiently without losing consistency.
This method provides a balanced approach, reducing overhead while maintaining
consistency.
This is exactly how Checkpointing and Rollback Recovery work in distributed systems to
handle failures!
What is Checkpointing?
Checkpointing is the process of saving the state of a process at a certain point so that if a
failure occurs, it can restart from that point instead of starting over.
🛠 Example:
A bank transaction system saves its progress after every 100 transactions. If the system
crashes after 150 transactions, it restores the last saved state (100 transactions) and
reprocesses the last 50 instead of all 150.
1. Detect Failure 🚨
The system notices that a process has failed.
2. Restore Checkpoint 🔄
The system reloads the last saved checkpoint for that process.
3. Re-execute ⏩
The process continues from where it left off, avoiding major data loss.
🛠 Example:
If an airline booking system crashes after booking 50 tickets, it restores the last saved
checkpoint (e.g., after booking 40 tickets) and reprocesses the last 10 tickets.
🛠 Example:
If process P1 rolls back, but it has sent data to P2, then P2 must also roll back, and so on.
This can cause the system to restart from a very old state.
🛠 Example:
An online shopping system saves a checkpoint every hour across all servers. If one server
crashes, it restores all servers to the last saved state to maintain consistency.
🛠 Example:
A cloud storage service automatically saves checkpoints when many files are being
uploaded, ensuring smooth recovery if a failure occurs.
Types of Messages:
Just like different schools have different rules for writing assignments, DSM has
different consistency rules for how data is shared.
Programmers must understand these rules to avoid errors.
🛠 Example: If two people write on the same notebook at the same time, how do we decide
whose writing is correct? DSM needs rules for such cases.
🛠 Example: Buying a ready-made suit vs. getting a tailor-made suit. DSM is like the ready-
made suit—it works, but not always a perfect fit.