0% found this document useful (0 votes)
30 views

Concurrency Control in Distributed Network

Distributed databases allow data to be fragmented and replicated across multiple sites for increased availability and performance. There are three types of data fragmentation: horizontal, vertical, and hybrid. Data replication stores copies of fragments at different sites. Transaction processing monitors coordinate distributed transactions and ensure their atomicity, consistency, isolation, and durability. Common transaction failures include computer failures, transaction errors, local errors, and concurrency control issues.

Uploaded by

Kevin Skillz
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views

Concurrency Control in Distributed Network

Distributed databases allow data to be fragmented and replicated across multiple sites for increased availability and performance. There are three types of data fragmentation: horizontal, vertical, and hybrid. Data replication stores copies of fragments at different sites. Transaction processing monitors coordinate distributed transactions and ensure their atomicity, consistency, isolation, and durability. Common transaction failures include computer failures, transaction errors, local errors, and concurrency control issues.

Uploaded by

Kevin Skillz
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 55

Concurrency control in

Distributed Network
• In today’s world of universal dependence on
information systems, there is a rising need for
secure, reliable and accessible information in
today’s business environment this has resulted
into the increased need for distributed
information Systems. With information
systems come databases implying that
distributed information systems cannot be
separated from distributed databases.
DISTRIBUTED DATABASE DESIGN
• Distributed Database Systems are needed for the
applications where data and its accesses are
inherently distributed and to increase the availability
during failures. The prime examples are the systems
for international air-line reservations, financial
institutions, and automated manufacturing.
• The methodology for designing Distributed Systems
is same as that used for the Centralized Systems.
However, some additional factors have been
considered for a Distributed System:
A. Data Fragmentation:
• In Distributed Databases, we need to define
the logical unit of Database Distribution and
allocation. The database may be broken up
into logical units called fragments which will
be stored at different sites. The simplest
logical units are the tables themselves.
• Three Types of Data Fragmentation are:
Horizontal fragmentation:
A horizontal fragment of a table is a subset of
rows in it. So horizontal fragmentation divides a
table 'horizontally' by selecting the relevant
rows and these fragments can be assigned to
different sides in the distributed system .
Example: Consider a relation PROJ (PNo, Pname, Bugdet, Location)
Vertical fragmentation:
• A vertical fragment of a table keeps only
certain attributes of it. It divides a table
vertically by columns. It is necessary to include
the primary key of the table in each vertical
fragment so that the full table can be
reconstructed if needed.
Example: Consider a Relation PROJ (PNo, Pname, Budget, Location)
Hybrid fragmentation:
Hybrid Fragmentation comprises the
combination of characteristics of both
Horizontal and Vertical Fragmentation. Each
fragment can be specified by a SELECT-PROJECT
combination of operations. In this case the
original table can be reconstructed be applying
union and natural join operations in the
appropriate order.
B. Data Replication:
• A popular option for data distribution as well
as for fault tolerance of a database is to store
a separate copy of the database at each of two
or more sites. A copy of each fragment can be
maintained at several sites. Data replication is
the design process of deciding which
fragments will be replicated .
Fundamentals of Transaction Management and Concurrency Control
What is a Transaction
• A transaction is an atomic unit of database access, which is
either completely executed or not executed at all .
• It includes one or more database access operations (read
and write )

• Example: Transfer of funds


• Many Enterprises use databases to store information about
their state e.g balances of all depositors
• The occurrence of a real world event that changes the
enterprise state requires the execution of a program that
changes the database state in a corresponding way e.g balance
must be updated when you deposit
• A transaction is a program that accesses the database in
response to real-world events
Transaction Terminology
• Commit
Transaction completely performs all the reads and writes
and changes the database state accordingly
• Abort
Transaction is unable to complete all the reads and writes
and hence will undo the operations that were performed till
that point. Database will remain in the same state as it was
prior to the beginning of this transaction
Transaction Processing Monitor
Transaction Processing Monitor
• A Transaction Processing Monitor (TP
Monitor) is a control program
that monitors the transfer of data between
multiple local and remote terminals to ensure
that the transaction processes complete or, if
an error occurs, to ensure that appropriate
actions are taken.
Transaction Processing Monitor
• A Transaction Processing (TP) Monitor is a
program that performs an administrative
function by accessing a shared database on
behalf of an on-line user.
• In other words the Transaction processing (TP)
monitor technology provides the distributed
client/server environment the capacity to
efficiently and reliably develop, run, and
manage transaction applications.
What do TP Monitors do?
• Its mission is to monitor workflow
status for transactions that require
multiple steps. The TPM generally has
the capability to post alerts and rollback
errors or generate compensating
transactions when an error occurs.
Key Insights on TPM;
• A TPM is a standalone program, distinct from the
Web server and app server.
• TP monitor technology controls transaction
applications and performs business logic/rules
computations and database updates.
• TP monitor technology is used in data management,
network access, security systems, delivery order
processing, airline reservations, and customer service
A TP system
• A TP system is an integrated set of products
that supports TP applications. These products
include both hardware, such as processors,
memories, disks and communications
controllers, software such as operating
systems (OSs), database management systems
(DBMSs), computer networks and TP
monitors.
In general, a TPM provides the following
functionality:
• Coordinating resources
• Balancing loads
• Creating new processes as/when needed
• Providing secure access to services
• Routing services
• Wrapping data messages into messages
• Unwrapping messages into data packets/structures
• Monitoring operations/transactions
• Managing queues
• Handling errors through such actions as process restarting
• Hiding inter process communications details from
programmers
Examples of commonly used TP Monitors include:

• Customer Information Control System, or CICS in short.


Mostly found in IBM environments
• Tuxedo Monitor for the Unix environment
• Application Servers using the Java Two Enterprise
Edition for the Unix environment
• Encina for Unix (Is this in the same league as the other
products?)
• Microsoft Transaction Server, or MTS. Now superseded
by Com Plus which is also used in Dot Net environments.
• IbmIms
TP-Lite monitor
• The integration of TP monitor functions in a
database engine is called a TP-Lite monitor.
Types of failures in transactions
• Computer failure
A hardware or software error in the computer during transaction execution.
e.g Disk Failure
Loss of data due to crashing of the disk head

• Transaction error
Failure caused by an operation in the transaction e.g Division by zero

• Local Error
Conditions that cause the cancellation of transactions. E.g data needed
for transaction not found (Missing account number)
• Concurrency control enforcement
Transaction may be aborted by the concurrency control method

• Physical problems and catastrophes


Problems like power failure, fire, flood etc
Properties of Transaction:
• A Transaction has four properties that lead to
the consistency and reliability of a distributed
database. These are Atomicity, Consistency,
Isolation, and Durability
Atomicity
• This refers to the fact that a transaction is treated as a
unit of operation.
• Either all the instructions of a transactions are executed in
full or none.
• This property ensures that no transaction is executed only
Up to the midway point
• It also ensures that if one part of the transaction fails, the
entire transaction fails and the database state is left
unchanged.
Consistency
• The consistency of a transaction is its correctness
• When multiple transactions are accessing data
simultaneously, the access is protected though
concurrency control mechanisms.
• What happens when both credit and debit transactions
are executed at the same time?

• If both the Debit and Credit transactions access the


Account Number and Balance data simultaneously, they
will be protected through concurrency control
mechanisms to ensure correctness of a transaction.
Isolation
• According to this property, each transaction should see a
consistent database at all times. Consequently, no other
transaction can read or modify data that is being modified by
another transaction.
• The result of one transaction will not be visible to the other
transactions till the transaction commits.
• What happens if the results of credit transaction are visible
to debit transaction before it is actually written onto the
database.
• To avoid the situation above the isolation property ensures that
the debit or credit transaction results will not be available until
they are committed
Durability
This property ensures that once a transaction commits, its results are
permanent and cannot be erased from the database. This means
that whatever happens after the COMMIT of a transaction, whether
it is a system crash or aborts of other transactions, the results
already committed are not modified or undone.
Collisions
• A collision occurs when two or more transactions attempt
to change entities within a system
• There are three fundamental ways that two transactions
can interfere with one another:
1) Dirty read
2) Unrepeatable read
3)Phantom read
Dirty read/ Uncommitted dependency/ temporary update
• Transaction 1 (T1) reads an entity from the database and then
updates the database but does not commit the change
(for example, the change hasn’t been finalized).

• Transaction 2 (T2) reads the entity, unknowingly making a


copy of the uncommitted version.

• T1 rolls back aborts the changes, restoring the entity to the


original state that T1 found it in.

• T2 now has a version of the entity that was never committed


and therefore is not considered to have actually existed.
Unrepeatable read
T1 reads an entity
from
the database,
making a copy of it.
T2 deletes the entity
from the system of
record.
T1 now has a copy of
an entity that does
not
officially exist.
Phantom read
T1 retrieves a collection of entities from the database,
making copies of them, based on some sort of search criteria
such as “all customers with first name Bill.”
T2 then creates new entities, which would have met the
search criteria (for example, inserts “Bill Klassen” into the
database), saving them to the database.
If T1 reapplies the search criteria it gets a different result set.
Concurrency Control:
• In distributed database systems, database is typically used by many users.
• These systems usually allow multiple transactions to run concurrently i.e.
at the same time.
• Concurrency control is the activity of coordinating concurrent accesses
to a database in a multiuser database management system (DBMS).
• Concurrency control permits users to access a database in a multi-
programmed fashion while preserving the illusion that each user is
executing alone on a dedicated system.
• The main technical difficulty in attaining this goal is to prevent database
updates performed by one user from interfering with database retrievals
and updates performed by another.
• When the transactions are updating data concurrently, it may lead to
several problems with the consistency of the data.
Problems with the consistency of the data

• As mentioned earlier simultaneous execution of


transactions over a distributed database can create several
data integrity and consistency problems. These include:
• Lost Updates.
• Dirty reads
• Phantom reads
• Unrepeatable reads
• Uncommitted Data.
• Inconsistent retrievals.
Serializability
•A transaction schedule is serializable or has the
serializability property, if its outcome (e.g., the resulting
database state, the values of the database's data) is equal
to the outcome of its transactions executed serially, i.e.,
sequentially without overlapping in time
In summary, concurrency control is the process
of managing simultaneous execution of
transactions in a distributed database by
ensuring the serializability of transactions.
Distributed Concurrency Control Algorithms:
• Before discussing the algorithms, we need to get an idea about
the distributed transactions.

• Distributed Transaction: A distributed transaction is a


transaction that runs in multiple processes, usually on several
machines.
• Distributed transaction processing systems are designed to
facilitate transactions that span heterogeneous, transaction-
aware resource managers in a distributed environment.
• The execution of a distributed transaction requires
coordination between a global transaction management
system and all the local resource managers of all the involved
systems.
• The resource manager and transaction processing
monitor are the two primary elements of any
distributed transactional system.

• Distributed transactions, like local transactions,


must observe the ACID properties. However,
maintenance of these properties is very
complicated for distributed transactions because a
failure can occur in any process. If such a failure
occurs, each process must undo any work that has
already been done on behalf of the transaction.
• A distributed transaction processing system
maintains the ACID properties in distributed
transactions by using two features:
• Recoverable processes. Recoverable processes
log their actions and therefore can restore
earlier states if a failure occurs.
• A commit protocol. A commit protocol allows
multiple processes to coordinate the
committing or aborting of a transaction. The
most common commit protocol is the two-
phase commit protocol.
Distributed Two-Phase Locking (2PL):
• In order to ensure serializability of parallel
executed transactions different methods of
concurrency control are elaborated . One of
these methods is the locking method. There are
different forms of locking methods.
• The Two phase locking protocol is one of the
basic concurrency control protocols in a
distributed database systems. The main
approach of this protocol is “read any, write all”.
• Transactions set read locks on items that they read,
and they convert their read locks to write locks on
items that need to be updated.
• To read an item, it suffices to set a read lock on any
copy of the item, so the local copy is locked; to update
an item, write locks are required on all copies.
• Write locks are obtained as the transaction executes,
with the transaction blocking on a write request until
all of the copies of the item to be updated have been
successfully locked.
• All locks are held until the transaction has successfully
committed or aborted
• The 2PL Protocol oversees locks by determining
when transactions can acquire and release locks.
The 2PL protocol forces each transaction to make a
lock or unlock request in two steps:
I. Growing Phase: A transaction may obtain locks
but may not release any locks.
II. Shrinking Phase: A transaction may release locks
but not obtain any new lock.
• The transaction first enters into the Growing Phase,
makes requests for required locks, then gets into
the Shrinking phase where it releases all locks and
cannot make any more requests.
• Transactions in 2PL Protocol should get all needed locks
before getting into the unlock phase.
• While the 2PL protocol guarantees serializability, it does
not ensure that deadlocks do not happen.
• So a deadlock is a possibility in this algorithm, Local
deadlocks are checked for any time a transaction blocks,
and are resolved when necessary by restarting the
transaction with the most recent initial startup time
among those involved in the deadlock cycle.
• Global deadlock detection is handled by a “Snoop”
process, which periodically requests waits-for
information from all sites and then checks for and
resolves any global deadlocks.
Wound-Wait (WW):
• The second algorithm is the distributed wound-
wait locking algorithm. It follows the same
approach as the 2 PL protocol.
• The difference lies in the fact that it differs from
2PL in its handling of the deadlock problem:
unlike 2PL protocol, rather than maintaining
waits-for information and then checking for local
and global deadlocks, deadlocks are prevented
via the use of timestamps in this algorithm.
• Each transaction is numbered according to its
initial startup time, and younger transactions are
prevented from making older ones wait.
• If an older transaction requests a lock, and if the
request would lead to the older transaction waiting
for a younger transaction, the younger transaction
is “wounded” – it is restarted unless it is already in
the second phase of its commit protocol.
• Younger transactions can wait for older
transactions so that the possibility of deadlocks is
eliminated
Example
T1 is younger than T2 T1 Waits
T1 is older than T2 T2 Aborts
• In this scheme if a transaction requests to lock a
resource which is already held with conflicting
lock by some other transaction, one of the two
possibilities may occur;
• If t(T1)<t(T2), then T1 forces T2 to be rolled back –
that is T1 wounds T2.T2 is restarted and later with
a random delay but with the same time stamp
• If t(T1)>t(T2), then T1 is forced to wait until the
resource is available
• The scheme allows the younger transaction to
wait but when an older transaction requests an
item held by a younger one, the older transaction
forces the younger one to abort and release them.
Basic Timestamp Ordering (BTO):
• A timestamp is a unique identifier created by the DBMS to identify
a transaction.
• Typically, timestamp values are assigned in the order in which the
transactions are submitted to the system, so a timestamp can be
thought of as the transaction start time.
• The third algorithm is the basic timestamp ordering algorithm. The
idea for this scheme is to order the transactions based on their
timestamps.
• A schedule in which the transactions participate is then
serializable, and the equivalent serial schedule has the
transactions in order of their timestamp values.
• This is called timestamp ordering (TO).
• Like wound-wait, it employs transaction startup timestamps,
but it uses them differently.
• BTO associates timestamps with all recently accessed data
items and requires that conflicting data accessed by
transactions be performed in timestamp order instead of
using locking approach.
• Transactions that attempt to perform out-of-order accessed
are restarted. When a read request is received for an item, it
is permitted if the timestamp of the requester exceeds the
item’s write timestamp.
• When a write request is received, it is permitted if the
requester’s timestamp exceeds the read timestamp of the
item; in the event that the timestamp of the requester is less
than the write timestamp of the item, the update is simply
ignored .
• For replicated data, the “read any, write all”
approach is used, so a read request may be
sent to any copy while a write request must be
sent to all copies. Integration of the algorithm
with two phase commit is accomplished as
follows: Writers keep their updates in a private
workspace until commit time.
Distributed Optimistic(OPT):
• The fourth algorithm is the distributed, timestamp-based,
optimistic concurrency control algorithm which operates by
exchanging certification information during the commit
protocol.
• For each data item, a read timestamp and a write
timestamp are maintained. Transactions may read and
update data items freely, storing any updates into a local
workspace until commit time.
• For each read, the transaction must remember the version
identifier (i.e., write timestamp) associated with the item
when it was read.
• Then, when all of the transaction’s cohorts
have completed their work, and have reported
back to the master, the transaction is assigned
a globally unique timestamp. This time stamp
is sent to each cohort in the “prepare to
commit” message ,and it is used to locally
certify all of its reads and writes as follows
• A read request is certified if-:
(i) The version that was read is still the current
version of the item, and
(ii) No write with a newer timestamp has already
been locally certified.
• A write request is certified if-:
(i) No later reads have been certified and
subsequently committed, and (ii) No later reads
have been locally certified already

You might also like