0% found this document useful (0 votes)
0 views

Topic 5

A Distributed Database Management System (DDBMS) allows users to interact with a distributed database as if it were a single entity, managing data across multiple interconnected locations. Key features include data synchronization, support for various application software, and enhanced reliability and performance compared to centralized systems. However, challenges such as complex software requirements, data integrity issues, and processing overheads must be addressed in DDBMS implementations.

Uploaded by

brianalbert262
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
0 views

Topic 5

A Distributed Database Management System (DDBMS) allows users to interact with a distributed database as if it were a single entity, managing data across multiple interconnected locations. Key features include data synchronization, support for various application software, and enhanced reliability and performance compared to centralized systems. However, challenges such as complex software requirements, data integrity issues, and processing overheads must be addressed in DDBMS implementations.

Uploaded by

brianalbert262
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Topic 5: Distributed Database Management System

5.1 Distributed DBMS


5.1.1 Distributed Databases
A distributed DBMS (DDBMS) manages the distributed database in a manner so that it appears as
one single database to users. A distributed database (DDB) is a collection of multiple
interconnected databases, which are spread physically across various locations that communicate
via a computer network.

Features of a DBMS
i. Databases is the collection are logically interrelated with each other. Often they represent
a single logical database.
ii. Data is physically stored across multiple sites. Data in each site can be managed by a
DBMS independent of the other sites.
iii. The processors in the sites are connected via a network. They do not have any
multiprocessor configuration.
iv. A distributed database is not a loosely connected file system.
v. A distributed database incorporates transaction processing, but it is not synonymous with
a transaction processing system.

5.1.2 Distributed Database Management System


A distributed database management system (DDBMS) is a centralized software system that
manages a distributed database in a manner as if it were all stored in a single location.

Features of a DDBMS
i. It is used to create, retrieve, update and delete distributed databases.
ii. It synchronizes the database periodically and provides access mechanisms by the virtue of
which the distribution becomes transparent to the users.
iii. It ensures that the data modified at any site is universally updated.
iv. It is used in application areas where large volumes of data are processed and accessed by
numerous users simultaneously.
v. It is designed for heterogeneous database platforms.
vi. It maintains confidentiality and data integrity of the databases.

5.1.3 Factors Encouraging DDBMS


The following factors encourage moving over to DDBMS −

i. Distributed Nature of Organizational Units − Most organizations in the current times


are subdivided into multiple units that are physically distributed over the globe. Each unit
requires its own set of local data. Thus, the overall database of the organization becomes
distributed.
ii. Need for Sharing of Data − The multiple organizational units often need to communicate
with each other and share their data and resources. This demands common databases or
replicated databases that should be used in a synchronized manner.
iii. Support for Both OLTP and OLAP − Online Transaction Processing (OLTP) and Online
Analytical Processing (OLAP) work upon diversified systems which may have common
data. Distributed database systems aid both these processing by providing synchronized
data.
iv. Database Recovery − One of the common techniques used in DDBMS is replication of
data across different sites. Replication of data automatically helps in data recovery if
database in any site is damaged. Users can access data from other sites while the damaged
site is being reconstructed. Thus, database failure may become almost inconspicuous to
users.
v. Support for Multiple Application Software − Most organizations use a variety of
application software each with its specific database support. DDBMS provides a uniform
functionality for using the same data among different platforms.

5.1.4 Advantages of Distributed Databases


Following are the advantages of distributed databases over centralized databases.
Modular Development − If the system needs to be expanded to new locations or new units, in
centralized database systems, the action requires substantial efforts and disruption in the existing
functioning. However, in distributed databases, the work simply requires adding new computers
and local data to the new site and finally connecting them to the distributed system, with no
interruption in current functions.
More Reliable − In case of database failures, the total system of centralized databases comes to a
halt. However, in distributed systems, when a component fails, the functioning of the system
continues may be at a reduced performance. Hence DDBMS is more reliable.
Better Response − If data is distributed in an efficient manner, then user requests can be met from
local data itself, thus providing faster response. On the other hand, in centralized systems, all
queries have to pass through the central computer for processing, which increases the response
time.
Lower Communication Cost − In distributed database systems, if data is located locally where it
is mostly used, then the communication costs for data manipulation can be minimized. This is not
feasible in centralized systems.

5.1.5 Adversities of Distributed Databases


Following are some of the adversities associated with distributed databases.
i. Need for complex and expensive software − DDBMS demands complex and often
expensive software to provide data transparency and co-ordination across the several sites.
ii. Processing overhead − Even simple operations may require a large number of
communications and additional calculations to provide uniformity in data across the sites.
iii. Data integrity − The need for updating data in multiple sites pose problems of data
integrity.
iv. Overheads for improper data distribution − Responsiveness of queries is largely
dependent upon proper data distribution. Improper data distribution often leads to very
slow response to user requests.

5.2 Distributed DBMS - Database Environments


5.2.1 Types of Distributed Databases
Distributed databases can be broadly classified into homogeneous and heterogeneous distributed
database environments, each with further sub-divisions, as shown in the following illustration.
Fig 5.1: Distributed Database Environment
Homogeneous Distributed Databases
In a homogeneous distributed database, all the sites use identical DBMS and operating systems.
Its properties are −

i. The sites use very similar software.


ii. The sites use identical DBMS or DBMS from the same vendor.
iii. Each site is aware of all other sites and cooperates with other sites to process user requests.
iv. The database is accessed through a single interface as if it is a single database.

Types of Homogeneous Distributed Database


There are two types of homogeneous distributed database:
i. Autonomous − Each database is independent that functions on its own. They are integrated
by a controlling application and use message passing to share data updates.
ii. Non-autonomous − Data is distributed across the homogeneous nodes and a central or
master DBMS co-ordinates data updates across the sites.

Heterogeneous Distributed Databases


In a heterogeneous distributed database, different sites have different operating systems, DBMS
products and data models. Its properties are:
i. Different sites use dissimilar schemas and software.
ii. The system may be composed of a variety of DBMSs like relational, network, hierarchical
or object oriented.
iii. Query processing is complex due to dissimilar schemas.
iv. Transaction processing is complex due to dissimilar software.
v. A site may not be aware of other sites and so there is limited co-operation in processing
user requests.

Types of Heterogeneous Distributed Databases


• Federated − The heterogeneous database systems are independent in nature and integrated
together so that they function as a single database system.
• Un-federated − The database systems employ a central coordinating module through
which the databases are accessed.

5.3 Distributed DBMS Architectures


DDBMS architectures are generally developed depending on three parameters:
i. Distribution − It states the physical distribution of data across the different sites.
ii. Autonomy − It indicates the distribution of control of the database system and the degree
to which each constituent DBMS can operate independently.
iii. Heterogeneity − It refers to the uniformity or dissimilarity of the data models, system
components and databases.

5.3.1 Architectural Models


Some of the common architectural models are:
Client - Server Architecture for DDBMS
This is a two-level architecture where the functionality is divided into servers and clients. The
server functions primarily encompass data management, query processing, optimization and
transaction management. Client functions include mainly user interface. However, they have some
functions like consistency checking and transaction management.

The two different client - server architecture are:


i. Single Server Multiple Client
ii. Multiple Server Multiple Client (shown in the following diagram)

Fig 5.2: Client-Server Architecture

Peer- to-Peer Architecture for DDBMS


In these systems, each peer acts both as a client and a server for imparting database services. The
peers share their resource with other peers and co-ordinate their activities.
This architecture generally has four levels of schemas:

i. Global Conceptual Schema − Depicts the global logical view of data.


ii. Local Conceptual Schema − Depicts logical data organization at each site.
iii. Local Internal Schema − Depicts physical data organization at each site.
iv. External Schema − Depicts user view of data.

Fig 5.3: Peer- to-Peer Architecture

Multi - DBMS Architectures


This is an integrated database system formed by a collection of two or more autonomous database
systems.
Multi-DBMS can be expressed through six levels of schemas:
i. Multi-database View Level − Depicts multiple user views comprising of subsets of the
integrated distributed database.
ii. Multi-database Conceptual Level − Depicts integrated multi-database that comprises of
global logical multi-database structure definitions.
iii. Multi-database Internal Level − Depicts the data distribution across different sites and
multi-database to local data mapping.
iv. Local database View Level − Depicts public view of local data.
v. Local database Conceptual Level − Depicts local data organization at each site.
vi. Local database Internal Level − Depicts physical data organization at each site.

There are two design alternatives for multi-DBMS :


i. Model with multi-database conceptual level.
ii. Model without multi-database conceptual level.
Fig 5.4: Multi-database Architecture

Fig 5.5 Multi-DB without conceptual layer

5.3.2 Design Alternatives


The distribution design alternatives for the tables in a DDBMS are as follows:
i. Non-replicated and non-fragmented
ii. Fully replicated
iii. Partially replicated
iv. Fragmented
v. Mixed

Non-replicated & Non-fragmented


In this design alternative, different tables are placed at different sites. Data is placed so that it is at
a close proximity to the site where it is used most. It is most suitable for database systems where
the percentage of queries needed to join information in tables placed at different sites is low. If an
appropriate distribution strategy is adopted, then this design alternative helps to reduce the
communication cost during data processing.

Fully Replicated
In this design alternative, at each site, one copy of all the database tables is stored. Since, each site
has its own copy of the entire database, queries are very fast requiring negligible communication
cost. On the contrary, the massive redundancy in data requires huge cost during update operations.
Hence, this is suitable for systems where a large number of queries is required to be handled
whereas the number of database updates is low.

Partially Replicated
Copies of tables or portions of tables are stored at different sites. The distribution of the tables is
done in accordance to the frequency of access. This takes into consideration the fact that the
frequency of accessing the tables vary considerably from site to site. The number of copies of the
tables (or portions) depends on how frequently the access queries execute and the site which
generate the access queries.

Fragmented
In this design, a table is divided into two or more pieces referred to as fragments or partitions, and
each fragment can be stored at different sites. This considers the fact that it seldom happens that
all data stored in a table is required at a given site. Moreover, fragmentation increases parallelism
and provides better disaster recovery. Here, there is only one copy of each fragment in the system,
i.e. no redundant data.

The three fragmentation techniques are:


i. Vertical fragmentation
ii. Horizontal fragmentation
iii. Hybrid fragmentation

Mixed Distribution
This is a combination of fragmentation and partial replications. Here, the tables are initially
fragmented in any form (horizontal or vertical), and then these fragments are partially replicated
across the different sites according to the frequency of accessing the fragments.

5.4 Distributed DBMS - Design Strategies


5.4.1 Data Replication
Data replication is the process of storing separate copies of the database at two or more sites. It is
a popular fault tolerance technique of distributed databases.

Advantages of Data Replication


i. Reliability − In case of failure of any site, the database system continues to work since a
copy is available at another site(s).
ii. Reduction in Network Load − Since local copies of data are available, query processing
can be done with reduced network usage, particularly during prime hours. Data updating
can be done at non-prime hours.
iii. Quicker Response − Availability of local copies of data ensures quick query processing
and consequently quick response time.
iv. Simpler Transactions − Transactions require less number of joins of tables located at
different sites and minimal coordination across the network. Thus, they become simpler in
nature.

Disadvantages of Data Replication


i. Increased Storage Requirements − Maintaining multiple copies of data is associated with
increased storage costs. The storage space required is in multiples of the storage required
for a centralized system.
ii. Increased Cost and Complexity of Data Updating − Each time a data item is updated,
the update needs to be reflected in all the copies of the data at the different sites. This
requires complex synchronization techniques and protocols.
iii. Undesirable Application – Database coupling − If complex update mechanisms are not
used, removing data inconsistency requires complex co-ordination at application level.
This results in undesirable application – database coupling.

Some commonly used replication techniques are:


i. Snapshot replication
ii. Near-real-time replication
iii. Pull replication

5.4.2 Fragmentation
Fragmentation is the task of dividing a table into a set of smaller tables. The subsets of the table
are called fragments. Fragmentation can be of three types: horizontal, vertical, and hybrid
(combination of horizontal and vertical). Horizontal fragmentation can further be classified into
two techniques: primary horizontal fragmentation and derived horizontal fragmentation.
Fragmentation should be done in a way so that the original table can be reconstructed from the
fragments. This is needed so that the original table can be reconstructed from the fragments
whenever required. This requirement is called “reconstructiveness.”

Advantages of Fragmentation
i. Since data is stored close to the site of usage, efficiency of the database system is increased.
ii. Local query optimization techniques are sufficient for most queries since data is locally
available.
iii. Since irrelevant data is not available at the sites, security and privacy of the database system
can be maintained.
Disadvantages of Fragmentation

i. When data from different fragments are required, the access speeds may be very high.
ii. In case of recursive fragmentations, the job of reconstruction will need expensive
techniques.
iii. Lack of back-up copies of data in different sites may render the database ineffective in case
of failure of a site.

5.4.2.1 Types of fragmentation


Vertical Fragmentation
In vertical fragmentation, the fields or columns of a table are grouped into fragments. In order to
maintain reconstructiveness, each fragment should contain the primary key field(s) of the table.
Vertical fragmentation can be used to enforce privacy of data.

For example, let us consider that a University database keeps records of all registered students in
a Student table having the following schema.

STUDENT

Regd_No Name Course Address Semester Fees Marks


Now, the fees details are maintained in the accounts section. In this case, the designer will fragment
the database as follows −

CREATE TABLE STD_FEES AS


SELECT Regd_No, Fees
FROM STUDENT;

Horizontal Fragmentation
Horizontal fragmentation groups the tuples of a table in accordance to values of one or more fields.
Horizontal fragmentation should also confirm to the rule of reconstructiveness. Each horizontal
fragment must have all columns of the original base table.

For example, in the student schema, if the details of all students of Computer Science Course needs
to be maintained at the School of Computer Science, then the designer will horizontally fragment
the database as follows −

CREATE COMP_STD AS
SELECT * FROM STUDENT
WHERE COURSE = "Computer Science"
Hybrid Fragmentation
In hybrid fragmentation, a combination of horizontal and vertical fragmentation techniques are
used. This is the most flexible fragmentation technique since it generates fragments with minimal
extraneous information. However, reconstruction of the original table is often an expensive task.

Hybrid fragmentation can be done in two alternative ways −


i. At first, generate a set of horizontal fragments; then generate vertical fragments from one
or more of the horizontal fragments.
ii. At first, generate a set of vertical fragments; then generate horizontal fragments from one
or more of the vertical fragments.

5.5 DDBMS - Distribution Transparency


Distribution transparency is the property of distributed databases by the virtue of which the internal
details of the distribution are hidden from the users. The DDBMS designer may choose to fragment
tables, replicate the fragments and store them at different sites. However, since users are oblivious
of these details, they find the distributed database easy to use like any centralized database.
The three dimensions of distribution transparency are:
• Location transparency
• Fragmentation transparency
• Replication transparency

5.5.1 Location Transparency


Location transparency ensures that the user can query on any table(s) or fragment(s) of a
table as if they were stored locally in the user’s site. The fact that the table or its fragments are
stored at remote site in the distributed database system, should be completely oblivious to the end
user. The address of the remote site(s) and the access mechanisms are completely hidden. In order
to incorporate location transparency, DDBMS should have access to updated and accurate data
dictionary and DDBMS directory which contains the details of locations of data.

5.5.2 Fragmentation Transparency


Fragmentation transparency enables users to query upon any table as if it were unfragmented.
Thus, it hides the fact that the table the user is querying on is actually a fragment or union of some
fragments. It also conceals the fact that the fragments are located at diverse sites. This is somewhat
similar to users of SQL views, where the user may not know that they are using a view of a table
instead of the table itself.

5.5.3 Replication Transparency


Replication transparency ensures that replication of databases are hidden from the users. It
enables users to query upon a table as if only a single copy of the table exists. Replication
transparency is associated with concurrency transparency and failure transparency. Whenever a
user updates a data item, the update is reflected in all the copies of the table. However, this
operation should not be known to the user. This is concurrency transparency. Also, in case of
failure of a site, the user can still proceed with his queries using replicated copies without any
knowledge of failure. This is failure transparency.

5.5.4 Combination of Transparencies


In any distributed database system, the designer should ensure that all the stated transparencies are
maintained to a considerable extent. The designer may choose to fragment tables, replicate them
and store them at different sites; all oblivious to the end user. However, complete distribution
transparency is a tough task and requires considerable design efforts.

5.5 Parallel Databases


Companies need to handle huge amount of data with high data transfer rate. The client server and
centralized system is not much efficient. The need to improve the efficiency gave birth to the
concept of Parallel Databases. Parallel database system improves performance of data processing
using multiple resources in parallel, like multiple CPU and disks are used in parallel. It also
performs many parallelization operations like, data loading and query processing.
5.5.1 Goals of Parallel Databases
The concept of Parallel Database was built with a goal to:

Improve performance:
The performance of the system can be improved by connecting multiple CPU and disks in
parallel. Many small processors can also be connected in parallel.
Improve availability of data:
Data can be copied to multiple locations to improve the availability of data.
For example: if a module contains a relation (table in database) which is unavailable then it is
important to make it available from another module.
Improve reliability:
Reliability of system is improved with completeness, accuracy and availability of data.
Provide distributed access of data:
Companies having many branches in multiple cities can access data with the help of parallel
database system.

5.5.2 Parallel Database Architectures


i. Shared memory system
• Shared memory system uses multiple processors which is attached to a global shared
memory via intercommunication channel or communication bus.
• Shared memory system have large amount of cache memory at each processors, so
referencing of the shared memory is avoided.
• If a processor performs a write operation to memory location, the data should be updated
or removed from that location.

Advantages of Shared memory system


• Data is easily accessible to any processor.
• One processor can send message to other efficiently.
Disadvantages of Shared memory system
• Waiting time of processors is increased due to more number of processors.
• Bandwidth problem.

ii. Shared Disk System


• Shared disk system uses multiple processors which are accessible to multiple disks via
intercommunication channel and every processor has local memory.
• Each processor has its own memory so the data sharing is efficient.
• The system built around this system are called as clusters.
Advantages of Shared Disk System
• Fault tolerance is achieved using shared disk system.
Fault tolerance: If a processor or its memory fails, the other processor can complete the
task. This is called as fault tolerance.
Disadvantage of Shared Disk System
• Shared disk system has limited scalability as large amount of data travels through the
interconnection channel.
• If more processors are added the existing processors are slowed down.

iii. Shared nothing system


• Each processor in the shared nothing system has its own local memory and local disk.
• Processors can communicate with each other through intercommunication channel.
• Any processor can act as a server to serve the data which is stored on local disk.
Advantages of Shared nothing disk system
• Number of processors and disk can be connected as per the requirement in share nothing
disk system.
• Shared nothing disk system can support for many processor, which makes the system
more scalable.
Disadvantages of Shared nothing disk system
• Data partitioning is required in shared nothing disk system.
• Cost of communication for accessing local disk is much higher.

Hierarchical System or Non-Uniform Memory Architecture


• Hierarchical model system is a hybrid of shared memory system, shared disk system and
shared nothing system.
• Hierarchical model is also known as Non-Uniform Memory Architecture (NUMA).
• In this system each group of processor has a local memory. But processors from other
groups can access memory which is associated with the other group in coherent.
• NUMA uses local and remote memory(Memory from other group), hence it will take
longer time to communicate with each other.
Advantages of NUMA
• Improves the scalability of the system.
• Memory bottleneck(shortage of memory) problem is minimized in this architecture.
Disadvantages of NUMA
The cost of the architecture is higher compared to other architectures.

You might also like