0% found this document useful (0 votes)
104 views

Distributed Data Systems: Sesapzg554

The document discusses distributed database systems and distributed database management systems (DDBMS). It defines a distributed database as a collection of logically related databases distributed over a computer network. A DDBMS manages this distributed database as if it were in a single location. The document then covers the ANSI/SPARC architecture and its three schema views (external, internal, conceptual). It also discusses architectural models for distributed databases based on autonomy, distribution, and heterogeneity. Autonomy refers to the degree of independence between individual DBMSs.

Uploaded by

Home TV
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
104 views

Distributed Data Systems: Sesapzg554

The document discusses distributed database systems and distributed database management systems (DDBMS). It defines a distributed database as a collection of logically related databases distributed over a computer network. A DDBMS manages this distributed database as if it were in a single location. The document then covers the ANSI/SPARC architecture and its three schema views (external, internal, conceptual). It also discusses architectural models for distributed databases based on autonomy, distribution, and heterogeneity. Autonomy refers to the degree of independence between individual DBMSs.

Uploaded by

Home TV
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 50

Distributed Data Systems

SESAPZG554

BITS Pilani
Pilani|Dubai|Goa|Hyderabad
Parthasarathy
1
BITS Pilani
Pilani|Dubai|Goa|Hyderabad

SESAPZG554 – CS#3
Distributed DBMS Architectures
2
Agenda for CS #3

1) Recap of Sessions
2) Distributed Database System
Watched M2
3) Distributed DBMS Videos ?
4) ANSI/SPARC Architecture
5) Architectural Models for DDBS
 Autonomy
 Distribution Read Chapter 1
& 4 – Textbook ?
 Heterogeneity
6) Distributed DBMS Architecture
 Client/Server Systems
 Peer-to-Peer Systems
 Multidatabase System Architecture
7) Distributed Data Sources 3

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Distributed Database System

 We define a distributed database as a collection of multiple,


logically interrelated databases distributed over a computer
network.
 A distributed database management system (distributed
DBMS) is the software system that does the management of
the distributed database .
 A distributed database management system (DDBMS) is a
centralized software system that manages a distributed
database in a manner as if it were all stored in a single
location.

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Distributed Database System

 Sometimes “distributed database system” (DDBS) is used to


refer jointly to the distributed database and the distributed
DBMS.
 The two important terms in these definitions are
 logically interrelated and
 distributed over a computer network

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Centralized DBMS

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Distributed DBMS

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


ANSI/SPARC Architecture

 In late 1972, the Computer and Information Processing


Committee (X3) of the American National Standards Institute
(ANSI) established a Study Group on Database Management
Systems under the auspices of its Standards Planning and
Requirements Committee (SPARC).
 The mission of the study group was to study the feasibility of
setting up standards in this area, as well as determining which
aspects should be standardized if it was feasible.
 The study group proposed that the interfaces be standardized, and
defined an architectural framework that contained 43 interfaces,
14 (DB) of which would deal with the physical storage subsystem
of the computer and therefore not be considered essential parts of
the DBMS architecture
8

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


ANSI/SPARC Architecture

 In a simplified version of the ANSI/SPARC architecture there


are three views of data:
 The external view , which is that of the end user, who
might be a programmer
 The internal view , that of the system or machine; and
 The conceptual view , that of the enterprise

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


ANSI/SPARC Architecture

10

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


ANSI/SPARC Architecture

Internal Schema
 At the lowest level of the architecture is the internal view, which deals
with the physical definition and organization of data.
 The location of data on different storage devices and the access
mechanisms used to reach and manipulate data are the issues dealt with at
this level.
External Schema
 At the other extreme is the external view, which is concerned with how
users view the database.
 An individual user’s view represents the portion of the database that will
be accessed by that user as well as the relationships that the user would
like to see among the data.
 A view can be shared among a number of users, with the collection of
user views making up the external schema.
11

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


ANSI/SPARC Architecture

Conceptual Schema
 In between these two ends is the conceptual schema, which is
an abstract definition of the database.
 It is the “real world” view of the enterprise being modeled in
the database

12

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Experience Sharing

o Have you dealt with DDS in your organization?


o Knowingly / Unknowingly?
o What about three schema architecture ( ANSI/SPARC )
architecture? Have used them in any applications? Throw some
light to the class 

13

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Architectural Models for DDBS

 Lets consider the possible ways in which a distributed DBMS


may be architected.
 We use a classification that organizes the systems as
characterized with respect to
 The autonomy of local systems,
 Their distribution, and
 Their heterogeneity

14

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Architectural Models for DDBS

Exercise:
Read the book C4
For more info !!

15

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Autonomy

 Autonomy, in this context, refers to the distribution of control,


not of data.
 Autonomy refers to the distribution (or decentralization) of
control
 It indicates the degree to which individual DBMSs can operate
independently.
 Autonomy is a function of a number of factors such as whether
the component systems (i.e., individual DBMSs) exchange
information, whether they can independently execute
transactions, and whether one is allowed to modify them.

16

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Autonomy

The dimensions of autonomy can be specified as follows:


1. Design autonomy: Individual DBMSs are free to use the data models
and transaction management techniques that they prefer.
2. Communication autonomy: Each of the individual DBMSs is free to
make its own decision as to what type of information it wants to provide to
the other DBMSs or to the software that controls their global execution.
3. Execution autonomy: Each DBMS can execute the transactions that
are submitted to it in any way that it wants to.
4. Tight integration, where a single-image of the entire database is
available to any user who wants to share the information, which may reside
in multiple databases.
 From the users’ perspective, the data are logically integrated in one
database.
17

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Autonomy

 In these tightly-integrated systems, the data managers are


implemented so that one of them is in control of the processing of
each user request even if that request is serviced by more than one
data manager.
 The data managers do not typically operate as independent DBMSs
even though they usually have the functionality to do so.
 Semiautonomous systems that consist of DBMSs that can (and
usually do) operate independently, but have decided to participate in
a federation to make their local data sharable.
 Each of these DBMSs determine what parts of their own database
they will make accessible to users of other DBMSs.
 They are not fully autonomous systems because they need to be
modified to enable them to exchange information with one another.

18

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Autonomy

 Total isolation, where the individual systems are stand-alone


DBMSs that know neither of the existence of other DBMSs
nor how to communicate with them.
 In such systems, the processing of user transactions that access
multiple databases is especially difficult since there is no
global control over the execution of individual DBMSs.

19

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Distribution

 Autonomy refers to the distribution (or decentralization) of


control, the distribution dimension of the taxonomy deals with
data over multiple sites. The user sees the data as one logical pool.
 We abstract these alternatives into two classes:
 client/server distribution and
 peer-to-peer distribution
 The client/server distribution concentrates data management duties
at servers while the clients focus on providing the application
environment including the user interface.
 The communication duties are shared between the client machines
and servers.
 Client/server DBMSs represent a practical compromise to
distributing functionality. 20

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Distribution

 In peer-to-peer systems , there is no distinction of client


machines versus servers.
 Each machine has full DBMS functionality and can
communicate with other machines to execute queries and
transactions

21

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Heterogeneity

 Heterogeneity may occur in various forms in distributed systems,


ranging from hardware heterogeneity and differences in
networking protocols to variations in data managers.
 Heterogeneity in query languages not only involves the use of
completely different data access paradigms in different data
models but also covers differences in languages even when the
individual systems use the same data model.
 Although SQL is now the standard relational query language,
there are many different implementations and every vendor’s
language has a slightly different flavor
 SQL is Structured Query Language is a database computer
language designed for managing data in relational database
management systems (RDBMS).
22

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Heterogeneity

 PostgreSQL is an object-relational database management system


(ORDBMS).It is released under a BSD-style license and is thus free
software. As with many other open-source programs, PostgreSQL is
not controlled by any single company, but has a global community of
developers and companies to develop it.
 SQLite is an ACID-compliant embedded relational database
management system contained in a relatively small (~225 KB1) C
programming library. The source code for SQLite is in the public
domain.
 MySQL (pronounced /maɪˌɛskjuːˈɛl/1 My S-Q-L, or "My sequel"
/maɪˈsiːkwəl/) is a relational database management system (RDBMS)2
which has more than 6 million installations. 3 MySQL stands for "My
Structured Query Language". The program runs as a server providing
multi-user access to a number of databases. 23

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Quick Break !!

24

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Progress Check

1) Recap of Sessions✔
2) Distributed Database System✔
3) Distributed DBMS✔
4) ANSI/SPARC Architecture✔
5) Architectural Models for DDBS✔
 Autonomy
 Distribution
 Heterogeneity
6) Distributed DBMS Architecture
 Client/Server Systems
 Peer-to-Peer Systems
 Multidatabase System Architecture
7) Distributed Data Sources 25

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Client/Server Systems

 The general idea is very simple and elegant: distinguish the


functionality that needs to be provided and divide these functions
into two classes server functions and client functions.
 This provides a two-level architecture which makes it easier to
manage the complexity of modern DBMSs and the complexity of
distribution
 In relational systems, the server does most of the data management
work.
 This means that all of query processing and optimization, transaction
management and storage management is done at the server.
 The client, in addition to the application and the user interface, has a
DBMS client module that is responsible for managing the data that
is cached to the client and (sometimes) managing the transaction
locks that may have been cached as well.
26

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Client/Server Systems

 It is also possible to place


consistency checking of
user queries at the client
side, but this is not
common since it requires
the replication of the
system catalog at the
client machines.
 In relational systems
where the communication
between the clients and
the server(s) is at the level
of SQL statements
27

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Client/Server Systems

 There are a number of different types of client/server architecture.


 The simplest is the case where there is only one server which is
accessed by multiple clients we call this multiple client/single server .
 From a data management perspective, this is not much different from
centralized databases since the database is stored on only one machine
(the server) that also hosts the software to manage it.
 A more sophisticated client/server architecture is one where there are
multiple servers in the system the so-called multiple client/multiple
server approach.
 In this case, two alternative management strategies are possible: either
each client manages its own connection to the appropriate server or
each client knows of only its “home server” which then communicates
with other servers as required
28

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Database Server Approach

29

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Distributed Database Servers

30

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Peer-to-Peer Systems

 The physical data organization on each machine may be, and


probably is, different. This means that there needs to be an
individual internal schema definition at each site, which we call the
local internal schema (LIS).
 The enterprise view of the data is described by the global conceptual
schema (GCS), which is global because it describes the logical
structure of the data at all the sites.
 To handle data fragmentation and replication, the logical
organization of data at each site needs to be described.
 Therefore, there needs to be a third layer in the architecture, the
local conceptual schema (LCS).
 In the architectural model we have chosen, the global conceptual
schema is the union of the local conceptual schemas.
 Finally, user applications and user access to the database is
supported by external schema 31

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Peer-to-Peer Systems

Distributed
Database
Reference
Architecture – P2P

32

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Peer-to-Peer Systems

 The user queries data


irrespective of its location or
of which local component of
the distributed database
system will service it
 The distributed DBMS
translates global queries into
a group of local queries,
which are executed by
distributed DBMS
components at different sites
that communicate with one
another.
33

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Peer-to-Peer Systems

34

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Peer-to-Peer Systems

The first major component, which we call the user processor , consists of
four elements:
1. The user interface handler is responsible for interpreting user
commands as they come in, and formatting the result data as it is sent to
the user.
2. The semantic data controller uses the integrity constraints and
authorization that are defined as part of the global conceptual schema to
check if the use query can be processed
3. The global query optimizer and decomposer determines an execution
strategy to minimize a cost function, and translates the global queries into
local ones using the global and local conceptual schemas as well as the
global directory.
The global query optimizer is responsible, among other things, for
generating the best strategy to execute distributed join operations
35

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Peer-to-Peer Systems

4. The distributed execution monitor coordinates the


distributed execution of the user request.
 The execution monitor is also called the distributed transaction
manager .
 In executing queries in a distributed fashion, the execution
monitors at various sites may, and usually do, communicate
with one another

36

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Peer-to-Peer Systems

37

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Peer-to-Peer Systems

The second major component of a distributed DBMS is the data processor


and consists of three elements:

1. The local query optimizer, which actually acts as the access path
selector, is responsible for choosing the best access path to access any data
item
2. The local recovery manager is responsible for making sure that the
local database remains consistent even when failures occur.
3. The run-time support processor physically accesses the database
according to the physical commands in the schedule generated by the query
optimizer.
The run-time support processor is the interface to the operating system and
contains the database buffer (or cache) manager, which is responsible for
maintaining the main memory buffers and managing the data accesses.
38

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Multidatabase System

 Multidatabase systems (MDBS) represent the case where


individual DBMSs (whether distributed or not) are fully
autonomous and have no concept of cooperation; they may
not even “know” of each other’s existence or how to talk to
each other.
 The differences in the level of autonomy between the
distributed multi-DBMSs and distributed DBMSs are also
reflected in their architectural models.
 In the case of logically integrated distributed DBMSs, the
global conceptual schema defines the conceptual view of the
entire database, while in the case of distributed multi-DBMSs,
it represents only the collection of some of the local databases
that each local DBMS wants to share.
39

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Multidatabase System

 The individual DBMSs may choose to make some of their data available
for access by others by defining an export schema
 In a MDBS, the GCS which is also called a mediated schema is defined
by integrating either the external schemas of local autonomous databases
or (possibly parts of their) local conceptual schemas.
 Designing the global conceptual schema in multidatabase systems
involves the integration of either the local conceptual schemas or the
local external schemas
 A major difference between the design of the GCS in multi-DBMSs and
in logically integrated distributed DBMSs is that in the former the
mapping is from local conceptual schemas to a global schema
 if heterogeneity exists in the multidatabase system, a canonical data
model has to be found to define the GCS
40

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Multidatabase System

MDBS Architecture
with a GCS

41

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Distributed Data Sources

 A data source is simply the source of the data.


 It can be a file, a particular database on a DBMS, or even a live
data feed.
 The data might be located on the same computer as the program,
or on another computer somewhere on a network
 The purpose of a data source is to gather all of the technical
information needed to access the data — the driver name, network
address, network software, and so on — into a single place and
hide it from the user.
 The user should be able to look at a list that includes Payroll,
Inventory, and Personnel, choose Payroll from the list, and have
the application connect to the payroll data, all without knowing
where the payroll data resides or how the application got to it
42

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Distributed Data Sources
 There are two types of data sources:
 Machine Data Sources And
 File Data Sources.
 Although both contain similar information about the source of the
data, they differ in the way this information is stored.
 Because of these differences, they are used in somewhat different
manners
Machine data sources
 Machine data sources are stored on the system with a user-defined
name.
 Associated with the data source name is all of the information the
Driver Manager and driver need to connect to the data source.
 For an Xbase data source, this might be the name of the Xbase
driver, the full path of the directory containing the Xbase files, and
some options that tell the driver how to use those files, such as
single-user mode or read-only. 43

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Distributed Data Sources

File data sources


 File data sources are stored in a file and allow connection
information to be used repeatedly by a single user or shared
among several users.
 When a file data source is used, the Driver Manager makes the
connection to the data source using the information in a .dsn
file.
 This file can be manipulated like any other file. A file data
source does not have a data source name, as does a machine
data source, and is not registered to any one user or machine.

44

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Distributed Data Sources

 Data sources usually are created by the end user or a technician with a
program called the ODBC Administrator.
 The ODBC Administrator prompts the user for the driver to use and
then calls that driver.
 Open Database Connectivity (ODBC) is a system that
connects ODBC-enabled applications to the database management
systems that provide the data. The ODBC Data Source
Administrator is used to configure your applications so that they can
get data from a variety of database management systems.
 The driver displays a dialog box that requests the information it needs
to connect to the data source.
 After the user enters the information, the driver stores it on the system.
Later, the application calls the Driver Manager and passes it the name
of a machine data source or the path of a file containing a file data
source. 45

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Distributed Data Sources

 When passed a machine data source name, the Driver Manager


searches the system to find the driver used by the data source.
 It then loads the driver and passes the data source name to it.
The driver uses the data source name to find the information it
needs to connect to the data source.
 Finally, it connects to the data source, typically prompting the
user for a user ID and password, which generally are not
stored.

46

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Distributed Data Sources

 When passed a file data source, the Driver Manager opens the
file and loads the specified driver.
 If the file also contains a connection string, it passes this to the
driver.
 Using the information in the connection string, the driver
connects to the data source.
 If no connection string was passed, the driver generally
prompts the user for the necessary information.

47

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Exercise

• Watch RL’s
• Explore more about Multidatabase
• Mediator/Wrapper etc

48

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


49

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Thank You for your
time & attention !
Contact : [email protected]

50

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956

You might also like