Distributed Data Systems: Sesapzg554
Distributed Data Systems: Sesapzg554
SESAPZG554
BITS Pilani
Pilani|Dubai|Goa|Hyderabad
Parthasarathy
1
BITS Pilani
Pilani|Dubai|Goa|Hyderabad
SESAPZG554 – CS#3
Distributed DBMS Architectures
2
Agenda for CS #3
1) Recap of Sessions
2) Distributed Database System
Watched M2
3) Distributed DBMS Videos ?
4) ANSI/SPARC Architecture
5) Architectural Models for DDBS
Autonomy
Distribution Read Chapter 1
& 4 – Textbook ?
Heterogeneity
6) Distributed DBMS Architecture
Client/Server Systems
Peer-to-Peer Systems
Multidatabase System Architecture
7) Distributed Data Sources 3
10
Internal Schema
At the lowest level of the architecture is the internal view, which deals
with the physical definition and organization of data.
The location of data on different storage devices and the access
mechanisms used to reach and manipulate data are the issues dealt with at
this level.
External Schema
At the other extreme is the external view, which is concerned with how
users view the database.
An individual user’s view represents the portion of the database that will
be accessed by that user as well as the relationships that the user would
like to see among the data.
A view can be shared among a number of users, with the collection of
user views making up the external schema.
11
Conceptual Schema
In between these two ends is the conceptual schema, which is
an abstract definition of the database.
It is the “real world” view of the enterprise being modeled in
the database
12
13
14
Exercise:
Read the book C4
For more info !!
15
16
18
19
21
24
1) Recap of Sessions✔
2) Distributed Database System✔
3) Distributed DBMS✔
4) ANSI/SPARC Architecture✔
5) Architectural Models for DDBS✔
Autonomy
Distribution
Heterogeneity
6) Distributed DBMS Architecture
Client/Server Systems
Peer-to-Peer Systems
Multidatabase System Architecture
7) Distributed Data Sources 25
29
30
Distributed
Database
Reference
Architecture – P2P
32
34
The first major component, which we call the user processor , consists of
four elements:
1. The user interface handler is responsible for interpreting user
commands as they come in, and formatting the result data as it is sent to
the user.
2. The semantic data controller uses the integrity constraints and
authorization that are defined as part of the global conceptual schema to
check if the use query can be processed
3. The global query optimizer and decomposer determines an execution
strategy to minimize a cost function, and translates the global queries into
local ones using the global and local conceptual schemas as well as the
global directory.
The global query optimizer is responsible, among other things, for
generating the best strategy to execute distributed join operations
35
36
37
1. The local query optimizer, which actually acts as the access path
selector, is responsible for choosing the best access path to access any data
item
2. The local recovery manager is responsible for making sure that the
local database remains consistent even when failures occur.
3. The run-time support processor physically accesses the database
according to the physical commands in the schedule generated by the query
optimizer.
The run-time support processor is the interface to the operating system and
contains the database buffer (or cache) manager, which is responsible for
maintaining the main memory buffers and managing the data accesses.
38
The individual DBMSs may choose to make some of their data available
for access by others by defining an export schema
In a MDBS, the GCS which is also called a mediated schema is defined
by integrating either the external schemas of local autonomous databases
or (possibly parts of their) local conceptual schemas.
Designing the global conceptual schema in multidatabase systems
involves the integration of either the local conceptual schemas or the
local external schemas
A major difference between the design of the GCS in multi-DBMSs and
in logically integrated distributed DBMSs is that in the former the
mapping is from local conceptual schemas to a global schema
if heterogeneity exists in the multidatabase system, a canonical data
model has to be found to define the GCS
40
MDBS Architecture
with a GCS
41
44
Data sources usually are created by the end user or a technician with a
program called the ODBC Administrator.
The ODBC Administrator prompts the user for the driver to use and
then calls that driver.
Open Database Connectivity (ODBC) is a system that
connects ODBC-enabled applications to the database management
systems that provide the data. The ODBC Data Source
Administrator is used to configure your applications so that they can
get data from a variety of database management systems.
The driver displays a dialog box that requests the information it needs
to connect to the data source.
After the user enters the information, the driver stores it on the system.
Later, the application calls the Driver Manager and passes it the name
of a machine data source or the path of a file containing a file data
source. 45
46
When passed a file data source, the Driver Manager opens the
file and loads the specified driver.
If the file also contains a connection string, it passes this to the
driver.
Using the information in the connection string, the driver
connects to the data source.
If no connection string was passed, the driver generally
prompts the user for the necessary information.
47
• Watch RL’s
• Explore more about Multidatabase
• Mediator/Wrapper etc
48
50