0% found this document useful (0 votes)
1 views

DBMS Repeated From Book

The document outlines the Three Schema Architecture of a Database Management System (DBMS), detailing the internal, conceptual, and external schemas along with their mappings and the concept of data independence. It discusses the roles of database administrators, transaction management, storage manager, query processor, and the evaluation of DBMS features and data models. Additionally, it covers various data models, including hierarchical, network, and relational models, as well as the importance of a data dictionary and database design phases.

Uploaded by

S.S. Ammar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1 views

DBMS Repeated From Book

The document outlines the Three Schema Architecture of a Database Management System (DBMS), detailing the internal, conceptual, and external schemas along with their mappings and the concept of data independence. It discusses the roles of database administrators, transaction management, storage manager, query processor, and the evaluation of DBMS features and data models. Additionally, it covers various data models, including hierarchical, network, and relational models, as well as the importance of a data dictionary and database design phases.

Uploaded by

S.S. Ammar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 74

Three Schema Architecture:

Defines DBMS schemas at three levels:

Internal schema at the internal level to describe physical


storage structures and access paths. Typically uses a physical
data model.

Conceptual schema at the conceptual level to describe the


structure and constraints for the whole database for a
community of users. Uses a conceptual or an implementation
data model.
External schemas at the external level to describe the various
user views. Usually uses the same data model as the
conceptual level.

END USERS

EXTERNAL EXTERNAL.
LEVEL VIEW
VIEW

axleriial/curicoptual
mapping

CONCEPTUAL
LEVEL

conceptual/internal mapping

INTERNAL
LEVEL

STORED DATABASE
Mappings among schema levels are needed to transform
requests and data. Programs refer to an external schema, and
are mapped by the DBMS to the internal schema for execution.

Data Independence:

Logical Data Independence: "rhe capacity to change the


conceptual schema without having to change the external
schemas and their application programs.

Physical Data Independence: The capacity to change the


internal schema without having to change the conceptual
schema.

When a schema at a lower level is changed, only the


mappings between this schema and higher-level schemas need
to be changed in a DBMS that fully supports data independence.
The higher-level schemas themselves are unchanged. Hence,
the application programs need not be changed since they refer
to the external schemas.

Overall System Structure:

The DBMS architecture constitutes the Disk storage,


Storage manager, Query processor and Database users,
connected through various tools and applications like query and
application tools and application programs and application
interfaces. The architecture of a DBMS is illustrated as follows:
Users are differentiated by the way they expect to
interact with the system.

• Application programmers - interact with system


through DML calls
• Sophisticated users — form requests in a database
query language

• Specialized users — write specialized database


applications that do not fit into the traditional data
processing framework

• NaTve users - invoke one of the application programs


that have been written previously. For example people
accessing database over the web, bank tellers,
clerical staff etc.

Database Administrator:
Coordinates all the activities of the database system; the
database administrator has a good understanding of the
enterprise’s information resources and needs.
Responsibilities of the Database administrator include:
• Schema definition
• Storage structure and access method definition
• Schema and physical organization modification
• Granting user authority to access the database
• Specifying integrity constraints
• Acting as liaison with users
• Monitoring performance and responding to changes in
requirements
Transaction Management within Storage Manager:
• A transaction is a collection of operations that performs a
single logical function in a database application

• Transaction-management component ensures that the


database remains in a consistent (correct) state despite
system failures (e.g., power failures and operating system
crashes) and transaction failures.

• Concurrency-control manager controls the interaction


among the concurrent transactions, to ensure the
consistency of the database.
• This process is performed by a system inside Storage
manager.

Storage Manager

• Storage manager is a program module that provides the


interface between the low-level data, stored in the
database and the application programs and queries
submitted to the system.

• It executes based on data obtained from query execution


engine.

• It consists of Buffer manager, File manager,


Authentication and integrity manager and Transaction
manager.

• The storage manager is responsible to the following


tasks:

• Interaction within various managers.

• Efficient storing, retrieving and updating of data

Disk Storage

Disk storage consists of Data in the form logical tables,


indices, data dictionary and statistical data. Data Dictionary
stores the data about data i.e., its structure etc. Indices are used
for easy searching in a data base. Statistical data is the log
storage details about the various transactions which occur on
the database.

Query processor

• The users submits query which passes to optimizer


where the query is optimized, the physical execution plan
goes to execution engine.

• Query Execution engine passes the request to index/ file


/ record manager. That in turn passes the request to
buffer manager requesting it to allocate memory to store
pages. Buffer manager in turn sends the pages to storage
manager which takes care of physical storage.

• The resulting data out of physical storage comes in


reverse order. The catalog is the Data Dictionary which
contains statistics and schema. Every query execution
which takes place in execution engine is logged and
recovered when required.

When not to use a DBMS?

As a DBMS includes the following costs like high initial


cost , (possibly) cost of extra hardware , cost of entering data,
cost of training people to use DBMS ,cost of maintaining DBMS,
it becomes tough to use DBMS.

A DBMS becomes unnecessary

• If access to data by multiple users is not required


(and data set is small)

• If database and application are simple, well-


defined, and not expected to change

Data Model:

A set of concepts to describe the structure of a database,


and certain constraints that the database should obey.

Data Model Operations:

Operations for specifying database retrievals and


updates by referring to the concepts of the data model.
Operations on the data model may include basic operations and
user-defined operations.

2.0 Hierarchical Data Model


Organizes the database as a tree structure.

• Organization of the records is as a collection of trees,


rather than arbitrary graphs.

• Schema represented by a Hierarchical Diagram.


• One record type, called Root, does not
participate as a child record type.

'• Every record type except the root participates


as a child record type in exactly one type.

• Leaf is a record that does not participate in any


record types.

• A record can act as a Parent for any number of


records.

2.1 Network Data Model


Organizes the database as a graph.

• Data are represented by collections of records.

• Relationships among data are represented by links.

19
• Organization is that of an arbitrary graph and
represented by Network diagram.

Constraints in the Network Model:

• Insertion Constraints: Specifies what should happen


when a record is inserted

• Retention Constraints: Specifies whether a record must


exist on its own or always be related to an owner as a
member of some set instance.

• Set Ordering Constraints: Specifies how a record is


ordered inside the database.

• Set Selection Constraints: Specifies how a record can be


selected from the database.

2.2 Relational Data Model


Organizes the database as a set of relations (tables) and
relationship among the relations (tables).

ROW J’ABLE

Data Dictionary:

The data dictionary stores information about the data in


the database that is essential to its management as a business
resource. A data dictionary, or data catalog is a database (in its
own right) that provides a list of the definitions of all objects in
the main database. For instance, it should include information on
all entities in the database, along with their attributes and
indexes. This "data about data" is sometimes referred to as

20
20
metadata. The data dictionary should be accessible to the user
of the database, so that she can obtain this metadata. Some
examples of the contents of the data dictionary are:

• What data is available?

• Where the data is stored?

• Who owns the data?

• How the data is used?

• Who can access the data?

• Where relationships exist between data items?

Database Design Phases:

DATA ANALYSIS

LOGICAL DESIGN
Tables - Columns - Primary Keys - Foreign Keys

PHYSIC AL DESIGN

• Conceptual Design: (Entity Relationship Model is used at


this stage.)

21
• Schema Refinement: (Normalization)

• Physical Database Design and Tuning

2.3 Entity Relationship Model


An Entity is an object in the world that can be
distinguished from other objects. An Entity has well defined
properties (attributes).

What Should an Entity be and Should not be?

Should Be:

• An object that will have many instances in the database

• An object that will be composed of multiple attributes

• An object that we are trying to model

Should Not Be:

• A user of the database system

• An output of the database system (e.g. a report)

Attribute:

An Attribute is property or characteristic of an entity type.

Classification of attributes:

• Simple versus Composite Attribute

• Single-Valued versus Multivalued Attribute

• Stored versus Derived Attributes

• Identifier Attributes

Simple versus Composite Attribute:

An attribute that cannot be further subdivided is termed


as a simple attribute or an atomic attribute.

22
For example consider tne following:

The attribute Employee ID in EMPLOYEE entity, cannot


be further sub divided.

An attribute that can be further sub divided is an example


for composite attribute. As an example consider the attribute
address:

Single-Valued versus Multivalued Attribute:

If the attribute of an entity has only one value associated


with it, it is termed as a single valued attribute. For example the
attribute Employee_lD in Employee entity is Single valued.

23
If there are more than one values associated with an
entity it is a multivalued attribute. For example the attribute Skill
in Employee entity is multivalued.

Stored versus Derived Attributes:

If the value of an attribute is stored (for exampleDate


Employed) in Employee entity) it is a stored attribute.

If the value of an attribute is derived from the value of


another it is a derived attribute. For example the attribute
Years Employed is a derived attribute, since the value of
Years_Employed is derived from the value of Date_Employed.

Identifier Attributes:

Identifier (Key) - An attribute (or combination of attributes) that


uiiiquuly ider lifie iridlvldual Instances of an entity type.

The attribute Employee ID in Employee entity uniquely


identifies an Employee entity.

E-R Model Constructs:

Entity instance: person, place, object, event, concept (often


corresponds to a row in a table).
Entity Type: Collection of entities (often corresponds to a table)
Attribute - property or characteristic of an entity type (often
corresponds to a field in a table).
Relationship instance: Link between entities (corresponds to
primary key-foreign key equivalencies in related tables)
Relationship Type: Category of relationship...link between
entity types
A Sample Entity Relationship Diagram:

Business Logic:

Supplier supplies Items. Supplier sends Shipment. Shipment


includes Items. Items are used in Products. Customers submit
Order. Order requests Product.

24
SfJlPMENT ITEM CUSÏOMER

ENTITY

Opl Man

Basic E-R Notation:

Basic symbols

lf0f\g 9fJ(Ît}/ Associative entity

Relati0nship Multivalued atttibute

Identifying r9lationsJip D9fiVfid ültfibUl9

25
Evaluation of DBMS:
Evaluation is done based on the following features:
• Data Definition

• Physical Definition

• Accessibility
• Transaction handling

• Utilities
• Development

Physical dzfnition

Primary kcy cnforccment File structures available


Foreign key specification File structure maintenance
Data types available Fuse of reorganiz.ation
Data type extensibility Indexing
Domain specification Variable length fields/records
Ease of restnicturing Data compression
Integrity controls Encryption routines
View mechanism Memory requirements
Data dictionary Storage requirements
Data indepehdence
Type of data model used
Schema evolution

62

26
Accessibilité

Query language: SQL-92/SQL3 compliant Backup and recovery routines


Other system interfacing Checkpointing facihty
Interfacing to 3GLs Logging facility
Multiuser Granularity of concurrency
Security Deadlock resolution skategy
— Access controls Advanced transaction models
— Authorization mechanism Parallel query processing

4GL/SGL tech
CASEtmh
Loa‹f/unloaé facilite
User usage nonitoring
Dalabm adInÎnÎStfdti0Il Sllp|mt

Other ferrures

Upgradibihty Interoperability with other DBMSs and other systems


Vendor stability Internet support
User base Replication utilities
Trainiiig a8d user support Distributed capabilities
Documentation Portability
Operating system required Hardware required
Cost Network support
Online help Object-oriented capabilities
Standards used Architecture (2 or 3-uer clienfserver)
Version mana8ement Performance
Extensible query optimization Transaction throughput
Scalabihty Maximum number of concurrent users

63
Evaluation of Data Models:
An optimal data model should satisfy the criteria
tabulated below:

64
Criteria

Simplicity

Expressability

Non redundancy

Sharability

Extensibility

Integrity

Diagrammatic
representation

65
Description

Consistency with the way the enterprise defines and organizes information.

Ease ”of understanding by Information System professionals and non-technical users.

Ability to distinguish between different data, relationships, between data, and constraints.
Exclusion of extraneous information; in particular, the representation of any one piece of information
exactly ones

Not specific to any particular application or technology and thereby usable by many

Ability to evolve to support new requirements with minimal affect on existing users

Consistency with the way the enterprise uses and manages information.

Ability to represent a model using easily understood diagrammatic notation .

An Associative Entity is a special entity that is also a


relationship.

3.1 Methods of file organization


File Structures:
File Structures is the Organization of Data in Secondary
Storage Device in such a way that minimizes the access time
and the storage space. A File Structure is a combination of
representations for data in files and of operations for accessing
the data. A File Structure allows applications to read, write and
modify data. It might also support finding the data that matches
some search criteria or reading through the data in some
particular order. File organization may be sequential, index
sequential, inverted list or random. Each method has its own
uses and abuses.
A file is organized tu unsure that records are available for
processing. Before a file is created, the application to which the
file will be use must be carefully examined. Clearly, a

66
fundamental consideration in this examination will concern the
data to be recorded on the file. But an equally important and less
obvious consideration concerns how the data are to be placed
on the file.
3.1.1 Sequential file Organization
It is the simplest method to store and retrieve data from a
file. Sequential organization simply means storing and sorting in

67
physical on tape or disk. In a sequential organization records can
be added only at the end of the file. That is in a sequential file,
records are stored one after the other without concern for the
actual value of the data in the records. It is not possible to insert
a record in the middle of the file without re-writing the file.
Records from both files are matched, one record at a time,
resulting in an updated master file. It is a characteristic of
sequential files that all records are stored by position; the first
one is at the first position, the second one occupies the second
position and so on. "rhere are no addresses or location
assignments in sequential files. To read a sequential file, the
system always starts at the beginning of the file. If the record
sought is somewhere in the file, the system reads its ways unto
it, one record at a time. For example, if a particular record
happens to be the fifteen one in a file, the system starts at the
first one, and reads a head one record at a time until the fifteenth
one is reached. It cannot jump directly to the fifteenth one in a
sequential file without starting from the beginning. Using the key
field, in a sequential file the records have been arranged into
ascending or descending order according to a Keyfield. This key
field may be numeric, alphabetic, or a combination of both, but it
must be occupy the same place in each reccrd, as it forms the
basis for determining header which the records will appear on the
file. Sequential files are generally maintained a magnetic tape,
disk or a mass storage system. The advantages and
disadvantages of the sequential file organization are given
below:

Advantages:

• Simple to understand this approach


• Locating a record requires only the record key
• Efficient and economical activity rate

• Relatively inexpensive I/O media and devices may be


used.

68
• Files may be relatively easy to reconstruct since a good
measure of built in backup is usually available
Disadvantages:

• Entire file must be processed even when the activity rate


is low
• Transactions must be sorted and placed in sequence
prior to processing
• Timeliness of data in file deteriorates while batches are
being accumulated
• Data redundancy is typically high since the same data
may be stored in several files sequenced on different
keys
3.1.2 Random or Direct File Organization
For a proposed system, when the sequential files are
assumed as a disadvantage, another file organization called
direct organization is used. As with a sequential file, each record
in a direct file nest contains a key field. However the records
need not appear on the file in deny field sequence. In addition
any record stored on a direct file can be accesseddirectly, if its
location or address is known (i.e) all previous records need not
to be accessed. The problem, however is to determine how to
6tore the data records so that, given the key field of the desired
record, its storage location on the flle can be determined. In other
words, if the program knows the record key, it can determine the
location address of a record and retrieve it independently of any
other records in the file. It would be ideal ifthe key field could also
be the location of the record on the file. This method is known as
direct addressing method. This is quite simple method but the
requirements of this method often prevent its use. In case of
many other factors, this method couldbecome popular. Hence it
is rarely used.
Therefore, before a direct organized file can be created,
a formula or method must be devised to convert the key field

69
value for a record to the address or location of the record on the
file. This formula or method is generally called an algorithm.
Otherwise called the hashing addressing. Hashing refers to the
process of deriving a storage address from a record key; there
are many algorithms to determine the storage location using key
field, some of the algorithms are:
Division by Prime: In this procedure, the actual key is divided
by any prime number. Here the modular division is used. “rhat is
quotient is discarded and the storage locations signified by the
remainder. If the key field consists of large number of digits, for
instance, 10digits (e.g. 2345632278) then strip off the first or last
4 digits and then apply the division by prime method. Various
common algorithms are also given as under: Folding, Extraction,
Squaring.
The advantages and disadvantages of direct file
organization are as follows:
Advantages:
• Immediate access to records for inquiry and updating
purposes is possible
• Immediate updating of several files as a result of single
transaction is possible
• Time taken for sorting the transaction can be saved
Disadvantages:
• Records in the on-line file may be exposed the risk of a
loss of accuracy and a procedure for special backup and
reconstruction is required
• As compared to sequentially organized, this may be less
efficient in using the storage space
• Adding and deleting of records is more difficult then with
sequential files
• Relatively expensive hardware and software resources
are required
3.1.3 Index Sequential File Organization
The third way of accessing records stored in the system
is through an index. The basic form of an index includes a record
key and the storage address for record. To find record, when the
storage address is unknown it is necessary to scan the records.
However, if an index is used, the search will be faster since it
takes less time to search an index than an entire file of data.
Indexed file offers the simplicity of sequential file while the
same time offering a capability for direct access. The records
must be initially stored on the file in sequential order according
to a key field. In addition, as the records are being recorded on
the file, one or more indexes are establist›ed by the system to
associate the key field vall.ie(s) with the storage location of the
record on the file. These indexes are then used by the system
to allow a record to be directly accessed.
To find a specific record when the file is stored under an
indexed organization, the index is searched first to find the key
of the record wanted. When it is found, the corresponding
storage address is ncted and then the program can access the
record directly. This method uses a sequential scan of the index,
followed by direct access to the appropriate record. The index
helps to speed up the search compared with a sequential file, but
it is slower than the direct addressing.
The indexed files are generally maintained on magnetic
disk or on a mass storage system. The primary differences
between direct and indexed organized files are as follows:
Records may be accessed from a direct organized file
only randomly, where as records may be accessed sequentially
or randorrily from an indexed organized files.
Direct organized files utilize an algorithm to determine the
location of a record, whereas indexed organized files utilize an
index to locate a record to be randomly accessed. The
advantages and disadvantages of indexed sequential file
organization are as follows:
Advantages:

• Permits the efficient and economical use of sequential


processing techniques when the activity rate is high.
• Permits quick access to records in a relatively efficient
way this activity is a small fraction of the total workload.
Disadvantages:

• Less efficient in the use of storage space than some


other alternatives.
• Access to records may be slower using indexes than
when transform algorithms are used.
• Relatively expensive hardware and software resources
are required
Indexing and Hashing:
An index for a file in a database system works in much
the same way as the index in a book. The index is much
smaller than the book, further reducing the effort needed to find
the words we are looking for. Database system indices play the
same role as book indices or card catalogs in libraries.
There are two basic kinds of indices:

• Ordered indices - Based on a sorted ordering of the


values.
• Hash indices - Based on a uniform distribution of the
values across a range of buckets. The bucket to which a
value is assigned is determined by a function, called a
hash function.
to on’e technique is the best. Rather, each technique is
best suited to particular database application. Each technique
must be evaluated on the basis of these factors:

53
• Access types: The types of access that are supported
efficiently. Access types can include finding records with
a specified attribute value and finding records whose
attribute values fall in a specified range.

• Access time: The time it takes to find a particular data


item, or set of items, using the technique.
• Insertion time: The time it takes to insert a new data item.
"rhis value includes the time it takes to find the correct
place to insert the new data item, as well as the time it
takes to update the index structure.

• Deletion time: The time it takes to delete a data item. This


value includes the time it takes to find the item to be
deleted, as well as the time it takes to update the index
structure.

• Space overhead: The additional space occupied by an


index structure. Provided that the amount of additional
space is moderate, it is usually worthwhile to sacrifice the
space to achieve irriproved performance.
An attribute or set of attributes used to look up records in
a file is called "search key." Note that this definition of "key"
differs from that used in "primary key, candidate key and
superkey." Using our noticn of a search key, we see that if there
are several indices on a file, there are several search keys.
Ordered Indices:
To gain fast random access to records in a file, we can
use an index structure. Each index structure is associated with
a particular search key. Just like the index of a book or a library
catalog, an ordered index stores the values of the search keys
in sorted order, and associates with each search key the records
that contain it.
The records in the indexed file may th•.mselves be stored
in some sorted order. A file may have several índices, on
different search keys. If the file containing the records is
sequentially ordered, a "primary index" is an index whose
search key also definen the sequential order of the file. The
term "primary index" is sometimes used to mean an index on a
primary key. However, such usage is nonstandard and should
be avoided.) Primary índices are also called "clustering indices."
The search key of a primary index is usually the primary key,
although that is not necessarily so. índices whose search key
specifies an order different from the sequential order of the file
called "secondary índices," or‘ "nonclusterinç" índices.
Primary Index:
"Index-sequential files" are files that ordered sequentially
on primary index on the search key. They are designed for
applications that require both sequential processing of the entire
file and random access to individual records.
Dense and Sparse Índices:
An "index record," or "index entry," consists of a search-
key value, and pointers to one or more records with that value as
their search-key value. The pointer to a record consists of the
identifier of a disk block and an offset within the disk block to
identify the record within the block.
There are two types of ordered índices that we can use:
Dense index: An index record appears for every search-key
value in the file. In a dense primary index, the index record
contains the search-key value and a pointer to the first data
record with that search-key value. The rest of the records with
the same search-key value would be stored sequentially alter the
‘first record, since, because the index is a primary one,
records are sorted on the same search key. Dense index
implementations may store a list of pointers to all records with
the same search-key value; doing so is not essential for primary
indices'
Sparse index: An index record appears for only some of the
search-key values. As is true dense indices, each index record
contains a search-key value and a pointer to the first data record
with that search-key value. To locate a record, we find the index
entry with the largest search-key value that is less than or equal
to the search-key value for which we are looking.We start at the
record pointed to by that index entry, and followthe pointers in
the file until we find the desired record.
Multilevel Indices:
Even if we use a sparse index, the index itself may
become too large for efficient processing. It is not unreasonable,
in practice, to have a file with 100,000 records, with 10 records
stored in each block. If we have one index record per block, the
index has 10,000 records. Index records are smaller than data
records, so let us assume that 100 index records fit on a block.
Thus, our index occupies 100 blocks. Such large indices are
stored as sequential files on disk.
If an index is sufficiently small to be kept in main memory,
the search time to find an entry is low. However, if the index is
so large that it must be kept on disk, a search for an entry
requires seveial dis°k Uluck reads. If the Index occupies N blocks,
binary search requires as many as (log N) blocks to be read.
Indices with two or more levels are called "multilevel"
indices. Searching for records with a multilevel index requires
significantly fewer l/O operations than does searching for
records by binary search. Multilevel indices are closely related
to tree structures, such as the binary trees used for in-memory
indexing.
Index Update:
Regardless of what form of index is used, every index
must be updated whenever a record is either inserted into or
deleted from the file. We first describe algorithms for updating
single-level indices.
Insertion:
First the system performs a lookup using the search-key
value that appears in the record to be inserted. Again, the
actions the system takes next depend on whether the index is
dense or sparse:
Dense Indices:

• If the search-key value does not appear in the index, the


system Inserts an index record with tlse search-key value
in the index at the appropriate position.
Otherwise the following actions are taken:

• If the index record stores pointers to all Records with the


same search-key value, the system adds h pointer to the
new record to the index record.
• Otherwise, the index record stores a pointer to only the
first record with the search-key value. The system then
places the record being inserted after the other records
with the same search-key values.
Sparse Indices:
We assume that the index stores an entry for each block.
If the system creates a new block, it inserts the ’first search-key
value (in search-key order) appearing in the new block into the
index. On the other hand, if the new record has the least
search-key value in its block, the system updates the index
entry pointing to the block; if not, the system makes no change
to the index.

17
Deletion: To delete a record, the system first looks up the record
to be deleted. The actions the system takes next depend on
whether the index is dense or sparse:
Dense Indices:

• If the deleted record was the only record with its particular
search-key value, then the system deletes the
corresponding index record from the index.
Otherwise the following actions are taken:

• If the index record stores pointers to all records with the


same search-key value, the system deletes the pointer to
the deleted record from the index record.

• Otherwise, the index record stores a pointer to only the


first record with the search-key value. In this case, if the
deleted record was the first record with the search-key
value, the system updates the index record to point to the
next record.
Sparse Indices:

• If the index does not contain an index record with the


search-key value of the deleted record, nothing needs to
be done to the index.
Otherwise the system takes the following actions:

• If the deleted record was the only record with its search
key, the system replaces the corresponding index record
with an index record for the next search-key value (in
search-key order). If the next search-key value already
has an index entry, the entry is deleted instead of being
replaced.

• Otherwise, if the index record for the search-key value


points to the record being deleted, the system updates
the index record to point to the next record with the same
search-key value.
Insertion and deletion algorithms for multilevel indices are
a simple extension of the scheme just described.
Secondary Indices:
Secondary indices must be dense, with an index entry for
every search-key value, and a pointer to every record in the file.
A primary index may be sparse, storing only some of the
search-key values, since it is always possible to find records with
intermediate search-key values by a sequential access to a part
of the file. If a secondary index stores only some of the search-
key values, records with intermediate search-key valuesmay be
anywhere in the file and, in general, we cannot find them
without searching the entire file.
Secondary indices improve the performance of queries
that use keys other than the search key of the primary index.
However, they impose a significant overhead on the database.
The designer of a database decides which secondary indices are
desirable on the basis of an estimate of the relative frequency
of queries and modifications.
3.1.4 Multiple Key File Organization
B+ tree Index Files:
The main disadvantage of the index-sequential file
organization is that performance degrades as the file grows, both
for index lookups and for sequential scans through the data.
Although this degradation can be remedied by reorganization
of the file, frequent reorganization are undesirable.
The B+ tree index structure is the most widely used of
several index structures that maintain their efficiency despite
insertion and deletion of data. A B+ tree index takes the form of a
balanced tree in which every path from the root of the tree to a
leaf of the tree is of the same length. Each non leaf node in the
tree has beta een ceil(n/2) and n children, where n is fixed for a
particular tree.
B+ tree structure imposes performance overhead on
insertion and deletion, and adds space overhead. The overhead
is acceptable even for frequently modified files, since the cost of
file reorganization is avoided. Furthermore, since nodes may be
as much as half empty, there is some wasted space Overhead,
too, is acceptable given the performance ’benefits of the B+ tree
structure.
Static Hashing:

One disadvantage of sequential file organization is that


we must access an index structure to locate data, or must use
binary search, and that results in more l/O operations. File
organizations based on the technique of hashing allow us to
avoid accessing an index structure. Hashing also provides a
way of constructing indices.
Dynamic Hashing:
Most databases grow larger over time. If we are to use static
hashing for such a database, we have three classes of options:

• Choose a hash function based on the current file size.


This option will result in performance degradation as the
database grows.

• Choose a hash function based on the anticipated size of


the file at some point in the future. Although performance
degradation is avoided, a significant amount of space
may be wasted initially.

• Periodically reorganize the hash structure in response to


file growth. Such a re-organization involves choosing a
new hash function, re-computing the hash function on
every record in the file, and generating new bucket
assignments. This reorganization is a massive, time-
consuming operation. Furthermore, it is necessary to
forbid access to the tile during reorganization.

fi0
Multiple-Key Access:
Use multiple indices for certain types of queries.
Example:
SELECT * FROM ACCOLINT
WHERE BRANCH_NAME = ‘CHENNAI’ AND BALANCE = 1000
Possible strategies for processing query using indices on single
attributes:
• Use index on BRANCH_NAME to find accounts with
balances of 1000; test BRANCH_NAME = ‘CHENNAI’
• Use index on BALANCE to find accounts with balances
of 1000; test BRANCH NAME = ’CHENNAI’
• Use BRANCH NAME index to find pointers to all records
pertaining to the CHEhINAI branch. Similarly use index on
BALANCE. Take intersection of both sets of pointers
obtained.
Indices on Multiple Keys:
Composite search keys are search keys containing more than
one attribute
Example: (BRANCH NAME, BALANCE)
Indices on Multiple Attributes:
Suppose we have an index on combined search-key
(BRANCH_NAME, BALANCE)
With the where clause
WHERE BRANCH_NAME ‘CHEI' NAI’ AND BALANCE
1000 the index on (BRANCH_NAME, BALANCE) can be used
to fetch only records that satisfy both conditions.
Using separate indices is less efficient — we may fetch many
records (or pointers) that satisfy only one of the conditions.
The combined search-key (BRANCH NAME, BALANCE) can
also efficiently handle
WHERE BRANCH_NAME = ’CHENNAI’ AND BALANCE < 1000

59
4.2.1 Properties of Normalization:
Prime Attribute:
An attribute is said to be prime if it is a candidate key,
primary key or part of candidate key or primary key.
Non Prime Attribute,°
An attribute is said to be non prime if it is neither a
candidate key nor a primary key or not part of a candidate key
or primary key.
Transitive Functional Dependency:

A functional dependency X -+ Y in a relation schema R


is a transitive dependency if there is a set of attributes Z, and
both X mz and Z Y hold.
4.2.2 Various Normalization Techniques:
First Normal Form:
First Normal Form disallows multivalued attributes,
composite attributes and their combinations. It states that the
domain of an attribute must include only atomic (simple
indivisible) values and that the value of any attribute in a tuple
must be a single value from the domain of that attribute.

59
EXAMPLE:

A Department in a company could have any number of locations


distributed geographically.
Consider the following relation DEPARTMENT_LOCA"FION:

DNO DNAME DLOCATION

RE RESEARCH (CHENNAI , MADURAI, PUNE)

AD ADMINISTRATION (CHEIJIJAI)

DC DATA COLLECTION {CHENNAI, NAGERCOIL, MARTANDAM)

Is the above relation in ‘INF?


No, The above relation DEPARTMENT_LOCATION isnot in
1NF
Justification:
There are two ways we can look at the DLOCATION attribute:
The domain of DLOCATION contains atomic values, but
some tuples can have a set of these values. In this case,
DLOCATION is not functionally dependent on
DNUMBER.
0 The domain of DLOCATION contains set cf values and
hence is non atomic. In this case DNO mDLOCATION,
because each set is considered as a single member of the
attribute domain.
How to Normalize?
-rhere are three main techniques to achieve first normal form for
such a relation.
0 Remove the attribute DLOCATION that violates 1NF and
place it in a separate relation along with the primary key
DNO of DEPARTMENT LOCATION.. The primary key of
the new relation is a combination of ( DNO

59
DLOCATION). This decomposes the non 1NF relation in
to two ’INF relations.
Expand the key so that there will be a separate tuple in
the original relation for each location of a department. In
this case, the primary key becomes a combination of
(DNO DLOCA1-ION). -Fhis solution has the
disadvantage of introducing redundancy in the relation.
If a maximum of values is known for the attribute, for
example, if it is known that at most three locations can
exist for a department — replace the DLOCATION
attribute by three atomic attributes: DLOCATION1,
DLOCATIOIfl2 and DLOCATION3. “rhis solution has the
disadvantage of introducing null values if most
departments have fewer than three locations
Solution 1:
DEPARTMENT:

DNO DNAME

RE RESEARCH

AD ADMINISTRATION

DC DATA COLLECTION

DEPARTMENT_LOCATION:

DNO DLOCATION

RE MADURAI

RE CHENNAI

59
RE PUNE

AD CHENNAI

DC CHENNAI

DC NAGERCOIL

DC MARTAhIDAM

The following Integrity Constraints hold on the above two


relations:

• In the DEPARTMENT relation the attribute DNO is the


Primary Key

• In the DEPARTMENT LOCATION relation the


combination of (DNO, DLOCATION) is the Primary Key
• The DNO attribute of DEPARTMENT_LOCATION is a
Foreign Key referencing DNO attribute of
DEPARTIVIENT.
Solution 2:
DEPARTMENT

DNO DNAME DLOCATION

RE RESEARCH CHENNAI

RE RESEARCH MADURAI

RE RESEARCH PUNE

AD ADMINISTRATION CHENNAI

DC DATA COLLECTION CHEhlNAl

DC DATA COLLECTION NAGERCOIL

DC DATA COLLECTION MARTANDAM

9I
The Primary Key of the above relation is the combination
of (DNO, DLOCATION). The major drawback of this approach is
redundancy.
Solution 2:
DEPARTMENT:

DNO DNAME DLOCATION1 DLOCATION2 DLOCATION3

RE RESEARCH CHELINAI MADLIRAI PUNE

AD ADMiNis-rRATlON CHENNAI

DC DATA CHENNAI NAGERCOIL MARTANDAM


COLLECI"ION

Second Normal Form:


Each non prime attribute must be fully functionally dependent on
the Candidate Key of the relation. That is, No Partial Functional
Dependencies on the Candidate Key must exist. The above can
be restated as, no non prime attribute must be dependent on
part of the Candidate Key.
Example:
Consider the following relation EMPLOYEE_PROJECT:
Business Rules:

An employee can work on any number of projects on a day.


However he / she will not be permitted to work more than once
on a project he / she worked on the samu day.

ENO ENAME DESIGNATION PNO PNAME DATE INTIME OUTTIIVIE HOURS

In the above relation the following functionally dependencies


hold:
FD 1: ENO, PNO, DATE —-› INTIME, OUTTIME, HOURS

FD 2: ENO —+ ENAIVIE, DESIGNATION

FD 3: PNO PNAI\/IE

92
9I
Is the above relation in 2NF?
No, The above relation EMPLOYEE PROJECT is not in 2hlF.
Justification:
The Primary key of the relation is ENO, PNO, DATE. For
a relation to be in 2NF each nonprime attribute must be fully
functionally dependent on the key of the relation.
It’s clear from the above functional dependencies there
are non prime attributes that are partially functionally dependent
on the key of the relation.
• The non prime attributes ENAME and DESIGNATION
are dependent on part of the key ENO.

e The non prime attribute PNAI\/IE is dependent on part of


the key PNO.
How to Normalize?
Decomposition!
Create new relations
0 For each partial key with it’s dependent attribute(s).
0 Create a relation with the original key and any attributes
that are fully functionally dependent on it.
EMPLOYEE

ENO ENAI\/IE DESIGNA1"ION

PROJECT

PNO PNAME

WORKS

ENO PNO DATE INTIME OUTTIME HOURS

93
The following Integrity Constraints hold on the above two
relations:

• In the EMPLOYEE relation the attribute ENO is the


Primary Key

• In the PROJECT relation the attribute PNO is the Primary


Key

• In the WORKS relation combination of (ENO, PNO,


DATE) is the Primary Key.

• ENO attribute in WORKS is a Foreign Key referencing


ENO of EMPLOYEE.

• PNO attribute in WORKS is a Foreign Key referencing


PNO of PROJECT.
Third Normal Form:
Third Normal Form is based on the concept of Transitive
Dependency. A functional dependency X Y in a relation schema
R is a transitive dependency if there is a set of attributes Z that
is neither a candidate key nor a subset of any key of R, and both
X wz and Z mY holds.

EXAMPLE:
A company is organized in to departments. An Employee work’s
in one department. Consider the following relation
EMPLOYEE_DEPARTMENT:

ENO NAME DESIGNATION DNO DNAME

The Primary Key of the relation is ENo. -rhe following


Functional Dependencies hold:

ENO DNO

DNO DNAME

The dependency ENO -+ DNAME is transitive through DNO.


DNO is neither a key itself nor a subset of the key.

94
“rhird Normal form states that no non-prime attribute must
be transitively determined by the Candidate Key / Primary Key
of a re.lation through a non prime attribute.
Is the above relation in 3NF?
The above relation is not in "rhird normal form.
How to Normalize†
Decomposition!
Create a relation with the original key and retain the attributes
that are not functionally determined by other non key
attribute(s).
Create a relation that includes the non key attribute(s) that
functionally determines other non key attribute(s) and the non
key attribute(s) it determines.
EMPLOYEE

ENO ENAME DESIGNATION DNO

DEPARTMENT

DNO DNAME

The following Integrity Constraints hold on the above two


relations:
• In the EMPLOYEE relation the attribute ENO is the
Primary Key
• In the DEPARTMENT relation the attribute DNO is the
Primary Key
• DNO attribute in EMPLOYEE relation is a Foreign Key
referencing DNO attribute of DEPARTMENT relation.
Note: For a relation to be in Third Normal Form it must be in
Second Normal Form.

95
Boyce-Codd Normal Form (BCNF):
A relation R is said to be in BClflF if every determinant
is a Candidate Key. Consider the following relation.
Candidate_lnterview:

CID INTID DATE TIME ROOMNO

Business Rules Governing The Relation


Candidate_lnterview:
A Candidate is interviewed only once on a particular day.
However he/she can appear for interview on a different date. An
interviewer is assigned a room on a particular day. He may be
assigned a different room on a different date. The same room
may be utilized by more than one interviewer but at different
times.
Determinant’s in the Relation Candidate_lnterview:

CID, DATE —› INTID, TIME, ROOMIJO

INTID, DATE, TIME CID, ROOMIJO

DATE, TIME, ROOMNO -> INTID, CID

INI“ID, DATE ROOMNO


Arc all determinant’s candidate keys?

The determinant IN“riD, DATE —+ ROOMNO is not a Candidate


Key.
Is the
Is the above relation in BCNF?
The above relation is not in BCNF.
How to Normalize?
Decomposition!

• Create a relation with the determinants and all attributes


except the attribute(s) that are determined by a
determinant that is not a candidate key.

96
• Create another relation smith the determinant that is not a
candidate key ancl the attribute(s) it determines.
R1

CID INTID DATE

You can infer that the relation R1 has the following


determinants:

CID, DATE —+ IN-FID, “riuE, ROOMNO


IN"IID, DA”FE, “I IME CID, ROOMNO

Both the determinants are Candidate Keys.


R2

INTID DATE ROOMNO

You can infer that the relation R2 has the following determinant:

INI"ID, DATE ROOMNO


The above deterrriinant is a Candidate Key.
Note: For a relation to be in BCNF it must be in Third Normal
Form.
Fourth Normal Form:
Multi-valued Oepenrfence/
Let R be a relation, and let A, B and C are attributes of R.
Then u‹e state that B is multi- dependent on A if for a value of A
there are multiple values of B and C is multi-dependent on A if
for a value of A there are multiple values of C. It can be illustrated
symbolically as follows:
A B
A C
Fourth Normal Form states that no relation must contain two or
more one to many or many to many relationships that are not

97
directly related to the key. These kinds of relationships are
called Multi-valued Dependencies.
Consider the following Relation EMP_PROJ_HOBBY:

ENO PNO HOBBY

An Employee may be involved in two or more projects.


An Employee may have any number of hobbies. The following
multi-va!ued dependencies hold:
ENO PNO
ENO HOBBY
The above relation is nut in Fourth Normal Form.
How to Normalize†
Decomposition!
The relation EMP_PROJ_HOBBY must be decomposed in to:
R1

ENO PNO

R2

ENO HOBBY

Fihh Normal Form:


Project-Join Normal Form (PJNF):
A relation is said to be in Fifth Normal Form if the relation
has no Join Dependencies. In a relation R with subsets of the
attributes of R denoted as A, B, .. .., Z, a relation R satisfies a
‘join dependency if and only if, every legal value of R is equal to
the join of its projections on A, B, Z.
Fifth Normal Form is not a practical normal form.

98
Lossless-Join Dependency:
Lossless-Join Dependency is a property of
decomposition, which ensures that no spurious tuples (additional
tuples) are generated and when relations are reunited through a
natural join operation.
Bomain-Key Normal Form
This level of normalization is simply a model taken to the
point where there are no opportunities for modification
anomalies.

• "If every constraint on the relation is a logical


consequence of the definition of keys and domains"

• Constraint "a rule governing static values of attributes"

• Key "unique identifier of a tuple"

• Domain "description of an attribute’s allowed values"


A relation in DK/NF has no modification anomalies, and
conversely. DK/NF is the ultimate normal form; there is no higher
normal form related to modification anomalies
A relation is in DK/NF if every constraint on the relation is
a logical consequence of the definition of keys and domains.

• Constraint is any rule governing static values of attributes


that is precise enough to be ascertained whether or not it
is true. As an example edit rules, intra-relation and inter-
relation constraints, functional and multi-valued
dependencies.

• Not including constraints on changes in data values or


time-dependent constraints.

• Key - the unique identifier of a tuple.

• Domain: physical and a logical description of an attributes


allowed values.

• Physical description is the format of an attribute.

99
• Logical description is a further restriction of the values the
domain is allowed
• Logical consequence: find a constraint on keys and/or
domains which, if it is enforced, means that thë desired
constraint is also enforced

• Bottom line on DK/NF: If every table has a single theme,


then all functional dependencies will be logical
consequences° of keys. All data value constraints can
them be expressed as domain constraints.

• Practical consequence: Since keys are enforced by the


DBMS and domains are enforced by edit checks on data
input, all modification anomalies can be avoided by just
these two simple measures.

5.0 SQL commands


The Structured Query Language (SQL) is a language
originally known as SEQUEL (Structured English QUEry
Language) that was developed by Dr. E.F. Codd at the IBM
research center in 1974. Later shortened to SQL but still
pronounced as sequel, SQL has become the de facto standard
aatabase language. The first commercial version of SQL was

100
introduced in 1979 by Oracle. Today, there are three standardsof
SQL, SQL80 (SQU), SQL92 (SQL2), and SQL99 (SQL3),
and numeroL S flavors of SQL available. SQL is used in
manipulating data stored in Relational Database Management
Systems (RDBMS). SQL provides commands through which data
can be extracted, sorted, updated, deleted and inserted. SQL is
an ANSI (American National Standards Institute) standard
computer language for accessing and manipulating database
systems. SQL can be used with any RDBMS such as MySQL,
PostgresSQL, Oracle, Microsoft SQL Server, Sybase, Ingres etc.
All the important and common SQL statements are supported by
these RDBMS; however, each has its own set of proprietary
statements and extensions.
In a Nutt Shell
• SQL stands for Structured Query Language
• SQL is an ANSI standard computer language
• SQL allows you to access a database
• SQL allows you to execute queries against a database
• SQL allows you to retrieve data from a database
• SBL allows you to insert new records in a database
• SQL allows you to delete records from a database
• SQL allows you to update records in a database
SQL Language Elements
The SQL language is sub-divided into several language
elements, including:
Statements which may have a persistent effect on
schemas and data, or which may control transactions,
program flow, connections, sessions, or diagnostics.
Queries which retrieve data based on specific criteria.
Expressions which can produce either scalar values
or tables consisting of columns and rows of data.

101
» Predicates which specify conditions that can be
evaluated to SQL three-valued logic (3VL) Boolean
truth values and which are used to limit the effects of
statements and queries, or to change program flow.
‹ Clauses which are (in some cases optional) constituent
components of statements and queries
Whitespace is generally ignored in SQL statements
and queries, nlaking it easier to format SQL code for
readability.
SQL statements also incluóe the semicolon (";")
statement terminator. Though not required on every
platform, it is defined as a standard part of the SQL
grammar.

5.1 Data De/inition


SBL Data Definition Language (DDL)
The Data Definition Language (DDL) part of SQL permits
database tables to be created or deleted. We can also define
indexes (keys), specify links between tables, and impose
constraints between database tables.
1“he most important DDL statements in SBL are:
• CREATE TABLE - Creates a new database table
• ALTER TABLE - Alters (changes) a database table
• BROP TABLE - Deletes a database table
• CREATE ÍNDEX - Creates an índex (seaich key)
• DROP ÍNDEX - Deletes an índex

5.2 "Data Manipulation Statements


SQL Data Manipulation Language (BML)
SQL (Structured Query Language) provides syntax for
executing queries. The SQL language also includes syntax to
update, insert, and delete records. These query and update
commands together form the Data Manipulation Language
(DML) part of SQL:

102
• SELECT - Extracts data from a database table
• LIPDATE - Updates data in a database table
• DELETE - Deletes data from a database table
• INSERT INTO - Inserts new data into a database table
SQL Data Control Language (DCL).
DCL handles the authorization aspects of data and
permits the user to control who has access to see or manipulate
data within the database. Its two rriain keywords are:
GRANT authorizes one or more users to perform an
operation cr a set of operations on an object.
, REVOKE removes or restricts the capability of a user
to perform an operation or a set of operations.
Transaction Controls:
Transactions, if available, can be used to wrap around the
DML operations:
BEGIN WORK (or (SQL)|STARJ TRANSACTION]],
depending on SQL dialect) can be used to mark the start
of a database transaction, which either completes
completely or not at all.
. COMMIT causes all data changes in a transaction to be
made permanent.
, ROLLBACK causes all data changes since the last COMMIT

or ROLLBACK to be discarded, so that the state of the data


is "rolled back" to the way it was prior to those changes
being requested.
COMMIT and ROLLBACK interact with areas such as transaction
control and locking. Strictly, both terminate any open transaction
and release any I.ocks held on data. In the absence of a BEGIN
WORK or similar statement, the semantics of SQL are
implementation-dependent.
Queries
The most common operation in SQL databases is the
query, which is performed with the declarative SELECT keyword.
SELECT retrieves data from a specified table, or multiple related
tables, in a database. While often grouped with Data
Manipuiation Language (DfdL) statements, the standard SELECT
query is considered separate from SQL DML, as it has no
persistent effects on the data stored in a database. Note that
there are some platform-specific variations of SELECT that can
persist their effects in a database, such as Microsoft SQL
Server’s proprietary SELECT INTO syntax.

SQL queries allow the user to specify a description of the


desired result set, but it is ieft to the devices of ihe Database
Management System (DBMS) to plan, optimize, and perform the
physical operations necessary to produce that result set in as
efficient a manner as possible. An SQL query includes a list of
columns to be included in the final result immediately following
the SELECT keyword. An asterisk ( ") can also be used as a
"wildcard" indicator to specify that all available columns of a
table (or multiple tables) are to be returned. SELECT is the most
complex statement in SQL, with several optional keywords and
clauses, including:
The FROM clause which indicates the source table or
tables from which the data is to be retrieved. The FROM
clause can include optional JOIN clauses to join related
tables to one another based on user-specified criteria.
The WHERE clause includes a comparison predicate,
which is used to restrict the number of rows returned by
the query. The WHERE clause is applied before the
GROUP BY clause. The WHERE clause eliminates all
rows from the result set where the comparison predicate
does not evaluate to True.
The GROUP BY clause is used to combine, or group,
rows with related values into elements of a smaller set of
rows. GROUP BY is often used in conjunction with SQL
aggregate functions or to eliminate duplicate rows from a
result set.
. "rhe HAVING clause includes a comparison predicate
used to eliminate rows after the GROUP BY clause is
applied to the result set. Because it acts on the results of
the GROUP BY clause, aggregate functions can be used in
the HAVING clause predicate.
. -rhe ORDER BY clause is used to identify which columns
are used to sort the resulting data, and in which order
they should be sorted (options are ascending or
descending). The order of rows returned by an SQL
query is never guaranteed unless an ORDER BY clause
is specified.
5.3 Distributed Database Architecture
Introduction:
The widespread use of computers for data processing in
large distributed organizations means that such organizations
often store their data at different sites of a computer network,
possibly in a variety of forms, ranging from flat files, to
hierarchical or relational databases, or object-relational
databases or Object Oriented databases The rapid growth of the
Internet is causing an even greater explosion in theavailability
of distributed information sources. Distributed Database
Technology aims to provide uniform access to physically
distributed but logically related information sources.
Centralized vs Distributed Databases:

A centralized database management system consists a


database, which is held on disc. Users access the database by
submitting queries or transactions to the DBMS. Two major

107
components of any DBMS are the query processor and the
transaction manager.
The query processor translates queries into a sequence
of retrieval requests on the stored data. There may be many
alternative translations for a given query, which are known as
query plans. The task of selecting a good query plan is known
as query optimization. A .good. query plan is one that has a
relatively low cost of execution compared with the alternative
query plans. A transaction is a sequence of queries and/or
updates. The transaction manager coordinates concurrently
executing transactions so as to guarantee the so-ealled ACID
properties:

• Atomicity. Either all or none of a transaction is executed.

• Consistency. Transactions must leave the data in a


consistent state, that is, satisfying all the stated integrity
constraints.

• Isolation. It must appear to users that transactions are


being executed one after the other, even though they may
be interleaved.

• Durability. If a transaction has committed, its effects must


not be lost.
The two mechanisms by which a tiansaction manager
guarantees the ACID properiies arc concurrency control, which
ensures consistency and isolation, and recovery, which ensures
atomicity and durability.
A Distributed Database system consists of several
databases stored at different sites of a computer network. The
data at each site are managed by a database server running
some DBMS software. ”rhe servers can cooperate in executing
global queries and global transactions, that is, queries and
transactions whose processing may require access to
databases stored at differ‘ent sites. There are a number of

108
alternative architectures for Distributed Database systems. To
improve the performance of global queries in distributed
databases, data items can be split into fragments that can be
stored at sites requiring frequent access to them. Data items or
fragments of data items can also be replicated across more than
one site. Techniques are therefore needed for deciding the best
way to fragment and replicate the data to optimize the
performance of applications.
A key difference between processing global queries in a
Distributed Database system and processing queries in a
centralized database system is that distributed database queries
may require data to be transmitted over the network. Thus, new
query-processing algorithms are needed that include data
transmission as an explicit part of their processing. Also, the
global query optimizer needs to take data transmission costs into
account when generating and evaluating alternative query plans.
A key difference between global transactions in a
Distributed Database system and transactions in a centralized
database system is that global transactions are divided into a
number of sub-transactions. Each sub-transaction is executed
by a single DATABASE server, which guarantees its ACID
properties. However, an extra level of coordination of the sub-
transactions is needed to guarantee that the overall global
transactions also exhibit the ACID properties.

5.4 Distributed Database Architecture


There are a number of alternative architectures for
Distributed Database systems.
A Distributed Database system consists of several
databases distributed over several sites. Access to the
databases can be controlled by a single DBMS process.
Alternatively, there may be several independent DBMS
processes, each controlling access to its own local database.

109
e«erated
II‹eISi oq eI‘.e o uS'i

Sincle Mulâple
federated lG8er tet
sc l›ei»a schema

There are two variants of the multi-DBMS architecture,


depending on the amount of autonomy of each of the
participating DBMS processes. In an unfederated multi-DBMS,
a single database administration authority decides what
information is stored in each database, how the information is
stored and accessed, and who is able to access it. In contrast, a
federated multi-DBMS separates the DATABASE administration
authority between the DATABASE administrators (DBAs) of
each heel DATABASE (the local DBAs) and the DBAs for the
overall federation (the global DBAs). The local DBAs have
complete authority over the information in their DATABASEs,
and what part of that information is made available to the
federation, that is, to global queries and transactions. This
information is represented in the form of one or more export
schemas for each local database. The global DBAs control
global access to the system, but must accept the access
restrictions irriposed by the local DBAs.
A federated DDB is said to be tightly coupled if the global
DBAs maintain one or more global schemas that provide an

110
integrated view through which global queries and transactions
can access the information stored in the local databases. A
federated DDB is loosely coupled if there is no global schema
provided by a global DBA, and it is the users. responsibility to
define the global schemas they require to support their
applications. “Fhis chapter concentrates on tightly coupled
GDBs, which present the extra difficulty of having to provide an
integrated view of the information stored in the local databases.
The presence of a single database administration authority in an
unfederated multi-DBMS makes it likely that the multi-DBMS will
be a homogeneous one, both physically and semantically.
Physical homogeneity means that the local databases are all
managed by the same type of DBMS, supporting the same data
model, Data Definition Language (DDL)/ Data Manipulation
Language (DML), query processing, transaction management,
and so forth. Semantic homogeneity means that different local
databases store any information they have in common in a
consistent manner, so that integration of the information does not
require it to be transformed in any way. In contrast, the presence
of multiple database administration authorities in a federated
multi-DBMS makes it likely that it will be heterogeneous. The
heterogeneity may be physical, semantic, or both. Physical
heterogeneity means that different local DBs may be managed
by different types of DBMSs for example, different products or
different versions of one product. Thus, the local conceptual
schemas may be defined in different data models (e.g., network,
hierarchical, relational, object-oriented), the DDL/DML supported
by local DATABASEs may be different (e.g., network or
hierarchical, different versions of Structured Query Language
(SQL), Object Query Language (OQL), the query processors
may use different algorithms and cost models, the transaction
managers may support different concurrency control and
recovery mechanisms, and so forth. Semantic heterogeneity
means that different local DATABASEs may

111
moclel the same information using different schema constructs
or may use the same schema construct to model different
information. For example, peoples names may be stored using
different string lengths, or a relation named student in one
DATABASE may contain only undergraduate students while a
relation named student in another DATABASE contains both
undergraduate and postgraduate students. If there is semantic
heterogeneity in a multi-DBMS, it is necessary to perform
semantic integration of the export schemas. That requires thé
export schemas to be transformed so as to eliminate any
inconsistencies between them.
A heterogeneous multi-DBMS must integrate the export
schemas of the local DATABASEs into one or more global
schemas, which provide an integrated view through which
global queries and transactions can access the federation. This
view must be constructed while preserving the autonomy of the
local DATABASEs, that is, leaving control of them in the hands
of the local DBAs. The following types of schemas are
addressed in a heterogeneous DDB system.
• A local schema for each local DATABASE. ”rhe local
schema is the conceptual schema of the local
DATABASE. Each local DATABASE continues to opeiate as
an autonomous entity, and the content of its local
schema is under the control of its local DBAs. Each local
DATABASE will also have a physical schema and
possibly a number of external schemas that are views of
its local schema. However, those schemas are not
considered to be part of the heterogeneous multi-DBMS
architecture.
• A component schema corresponding to each local
schema. The local DATABASEs may support different
data models and different DDL/ DMLs. Thus, the local

111
schemas have to be translated into some ccmmon data
model (CDM) before they can be integrated.
• One or more export schemas corresponding to each
component scherria. Each export scherria is a view over
the component schema that the local DBAs want to make
available to the federation. The export schemas define
what part of the locally held information can be accessed
by global queries and transactions.

e One or more global schemas. Each global schema is


obtained by integrating one or more of the export
schemas into a single schema. A global schema can be
regarded as a conceptual schema for the heterogeneous
DDB. However, in contrast to a centralized DATABASE,
it may net be possible or desirable to create a single global
schema that encompasses all the export schemas. For
example, it may not be possible to resolve some of the
semantic heterogeneities between some of the export
schemas, or different application domains may require
access to different parts of the federation.

• A number of external schemas. Each external schema is


a view over one global schema and contains information
that a user needs for a specific application.
Distributed Data Independence:
Distributed data independence means that changing the
physical location of data items in a DDB should not require
application code to be altered. One way of achieving distributed
data independence is to maintain a global catalog that describes
all the data items stored at every site, associating both a logical
name and a physical name with each one. The disadvantage of
this approach is that the site where the catalog resides becomes
a bottleneck for network traffic as well as a single point of failure
for the DDB. -rhe problem can be overcome by replicating the
global catalog at multiple sites. But that makes updating the
catalog complex because all its distributed replicas have to ”be
updated to reflect the change before any of them can be used
again. The commonly adopted solution to those problems is for
the distributed DBMS to maintain a distributed catalog. With that
approach, each site maintains its own local catalog of all the
data items stored at that site. Each data item recorded in the
local catalog has both a local name and a global name. The
global name identifies the site or sites whose catalogs contain
full information regarding that data item, for example, its
definition, its physical location(s), the integrity constraints it must
satisfy, and user access rights to it. Applications running at each
site use their own local names for data items. The DBMS
handles the ti-anslation uf the lcc“al r angus te glubal names. A
change In the physical location ot some data item does not
require applications to be altered, and they can continue to use
just their local name for the data item.

5.5 Distributed Database Design


The two main approaches to designing DDBs are bottom
up and top down. With bottom-up DDB design, the local DBs
already exist and their export schemas need to be integrated
into one or more global schemas. With top-down design, there
are no preexisting local DBs. The glohal schema is first
designed, taking as input the requirements that all potential
users will have of the new DDB system. The design of the local
DBs then follows. The main challenge with top-down design is
how to derive the local conceptual schemas from the global
schema, that is, how to allocate the information in the global
schema across the local DBs. Top-down design is most likely to
occur in homogeneous multi-DBMS or single-DBMS
architectures. The top-down design of a relational DDB is
concerned with how relations should be fragmented and
replicated across the local DBs.
Bottom-Up Design of Heterogeneous DDBs:
With bottom-up design of heterogeneous DDBs, the local
DBs already exist and the main challenge is one of schema
integration, that is, how to integrate a set of export schemas into a
single global schema. Because the local DBs typically will have been
designed by different people at different times, conflicts are likely to
exist between the export schemas. For example, different export
schemas may model the same real-world concept using different
constructs, or they may model different real-world concepts
using the same construct. Such conflicts must be removed by
transforming the export schemas to produce equivalent schemas,
which can then be integrated. The schema integration process thus
consists of three main tasks:

• Schema conformance, during which conllict detectionand


conflict resolution are performed, resulting in new versions
of the export schemas that represent the same concepts
in the same manner.
• Schema merging, during which related concepts in the
export schemas are identified and a single global schema
is produce.
• Schema improvement, during which the quality of the
global schema is improved, in particular, removal of
redundant information so that the resulting schema is the
minimum union of the export schemas.
Top-down design is a suitable approach when a database
system is being designed from scratch. If a number of databases
already exist, and the design task involves integrating them into
one database - the bottom-up approach is suitable for this type
of environment. The starting point of bottom-up design is the
individual local conceptual schemas. The process consists of
integrating local schemas into the global conceptual schema.

115
Data Distribution:
Unit of distribution can be entire table(s) or subset of records.
Fragmentation and replication are two techniques through which
data is stored in a distributed environment. Parallel execution of a
single query by dividing it into a set of sub queries that operate on
fragments. Fragmentation typically increases the level of
concurrency and therefore the system throughput. There may be a
performance degradation (integration of several fragments: joins,
unions)
Four alternatives for Fragmentation:
• Horizontal fragmentation
• Vertical fragmentation
• Hybrid (eg. vertical-horizontal) fragmentation
• Derived horizontal fragmentation

6.0 OBJECT ORIENTED SYSTEM


Introduction: “
An object database management system (ODBMS, also
referred to as object-oriented database management system or
OODBMS), is a database management system (DBMSj that
suDports the modeling and creation of data as objects. This
includes some kind of support for classes of objects and the
inheritance of class properties and methods by subclasses and
their objects.
Object Oriented databases may be characterized quite
simply; they typically represent database systems that integrate
tigh!Iy •/it!^. a !anguage with object-oriented features such as
C++, Java or Smalltalk, that allow programs to link data
structures to databases in such a way that the data structures
trivially become "persistent".
Thus, once you tie a data structure to the database, you
no longer need to be concerned about whether a piece of data
is in memory or not; as soon as your program refers to it, the
data wili be pulled into memory if it is nnt already there. As soon
as your piugram updates the data, it will be updated intu the
database.
An object-oriented database system must satisfy two
criteria: it should be a DBMS, and it should be an object- oriented
system, i.e., to the extent possible, it should be consistent with
the current crop of object-oriented programminglanguages. The
first criterion translates into five features: persistence,
secondary storage management, concurrency, recovery and an
adhoc query facility. The second one translates into eight
features: complex objects, object identity, encapsulation, types
or classes, inheritance, overriding combined with late binding,
extensibility and computational completeness.
There is currently no widely agreed-upon standard for
what constitutes an ODBMS and no standard query language to
ODBMS equivalent to what SQL is to RDBMS (relational
DBMS.) Initiatives by an industry group, the Object Datai
anagement Gro!gp (ODMG), to create a standardized Object Q‘sery
I.angua /‹J4L) have been abandoned in 2001.
ODBMS were originally thought of to replace RDBMS
because of their better fit with object-uri=nted programming
languages. However, high switching cost, the inclusion of

132
object-oriented features in RDBMS to make them ORDBMS,
and the emergence of object-relational mappers (ÓRMs) have
made RDBMS successfully defend their dominance in the data
center for server-side persistence.
Object databases are now established as a complement,
not a replacement for relationai databases. They found their
place as embeddable persistence solutions in devices, on
clients, in packaged software, in real-time control systems, and
to power websites. The open source community has created a
new wave of enthusiasm that's now fueling the rapid growth of
ODBMS installations.
6.1 Object-Oriented DBMS
Object-Oriented Concepts: ““
Object:
An object is an abstract representation of a real-world er t”ty that
has a unique identity, embedded properties, and the ability to
interact with other objects and itself. It can also be defined as am
entity that has a well-defined role in the application domain, as
well as state, behavior, and identity. Objects exhibit behavior.
Behavior represents how an object acts and reacts. Behavior is
expressed through operations that can be performed on it.
Object ldentifier:
• An object lD (OID) represents the cbject’s identity, which
is unique to that object.
• The OID is assigned by the System at the moment of the
object’s creation and ca› nom be changed under any
circumstance.
• The OID can be dele ‹.•^. ’•'°. ." the object is deleted, and
that OID can ac•/ i e eu•›•ú
Attributes:
• Objects are described by their attributes, known as
instance variables.

133
e Attributes have a domain. The domain logically groups and
describes the set of all possible values that an attribute
can ha‘ e.
• An attribute can se single valued or multivalued.
• Attributes may reference one or more other objects.

Object State:

• "rhe object state is the set of values that the object’s


attributes have at a gi«en time.
• If we change the obiect’s state, we must change the
values of the object attributes.
• to change the object’s attribute values, we must send a
message io thr. nhjeet This message invokes a methoci.
Messages and Methods:
• Every operation performed on an cbjsct must be
implemented by a method.
• Methods represent real-world actions and are equivalent
to procedures in traditional programming languages.
• Every method is identified by a name and has a body.
• The body is composed of computer instructions written in
some programming language to represent a real-world
action.
• To invoke a method you send a message to the object.
• A message is sent by specifying a receiver object, the
name of the method, and any required parameters.
• The internal structure of the object cannot be accessed
directly by the message sender. -Fhe ability to hide the
object’s internal details (attributes and methods) is known as
encapsulation.
• An object may send messages to change or interrogate
another object's state.

134
Classes:
• Objects that share common characteristics are grouped
into classes. A class is a collection of similar objects with
shared structure (attributes) and behavior (methods).
• Each object in a class is known as a class instance or
object instance.
Protocol:
• The class’s collection of messages, each identified by a
message name, constitutes the object or class protocol.
• The protocol represents an object’s public aspect; i.e., it
is known by other objects as well as end users.
• The implementation of the object’s structure and methods
constitutes the object’s private aspect.
• A message can be sent to an object instance or the
class. When the receiver object is a class, the message
will invoke a class method.
Superclasses, Subclasses, and Inheritance:
Classes are organized into a class hierarchy.
Example: Musical instrument class hierarchy
Piano, Violin, and Guitar are a subclass of Stringed instruments,
which is, in turn, a subclass of Musical instruments. Musical
instruments defines the superclass of Stringed instruments,
which is, in turn, the superclass of the Piano, Violin, and Guitar
classes. Inheritance is the ability of an object within the
hierarchy to inherit the data structure and behavior (methods) of
the classes above it.
Characteristics of an OO Data Model:
• Support the representation of complex objects.
• Be extensible; i.e., it must be capable of defining new
data types as well as the operations to be performed on
them.

135
• Support encapsulation; i.e., the data representation and
the method’s imt:!cmeri!atiun must be hidden from
x.ernal entities.
• Exhibit inheritance; an object must be able to inhei it the
properties (data and methods) of other objects.
• Support the notion of object id°•t/ty (OID).
• The OODM models real-worid entities as objects.
• Each object is composed of attributes and a set of
methods.
• Each attribute can reference another object or a set of
objects.
• The attributes and the methods implementation are
hidden, or encapsulated, from other objects.
• each object is identified by a unique object ID (OU), which
is independent of the value of .› s attributes.
• Similar objects are described and grouped in a class that
contains the description of the data and the method’s
implementation.
• The class describes a type of object.
• Classes are organized in a class h:erarchy.
• Each object of a class inner.ts a.! ‹sioperties of its
superc!asses in the class hierarcn,’
Object 6rient.r•.d Data Modeling:
• Cente: o around objects and classes
• invcives inheritance
• Encapsulates both data and behavior
Benefits of Object-Oriented Modeling:
• Ability to tackle challenging problems
• Improved communication between users, analysts,
designer, and programmers
• Increased consistency in analysis and design
• Expli it representation of commonality among system
components

136
• System robustness
• Reusability of analysis, design, and programming results
• Object-oriented modeling is frequently accomplished
using the Unified Modeling Language (UML)

6.2 Comparison of RDBMS and OODBMS


OOP Features:”
• Complex Objects
• Object Identity
• Methods & Messages
• Inheritance
• Polymorphism
• Extensibility
• Computational Completeness
DBMS Features:
• Persistence
• Disc IVlanagement
• Data Sharing
• Concurrency
• Reliability
• Security
• AdHcc Querying

137
WHAT AN OODBMS SHOULD SUPPORT†
• Atomic and Complex Objects
• Methods and Messages
• Object Identity
• Single Inheritance
• Polymorphism - Overloading and Late-binding
• Persistence
• Shared Objects
In addition an OODBMS can optionally support the following:
• Multiple Inheritance
• Exception Messages
• Distribution
• Long Transactions
• Versions
Characteristics That ‘Must Be’ Supported by an OODBMS
As Specified By The OO Database Manifesto:
• Complex Objects
• Object Identity
• Encapsulation
• Classes
• Inheritance
• Overriding and Late-binding
• Extensibility
• Computational Completeness
• Persistence
• Concurrency
• Recovery
• Ad-hoc querying
Advantages of OODBMS:
• Enriched modelling capabilities
• Extensibility
• Removal of Impedance Mismatch
• Support for schema evolution.

138
• Support for long duration transactions.
• Applicable for advanced database applications
• Improved performance.
Applications of OODBMS:
• Computer-Aided Design (CAD).
• Computer-Aided Manufacturing (CAM).
• Computer-Aided Software Engineering (CASE).
• Office Information Systems (OIS).
• Multimedia Systems.
• Digital Publishing.
• Geographic Information Systems (GIS).
• Scientific and Medical Systems.
Disadvantages of OODBMS:
• Lack of a universal data model
• Lack of experience
• Lack of standards.
• Ad-hoc querying compromises encapsulation.
• Locking at object-level impacts performance
• Complexity
• Lack of support for views
• Lack of support for security

Object RelationalDBMS:
Object-Relational databases extend the Relational Data
Model to address those weaknesses identified previously. An
Object-Relational database adds features associated with
object-oriented systems to the Relational Data Model. In
essence ORDBMSs are an attempt to add OO to Tables.
Major Difference Between an OODBMS and an ORDBMS:
OODBMSs try to add DBMS functionality to one or more
OO programming languages. [Revolutionary in that they
abandon SQL]

147
ORDBMSs try to add richer data types and OO features to a relational DBMS

7.0 Client Server Computing


-rhe introduction of large nurribers of PCs and networks
into the workplace led to the development and widespread
adoption of client-server systems in the 1990s. This
development has been significant to the growth of distributed
databases. Distributed databases require processing power at
each site where data is physically located. Processing power is
also usually required at each individual workstation to resolve
the complex issues of where and how data should be stored and
retrieved in a distributed database environment. Client- server
architectures provide processing power at all locations. In

148
a traditional mainframe architecture, the combination of
processing power and dafa storage is located at only one site -
implementing a distributed database is not possible.
Although client-server systems are usually identified with
distributed data storage, there is no requirement for data
storage to be distributed in client-server environments - data
may be centralized on one mainframe, distributed widely
throughout the organization, or anything in between. Many
organizations have seen the ability to move to client-server as
an opportunity to replace their expensive mainframe data
centers with less expensive minicomputers and
microcomputers. Such a strategy has come to be known as
downsizing or rightsizing.
“Client/server systems operate in a networked
environment, splitting the processing of an application between
a front-end client and a back-end processor.“
• Client and server may reside on same computer
• Both are intelligent and programmable
Application Logic Components:
• Presentation logic
Input
Output
• Processing logic
I/O processing
Business rules
Data management
• Storage logic
Data storage and retrieval
DBMS functions
File Server Architecture:
• “A file server is a device that manages file operations and
is shared by each of the client PCs."
e Fat client: does most processing

149
Limitations:
• Whole file or table transferred to client
• Client must have full version of DBMS
• Each client DBMS must manage database integi ity
Database Server Architecture:
• Client workstation:
user interface, presentation logic, data processing
logic, business rules logic
• Database server:
database storage, access, and processing
Advantages: tess traffic, more ‹:ontroI over data
• Stored procedures: first use of business logic at database
server
Three-Tier Architectures:
• Application server in addition to ciient and database
server
• “rhin clients: do less processing
• Application server contains “standard” programs
• Benefits:
scalability
technological flexibility
lower long-term costs
better match business needs
improved customer service
competitive advantage
reduced risk
Characteristics of a Client

• Request sender is known as client


• Initiates requests
• Waits for and receives replies.
• Usually connects to a small number of servers at one
time

151
• Typically interacts directly with end-users using a
graphical user interface
Characteristics of a Server
• Receiver of request which is sent by client is known as
server
• Passive (slave)
• Waits for requests from clients
• Lipon receipt of requests, processes them and then
serves replies
• Usually accepts connections from a large number of
clients
• Typically does not interact directly with end-users
Examples of Client Server Databases : Oracle, Sybase, SGL
Server, Informix, etc.
7.1 Knowledge based Management Systems
Commercial relational DBMSs are tailored to efficiently
support fixed format data models in what is known as data
management. Nevertheless the upcoming demands in data
analysis are pushing the technological frontiers to allow that two
new other dimensions be supported by such systems: the object
management and the knowledge management.
7.2 Definition and importance of Knowledge
Knowledge:
Knowledge is defined variously as:
i. Expertise, and skills acquired by a person through
experience or education; ’the theoretical or practical
understanding of a subject
ii. What is known in a particular field or in total; facts and
information
iii. Awareness or familiarity gained by experience of a fact or
situation.
There is however no single agreed definition of
knowledge presently, nor any prospect of one, and there remain
numerous competing theories.

152
Knowledge acquisition involves complex cognitive
processes like perception, learning, communication, association
and reasoning. The term knowledge is also used to mean the
confident understanding pf a subject with the ability to.use it for
a specific purpose
7.3 Difference of KBMS and DBMS
Knowledge Based Systems:
Knowledge-based expert systems, or simply expert
systems, use human knowledge to solve problems that normally
would require human intelligence. These expert systems
represent the expertise knowledge as data or rules within the
computer. These rules and data can be called upon when
needed to solve problems. Books and manuals have a
tremendous amount of knowledge but a human has to read and
interpret the knowledge for it to be used. Conventional computer
programs perform tasks using conventional decision-making
logic containing little knowledge other than the basic algorithm for
solving that specific problem and the necessary boundary
conditions. This program knowledge is often embedded as part
of the programming code, so that as the knowledge changes, the
program has to be changed and then rebuilt. Knowledge- based
systems collect the small fragments of human know-how into a
knowledge-base which is used to reason through a problem,
using the knowledge that is appropriate. A different problem,
within the domain of the knowledge-base, can be solved using
the same program without reprogramming. The ability of these
systems to explain the reasoning process through back-traces
and to handle levels of confidence and uncertainty provides an
additional feature that conventional programming doesn’t
handle.
A knowledge base is a special kind of database for
knowledge management. It provides the means for the
computerized collection, organization, and retrieval of
knowledge. An active area of research in artificial intelligence is

153
knowledge representation. Early work in
Artificial Intelligence(AI) has focused on
techniques such as representation and
oblem-solving, scant attention was paid to
the issues to which database (DB)
research has focused (e.g., data sharing,
queryoptimization, transaction processing).
Knowledge base systems, also known
as expert systems, are a
facet of ArtificialIntelligence (AI). AI is
a sub-field of computer science that
focuses on the development of
intelligent software and
hardware systems that emulate human
reasoning techniquesand
capabilities. Knowledge base
systems emulate the
decision-making processes of humans
and are one of the most comn1ercially
successful AI technologies. These
systems areused in a variety of
applications for business, science and
engineering. Business applications
capture a company’s critical
business knowledge and utilize it for decision support.
Knowledge management entails the
ability to store “rules” (as defined in First
Order Logic) that are part of the semantic
of an application. These rules allow the
derivation of data that is not directly stored
in the database. A number of application
domains would benefit of the knowledge
management capabilities, and, therefore,
a simple, powerful, and efficient
mechanism to add the knQwledge
154
dimension to an off-the-shelf DBMS can
be rather useful.
Formally, a knowledge-base management
system (KBMS) is a system that:
• Provides support for efficient
access, transaction management,
and all other functionalities
associated to DBMSs.
• Provides a single, declarative
language to serve the roles played
by both the data manipulation
language and the host language in
a DBMS.

155

You might also like