0% found this document useful (0 votes)
11 views259 pages

DBMS SMK & KP

The document outlines the syllabus for a course on Database Management Systems, detailing five units covering relational databases, database design, transactions, implementation techniques, and advanced topics. It also introduces key concepts such as data, databases, and DBMS, highlighting their purposes and advantages over traditional file systems. Additionally, it discusses the architecture of database systems, including components like storage managers and query processors.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views259 pages

DBMS SMK & KP

The document outlines the syllabus for a course on Database Management Systems, detailing five units covering relational databases, database design, transactions, implementation techniques, and advanced topics. It also introduces key concepts such as data, databases, and DBMS, highlighting their purposes and advantages over traditional file systems. Additionally, it discusses the architecture of database systems, including components like storage managers and query processors.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 259

Database

Management Systems

Dr. S. Muthukumar, M.E., Ph.D.,


Associate Professor & Head
Department of Computer Science & Engineering
Sree Sowdambika College of Engineering
Aruppukottai, Virudhunagar - 626134.

P. Kavitha Pandian, M.E.,


Associate Professor
Department of Computer Science & Engineering
Sree Sowdambika College of Engineering
Aruppukottai, Virudhunagar - 626134.

For online purchase


www.charulathapublications.com
March 2024

© Charulatha Publications

Price : Rs.375/-

ISBN No. : 978-93-6260-206-0

CHARULATHA PUBLICATIONS
38/7, Rukmani Street,
West Mambalam, Chennai - 600 033.
Mobile : 98404 28577
Email : [email protected]
web : www.charulathapublication.com
CS3492 - DATABASE MANAGEMENT SYSTEMS
SYLLABUS

UNIT I RELATIONAL DATABASES 10

Purpose of Database System – Views of data – Data Models – Database System Architecture –
Introduction to relational databases – Relational Model – Keys – Relational Algebra – SQL
fundamentals – Advanced SQL features – Embedded SQL– Dynamic SQL

UNIT II DATABASE DESIGN 8

Entity-Relationship model – E-R Diagrams – Enhanced-ER Model – ER-to-Relational Mapping


– Functional Dependencies – Non-loss Decomposition – First, Second, Third Normal Forms,
Dependency Preservation – Boyce/Codd Normal Form – Multi-valued Dependencies and
Fourth Normal Form – Join Dependencies and Fifth Normal Form

UNIT III TRANSACTIONS 9

Transaction Concepts – ACID Properties – Schedules – Serializability – Transaction support in


SQL – Need for Concurrency – Concurrency control –Two Phase Locking- Timestamp –
Multiversion – Validation and Snapshot isolation– Multiple Granularity locking – Deadlock
Handling – Recovery Concepts – Recovery based on deferred and immediate update – Shadow
paging – ARIES Algorithm

UNIT IV IMPLEMENTATION TECHNIQUES 9

RAID – File Organization – Organization of Records in Files – Data dictionary Storage –


Column Oriented Storage– Indexing and Hashing –Ordered Indices – B+ tree Index Files – B
tree Index Files – Static Hashing – Dynamic Hashing – Query Processing Overview –
Algorithms for Selection, Sorting and join operations – Query optimization using
Heuristics - Cost Estimation.

UNIT V ADVANCED TOPICS 9

Distributed Databases: Architecture, Data Storage, Transaction Processing, Query processing


and optimization – NOSQL Databases: Introduction – CAP Theorem – Document Based
systems – Key value Stores – Column Based Systems – Graph Databases. Database Security:
Security issues – Access control based on privileges – Role Based access control – SQL
Injection – Statistical Database security – Flow control – Encryption and Public Key
infrastructures – Challenges.
CHAPTER – I
INTRODUCTION TO DBMS

1.1. INTRODUCTION

Data

These are simply raw facts which do not carry any specific meaning. When data are
processed, interpreted, organized, structured or presented in a proper way to make them
meaningful or useful, then they are called as information.

The number of visitors to a website by country is an example of data. Finding out that
traffic from the U.S. is increasing while that from Australia is decreasing is meaningful
information.

Database

It is logically coherent collection of data with some inherent meaning, representing some
aspect of real world. It is designed, built and populated with data for a specific purpose.

E.g.: register number, name, address, marks

Database management system (DBMS)

It is a collection of interrelated data and a set of programs to access those data. The goal
of DBMS is to provide an environment that is both convenient and efficient to use.

MySQL, SQL Server, MongoDB, Oracle Database, PostgreSQL, Informix, Sybase, etc.
are all examples of different databases. These modern databases are managed by DBMS.

1.2. PURPOSE OF DATABASE SYSTEMS

Previously, the applications were built directly on top of the file system. Keeping
organizational information in a file-processing system has a number of disadvantages as listed
below:

• Data redundancy and inconsistency

• Difficulty in accessing data

• Data isolation
1.2 Introduction to DBMS

• Integrity problems

• Atomicity problems

• Concurrent-access anomalies

• Security problems

i. Data redundancy and inconsistency

The same information may be duplicated in several places (files), which is called as
redundancy. This leads to higher storage and access cost. In addition, it may lead to data
inconsistency of data i.e., various copies of the same data may no longer agree.

Example: change of student address in one file may not get reflected in another file.

ii. Difficulty in accessing data

The conventional file-processing environments do not allow needed data (e.g., students
who have taken 60 credit hours) to be retrieved in a convenient and efficient manner.
More responsive data-retrieval systems are required for general use.

Example: One of the bank officers want to find out the names of all customers who live
within a particular postal-code area. If there is no application program for this means,
the officer has 2 alternatives: 1. Preparing the list manually from the list of all customers.
2. Ask system programmer to write the necessary application programs.

iii. Data isolation

As data are scattered in various files and files may be in different formats and these files
may be stored in different folders of different departments, writing new application
programs to retrieve the appropriate data is difficult.

iv. Integrity problems

The data values stored in the database must satisfy certain types of consistency
constraints. Developers enforce these constraints in the system by adding appropriate
code in various application programs. However, when new constraints are added, it is
difficult to change the programs to enforce them. The problem is compounded when
constraints involve several data items from different files. Example: The pass marks of
the student are 50.
Database Management Systems 1.3

v. Atomicity problems

It must happen in its entirety or not at all. It is difficult to ensure atomicity in a


conventional file-processing system. Example: Consider a program to transfer Rs.5000
from the account balance of department A to the account balance of department B. If a
system failure occurs when Rs.5000 was removed from the department A but was not
credited to the balance of department B, this will result in inconsistent state.

vi. Concurrent-access anomalies

For the sake of overall performance of the system and faster response, many systems
allow multiple users to update the data simultaneously. Example: Let Department A
has account balance of Rs.10,000. If two department clerks debit the amount of Rs.500
and Rs.100 from the account of department A at the same time, it will be written back
as Rs.9500 and Rs.9900 rather than the correct value of Rs.9400.

vii. Security problems

Not every user of the system should be able to access all the data. Enforcing security
constraints is difficult in file system. Example: University payroll personnel need to see
only the financial information but not the information about academic records.

The above-mentioned difficulties prompted the development of database systems.

Differences between DBMS and file systems are listed below:

Basis DBMS Approach File System Approach

Meaning DBMS is a collection of data. In The file system is a collection of


DBMS, the user is not required to write data. In this system, the user has to
the procedures. write the procedures for managing
the database.

Sharing of data Due to the centralized approach, data Data is distributed in many files,
sharing is easy. and it may be of different formats,
so it isn't easy to share data.
1.4 Introduction to DBMS

Data Abstraction DBMS gives an abstract view of data The file system provides the detail
that hides the details. of the data representation and
storage of data.

Security and DBMS provides a good protection It isn't easy to protect a file under
Protection mechanism. the file system.

Recovery DBMS provides a crash recovery The file system doesn't have a crash
Mechanism mechanism, i.e., DBMS protects the mechanism, i.e., if the system
user from system failure. crashes while entering some data,
then the content of the file will be
lost.

Manipulation DBMS contains a wide variety of The file system can't efficiently
Techniques sophisticated techniques to store and store and retrieve the data.
retrieve the data.

Concurrency DBMS takes care of Concurrent access In the File system, concurrent
Problems of data using some form of locking. access has many problems like
redirecting the file while deleting
some information or updating some
information.

Where to use Database approach used in large File system approach used in large
systems which interrelate many files. systems which interrelate many
files.

Cost The database system is expensive to The file system approach is cheaper
design. to design.

Data Redundancy Due to the centralization of the In this, the files and application
and Inconsistency database, the problems of data programs are created by different
redundancy and inconsistency are programmers so that there exists a
controlled.
Database Management Systems 1.5

lot of duplication of data which


may lead to inconsistency.

Structure The database structure is complex to The file system approach has a
design. simple structure.

Data In this system, Data Independence In the File system approach, there
Independence exists, and it can be of two types. exists no Data Independence.
Logical Data Independence
Physical Data Independence

Integrity Integrity Constraints are easy to apply. Integrity Constraints are difficult to
Constraints implement in file system.

Data Models In the database approach, 3 types of In the file system approach, there is
data models exist: no concept of data models exists.
Hierarchal data models
Network data models
Relational data models

Flexibility Changes are often a necessity to the The flexibility of the system is less
content of the data stored in any system, as compared to the DBMS
and these changes are more easily with approach.
a database approach.

Examples Oracle, SQL Server, Sybase etc. Cobol, C++ etc.

1.3. APPLICATIONS

Databases are useful in the following fields:

• Banking: For maintaining customer information, accounts, loans and banking


transactions.

• Universities: For maintaining student records, course registration and grades.


1.6 Introduction to DBMS

• Railway Reservation: For checking the availability of reservation in different trains,


tickets, etc.
• Airlines: For reservation and schedule information.
• Telecommunication: For keeping records of calls made, generating monthly bills etc.
• Finance: For storing information about holidays, sales and purchase of financial
instruments.
• Sales: For customer, product and purchase information.
1.4. VIEWS OF DATA
A database system is a collection of interrelated data and a set of programs that allow
users to access and modify these data. A major purpose of a database system is to provide users
with an abstract view of the data. That is, the system hides certain details of how the data are
stored and maintained.
1.4.1. Levels of Abstraction
Database systems are made-up of complex data structures. To ease the user interaction
with database, the developers hide internal irrelevant details from users. This process of hiding
irrelevant details from user is called data abstraction. There are three levels of abstraction as
shown in Fig.1.1.
o Physical level: describes how a record (e.g., customer) is stored.
o Logical level: describes data stored in database, and the relationships among the
data.
o View level: application programs hide details of data types. Views can also hide
information (e.g., salary) for security purposes.

Fig. 1.1 Three levels of data abstraction


Database Management Systems 1.7

1.4.2. Instances and Schema

Databases change over time as information is inserted and deleted. The collection of
information stored in the database at a particular moment is called an instance of the database.
The overall design of the database is called the database schema. Database systems have several
schemas, partitioned according to the levels of abstraction. The physical schema describes the
database design at the physical level, while the logical schema describes the database design at
the logical level. A database may also have several schemas at the view level, sometimes called
subschemas that describe different views of the database.

In general, the interfaces between the various levels and components should be well
defined so that changes in some parts do not seriously influence others.

1.4.3. Data Independence

The ability to change the schema at one level of a database system without having to
change the schema at the next higher level is called "Data Independence". There are two types
of data independence:

(1) Physical Data Independence: Physical data Independence is the ability to change the
internal schema without having to change the conceptual schema.

e.g., creating additional access structure to improve the performance of the retrieval or
update.

(2) Logical Data Independence: The logical Data Independence is the ability to change the
conceptual schema without having to change application programs (external schema).

e.g., We may change the conceptual schema to expand the database by adding a record
type or data items or reduce the database by removing data items.

1.5. DATABASE ARCHITECTURE

Fig. 1.2 depicts the architecture of the database. It shows how different types of users
interact with a database, and how the different components of a database engine are connected
to each other. A database system is partitioned into modules that deal with each of the
responsibilities of the overall system. The functional components of a database system can be
broadly divided into

• Storage manager

• Query processor

• Users
1.8 Introduction to DBMS

1.5.1. Storage Manager

It is important because database typically requires a large amount of storage space. It is


a program module that provides the interface between the low-level data stored in the database
and the application programs and queries submitted to the system. The storage manager
translates the various DML statements into low level file system commands. Thus, the storage
manager is responsible for the following tasks:

 Interaction with the OS file manager

 Efficient storing, retrieving and updating of data

The storage manager components include:

 Authorization and integrity manager – This tests for the satisfaction of integrity
constraints and checks the authority of users to access data

 Transaction manager – This ensures that the database remains in a consistent state when
the system failures and those concurrent transaction executions proceed without
conflicting. The transaction manager consists of the concurrency-control manager and
the recovery manager.

o Recovery manager detects system failures and restore the database to the state that
existed prior to the occurrence of the failure.

o Concurrency-control manager controls the interaction among the concurrent


transactions, to ensure the consistency of the database.

 File manager – This manages the allocation of storage space on disk and data structures
used to store that information.

 Buffer manager – This is responsible for fetching data from disk into main memory.

The storage manager implements several data structures as part of the physical system
implementation:

 Data files - store the database itself

 Data dictionary - stores metadata about the structure of the database, in particular the
schema of the database.

 Indices - can provide fast access to data items. A database index provides pointers to
those data items that hold a particular value.
Database Management Systems 1.9

1.5.2. Query Processor

The query processor components include:

 DDL interpreter - interprets Data Definition Language (DDL) statements and records
the definitions in the data dictionary.

 DML compiler - translates Data Manipulation Language (DML) statements in a query


language into an evaluation plan consisting of low-level instructions that the query
evaluation engine understands.

 The DML compiler performs query optimization; that is, it picks the lowest cost
evaluation plan from among the various alternatives.

 Query evaluation engine - executes low-level instructions generated by the DML


compiler.

Fig. 1.2 Database Architecture


1.10 Introduction to DBMS

1.5.3. Database Users

Users are differentiated by the way they are expected to interact with the system.

• Application programmers – interact with the system through DML calls

• Sophisticated users –interact with the system through query language

• Specialized users – write specialized database applications that do not fit into the
traditional data processing framework E.g: CAD system, knowledge based and expert
systems

• Naive users – Interact with the system by invoking one of the permanent application
programs that have been written previously. E.g: people accessing database over the
web, bank tellers, clerical staff

• Database Administrator - A person who has central control over the system is called a
database administrator (DBA). He coordinates all the activities of the database system;
the database administrator has a good understanding of the enterprise’s information
resources and needs.
Functions of a DBA include:
o Schema definition - The DBA creates the original database schema by executing a
set of data definition statements in the DDL.
o Storage structure and access-method definition
o Schema and physical-organization modification - The DBA carries out changes to
the schema and physical organization to reflect the changing needs of the
organization to improve performance.
o Granting of authorization for data access - By granting different types of
authorization, the database administrator can regulate which parts of the database
various users can access.
o Routine maintenance - Examples of the database administrator’s routine
maintenance activities are:
▪ Periodically backing up the database
▪ Ensuring that enough free disk space is available for normal operations, and
upgrading disk space as required
▪ Monitoring jobs running on the database
Database Management Systems 1.11

The DBMS interacts with the operating system when disk accesses—to the database or
to the catalog—are needed. If the computer system is shared by many users, the OS will
schedule DBMS disk access requests and DBMS processing along with other processes. On the
other hand, if the computer system is mainly dedicated to running the database server, the
DBMS will control main memory buffering of disk pages. The DBMS also interfaces with
compilers for general purpose host programming languages, and with application servers and
client programs running on separate machines through the system network interface.

The database and the DBMS catalog are usually stored on disk. Access to the disk is
controlled primarily by the operating system (OS), which schedules disk read/write. Many
DBMSs have their own buffer management module to schedule disk read/write, because this
has a considerable effect on performance. Reducing disk read/write improves performance
considerably. A higher-level stored data manager module of the DBMS controls access to
DBMS information that is stored on disk, whether it is part of the database or the catalog.

1.6. DATA MODELS

It provides a collection of tools for describing data, data relationships, data semantics
and consistency constraints. There are different types of data models.

1.6.1 Relational model

The relational model uses a collection of tables to represent both data and the
relationships among those data as shown in Fig. 1.3. Each table has multiple columns, and each
column has a unique name. Tables are also known as relations. The relational model is an
example of a record-based model. Record-based models are so named because the database is
structured in fixed-format records of several types. Each record type defines a fixed number of
fields, or attributes. Relational model is the most widely used data model.

Fig. 1.3 Relational Model


1.12 Introduction to DBMS

Advantages of Relational Model

o Simplicity

o Structural independence

o Ease of design, implementation, maintenance and uses.

o Flexible and powerful query capability.

Disadvantages

o Hardware overheads- need more powerful computing hardware

o Easy to design capability may lead to bad design.

1.6.2 Entity-Relationship (ER) Model

The entity-relationship (E-R) data model uses a collection of basic objects, called
entities, and relationships among these objects as shown in Fig. 1.4. An entity is a “thing” or
“object” in the real world that is distinguishable from other objects. The entity-relationship
model is widely used in database design. The E-R diagram is built up from the following
components.

o Rectangle: which represent entity sets

o Ellipses: which represent attributes

o Diamonds: which represent relationship among entity sets.

o Lines: which link attributes to entity sets and entity sets to relationships.

o Double ellipses: which represent multi valued attributes.

o Dashed Ellipses: which denote derived attributes.

o Double Lines: which represent total participates of an entity in a relationship set.

o Double Rectangle: which represent weak entity sets.

Fig. 1.4 ER Model


Database Management Systems 1.13

1.6.3. Semistructured Data Model

The semistructured data model permits the specification of data where individual data
items of the same type may have different sets of attributes. This is in contrast to the data models
mentioned earlier, where every data item of a particular type must have the same set of
attributes. JSON and Extensible Markup Language (XML) are widely used semi-structured data
representations.

1.6.4. Object-Based Data Model

Object-oriented programming (especially in Java, C++, or C#) has become the dominant
software-development methodology. This led initially to the development of a distinct object-
oriented data model, but today the concept of objects is well integrated into relational databases.
Standards exist to store objects in relational tables. This can be seen as extending the relational
model with notions of encapsulation, methods, and object identity.
CHAPTER – II
RELATIONAL DATABASES

2.1 RELATIONAL MODEL

A relational database management system (RDBMS) is a database management system


based on relational model introduced by E.F.Codd. In relational model, data is represented in
terms of tuples (rows).

2.1.1 Introduction

Relational database consists of a collection of tables, each of which is assigned a unique


name. For example, consider the instructor table of Fig 2.1, which stores information about
instructors. The table has four column headers: ID, name, dept name and salary. Each row of
this table records information about an instructor, consisting of the instructor’s ID, name, dept
name, and salary. Note that each instructor is identified by the value of the column ID.

ID Name Dept name Salary

1 Ram CSE 30000

2 Sita ECE 23000

Fig 2.1 Instructor Table

In general, a row in a table represents a relationship among a set of values. In the


relational model the term relation is used to refer to a table, while the term tuple is used to refer
to a row. Similarly, the term attribute refers to a column of a table.

The term relation instance refers to a specific instance of a relation containing a specific
set of rows. The instance of instructor shown in Fig 2.1 has 2 tuples, corresponding to 2
instructors.

For each attribute of a relation, there is a set of permitted values, called the domain of
that attribute. Thus, the domain of the salary attribute of the instructor relation is the set of all
possible salary values, while the domain of the name attribute is the set of all possible instructor
names.
2.2 Relational Database

For all relations r, the domains of all attributes of r be atomic. A domain is atomic if
elements of the domain are considered to be indivisible units. For example, suppose the table
instructor had an attribute phone number, which can store a set of phone numbers corresponding
to the instructor. Then the domain of phone number would not be atomic, since an element of
the domain is a set of phone numbers, and it has subparts, namely, the individual phone numbers
in the set.

The null value is a special value that signifies that the value is unknown or does not
exist. For example, suppose as before that we include the attribute phone number in the
instructor relation. If an instructor does not have a phone number at all, use the null value to
signify that the value does not exist.

Database schema is the logical design of the database, and the database instance is a
snapshot of the data in the database at a given instant in time. In general, a relation schema
consists of a list of attributes and their corresponding domains.

2.1.2 Codd's 12 Rules

Dr Edgar F. Codd, after his extensive research on the Relational Model of database
systems, came up with twelve rules of his own, which according to him, a database must obey
in order to be regarded as a true relational database. These rules can be applied on any database
system that manages stored data using only its relational capabilities. This is a foundation rule,
which acts as a base for all the other rules.

Rule 1: Information Rule

The data stored in a database, may it be user data or metadata, must be a value of some
table cell. Everything in a database must be stored in a table format.

Rule 2: Guaranteed Access Rule

Every single data element (value) is guaranteed to be accessible logically with a


combination of table-name, primary-key (row value), and attribute-name (column value). No
other means, such as pointers, can be used to access data.

Rule 3: Systematic Treatment of NULL Values

The NULL values in a database must be given a systematic and uniform treatment. This
is a very important rule because a NULL can be interpreted as one the following − data is
missing, data is not known, or data is not applicable.
Database Management Systems 2.3

Rule 4: Active Online Catalog

The structure description of the entire database must be stored in an online catalog,
known as data dictionary, which can be accessed by authorized users. Users can use the same
query language to access the catalog which they use to access the database itself.

Rule 5: Comprehensive Data Sub-Language Rule

A database can only be accessed using a language having linear syntax that supports
data definition, data manipulation, and transaction management operations. This language can
be used directly or by means of some application. If the database allows access to data without
any help of this language, then it is considered as a violation.

Rule 6: View Updating Rule

All the views of a database, which can theoretically be updated, must also be updatable
by the system.

Rule 7: High-Level Insert, Update, and Delete Rule

A database must support high-level insertion, updation, and deletion. This must not be
limited to a single row, that is, it must also support union, intersection and minus operations to
yield sets of data records.

Rule 8: Physical Data Independence

The data stored in a database must be independent of the applications that access the
database. Any change in the physical structure of a database must not have any impact on how
the data is being accessed by external applications.

Rule 9: Logical Data Independence

The logical data in a database must be independent of its user’s view (application). Any
change in logical data must not affect the applications using it. For example, if two tables are
merged or one is split into two different tables, there should be no impact or change on the user
application. This is one of the most difficult rules to apply.

Rule 10: Integrity Independence

A database must be independent of the application that uses it. All its integrity
constraints can be independently modified without the need of any change in the application.
This rule makes a database independent of the front-end application and its interface.
2.4 Relational Database

Rule 11: Distribution Independence

The end-user must not be able to see that the data is distributed over various locations.
Users should always get the impression that the data is located at one site only. This rule has
been regarded as the foundation of distributed database systems.

Rule 12: Non-Subversion Rule

If a system has an interface that provides access to low-level records, then the interface
must not be able to subvert the system and bypass security and integrity constraints.

2.2. KEYS

A key allows us to identify a set of attributes and thus distinguishes entities from each
other. Keys also help to uniquely identify relationships, and thus distinguish relationships from
each other. Different types of keys are:

• Super key

• Primary key

• Candidate key

• Foreign key

2.2.1 Super Key

A super key is a set of one or more attributes that, taken collectively, that allows us to
identify uniquely a tuple in the relation.

e.g., {Roll-No}, {Roll-No, Name}, {Roll-No, Address}, {Roll-No, Name, Address} all
these sets are super keys.

2.2.2 Primary Key

A primary key is one or more column(s) in a table used to uniquely identify each row in
the table. Primary key cannot contain Null value.

2.2.3 Candidate Key

If a relational schema has more than one key, each is called a candidate key. All the
keys which satisfy the condition of primary key can be candidate key. In a student relation,
{Roll-No} and {Phone-No} are two candidate keys and we can consider anyone of these as a
primary key.
Database Management Systems 2.5

2.2.4 Foreign Key

Foreign keys are used to represent relationships between tables. An attribute in one
relation whose value matches the primary key in some other relation is called a foreign key.

For example, Consider the two relations dept and employee,

dept (dno, dname, dloc)

employee (eno, ename, gender, doj, dob, sal, job, dno)

In the above relations, for dept relation dno is the primary key, for employee relation
eno is the primary key and here we can find that the employee relation - dno matches with the
dept relation – dno. So here employee relation dno is known as foreign key.

Referential integrity ensures that a value in one table references an existing value in
another table. The rule of referential integrity states that the value of a foreign key must be
within the domain of its related primary key, or it must be null. This relationship ensures that:

• Records cannot be inserted into a detail table if corresponding records in the master table
do not exist.

• Records of the master table cannot be deleted if corresponding records in the detail table
exist.

2.3. RELATIONAL ALGEBRA

Relational database systems are expected to be equipped with a query language that can
assist its users to query the database instances. There are two kinds of query languages −
relational algebra and relational calculus. Relational algebra is a procedural query language.
Relational calculus is a non-procedural or declarative query language. Relational Algebra
targets how to obtain the result. Relational Calculus targets what result to obtain. Here we will
discuss in detail about relational algebra.

Relational algebra is a procedural query language, which takes instances of relations as


input and yields instances of relations as output. It uses operators to perform queries. An
operator can be either unary or binary. Relational algebra is performed recursively on a relation
and intermediate results are also considered relations.
2.6 Relational Database

2.3.1. Fundamental operations


The fundamental operations of relational algebra are as follows:
1. Select
2. Project
3. Union
4. Set difference
5. Cartesian product
6. Rename
Select Operation (σ)
It selects tuples that satisfy the given predicate (condition) from a relation.
Notation − σp(r), where σ stands for selection, r stands for relation and p is prepositional
logic formula (condition) which may use connectors like and, or, and not. These terms may use
relational operators like =, ≠, ≥, < , >, ≤.
Example 1: σsubject = "database"(Books)
Output − Selects the tuples from books relation where subject name is 'database'.
Example 2: σsubject = "database" and price = "450"(Books)
Output − Selects the tuples from books relation where subject name is 'database' and price of
the book is 450.
Example 3: σsubject = "database" and price = "450" or year > "2010"(Books)
Output − Selects the tuples from books relation where subject name is 'database' and price of
the book is 450 or the books that are published after 2010.
Project Operation ( ∏ )
It projects column(s) that satisfy a given predicate.
Notation − ∏A1, A2, …, An (r) where ∏ stands for projection, A1, A2 , …, An are attribute names
of relation r. In the result, duplicate rows are automatically eliminated, as relation is a set.
Example: ∏subject, author (Books)
Output − Selects and projects the columns that are named as subject and author from the relation
Books.
Database Management Systems 2.7

Union Operation (∪)

It performs binary union between two given relations and is defined as

r ∪ s = {t | t ∈ r or t ∈ s}

Notation − r U s where r and s are either database relations or relation result set (temporary
relation). In the result, duplicate tuples are automatically eliminated.

For a union operation to be valid, the following conditions must hold −

• r and s must have the same number of attributes.

• Attribute domains must be compatible.

Example: ∏ author (Books) ∪ ∏ author (Articles)

Output − Projects the names of the authors who have either written a book or an article or both.

Set Difference (−)

The result of this operation is set of tuples that are present in one relation but not in the second
relation.

Notation: r – s where r and s are relations. It finds all the tuples that are present in r but not in
s.

Example: ∏ author (Books) − ∏ author (Articles)

Output − Provides the name of authors who have written books but not articles.

Cartesian Product (Χ)

This operation is used to combine information in two different relations into one.

Notation: r Χ s where r and s are relations and their output will be defined as –

r Χ s = {q t | q ∈ r and t ∈ s}

The cardinality of the Cartesian product is the product of the cardinalities of its factors, that is,
|R × S| = |R| × |S|.

Example 1: The result of applying × on the given R and S is given below:


2.8 Relational Database

Example 2: σauthor = 'Elmasri'(Books Χ Articles)

Output - Yields a relation that shows all the books and articles written by Elmasri.

Rename Operation (ρ)

The results of relational algebra are also relations but without any name. The rename
operation allows us to name the output relation. 'rename' operation is denoted with small Greek
letter ρ.

Notation − ρx (E) where the result of expression E is saved with name of x.

2.3.2. Additional operations

The additional operations of relational algebra are as follows:

1. Set intersection

2. Division

3. Natural join

4. Assignment

Additional operations are defined in terms of the fundamental operations. They do not
add power to the algebra, but are useful to simplify common queries.

Set Intersection

Intersection on two relations R1 and R2 can only be computed if R1 and R2 are union
compatible (These two relations should have same number of attributes and corresponding
Database Management Systems 2.9

attributes in two relations should have same domain). Intersection operator, when applied to
two relations as R1∩R2, will give a relation with tuples which are in R1 as well as R2.

Syntax: Relation1 ∩ Relation2

Example: Find a person who is student as well as employee

Student ∩ Employee

In terms of basic operators (union and minus), intersection can be expressed as follows:

Student ∩ Employee = (Student U Employee) – (Student - Employee) - (Employee -


Student)

(Or) Student ∩ Employee = Student - (Student – Employee)

Division Operator (÷)

Division operator A÷B can be applied if and only if:

• Attributes of B is proper subset of Attributes of A.

• Attributes in resultant relation = All attributes of A – All Attributes of B

The relation returned by division operator will return those tuples from relation A which
are associated to every B’s tuple.

Example: Consider the relation Student_Sports and All_Sports as mentioned below:

Student_Sports Table

ROLL_NO SPORTS

1 Badminton

2 Cricket

2 Badminton

4 Badminton
2.10 Relational Database

All_Sports Table

SPORTS

Badminton

Cricket

Apply division operator as Student_Sports÷ All_Sports, this operation is valid as

• attributes in All_Sports is a proper subset of attributes in Student_Sports.

• attributes in resulting relation will have attributes {ROLL_NO, SPORTS}-{SPORTS}


= ROLL_NO

The tuples in resulting relation will have those ROLL_NO which are associated with all
B’s tuple {Badminton, Cricket}. ROLL_NO 1 and 4 are associated to Badminton only.
ROLL_NO 2 is associated to all tuples of B. So, the resulting relation will be:

ROLL_NO

Division operation in terms of basic operations is given by

Join Operation

Cartesian product of two relations gives us all the possible tuples that are paired
together. But it might not be feasible for us in certain cases to take a Cartesian product where
we encounter huge relations with thousands of tuples having a considerable large number of
attributes without any meaningful inference.

Join is a combination of a Cartesian product followed by a selection process. A Join


operation pairs two tuples from different relations, if and only if a given join condition is
satisfied.
Database Management Systems 2.11

Types of joins

• Theta (θ) Join

Theta join combines tuples from different relations that satisfy theta condition. The join
condition is denoted by the symbol θ.

Notation: R1 ⋈θ R2 where R1 and R2 are relations having attributes (A1, A2, ..., An)
and (B1, B2, ... ,Bn) such that the attributes don’t have anything in common, that is R1 ∩ R2 =
Φ. Theta join can use all kinds of comparison operators.

Example: Consider the following tables Student and Subject.

Student table

SID Name Std

101 Alex 10

102 Maria 11

Subject table

Class Subject

10 Math

10 English

11 Music

11 Sports

Applying the operation, Student ⋈Student.Std = Subject.Class Subject, we get the output as

SID Name Std Class Subject

101 Alex 10 10 Math

101 Alex 10 10 English


2.12 Relational Database

102 Maria 11 11 Music

102 Maria 11 11 Sports

• Equijoin

When Theta join uses only equality comparison operator, it is said to be equijoin. The
above example corresponds to equijoin.

• Natural Join (⋈)

Natural join does not use any comparison operator. It does not concatenate the way a
Cartesian product does. We can perform a Natural Join only if there is at least one
common attribute that exists between two relations. In addition, the attributes must have
the same name and domain. Natural join acts on those matching attributes where the
values of attributes in both the relations are same.

Example: Consider the following tables Courses and HoD.

Courses Table

CID Course Dept

CS01 Database CS

ME01 Mechanics ME

EE01 Electronics EE

HoD Table

Dept Head

CS Alex

ME Maya

EE Mira

Applying the operation Courses ⋈ HoD, we get the result as


Database Management Systems 2.13

Dept CID Course Head

CS CS01 Database Alex

ME ME01 Mechanics Maya

EE EE01 Electronics Mira

• Outer Joins

Theta Join, Equijoin, and Natural Join are called inner joins. An inner join includes only
those tuples with matching attributes and the rest are discarded in the resulting relation.
Therefore, we need to use outer joins to include all the tuples from the participating
relations in the resulting relation. There are three kinds of outer joins − left outer join,
right outer join, and full outer join.

Consider the tables Left and Right for explaining the concept:

Left

A B

100 Database

101 Mechanics

102 Electronics

Right

A B

100 Alex

102 Maya

104 Mira
2.14 Relational Database

o Left Outer Join (R S)

All the tuples from the Left relation, R, are included in the resulting relation. If there are
tuples in R without any matching tuple in the Right relation S, then the S-attributes of
the resulting relation are made NULL.

Applying the operation Left Right, we get the result as

A B C D

100 Database 100 Alex

101 Mechanics --- ---

102 Electronics 102 Maya

o Right Outer Join: (R S)


All the tuples from the Right relation, S, are included in the resulting relation. If there
are tuples in S without any matching tuple in R, then the R-attributes of resulting relation
are made NULL.
Applying the operation Left Right, we get the result as

A B C D

100 Database 100 Alex

102 Electronics 102 Maya

--- --- 104 Mira

o Full Outer Join: (R S)


All the tuples from both participating relations are included in the resulting relation. If
there are no matching tuples for both relations, their respective unmatched attributes are
made NULL.
Database Management Systems 2.15

Applying the operation Left Right, we get the result as

A B C D

100 Database 100 Alex

101 Mechanics --- ---

102 Electronics 102 Maya

--- --- 104 Mira

Assignment Operation
Sometimes it is useful to be able to write a relational algebra expression in parts using
a temporary relation variable. The assignment operation, denoted as , works like assignment
in a programming language. No extra relation is added to the database, but the relation variable
created can be used in subsequent expressions. Assignment to a permanent relation would
constitute a modification to the database.
CHAPTER – III
SQL FUNDAMENTALS

3.1 INTRODUCTION

IBM Sequel language was developed as part of System R project at the IBM San Jose
Research Laboratory. Afterwards it has been renamed as Structured Query Language (SQL).
SQL is the standard command set used to communicate with the relational database
management systems. All tasks related to relational data management - creating tables, querying
the database can be done using SQL.

The advantages of SQL are

• Standardized and Interactive Language.

• Portable.

• Simple and easy to learn.

• No Need of Coding Skills.

• Provide greater degree of abstraction.

3.2 DOMAIN TYPES IN SQL

SQL supports the following data types.

• char(n) (or character(n)): fixed-length character string, with user-specified length.

• varchar(n) (or character varying): variable-length character string, with user-specified


maximum length.

• int or integer: an integer (length is machine-dependent).

• smallint: a small integer (length is machine-dependent).

• numeric (p, d): a fixed-point number with user-specified precision, consists of p digits
and d of p digits after decimal point.

• real or double precision: floating-point or double-precision floating-point numbers, with


machine-dependent precision.
3.2 SQL Fundamentals

• float(n): floating-point, with user-specified precision of at least n digits.

• date: a calendar date, containing four-digit year, month, and day of the month.

• time: the time of the day in hours, minutes, and seconds.

3.3 SQL LANGUAGES


3.3.1. Data Definition Language (DDL)
DDL consists of the SQL commands that can be used to define the database schema.
It is used to create a table, alter the structure of a table and also drop the table. The various
DDL Commands are
• CREATE
• DESCRIBE
• ALTER
• RENAME
• DROP
• TRUNCATE
i. CREATE
This command is to create a new table.
Syntax:
create table table_name(fieldname1 datatype(size), fieldname2 datatype(size),..);
Example:
create table student(rollno int, name varchar (20), city varchar (20), age int);
ii. DESCRIBE
This command is to view the structure of the table.
Syntax:
desc table_name;
Example:
desc student;
iii. ALTER
This command is used to add a new column, drop a column or modify existing column
definitions.
Database Management Systems 3.3

ADD A COLUMN
Syntax:
alter table table_name add column_name datatype(size);
Example:
alter table student add column mark1 int;
DROP A COLUMN
This command is used to delete a column.
Syntax:
alter table table_name drop column_name ;
Example:
SQL>alter table student drop column mark1;
MODIFY COLUMN
This command is used to change the data type of a column or change the size of the
column
Syntax:
ALTER TABLE table_name MODIFY COLUMN column_name datatype;
Example:
alter table student modify name varchar2(25);
ADD PRIMARY KEY
This command is used to make a column as primary key.
Syntax:
alter table table_name add primary key(Field_name);
Example:
alter table student add primary key(Rollno);
DROP PRIMARY KEY
This command is used to remove the primary key.
Syntax:
alter table table_name drop primary key;
Example:
3.4 SQL Fundamentals

alter table itstudent drop primary key;


ADD CHECK CONSTRAINT
Syntax:
Alter table tablename add constraint const_name check (condition);
Example:
alter table student add constraint C1 check(mark1<=100);
iv. RENAME TABLE
\Syntax:
alter table table_name rename to new_table_name;
Example:
alter table student rename to itstudent;
v. TRUNCATE
This command is used to delete the rows in a table. It deletes the contents, but the
structure of the table is retained. The deleted rows cannot be retrieved using rollback
command.
Syntax:
truncate table tablename;
Example:
truncate table itstudent;
vi. DROP
This command is used to delete a table. It deletes the contents as well as the structure
of table.
Syntax:
drop table table_name;
Example:
drop table itstudent;
3.3.2. Data Manipulation Language (DML)
DML commands deal with the manipulation of data present in the database. The
various
Database Management Systems 3.5

DML commands are


• INSERT
• SELECT
• UPDATE
• DELETE
i. INSERT
This command is used to insert a row to a table.
Insert values to all fields:
Syntax:
INSERT INTO table_name VALUES (value1, value2, value3...);
Example:
INSERT INTO student VALUES (5,’Meena’,’Chennai’,23);
Insert values to particular fields:
Syntax:
INSERT INTO table_name(column1, column2, column3...) VALUES (value1, value2,
value3...);
Example:
INSERT INTO Student (ROLL_NO, NAME, AGE) VALUES (6,’Tina’,19);
ii. SELECT
This command is used to retrieve data from a SQL database.
a) SELECT without Condition:
Selecting all columns:
Syntax:
SELECT * FROM Table_name;
Example:
SELECT * from Student;
Selecting particular columns:
Syntax:
SELECT Column1, Column2, Column3… FROM Table_name;
3.6 SQL Fundamentals

Example:
SELECT ROLL_NO, NAME from Student;
b) SELECT with Condition
Selecting all columns satisfying some condition:
Syntax:
SELECT * FROM Table_name where condition;
Example:
SELECT * from Student where NAME='Tina';
Selecting particular columns satisfying some condition:
Syntax:
SELECT Column1, Column2, Column3… FROM Table_name where condition;
Example:
SELECT ROLL_NO from Student where NAME='Tina';

iii. UPDATE

This command is used to modify the data in existing database row(s). Usually, a
conditional clause is to be added to specify which row(s) are to be updated. If conditional
clause is not included, then updation will be done in all the rows.

Syntax:

UPDATE Table_name SET Column1=Value1, Column2=Value2 WHERE


Some_Column = Some_Value / Expression;

Example:

UPDATE Student SET AGE = 24 WHERE NAME='MEENA';

iv. DELETE

This command deletes a single record or multiple records from a table. If commit
command is not executed, then the deleted rows can be retrieved using rollback
command.

a) Delete particular row:

Syntax:
Database Management Systems 3.7

DELETE FROM table_name WHERE some_column=some_value;

Example:

DELETE FROM Student WHERE NAME='MEENA';

b) Delete all rows:

Syntax:

DELETE FROM table_name;

Example:

DELETE FROM Student;

3.3.3. Data Control Language (DCL)

DCL commands are used to control the access to data stored in a database. It ensures
security.

DCL commands are

• GRANT

• REVOKE

i) GRANT

This command is used give the rights to others user

Syntax:

Grant <privileges> on <table name> to <username>;

Example:

SQL> Grant select on emp to user1;

SQL> Grant select, insert on emp to user2;

ii) REVOKE

This command is used to get back the permission from the users.

Syntax:
3.8 SQL Fundamentals

SQL>Revoke <privileges> on <table name> from <username>;

Example:

SQL>Revoke select on emp from user1;

SQL>Revoke select, insert on emp from user2;

3.3.4. Transaction Control Language (TCL)

Transaction is a set of tasks grouped into a single execution unit. TCL commands are
used to maintain consistency of the database and management of transactions made by the DML
commands. The following TCL commands are used to control the execution of a transaction:

• COMMIT
• ROLLBACK
• SAVEPOINT
i. COMMIT:
This command is used to save the data permanently.
Syntax:
commit;
ii. ROLLBACK
This command is used to restore the data to the last savepoint or last committed state.
Syntax:
rollback;

iii. SAVEPOINT

This command marks a point in the current transaction. This command is used to save
the data at a particular point temporarily, so that whenever needed can be rollback to
that particular point.

Syntax:

SQL> save point savepointname;

Example:

SQL> save point x;


Database Management Systems 3.9

3.4. ADDITIONAL OPERATIONS

1. String Operations

SQL includes a string-matching operator for comparisons on character strings. Patterns


are case sensitive. The operator like uses patterns that are described using two special
characters:

• percent (%). The % character matches any substring.

• underscore (_). The _ character matches any character.

Pattern matching examples:

• select name from customer where name like 'S% '; //Displays the name of the

• customers whose name starts with letter S.

• 'Intro%' matches any string beginning with “Intro”.

• '%Comp%' matches any string containing “Comp” as a substring.

• '_ _ _' matches any string of exactly three characters.

• '_ _ _ %' matches any string of at least three characters.

SQL supports a variety of string operations such as

• concatenation using “||” operator

• converting from upper to lower case and vice versa

• finding string length, extracting substrings, etc.

2. Order by Clause

The ORDER BY keyword is used to sort the result of the query in ascending or
descending order. By default, the ORDER BY keyword sorts the records in ascending
order and DESC keyword should be used to sort the records in descending order.

Syntax:

SELECT column1, column2, ... FROM table_name ORDER BY column1, column2,


... ASC|DESC;
3.10 SQL Fundamentals

Examples:

• List the Customer names in alphabetic order.

SELECT DISTINCT CustName from Customer order by CustName;

• List all customer details in the Customer table, sorted ascending by city name and
then descending by the CustomerName

SELECT * FROM Customer ORDER BY City ASC, CustName DESC;

3. Set Operations

SQL Set operation is used to combine the data from the result of two or more SELECT
commands. The number of columns retrieved by each SELECT command must be the
same. The columns in the same position in each SELECT statement should have
similar data types.

There are four types of set Operation.

i. Union:

The UNION operator returns the rows either from the result set of first query or
result set of second query or both. The union operation eliminates the duplicate rows
from its result set.

Syntax:

SELECT * FROM Table1 UNION SELECT * FROM Table2;

Example:

SELECT acctno from Account UNION SELECT acctno from Loan;

ii. Union All:

Union all is same as Union but retains duplicates.

Syntax:

SELECT * FROM Table1 UNION ALL SELECT * FROM Table2;

Example:

SELECT acctno from Account UNION ALL SELECT acctno from Loan;
Database Management Systems 3.11

iii. Intersect:

The Intersect operation returns the common rows from the result set of both
SELECT statements. It has no duplicates and it arranges the data in ascending order
by default.

Syntax:

SELECT column_name FROM Table1 INTERSECT SELECT column_name FR


OM Table2;

Example:

SELECT acctno from Account INTERSECT SELECT acctno from Loan;

iv. Minus:

Minus operator is used to display the rows which are present in the result set of first
query but not present in the result set of second query. It has no duplicates and data
arranged in ascending order by default.

Syntax:

SELECT column_name FROM table1 MINUS SELECT column_name FROM ta


ble2;

Example:

SELECT acctno from Account MINUS SELECT acctno from Loan;

4. NULL values

A NULL signifies an unknown value or that a value does not exist. For example, the
value for the field phone number is NULL, if the person does not have phone or the
person has the phone but we do not know the phone number. Hence it is possible for
tuples to have a null value for some of their attributes denoted by NULL.

The result of any arithmetic expression involving null is null [5 + null returns null]. The
predicate is null can be used to check for null values. For example, the following query
displays the name of the employees whose salary is null.

select name from employee where salary is null


3.12 SQL Fundamentals

3.5. AGGREGATE FUNCTIONS


An aggregate function performs a calculation on a set of values, and returns a single
value.
i. Basic Aggregate Functions
The Various types of basic aggregate functions are:
• avg( )
• min( )
• max( )
• sum( )
• count( )
Let us consider the Salesman contains the following data.

SNAME SID SAGE SAREA SDEPT SSCORE

Ashwin 101 19 anna nagar aeronautical 750

Bhavesh 102 18 nungambakkam marine 500

pruthvik 103 20 anna nagar aerospace 250

charith 104 20 kilpauk mechanical 100

The following table shows the aggregate functions and the result of the queries.

S.No Examples of Aggregate Function Result

1 select avg(SSCORE) from Salesman; 400

2 select min(SSCORE) from Salesman; 100

3 select max(SSCORE) from Salesman; 750

4 select sum(SSCORE) from Salesman; 1600

5 select count(sid) from Salesman; or 4


select count(*) from Salesman;
Database Management Systems 3.13

ii. Aggregate Function with Group By Clause:

GROUP BY is used to group to all the records in a relation together for each and
every value of a specific key(s) and then display them for a selected set of fields the
relation.

Syntax:

SELECT <set of fields> FROM <relation_name> GROUP BY <field_name>;

Example:

SELECT empno, COUNT(pno) empproj group by empno;

iii. Aggregate Function with Group By Having Clause:

The HAVING clause is used to group the records together based on the key and
apply some condition on that group. The HAVING clause must follow the GROUP
BY clause in a query and must also precedes the ORDER BY clause if used.

Syntax:

SELECT column_name, aggregate_function(column_name) FROM table_name

GROUP BY column_name

HAVING aggregate_function(column_name) operator value;

Example:

SELECT empno, count(projno) from empproj group by empno having


count(projno) > 5;

3.6. NESTED SUBQUERIES

A Sub query or Inner query or a Nested query is a query within another SQL query and
embedded within the WHERE clause. A sub query is used to return data that will be used in the
main query as a condition. Sub queries must be enclosed within parentheses. Sub queries cannot
have an ORDER BY command but the main query can use an ORDER BY.

Syntax:

SELECT column_list (s) FROM table_name WHERE column_name OPERATOR

(SELECT column_list (s) FROM table_name [WHERE] condition)


3.14 SQL Fundamentals

i) Set Membership

There are two set membership operators - in and not in

Example:

▪ List the projectid of projects done by John

select projectid from emp_proj where empid in (select empid from employee where
empname = 'John');

▪ List the projectid of projects done by all employees except John

select projectid from emp_proj where empid not in (select empid from employee
where empname = 'John');

ii) Set Comparison

SQL supports various comparison operators such as < =, > =, < >, any, all, some, etc. to
compare sets.

Example:

▪ Display the name of employees whose salary is greater than that of some (at least
one) employees in the manufacturing department.

select distinct E.name from Employee as E, Employee as S where E.salary >


S.salary and S.dept name = 'manufacturing';

The above query can also be written using > some clause as shown below.

select name from Employee where salary > some (select salary from Employee
where deptname = ' manufacturing ');

▪ Display the name of employees whose salary is greater than the salary of all
employees in the manufacturing department.

select name from Employee where salary > all (select salary from Employee where
deptname = ' manufacturing ');

iii) Test for Empty Relations

SQL includes the facility for testing whether the result set of sub query has any tuples.
The exists in SQL returns the value true if the result of sub query is nonempty.
Database Management Systems 3.15

Example
▪ List the courseid of the courses registered by one or more students.
SELECT DISTINCT courseid FROM course WHERE EXISTS (SELECT * FROM
stud_course WHERE course.courseid = stud_course.courseid)
Similarly, the NOT EXISTS in SQL Server will check the subquery for rows
existence, and if there are no rows then it will return TRUE, otherwise FALSE.
▪ List the courseid of the courses not chosen by any student.
SELECT DISTINCT courseid FROM course WHERE NOT EXISTS (SELECT *
FROM stud_course WHERE course.courseid = stud_course.courseid)
iv) Test for absence of duplicate tuples
The unique construct tests whether a sub query has any duplicate tuples in its result. The
unique construct evaluates to “true”, if a given sub query contains no duplicates.
Example:
▪ Find all courses that were offered at most once in 2017
select T.course_id from course as T where unique (select R.course_id from section
as R where T.course_id= R.course_id and R.year = 2017);
3.7. VIEWS
In many applications, it is not desirable for all users to see the entire relation.
Security considerations require that a part of the relation can be made available to the users
and certain data need to be hidden from them. For example, a clerk may be given the rights
to know an instructor’s ID, name and department name and instructor’s salary. This person
should see a relation described in SQL, by:
select ID, name, dept name from instructor;
It is possible that the result set of above queries can be computed and stored in another
relation and that stored relation can be given to the clerks. However, if the underlying data in
the relation instructor changes, the stored query results would then no longer match the result
of re-executing the query on the relations.
In order to overcome the above issue, SQL allows a “virtual relation” to be defined
by a query, and the relation conceptually contains the result of the query. The virtual
relation is not pre-computed and stored, but instead computed by executing the query
whenever the virtual relation is used. Any such relation that is not part of the logical model,
but is made visible to a user as a virtual relation, is called a view.
3.16 SQL Fundamentals

i. View Definition

A view in SQL can be created using the create view command. To define a view, we
must give the view a name and must state the query that computes the view.

Syntax:

CREATE OR REPLACE VIEW view_name AS SELECT column_name(s) FROM


table_name WHERE condition;

Example:

Let us consider the EMP relation contains the fields empno, ename and job. The below
command creates a view named v1 with the details of clerks only.

CREATE VIEW v1 AS SELECT empno, ename, FROM EMP WHERE job = ‘clerk’;

The view relation conceptually contains the tuples in the query result and not pre
computed or saved. But the database system stores the query expression associated
with the view relation. Whenever the view relation is accessed, its tuples are created
by computing the query result.

Once we have defined a view, we can use the view name to refer to the virtual relation
that the view generates. Using the view v1, we can find the empno of all clerks.

SELECT empno from v1;

View names may be used in a way similar to using a relation name. Certain database
systems allow view relations to be stored, but they make sure that, if the actual relations
used in the view definition change, the view is kept up-to-date. Such views are called
materialized views.

ii. Update of a View

Although views are a useful tool for queries, they present serious problems if we
express updates, insertions, or deletions with them. The difficulty is that a
modification to the database expressed in terms of a view must be translated to a
modification to the actual relations in the logical model of the database.

Suppose the view v1 is made available to a clerk. Since we allow a view name to appear
wherever a relation name is allowed, the clerk can write:

insert into v1 values (900, ’Ajay’);

This insertion must be represented by an insertion into the relation EMP, since
Database Management Systems 3.17

EMP is the actual relation from which the view v1 is created. However, to insert a tuple
into Employee, we must have some value for job. There are two reasonable approaches
to deal with this insertion:

• Reject the insertion, and return an error message to the user.

• Insert a tuple (900, ’Ajay’, null) into the EMP relation.

iii. Delete View

Views can be deleted using the syntax

DROP VIEW view-name;

DROP VIEW v1; - deletes the view v1.

3.8. JOINS

The Join clause is used to combine records from two or more tables in a database.
Cartesian product of two relations results in all possible combination whereas Join combines
the tables by using common values. Join is equivalent to Cartesian product followed by a
selection process. A Join operation pairs two tuples from different relations, if and only if a
given join condition is satisfied. We have discussed the types of joins already. (Refer Section
2.3.2)

Examples of Joins in SQL:

Consider student and stud_per are the relations created as follows.

create table student(rollno int primary key, sname varchar(25), class varchar(10));

create table stud_per(rollno int, doorno varchar(6), city varchar(15));

The joins performed on the two relations are as follows

SELECT student.rollno, student.sname, stud_per.city FROM student INNER JOIN stud_per


ON student.rollno = stud_per.rollno;

SELECT student.rollno, student.sname, stud_per.city FROM student LEFT JOIN stud_per


ON student.rollno = stud_per.rollno;

SELECT student.rollno, student.sname, stud_per.city FROM student RIGHT JOIN stud_per


ON student.rollno = stud_per.rollno;
3.18 SQL Fundamentals

3.9. ADVANCED SQL FEATURES

3.9.1 Functions

A PL/SQL function is a reusable program unit stored as a schema object.

Syntax

Create [or replace] function fn_Name(parameter list)


return return type
is
[declarative section]
BEGIN
[executable section]
[EXCEPTION]
[exception-handling section]
END;

A function consists of a header and body. The function header has the function name
and a RETURN clause that specifies the data type of the value to be returned. The parameter of
the function can be either in the IN, OUT, or INOUT mode.

The function body has three sections:

Declarative section - declare variables, constants, cursors, and user-defined types,


executable section, and exception-handling section.

Executable section - executable statements and must have at least one


RETURN statement.

Exception - handling section - exception handler code.

Among the three sections, only the executable section is required, the others are optional.

Example:

The following example illustrates how to create and call a function. Let us consider the
Sportsman relation contains the following data.
Database Management Systems 3.19

Sid Sname Sage Scountry

100 Rohan 28 India

101 Dhoni 31 India

102 David 22 South Africa

103 Rohit 33 India

104 Michael 25 England

The below function creates a function that returns the id of the sportsman, if name is
given.

create function f1(t_name varchar2) return number as id number;


begin
select Sid into id from Sportsman where Sname=t_name;
return id;
end;
The below code invokes the function f1 and the id of Dhoni, 101 is printed as the
result.
begin
dbms_output.put_line(f1('Dhoni'));
end;
Advantages of functions:
• Can be used with Clauses easily (like where, having and so on)
• Function can be embedded in a Select statement
• Easy to Use
• Functionality Oriented
The syntax for deleting a function is
drop function fn_name;
The command drop function f1; - deletes the function f1.
3.20 SQL Fundamentals

3.9.2 Procedures

The PL/SQL stored procedure or simply a procedure is a PL/SQL block which performs
one or more specific tasks. It is just like procedures in other programming languages.

The procedure contains a header and a body.

Header: The header contains the name of the procedure and the parameters or variables passed
to the procedure.

Body: The body contains a declaration section, execution section and exception section
similar to a general PL/SQL block.

Passing parameters in procedure:

There are three ways to pass parameters in procedure:

1. IN parameters: The IN parameter can be referenced by the procedure but the value of
the parameter cannot be overwritten.

2. OUT parameters: The OUT parameter cannot be referenced but the value of the
parameter can be overwritten.

3. INOUT parameters: The INOUT parameter can be referenced the value of the
parameter can be overwritten. The main difference between procedure and a function is,
a function must always return a value, but a procedure may or may not return a value.

Syntax
CREATE [OR REPLACE] PROCEDURE procedure_name
[ (parameter [,parameter]) ]
IS
[declaration_section]
BEGIN
executable_section
[EXCEPTION
exception_section]
END [procedure_name];
Database Management Systems 3.21

Example:

In this example, a procedure is created to insert record in the product table. So, we need
to create product table first.
create table product(id number(10) primary key,name varchar2(100));
The below procedure is to insert record in product table.
create or replace procedure proc1(id IN NUMBER, name IN VARCHAR2)
is
begin
insert into product values(id,name);
end;
The below code calls the procedure proc1and inserts the row in the product table.
BEGIN
proc1(101,'Harddisk');
END;
The syntax for deleting a procedure is
DROP PROCEDURE procedure_name;

3.9.3 Triggers

A trigger is procedural code that is automatically executed in response to certain events


on a particular table or view in a database. The trigger is mostly used for maintaining the
integrity of the information on the database. For example, when a new employee joins an
organization, his details need to be added to the employees table. New records should also be
inserted in the pay roll table.

Triggers are for:

▪ Customization of database management

▪ Centralization of some business or validation rules

▪ Logging and audit

▪ Modify table data when DML statements are issued against views
3.22 SQL Fundamentals

▪ Enforce security authorizations (prevent DML operations on a table after regular


business hours)

▪ Prevent invalid transactions

The structure of the trigger is:


Create TRIGGER trigger_name
triggering_event
[ trigger_restriction ]
BEGIN
triggered_action;
END;

The trigger_name must be unique for triggers in the schema. A trigger cannot be invoked
directly like function or procedure. It will be invoked automatically by its triggering event. A
trigger can be made to execute before/after the operations such as insert/update/delete.

Example:

Let us consider the account table contains the following information.

ACC_NO ACC_NAME BALANCE

101 Reena 4000

102 Meena 20000

103 Tina 3000000

104 Smith 500

105 John 500000

The following trigger displays the total balance in the bank whenever a new account is
created.

create trigger t3 before insert on Account


for each row
Database Management Systems 3.23

declare a int;
begin
select sum(Balance) into a from Account;
dbms_output.put_line(a);
end;

Triggers can serve in many useful purposes like banking and railway reservation.
Triggers should be written with much care since the action of one trigger can invoke another
and may even lead to an infinite chain of triggering. Hence triggers can be avoided
whenever alternatives exist. Many trigger applications can be substituted by appropriate
use of stored procedures.

3.10. EMBEDDED SQL

SQL can be embedded in almost all high-level languages due to the vast support it has
from almost all developers. Languages like C, C++, Java etc, support SQL integration. Some
languages like python have inbuilt libraries to integrate the database queries in the code. For
python, we have the SQLite library which makes it easy to connect to the database using the
embedding process.

Embedded SQL provides a means by which a program can interact with a database
server. However, under embedded SQL, the SQL statements are identified at compile time
using a preprocessor, which translates requests expressed in embedded SQL into function calls.
At runtime, these function calls connect to the database using an API that provides dynamic
SQL facilities but may be specific to the database that is being used.

Embedded SQL gives us the freedom to use databases as and when required. Once the
application we develop goes into the production mode several things need to be taken care of.
We need to take care of a thousand things out of which one major aspect is the problem of
authorization and fetching and feeding of data into/from the database. With the help of the
embedding of queries, we can easily use the database without creating any bulky code. With
the embedded SQL, we can create API’s which can easily fetch and feed data as and when
required.

How to Embed SQL in High-Level Languages?

For using embedded SQL, we need some tools in each high-level language. In some
cases, we have inbuilt libraries which provide us with the basic building block. While in some
cases we need to import or use some packages to perform the desired tasks.
3.24 SQL Fundamentals

For example, in Java, we need a connection class. We first create a connection by using
the connection class and further we open the connection bypassing the required parameters to
connect with the database.

The distinction between an SQL statement and host language statement is made by using
the key word EXEC SQL; thus, this key word helps in identifying the Embedded SQL
statements by the pre-compiler.

Example: How to connect to a database (using JAVA).

Class.forName(“com.mysql.jdbc.Driver”);

Connection connection = DriverManager.getConnection( “jdbc:mysql://localhost:3306/


DataFlair”,”user”,”root”);

Statement statement = connection.createStatement();

After creating the connection using the statement, we can create SQL query and execute it using
the connection object.

Host variables

• Database manager cannot work directly with high level programming language
variables.

• Instead, it must be special variables known as host variables to move data between an
application and a database.

Two types of host variables.

• Input host variables: Transfer data to database.

• Output host variable: Receives data from database

Advantages of embedded SQL

Small footprint database: As embedded SQL uses an Ultra Lite database engine
compiled specifically for each application, the footprint is generally smaller than when using
an Ultra Lite component, especially for a small number of tables. For a large number of tables,
this benefit is lost.

High performance: Combining the high performance of C and C++ applications with
the optimization of the generated code, including data access plans, makes embedded SQL a
good choice for high-performance application development.
Database Management Systems 3.25

Extensive SQL support: With embedded SQL you can use a wide range of SQL in your
applications.

Disadvantages of embedded SQL

Knowledge of C or C++ required: If you are not familiar with C or C++ programming,
you may wish to use one of the other Ultra Lite interfaces. Ultra Lite components provide
interfaces from several popular programming languages and tools.

Complex development model: The use of a reference database to hold the Ultra Lite
database schema, together with the need to pre-process your source code files, makes the
embedded SQL development process complex. The Ultra Lite components provide a much
simpler development process.

SQL must be specified at design time: Only SQL statements defined at compile time
can be included in your application. The Ultra Lite components allow dynamic use of SQL
statements.

3.11. Dynamic SQL

Dynamic SQL, unlike embedded SQL statements, are built at the run time and placed
in a string in a host variable. The created SQL statements are then sent to the DBMS for
processing. Dynamic SQL is generally slower than statically embedded SQL as they require
complete processing including access plan generation during the run time.

Dynamic SQL is a programming technique that allows you to construct SQL statements
dynamically at runtime. It allows you to create more general purpose and flexible SQL
statement because the full text of the SQL statements may be unknown at compilation. For
example, you can use the dynamic SQL to create a stored procedure that queries data against a
table whose name is not known until runtime.

• The query can be entered completely as a string by the user or s/he can be suitably
prompted.

• The query can be fabricated using a concatenation of strings. This is language dependent
in the example and is not a portable feature in the present query.

• The query modification of the query is being done keeping security in mind.

• The query is prepared and executed using a suitable SQL EXEC command.
CHAPTER – IV
ENTITY RELATIONSHIP MODEL

4.1. DATA MODELS

The data model describes the structure of a database. It is a collection of conceptual tools
for describing data, data relationships and consistency constraints and various types of data
models such as

1. Object based logical model


2. Record based logical model
3. Physical model
Types of data model:
1. Object based logical model
a. ER-model
b. Functional model
c. Object oriented model
d. Semantic model
2. Record based logical model
a. Hierarchical database model
b. Network model
c. Relational model

3. Physical model

4.2 ENTITY RELATIONSHIP MODEL (ER MODEL)

The ER MODEL is a Classical, popular conceptual data model, which is first introduced
(mid 70’s) as a (relatively minor) improvement to the relational model: pictorial diagrams are
easier to read than relational database schemas evolved as a popular model for the first
conceptual representation of data structures in the process of database design.
4.2 Entity Relationship Model

ENTITY RELATIONSHIP MODEL

The entity-relationship data model perceives the real world as consisting of basic objects,
called entities and relationships among these objects. It was developed to facilitate database
design by allowing specification of an enterprise schema which represents the overall logical
structure of a database.

4.2.1 Features of ER-MODEL

• Entity relationship model is a high level conceptual model

• It allows us to describe the data involved in a real world enterprise in terms of objects
and their relationships.

• It is widely used to develop an initial design of a database

• It provides a set of useful concepts that make it convenient for a developer to move
from a basic set of information to a detailed and description of information that can be
easily implemented in a database system

• It describes data as a collection of entities, relationships and attributes.

The E-R data model employs three basic notions

1. Entity sets

2. Relationship sets

3. Attributes.

4.2.2. Entity Sets

An entity is a “thing” or “object” in the real world that is distinguishable from all other
objects. For example, each person in an enterprise is an entity. An entity has a set of properties
and the values for some set of properties may uniquely identify an entity. BOOK is entity and
its properties (called as attributes) bookcode, booktitle, price etc.

An entity set is a set of entities of the same type that share the same properties, or
attributes. Example: The set of all persons who are customers at a given bank.

4.2.3. Attributes

An entity is represented by a set of attributes. Attributes are descriptive properties


possessed by each member of an entity set.
Database Management Systems 4.3

Customer is an entity and its attributes are customerid, custmername, custaddress etc. An
attribute as used in the E-R model, can be characterized by the following attribute types.

a) Simple and Composite Attribute

Simple attributes are the attributes which can’t be divided into sub parts, e.g. customerid,
empno Composite attributes are the attributes which can be divided into subparts, e.g.
name consisting of first name, middle name, last name and address consisting of city,
pincode, state.

b) Single-Valued and Multi-Valued Attribute

The attribute having unique value is single –valued attribute, e.g. empno, customerid,
regdno etc. The attribute having more than one value is multi-valued attribute, eg:
phone-no, dependent name, vehicle.

c) Derived Attribute

The values for this type of attribute can be derived from the values of existing attributes,
e.g. age which can be derived from currentdate – birthdate and experience_in_year can
be calculated as currentdate - joindate.

d) NULL Valued Attribute

The attribute value which is not known to user is called NULL valued attribute.

4.2.4. Relationship Sets

A relationship is an association among several entities. A relationship set is a set of


relationships of the same type. Formally, it is a mathematical relation on n>=2 entity sets. If E1,
E2…En are entity sets, then a relationship set R is a subset of

{(e1,e2,…en) | e1Є E1, e2 Є E2.., en Є En} where (e1,e2,…en) is a relationship.

Customer Borrow Loan

Consider the two entity sets customer and loan. We define the relationship set borrow
to denote the association between customers and the bank loans that the customers have.
4.4 Entity Relationship Model

4.2.5. Mapping Cardinalities

Mapping cardinalities or cardinality ratios, express the number of entities to which


another entity can be associated via a relationship set. Mapping cardinalities are most useful in
describing binary relationship sets, although they can contribute to the description of
relationship sets that involve more than two entity sets. For a binary relationship set R between
entity sets A and B, the mapping cardinalities must be one of the following:

1. One to One

An entity in A is associated with at most one entity in B, and an entity in B is


associated with at most one entity in A.

Eg: relationship between college and principal

1 1
College Principal
has

2. One to Many

An entity in A is associated with any number of entities in B. An entity in B is


associated with at the most one entity in A.

Eg: Relationship between department and faculty

1 M
Department Faculty
has

3. Many to One

An entity in A is associated with at most one entity in B. An entity in B is associated


with any number in A.

M 1

Employee Works in Department


Database Management Systems 4.5

4. Many to Many

Entities in A and B are associated with any number of entities from each other.

M M
Customer Deposits Account

4.2.6. More about Entities and Relationship

Recursive Relationships

When the same entity type participates more than once in a relationship type in different
roles, the relationship types are called recursive relationships.

Participation Constraints

The participation constraints specify whether the existence of any entity depends on its
being related to another entity via the relationship. There are two types of participation
constraints.

a) Total: When all the entities from an entity set participate in a relationship type, is called
total participation. For example, the participation of the entity set student on the
relationship set must ‘opts’ is said to be total because every student enrolled must opt
for a course.

b) Partial: When it is not necessary for all the entities from an entity set to participate in a
relationship type, it is called partial participation. For example, the participation of the
entity set student in ‘represents’ is partial, since not every student in a class is a class
representative.

Weak Entity

Entity types that do not contain any key attribute, and hence cannot be identified
independently are called weak entity types. A weak entity can be identified by uniquely only
by considering some of its attributes in conjunction with the primary key attribute of another
entity, which is called the identifying owner entity.

Generally a partial key is attached to a weak entity type that is used for unique
identification of weak entities related to a particular owner type. The following restrictions must
hold:
4.6 Entity Relationship Model

• The owner entity set and the weak entity set must participate in one to many relationship
set. This relationship set is called the identifying relationship set of the weak entity set.

• The weak entity set must have total participation in the identifying relationship.

Example

Consider the entity type Dependent related to Employee entity, which is used to keep
track of the dependents of each employee. The attributes of Dependents are: name, birthdate,
sex and relationship. Each employee entity set is said to its own the dependent entities that are
related to it. However, not that the ‘Dependent’ entity does not exist of its own, it is dependent
on the Employee entity.

4.3. ER-DIAGRAM

The overall logical structure of a database is represented graphically with the help of an
ER- diagram.

4.3.1. Symbols used in ER- diagram

composite
entity
attribute

Weak entity Relationship

Identifying
attribute
Relationship

Multi valued Key attribute


attribute
Database Management Systems 4.7

Derived
attribute

Total Partial
participation participation

One to One Many to One


Relationship Relationship

Fig. 4.1 Symbols in ER Diagram

4.3.2. Examples

Example 1:

Fig. 4.2 ER Diagram for Bank Database


4.8 Entity Relationship Model

Example 2:

Fig. 4.3 ER Diagram for Car Insurance Company

Example 3:

Fig. 4.4 ER Diagram for a Hospital


Database Management Systems 4.9

Example 4:

A University registrar's office maintains data about the following entities:

(a) Course, including number, title, credits, syllabus and prerequisites

(b) Course offering, including course number, year, semester, section number, instructor
timings, and class room

(c) Students including student-id, name and program

(d) Instructors, including identification number, name, department and title

Further, the enrollment of students in courses and grades awarded to students in each
course they are enrolled for must be appropriate modeled. E-R diagram for this registrar's office
is given below:

Fig. 4.5 ER Diagram for University Registrar Office


4.10 Entity Relationship Model

Example 5:

Fig. 4.6 ER- Diagram for a Student Mark Maintenance System

4.4. THE ENHANCED ER MODEL

As the complexity of data increased in the late 1980s, it became more and more difficult
to use the traditional ER Model for database modelling. Hence some improvements or
enhancements were made to the existing ER Model to make it able to handle the complex
applications better.

Hence, as part of the Enhanced ER Model, along with other improvements, three new
concepts were added to the existing ER Model, they were:

1. Generalization

2. Specialization

3. Aggregation
Database Management Systems 4.11

4.4.1. Generalization
Generalization is a bottom-up approach in which two lower level entities combine to
form a higher level entity. In generalization, the higher level entity can also combine with other
lower level entities to make further higher level entity.
It's more like Superclass and Subclass system, but the only difference is the approach,
which is bottom-up. Hence, entities are combined to form a more generalized entity, in other
words, sub-classes are combined to form a super- class.

For example, Saving and Current account types entities can be generalized and an entity
with name Account can be created, which covers both.
4.4.2. Specialization
Specialization is opposite to Generalization. It is a top-down approach in which one
higher level entity can be broken down into two lower level entities. In specialization, a higher
level entity may not have any lower-level entity sets, it's possible.

4.4.3. Aggregation
Aggregation is a process when relation between two entities is treated as a single entity.
4.12 Entity Relationship Model

In the diagram above, the relationship between Center and Course together, is acting as
an Entity, which is in relationship with another entity Visitor. Now in real world, if a Visitor or
a Student visits a Coaching Center, he/she will never enquire about the center only or just about
the course, rather he/she will ask enquire about both.

4.5. ER MODEL TO RELATIONAL MODEL

ER Model can be represented using ER Diagrams which is a great way of designing and
representing the database design in more of a flow chart form. It is very convenient to design
the database using the ER Model by creating an ER diagram and later on converting it into
relational model to design your tables. Not all the ER Model constraints and components can
be directly transformed into relational model, but an approximate schema can be derived. ER
diagrams are converted into relational model schema, hence creating tables in RDBMS.

Entity becomes Table

Entity in ER Model is changed into tables, or we can say for every Entity in ER model,
a table is created in Relational Model, the attributes of the Entity gets converted to columns of
the table, the primary key specified for the entity in the ER model, will become the primary key
for the table in relational model.

For example, for the below ER Diagram in ER Model,

A table with name Student will be created in relational model, which will have 4
columns, id, name, age, address and id will be the primary key for this table.

Table: Student

id name age address


Database Management Systems 4.13

Relationship becomes a Relationship Table

In ER diagram, we use diamond/rhombus to represent a relationship between two


entities. In Relational model we create a relationship table for ER Model relationships too.

In the ER diagram below, we have two entities Teacher and Student with a relationship
between them.

As discussed above, entity gets mapped to table, hence we will create table for Teacher
and a table for Student with all the attributes converted into columns.

Now, an additional table will be created for the relationship, for example Student
Teacher or give it any name you like. This table will hold the primary key for both Student and
Teacher, in a tuple to describe the relationship, which teacher teaches which student.

If there are additional attributes related to this relationship, then they become the
columns for this table, like subject name. Also proper foreign key constraints must be set for all
the tables.
CHAPTER – V
NORMALIZATION

5.1. FUNCTIONAL DEPENDENCIES

Functional dependencies are constraints on the set of legal relations. A functional


dependency is a generalization of the notion of a key. If R is a relation with attributes X and Y,
a functional dependency between the attributes is represented as X→Y, which specifies Y is
functionally dependent on X. If X is repeated for n times, the corresponding Y value should be
same.

The formal definition of functional dependency is

A functional dependency  →  holds on R if and only if for any legal relations r(R),
whenever any two tuples t1 and t2 of r agree on the attributes , they also agree on the attributes
. That is,

t1[] = t2 []  t1[ ] = t2 [ ]

Consider r(A,B,C ) with the following instance of r.

A B C

1 4 2

1 5 1

3 7 2

1 5 5

On this instance, B → A holds. The reason is that B’s value 5 occurs in row 2 and row
4 and the corresponding A Value is 1 in both rows. But it is not the case in A → B, since A has
1 in three rows but the corresponding B value is not same (1 for row 1 and 5 for rows 2 and 5).
5.2 Normalization

Let us consider the student table contains the following data.

Rollno Name Subcode Mark

12345 Raghuveer CS3491 85

12346 Reshma CS3491 90

12345 Raghuveer CS3492 78

12346 Reshma CS3492 92

The Rollno 12345 is repeated two times (1st row & 3rd row) and the corresponding name
is Raghuveer in both rows. The same is true for Rollno 12346 also, which specifies Name is
functionally dependent on rollno i.e Rollno→Name.

Key:

For any relation R, the key is defined as an attribute or set of attributes that functionally
determines all the attributes of the relation.

Let’s consider the relation:

Movie(title, year, length, filmType, studioName)

There exist several functional dependencies in the Movie relation

Title, year → length `

Title, year → filmType

Title, year → studioName

Combining the above functional dependencies - Title, year → length filmType


studioName. These assertions make sense as the combination of attributes title and year form a
key for the movie relation.

Let us recall the types of keys - Super Key, Candidate Key and Primary Key what we
have discussed in Section 2.2.
Database Management Systems 5.3

Super Key: A super key is a set of one or more attributes to uniquely identify rows in a table.
It may have extraneous attributes. Extraneous attributes are additional attributes to a key.

Candidate key: The minimal set of attributes which uniquely identify a tuple is known as
candidate key. i.e without extraneous attributes.

Primary Key: Primary key uniquely identifies a tuple/row. Candidate key also identifies a
record uniquely, but a relation can have many candidate keys. Any one candidate key is chosen
as primary key and a relation can have only one primary key.

Let us see an example for the above keys. The employee details contain the following
information.

EmpId EmpName Address Designation

1001 Vignesh Gandhi Nagar Junior Programmer

1002 Eshwaran Rajaji Street Junior Programmer

1003 Vignesh Nehru Street Team Leader

1004 Sowmya Anna Nagar Manager

The EmpId determines a unique row and the composite attribute (EmpName, Address)
also identifies a unique row. Hence these are the two candidate keys in this relation. The user
can choose any one of the candidate keys as primary key -for example EmpId. The super key is
may have extraneous attributes in addition to candidate key. Some of the super keys are EmpId,
(EmpID, EmpName), (EmpID, EmpDesignation ) (EmpID, EmpName, Address),

Trivial FD: If a functional dependency (FD) X → Y holds, where Y is a subset of X, then it is


called a trivial Functional Dependency.

Example: ID, name ID

Partial dependencies: Consider a relation has more than one field as primary key. A subset of
non-key fields may depend on only one of the key fields and not on the entire primary key.
Such dependencies are called partial dependencies.
5.4 Normalization

Transitive dependencies: In a relation, there may be dependencies among non-key field


transitively (A → B and B→ C). Such dependencies are called transitive dependencies.
Multi-valued dependencies:
If X→ Y and X→ Z and set of values of Y is independent of set of values of Z, then multi-
valued dependencies exists. It is denoted by X →→ Y/Z.
5.1.1. Closure of Attribute Sets
 An attribute  is functionally determined by α if α → .
 Set of all attributes functional determined by the given attribute set is called as closure
of attribute set.
 Closure of the attribute set is used to identify key.
The algorithm to find the closure of the attribute set is
result := α;
while (changes to result) do
for each  →  in F do
begin
if   result then result := result  
end
Example 1: Let’s consider a relation R with attributes A, B, C, D, E and F. Suppose that this
relation satisfies the FD’s:
AB → C, BC→ AD, D → E, CF→ B.

1. What is (AB)+?

Iterations:

(AB)+= { } Initially make (AB)+ is empty

(AB)+ = {A,B} Include A,B since we are finding (AB)+

(AB)+ = {A,B,C} Include C since AB C

(AB)+ = {A,B,C,D} Include D since BC AD (A is already present)

(AB)+ = {A,B,C,D,E} Include E since D E. No more changes to (AB)+

Note: The FD CF B cannot be used since F does not exist in (AB)+.


Database Management Systems 5.5

Example 2:

Find the key for the previous problem.

Given the functional dependencies

F={AB→C, BC→ AD, D→ E,CF→ B}

(AB)+ = {ABCDE} // AB is not a key since F is missing in (AB)+

(BC)+ = {BCADE} // BC is not a key since F is missing in (BC)+

(D)+ = {DE} // D is not a key since A,B,C,F are missing in (D)+

(CF)+ = {CFB} // CFB is not a key since A,D,E are missing in (CF)+

If ABF is considered to be the key, then (ABF)+ = {ABFCDE} (It includes all the attributes
of the relation.). So, ABF is the key.

Similarly BCF is also a key since (BCF)+ = {BCFADE}

Example 3:

Consider a relation R(A,B,C,D,E) with the following functional dependencies.

FD1 : A→ BC

FD2 : C →B

FD3 : D → E

FD4 : E → D

Identify the candidate keys for the relation R.

Solution:

{A}+ = {A, B, C}

{B}+ = {B}

{C}+ = {C, B }

{D}+ = {D, E}

{E}+ = {E,D}
5.6 Normalization

None of the closure set of A,B,C,D.E is a key since all the resultant set does not include all
the attributes. Hence the single attribute A or B or C or D or E is not key. Hence, we need
to combine two or more attributes to determine the candidate keys. Let us check with
(AD) and (AE).

{A, D}+ = {A, B, C, D, E}

{A, E}+ = {A, B, C, D, E}

(AD) + and (AE) + includes all the attributes in the relation. Hence AD and AE are
candidate keys of Relation R. (Note: We can choose any one as Primary Key)

5.1.2. Closure of a Set of Functional Dependencies

Let F be a set of functional dependencies. The closure of F, denoted by F+ is the set of


all functional dependencies logically implied by F. If F were large, this process would be
lengthy and difficult. Axioms or rules of inference provide a simpler technique for reasoning
about functional dependencies.

Three rules to find logically implied functional dependencies:

• By applying these rules repeatedly, we can find all of F+, given F.

• This collection of rules is called Armstrong’s axioms.

o Reflexive rule: if    then  → 

o Augmentation rule: if  →  then   →  

o Transitivity rule: if  →  and  →  then  → 

• Armstrong’s axioms are sound, because they do not generate any incorrect functional
dependencies.

• They are complete, because, for a given set F of functional dependencies, they allow us
to generate all F+.

Additional Rules:

• Union rule: If  →  holds and  →  holds, then  →   holds.

• Decomposition rule: If  →   holds, then  →  holds and  →  holds.

• Pseudo transitivity rule: If  →  holds and   →  holds, then   →  holds.


Database Management Systems 5.7

The algorithm to find the closure of functional dependency is as follows.


F+=F
repeat
for each functional dependency f in F+
apply reflexivity and augmentation rules on f
add the resulting functional dependencies to F +
for each pair of functional dependencies f1and f2 in F +
if f1 and f2 can be combined using transitivity
then add the resulting functional dependency to F +
until F + does not change any further

Example 1:
Let R = (A, B, C, G, H, I)
F={ A→B A→C CG → H CG → I B → H}
Some members of F+
A→H by transitivity from A → B and B → H
AG → I by augmenting A → C with G, to get AG → CG and
then transitivity with CG → I
CG → HI by augmenting CG → I with CG to infer CG → CGI, and augmenting of CG →
H with I to infer CGI → HI, and then transitivity.
Example 2:
Suppose, R is a relation with attributes (A, B, C, D, E, F) and with the identified set F of
functional dependencies as follows;
F = { A → B, A → C, CD → E, B → E, CD → F }
Find the closure of Functional Dependency F+.
The closure of functional dependency (F+) includes F and new functional dependencies inferred
using the algorithm mentioned above.
1. A → E is logically implied. From our F we have two FDs A → B and B → E. By
applying Transitivity rule, we could infer A → E..
2. A → BC is logically implied. It can be inferred from the FDs A → B and A → C
using Union rule.
5.8 Normalization

3. CD → EF is logically implied by FDs CD → E and CD → F using Union rule.


4. AD → F is logically implied by FDs A → C and CD → F using Pseudotransitivity
rule.
Hence F+ = { A → B, A → C, CD → E, B → E, CD → F, A → E, A → BC, CD → EF,
AD → F }
5.2. NON LOSS DECOMPOSITION
Let R be a relation schema and let F be a set of functional dependencies on R. Let R is
decomposed into two relations R1 and R2. The decomposition is a non loss decomposition if
there is no loss of information by replacing R with two relation schemas R1 and R2. If natural
join is performed on R1 and R2, then we should get back exactly R. A decomposition that is not
a lossless decomposition is called a lossy decomposition.
For example, consider the relation shown in the Figure 5.1- employee(ID, name, street,
city, salary) is decomposed into two relations, first relation employee1 (ID, name) and the
second relation employee2 (name, street, city, salary). When natural join is performed on the
two decomposed relations, there are additionally two records with name ‘Kim’. The flaw in this
decomposition arises because that the enterprise has two employees with the same name Kim.
This extra tuples are called as spurious tuples.

Fig. 5.1 Loss of information via a bad decomposition


Database Management Systems 5.9

The relations R1 and R2 form a lossless decomposition of R if at least one of the


following functional dependencies is in F+:

R1 ∩ R2 → R1

R1 ∩ R2 → R2

If R1 ∩ R2 forms a superkey of either R1 or R2, the decomposition of R is a lossless


decomposition.

Example:

Let us consider the relation inst_dept (ID, name, salary, dept name, building, budget) is
decomposed into instructor (ID, name, dept name, salary) and department (deptname, building,
budget)

 The intersection of these two schemas is deptname.

 The deptname→ deptname, building, budget.

 The lossless-decomposition rule is satisfied and hence the decomposition is loss less.

5.3. NORMALIZATION

Normalization is a step by step process in which a complex relation is decomposed into


simpler relations. The desirable properties of decomposition are Dependency preservation and
Lossless Join. Relational database design requires that we find a “good” collection of relation
schemas. A bad design may lead to

• Repetition of Information.

• Inability to represent certain information.

5.3.1 Problems of redundant data in a relation

Redundancies in relations lead to a variety of data anomalies. Data anomalies are


divided into three general categories: insertion, deletion and update anomalies. They are named
respectively after the relational operations of Insert, Delete, and Update because it is during the
application of these operations that a relation may experience anomalies.

Example: Consider the following table which contains the exam results of students. Assume
that each student can enrol in many courses. Mathews has enrolled in two courses C1 & C2 and
Kiran has enrolled in only one course.
5.10 Normalization

StudId StudName CourseId CourseName Mark

S101 Mathews C1 JAVA 95

S102 Kiran C2 Python 87

S101 Mathews C2 Python 89

The functional dependencies are


StudId → StudName
CourseId → CourseName
StudId, CourseId → Mark
The primary key for the above relation is (StudId, CourseId) .
i. Insert Anomalies
▪ Student information cannot be inserted until he/she join any course.
▪ Similarly the information about the course cannot be inserted until a student enrolls
in that course
▪ These anomalies occur because StudId, CourseId is the composite primary key and
we cannot insert null in any of these two attributes for a record.
ii. Update Anomalies
▪ This relation is also susceptible to update anomalies because the details about the
course C2 is repeated two times since two students have registered.
▪ For example, if the course name has to be modified, it has to be done in both places.
But, we may modify in one place and may not modify in another place and end up
with an inconsistent database.
iii. Delete Anomalies
▪ This relation experiences deletion anomalies.
▪ If a student has enrolled in only one course and if he withdraws her/his registration,
then that record should be deleted. In that case details about the student will also be
deleted. For example, if Kiran withdraws the C2 course registration, his information
will also be removed.
Database Management Systems 5.11

5.3.2. FIRST NORMAL FORM

A relation is said to be in 1 NF, if it has no repeating groups and data must be atomic.
• A set of names is an example of a non-atomic value. For example, if the schema of a
relation employee included attribute children whose domain elements are sets of names,
the schema would not be in first normal form.
• Composite attributes, such as an attribute address with component attributes street, city,
state, and zip also have non atomic domains.
• Let us consider a scenario in which a student can register in more than one course.
Sample data is shown below.
ID Name Courses
-------------------------------------------
1 Banu c1, c2
2 Elamaran c3
3 Meena c2, c3
The above relation is not in 1 NF, since two values (multi-value) cannot be stored for
Course attribute. The tuples have to be stored as follows and now the relation is in 1NF.
ID Name Course
--------------------------
1 Banu c1
1 Banu c2
2 Elamaran c3
3 Meena c2
3 Meena c3

5.3.3. SECOND NORMAL FORM

A relation is said to be in 2 NF, if it is already in 1 NF and it has no partial dependencies.

• It focuses on partial functional dependency, prime and non-prime attributes.

• Prime attribute - An attribute is a part of the candidate key.

• Non-prime attribute: An attribute which is not a part of the prime key.


5.12 Normalization

Eg. Let R={A,B,C,D} with candidate key AB. Then the prime attributes are A and B
and nonprime attributes are C and D.

• Partial Functional dependency - A non-prime attribute is functionally dependent only


on a part of a candidate key and not on the entire key.

Let us consider a relation R(A,B,C,D) with functional dependencies {AB→CD,A→C}.


AB is a candidate key because (AB)+={ABCD}={R}. {A,B} are prime attributes and {C,D}
are non-prime attribute. In A→C, the non-prime attribute C is dependent upon A which is a part
of candidate key AB. Due to A→C, we get a partial functional dependency.

Example:
Given details in the ExamResult table are:
• sid
• sname
• cid
• cname
• mark
The Functional dependencies are
FD1: sid→ sname ---partial dependency
FD2: cid → cname ---partial dependency
FD3: sid cid → mark --- Full dependency

The key for the relation is (sid, cid). The functional dependency FD3 is only full
dependency, since mark depends fully on the key sid,cid.

The functional dependency 1 is partial, since sname depends only on sid and not on cid.
Similarly, the functional dependency 2 is also partial, since cname depends only on cid and not
on cid.

So, the table is decomposed as

R1(sid, sname)

R2(cid,cname)

R3(sid, cid, mark)

Now the relation is in 2 NF.


Database Management Systems 5.13

5.3.4. THIRD NORMAL FORM

A relation is said to be in 3 NF, if it is already in 2 NF and it has no transitive


dependencies.

• It focuses on transitive dependency.

Transitive dependency

1. A functional dependency is said to be transitive if it is indirectly formed by two


functional dependencies. Example: X→Z is a transitive dependency if the following
functional dependencies hold true: X→Y and Y→Z.

2. A table is in 3NF if it is in 2NF and for each functional dependency, X→Y at least
one of the following conditions hold.

• X is a super key of the table.

• Y is a prime attribute of table.

Example:

Consider the invoice of a bookshop:

INVOICE

Customer no:

Customer name:

Address:

Isbn Title Author City Zip Qty Price Amt

The fields in the relations are given by

Invoice ( cus_no,name,addr,(isbn,title,author,city,zip,qty,price))
5.14 Normalization

1 NF:

customer (cus_no,name,addr)

customer book(cus_no,isbn, title,author,city,zip,qty,price)

2 NF:

Here, the key is (cus_no, isbn). Qty attribute depends both on the key attributes, but the
other attributes such as title,author,city,zip,price depends only on isbn. This indicates
partial dependency. So the table is decomposed as follows:

customer (cus_no,name,addr) - R1

sales (cus_no,isbn,qty) - R2

book (isbn,title,author,city,zip,price) - R3

3 NF:

Here, there is a no transitivity dependency in relations R1 and R2 and hence R1 and R2


are in 3NF. But there is transitive dependency in R3.

isbn → zipcode zipcode → city

So the relation R3 is not in 3NF and decomposed as follows:

zip (zipcode,city) - R31

book (isbn,title,author,zipcode,price) - R32

Now there are 4 relations after decomposition and all the relations satisfy 3NF.

customer (cus_no,name.addr)

sales (cus_no,isbn,qty)

zip (zipcode,city)

book (isbn,title,author,zipcode,price)

5.4. DEPENDENCY PRESERVATION

Let F be a set of functional dependencies on a schema R, and let R1, R2, ..., Rn be the
decomposition of R. The restriction of F to Ri is the set Fi of all functional dependencies in F+
that include only attributes of Ri .
Database Management Systems 5.15

Since all functional dependencies in a restriction involve attributes of only one relation
schema, it is possible to test such a dependency for satisfaction by checking only one relation.

Let F’ = F1 ∪ F2 ∪ ・ ・ ・ ∪ Fn. A decomposition having the property F’+ = F+ is a


dependency-preserving decomposition. The input is a set D = {R1, R2, . . . , Rn} of decomposed
relation schemas, and a set F of functional dependencies. This algorithm is expensive since it
requires computation of F+. If each member of F can be tested on one of the relations of the
decomposition, then the decomposition is dependency preserving. This is an easy way to show
dependency preservation. The algorithm to test for dependency preservation is as follows:

Compute F+;

For each schema Ri in D do

Begin

Fi = the restriction of F+ to Ri;

End

F’= {}

For each restriction Fi do

Begin

F’=F’UFi

End

Compute F’+;

If(F’+ = F+) then return (true)

else return(false);

Example1:

Consider R(A, B, C) with functional dependency F = {A → B, B → C} is decomposed into


R1(A,B) and R2(B, C). Find whether the decomposition is dependency preserving.

Step 1:

Find F+.

F+ = {A → B, B → C, A → C, A → BC}
5.16 Normalization

Step 2:

Find the restriction of F for the decomposed relations R1 and R2. (Functional
Dependencies applicable to R1 and R2 separately).

F1= {A → B} holds for R1.

F2 = {B → C}holds for R2.

Step 3:

Combine the restrictions F1 and F2

F’ = {A → B, B → C}

Step 4:

Find the closure for F’

F’+ = {A → B, B → C, A → C, A → BC}

Step 5;

Check whether F+ and F’+ are same. If both are same, then the decomposition is
dependency preserving.

F’+ = F+ (Step 1 and Step results) are same.

Therefore, decoposition is dependency preserving.

5.5. BOYCE – CODD NORMAL FORM

A relation is said to be in BCNF, if it is already in 3 NF and every determinant is a


candidate key. It is a stronger version of 3 NF.

A relation schema R is in BCNF with respect to a set F of functional dependencies if,


for all functional dependencies in F+ of the form α→β , where α ⊆ R and β⊆ R at least one of
the following holds:

 α →β is a trivial functional dependency (that is, β ⊆ α ).

 α is a super key for schema R.

Let R be a schema that is not in BCNF. Then there is at least one nontrivial functional
dependency α→β such that is not a super key for R.
Database Management Systems 5.17

 Replace R in design with two schemas


 (α ∪ β )
 (R − (β − α ))
Let us consider the relation inst_dept (ID, name, salary, dept name, building, budget)
once again. The functional dependency dept name → budget holds on inst_dept, but dept name
is not a superkey, because a department may have a number of different instructors. The
decomposition of inst_dept into instructor (ID, name, dept name, salary) and department
(deptname, building, budget) is a better design.
All of the nontrivial functional dependencies that hold, such as: ID → name, dept name,
salary include ID on the left side of the arrow, and ID is a superkey (actually, in this case, the
primary key) for instructor. Thus, instructor is in BCNF. Similarly, the department schema is
in BCNF because all of the nontrivial functional dependencies that hold, such as: dept name →
building, budget
1

Difference between 3NF and BCNF –

S.NO 3NF BCNF

1. It concentrates on Primary Key It concentrates on Candidate Key.

Redundancy is high as compared to


2. 0% redundancy
BCNF

3. It may preserve all the dependencies It may not preserve the dependencies.

A dependency X → Y is allowed in 3NF


A dependency X → Y is allowed if X is
4. if X is a super key or Y is a part of some
a super key
key.

Example:

Consider a Relation R(A,B,C,D) with the following dependencies

FD1: A→BCD

FD2: BC→AD

FD3: D→B.
5.18 Normalization

Check whether R satisfies BCNF.

The keys are A and BC since (A)+ = {A,B,C,D} and (BC)+={BCAD}(Includes all the
attributes). The attribute D is not a key since (D)+ ={DB} where A and B are missing. There is
no partial dependency or transitive dependency. Hence the given relation satisfies 1 NF, 2NF
and 3NF.

In FD1 and FD2, A and BC are keys respectively. But in FD3, D is not a key. Hence
the relation R has to be decomposed. The FD3 D→B violates BCNF. Take the leftside attribute
D as α and right side attribute B as β .

The decomposition should be R1 with αβ and R2 with R- β

R(A,B,C,D)

R1(DB) R2(ADC)

5.6. MULTI-VALUED DEPENDENCIES AND FOURTH NORMAL FORM

5.6.1 Multi-Valued Dependencies

Functional dependencies rule out certain tuples from being in a relation. If A → B, then
we cannot have two tuples with the same A value but different B values. Multivalued
dependencies, on the other hand, do not rule out the existence of certain tuples. Instead, they
require that other tuples of a certain form be present in the relation

Let R be a relation schema and let   R and   R. The multivalued dependency

 →→ 

holds on R if in any legal relation r(R), for all pairs for tuples t1 and t2 in r such that t1[] = t2
[], there exist tuples t3 and t4 in r such that:

 = t2  = t3  = t4 


t3 = t1 
t3[R –  = t2[R – 
t4  = t2[
t4[R – ] = t1[R – 
Database Management Systems 5.19

To illustrate the difference between functional and multivalued dependencies, we consider the
schema shown in Figure 5.2.

Fig. 5.2 An example of redundancy in a relation on a BCNF Schema

Department name is repeated for each address of instructor (for example. he has two
addresses) and we must repeat the address of the instructor for each department in which he is
associated (for example. he works for two departments). This repetition is unnecessary, since
the relationship between an instructor and his address is independent of the relationship between
that instructor and a department.

In the above example, the instructor with ID 22222 is associated with the Physics
department and he has two houses. His department is associated with all his addresses.

Comparing the preceding example with our definition of multivalued dependency, we


see that we want the multivalued dependency: ID →→ street, city to hold.

As with functional dependencies, we shall use multivalued dependencies in two ways:

1. To test relations to determine whether they are legal under a given set of functional and
multivalued dependencies.

2. To specify constraints on the set of legal relations

5.6.2. Fourth Normal Form

A relation schema R is in 4NF with respect to a set D of functional and multivalued


dependencies if for all multivalued dependencies in D+ of the form  →→  where   R and
  R, at least one of the following hold:

•  →→  is trivial (i.e.,    or    = R)

•  is a super key for schema R

Note that the definition of 4NF differs from the definition of BCNF in only the use of
multivalued dependencies. Every 4NF schema is in BCNF.

Let r(R) be a relation schema, and let r1(R1),r2(R2),...,rn(Rn) be a decomposition of r(R).


To check if each relation schema ri in the decomposition is in 4NF, we need to find what
5.20 Normalization

multivalued dependencies hold on each ri . Recall that, for a set F of functional dependencies,
the restriction Fi of F to Ri is all functional dependencies in F+ that include only attributes of Ri.

The decomposition algorithm is as follows.

result: = {R};
done := false;
compute D+;
Let Di denote the restriction of D+ to Ri

while (not done)


if (there is a schema Ri in result that is not in 4NF) then
begin

let  →→  be a nontrivial multivalued dependency that holds


on Ri such that  → Ri is not in Di, and =;
result := (result - Ri) (Ri - )  ( );
end
else done:= true;

The analogy between 4NF and BCNF applies to the algorithm for decomposing a
schema into 4NF. It is identical to the BCNF decomposition algorithm, except that it uses
multivalued dependencies and uses the restriction of D+ to Ri .

Consider again the BCNF schema: R(ID, dept name, street, city) in which the
multivalued dependency “ID →→ street, city” holds. Even though this schema is in BCNF, the
design is not ideal, since we must repeat an instructor’s address information for each
department. We can use the given multivalued dependency to improve the database design, by
decomposing this schema into a fourth normal form decomposition.

If we apply the algorithm to Relation R(ID, dept name, street, city), then we find that
ID→→ dept name is a nontrivial multivalued dependency, and ID is not a superkey for the
schema. Following the algorithm, we replace it by two schemas: R1(ID,dept name),R2(ID,
street, city). This pair of schemas is now in 4NF, eliminates the redundancy we encountered
earlier.
Database Management Systems 5.21

5.7. JOIN DEPENDENCIES AND FIFTH NORMAL FORM

5.7.1 Join Dependency

A Join Dependency on a relation schema R, specifies a constraint on states, r of R that


every legal state r of R should have a lossless join decomposition into R1, R2,..., Rn. Join
dependency is a generalization of the idea of multivalued dependency.

Let R be a relation schema and R1, R2,..., Rn be the decomposition of R, R is said to


satisfy the join dependency (R1, R2,..., Rn), if and only if every legal instance r(R ) is equal to
join of its projections on R, R2,..., Rn.

The notation used for a join dependency on table is *(X, Y, Z) where X, Y … Z are
projections of T. Table T is said to satisfy the above join dependency, if it is equal to the join
of the projections X, Y, Z.

5.7.2. Fifth Normal Form

Fifth Normal form is

• otherwise called as Projection Normal Form.

• designed to remove redundancy in relational databases

A relation R is in 5NF if and only if every non-trivial join dependency in R is implied


by the candidate keys of R. A relation break up into two relations must satisfy lossless join
Property, which makes certain that no invalid or extra tuples of attributes are created when
relations are again joined together through a natural join.

Let us consider a Relation R(E_Name, Company, Product)

E_Name Company Product

Rohit TVR Computer

Shiva TMT Furniture

Anu APT Water Heater

Rani TVR Scanner


5.22 Normalization

Relation R is decomposed to R1, R2 and R3.

R1 ( E_Name, Company)

E_Name Company

Rohit TVR

Shiva TMT

Anu APT

Rani TVR

R2( E_Name, Product)

E_Name Product

Rohit Computer

Shiva Furniture

Anu Water Heater

Rani Scanner

R3(Company, Product)

Company Product

TVR Computer

TMT Furniture

APT Water Heater

TVR Scanner

If the natural join of all three tables yields the relation table R, the relation will be said
to have join dependency. Let us check whether R satisfies join dependency or not.
Database Management Systems 5.23

Step 1

Perform the natural join on R1 and R2 . The common field is E_Name.

E_Name Company Product

Rohit TVR Computer

Shiva TMT Furniture

Anu APT Water Heater

Rani TVR Scanner

Step 2

Perform the natural join of the above resultant table with R3. The common fields are
Company and Product.

E_Name Company Product

Rohit TVR Computer

Shiva TMT Furniture

Anu APT Water Heater

Rani TVR Scanner

In the above example, we get the same table R after performing the natural joins.
Therefore, the decomposition is lossless decomposition and all three decomposed relations R1,
R2 and R3 satisfies fifth normal form.
CHAPTER – VI
TRANSACTIONS

6.1. TRANSACTION CONCEPTS

6.1.1 Introduction

Transaction is collection of operations that form a single logical unit of work. For
example, transfer of money from one account to another is a transaction which consists of two
operations - withdrawal from one account and deposit in another account. The transaction
consists of all operations executed between the begin transaction and end transaction.

Transactions access data using two operations:

read(X) - transfers the data item X from the database to a variable, also called X, in a buffer in
main memory belonging to the transaction that executed the read operation.

write(X) - transfers the value in the variable X in the main-memory buffer of the transaction
that executed the write to the data item X in the database.

Transactions have to deal with two main issues:

• Hardware failures, Software failures and system crashes

• Concurrent execution of multiple transactions

6.1.2 ACID Properties

There are four important properties that a transaction should satisfy – Atomicity,
Consistency, Isolation and Durability.

• Atomicity

▪ If the transaction fails after step 3 and before step 6 due to power failure, then
Account A will have Rs.950, Account B will have Rs.2000 (Since B is not updated
due to failure. Rs.50 will be “lost” which leads to an inconsistent state.) At times,
the failure may be because of software or hardware.

▪ The system should ensure that updates of a partially executed transaction are not
reflected in the database. The updates should be complete- Do everything or Don’t
do anything.
6.2 Transactions

• Consistency

A transaction must preserve database consistency—if a transaction starts working with


a consistent database, the database should be consistent at the end of the transaction. But
during transaction execution the database may be in inconsistent state temporarily. In
the above example, the sum of A and B Rs.3000 is unchanged by the execution of the
transaction which is consistent.

• Isolation

When more than one transaction is executed concurrently, each transaction is unaware
that other transactions are being executed concurrently in the system. For every pair of
transactions Ti and Tj , it appears to Ti that either Tj finished execution before Ti started
or Tj started execution after Ti finished. The transactions must behave as if they are
executed in isolation. It means that the results of concurrent execution of transactions
should be the same as if the transactions are executed serially in some order.

• Durability

After the successful completion of the transaction (i.e., the transfer of the Rs50 from
Account A to Account B), the updates to the database by the transaction must persist
i.e., permanent even if there are software or hardware failures.

Example:

Let us consider the amount in Account A is Rs.1000 and B is Rs.2000. The following
transaction transfers Rs.50 from account A to account B. After successful completion
of the transaction, Account A will have Rs.950 and B will have Rs.2050.

1. read(A) // Reads Account A

2. A := A – 50 // Subtracts 50 from A

3. write(A) //Updates A’s Account with new A value

4. read(B) // Reads Account B

5. B := B + 50 // Adds 50 from A

6. write(B) //Updates B’s Account with new A value


Database Management Systems 6.3

6.1.3. States of a Transaction

A transaction may not always complete its execution successfully and it may fail due to
failure in hardware or software. Such a transaction is called as aborted transaction and an
aborted transaction should not have any effect on the state of the database. In order to maintain
atomicity property, any changes made by the aborted transaction must be undone i.e., rolled
back. A transaction that completes its execution successfully is said to be committed.

A transaction must be in any one of the following states:

• Active - the transaction stays in this initial state while it is executing.

• Partially committed - after the final statement has been executed but before committed.

• Failed - after the discovery that normal execution can no longer proceed.

• Aborted - after the transaction has been rolled back and the database has been restored
to its state prior to the start of the transaction.

• Committed - after successful completion.

The state diagram is represented as in the following Figure 6.1.

Fig. 6.1 State Diagram of a Transaction

A transaction is said to have terminated if it is in either committed or aborted state. A


transaction starts in the active state and when it completes the last statement, it enters the
partially committed state. At this point, the transaction has completed its execution and the
actual output resides in the main memory. There is a possibility that it may have to be aborted,
since a hardware failure may prevent its successful completion. When the database system
writes out the information to disk, then the transaction enters the committed state.
6.4 Transactions

A transaction enters the failed state when the system determines that the transaction
cannot proceed with normal execution because of hardware or logical errors. Such a transaction
must be rolled back and it enters the aborted state.

At this point, the system can restart the transaction [only hardware or software error] or
kill the transaction [only for internal logical error].

6.2. SCHEDULES

Schedule is a sequence of instructions that specify the chronological order in which the
instructions of are executed. Schedules are of two types: Serial Schedule and Concurrent
schedule. In a serial schedule, a transaction will be executed fully and then only the nest
transaction will start its execution. But in a concurrent schedule, there will be interleaving of
instructions i.e., a part of one transaction is executed, followed by some part of other transaction
and vice versa. The concurrent execution of transactions improves throughput and resource
utilization and reduces waiting time. A schedule for a set of transactions should contain all the
instructions of all those transactions and should preserve the order in which the instructions
appear in each individual transaction. A transaction that successfully completes its execution
will have commit as the last statement whereas a transaction that fails will have an abort
instruction.

Example:

Let T1 and T2 are two transactions in a schedule. Transaction T1 transfers Rs.50 from Account
A to Account B, and T2 transfers 10% of the balance from Account A to Account B.

The Schedule 1 is a serial schedule in which T1 is followed by T2 is given in Figure 6.2.

Fig. 6.2 Schedule 1 – a serial schedule in which T1 is followed by T2.


Database Management Systems 6.5

If the transactions are executed one at a time, T2 followed by T1, then the execution
sequence in the Schedule 2 is as shown below in the Figure 6.3.

Fig. 6.3 Schedule 2 – a serial schedule in which T2 is followed by T1.

The following Schedule 3 is concurrent schedule (Figure 6.4), in which the execution
of statements in T1 and T2 are interleaved. The first 3 statements in T1 are executed first, then
four statements in T2 followed by four statements in T1 and remaining three from T2.

Fig. 6.4 Schedule 3 – A concurrent Schedule


6.6 Transactions

The sum A + B is preserved in serial Schedules 1, 2 as well as in the concurrent Schedule


3, The concurrent Schedule 3 is equivalent to Schedule 1.

All concurrent Schedules will not be equivalent to serial schedule. For example, the
following concurrent Schedule 4 in Figure 6.5 does not preserve the value of (A + B ). After
the execution of this schedule, the final values of accounts A and B are Rs.950 and Rs.2100
respectively. The sum of A and B is Rs.3050 which is inconsistent. Hence Schedule 4 is not
serialiazable.

Fig. 6.5 Schedule 4 – A concurrent Schedule resulting in inconsistent state.

6.3. SERIALIZABILITY

Serial execution of a set of transactions preserves database consistency. A concurrent


schedule is serializable if it is equivalent to a serial schedule. There are two types of schedule
equivalence.

1. Conflict serializability

2. View serializability

Let us consider only read and write instructions and ignore other instructions for
explaining the concept of serializability.
Database Management Systems 6.7

6.3.1. Conflict Serializability

Instructions li and lj of transactions Ti and Tj respectively, conflict if and only if some


date item Q is accessed by both li and lj, and at least one of these instructions wrote Q. The
below are the possible combinations of read and write operation by Instructions li and lj

1. li = read(Q), lj = read(Q). li and lj don’t conflict.


2. li = read(Q), lj = write(Q). li and lj conflict.
3. li = write(Q), lj = read(Q). li and lj conflict
4. li = write(Q), lj = write(Q). li and lj conflict

If a schedule S can be transformed into a schedule S’ by a series of swaps of non-


conflicting instructions, then S and S’ are conflict equivalent. A schedule S is conflict
serializable if it is conflict equivalent to a serial schedule

Example:

Schedule 3 in Figure 6.6 shows only read and write operations of concurrent schedule
in Figure 6.4. The write(A) instruction of T1 conflicts with the read(A) instruction of T2.
However, the write(A) instruction of T2 does not conflict with the read(B) instruction of T1,
because the two instructions access different data items.

Let I and J be consecutive instructions of a schedule S. If I and J are instructions of


different transactions and I and J do not conflict, then we can swap the order of I and J to
produce a new schedule S’. S is equivalent to S’, since all instructions appear in the same order
in both schedules except for I and J, whose order does not matter. Since the write(A) instruction
of T2 in schedule 3 of Figure 6.6 does not conflict with the read(B) instruction of T1, we can
swap these instructions. We continue to swap non conflicting instructions:

▪ Swap the read(B) instruction of T1 with the read(A) instruction of T2.

▪ Swap the write(B) instruction of T1 with the write(A) instruction of T2.

▪ Swap the write(B) instruction of T1 with the read(A) instruction of T2.

The final result of these swaps, schedule 5 of Figure 6.7, is a serial schedule. Note that
schedule 5 is exactly the same as schedule 1, but it shows only the read and write instructions.
Thus, we have shown that Schedule 3 is equivalent to a serial schedule. If a schedule S can be
transformed into a schedule S’ by a series of swaps of non conflicting instructions, we say that
S and S’ are conflict equivalent.
6.8 Transactions

Fig 6.6 Schedule 3 – showing only read and write instructions

Fig. 6.7 Schedule 5 – Serial Schedule equivalent to Schedule 3

The below schedule 6 in Figure 6.8 is not conflict serializable, because we cannot swap the
instructions Write(Q) in transaction T3 with Write(Q) in transaction T4 to obtain either the serial
schedule < T3, T4 >, or the serial schedule < T4, T3 >.

Fig. 6.8 Schedule 6

6.3.2. View serializability

Let S and S’ are the two schedules with the same set of transactions and Q be the
common data item present in the transactions. The Schedules S and S’ are said to be view
equivalent if the following three conditions are met, for each data item Q,

1. If transaction Ti reads the initial value of Q in schedule S, then in schedule S’ also


transaction Ti must read the initial value of Q.
Database Management Systems 6.9

2. If transaction Ti performs the final write(Q) operation in schedule S, then in schedule


S’ also transaction Ti must perform the final write(Q)

3. If transaction Ti executes read(Q) in schedule S and that value was produced by


transaction Tj , then in schedule S’ also transaction Ti must read the value of Q that was
produced by the same write(Q) operation of transaction Tj .

A schedule S is view serializable if it is view equivalent to a serial schedule and every


conflict serializable schedule is also view serializable. Below Schedule 7 in Figure 6.9 is view-
serializable but not conflict serializable to previous Schedule 6. The reason is that the initial
write on data item Q is performed by Transaction T28 and overwritten by T27 which is obsolete
write. Similarly Transaction T29 overwites Q which has been written by T27. In general, every
view serializable schedule that is not conflict serializable has blind writes.

Fig. 6.9 Schedule 7

The Schedule 8 shown in Figure 6.10 produces the same result as that of the serial
schedule < T1, T5 >, even though it is not conflict equivalent or view equivalent. Determining
such type of equivalence requires analysis of all other operations in addition to read and write.

Fig. 6.10 Schedule 8


6.10 Transactions

6.3.3. Testing for Conflict Serializability

Precedence graph determines whether a schedule is conflict serializable or not.


Precedence graph is a direct graph in which the vertices are the transaction names and an arc is
to be drawn from Ti to Tj if the two transaction conflict, and Ti accessed the data item on which
the conflict arose earlier. Consider a schedule with set of transactions T1, T2, ..., Tn. A schedule
is conflict serializable if and only if its precedence graph is acyclic. Cycle-detection algorithms
take order n2 time, where n is the number of vertices in the graph i,e., number of transactions in
the schedule. The following graph in Figure 6.11 is cyclic which represents T1 and T2 are not
conflict equivalent.

Fig. 6.11 Precedence Graph for Schedule 4 in Fig. 6.5

If a precedence graph is acyclic, then the serializability order can be obtained using
topological sorting of the graph. This is a linear order consistent with the partial order of the
graph. In the below Figure 6.12, b and c are the two ways of topological ordering of figure a.

Fig. 6.12 Illustration of Topological Sorting


Database Management Systems 6.11

6.3.4. Test for View Serializability

The precedence graph used to determine conflict serializability cannot be used directly
to test for view serializability because extension to test for view serializability has cost
exponential in the size of the precedence graph. The problem of determining whether a schedule
is view serializable or not falls in the class of NP-complete problems. However there exist some
algorithms that just check some sufficient conditions for view serializability.

6.4. TRANSACTION SUPPORT IN SQL

• In SQL, a transaction begins implicitly.

• A transaction in SQL ends by Commit or Rollback

▪ Commit - commits current transaction and begins a new one.

▪ Rollback - causes current transaction to abort.

• If any transaction executes successfully, then in almost all database systems, by default
every SQL statement commits implicitly. Implicit commit can be turned off by a
database directive

▪ E.g - In JDBC - connection.setAutoCommit(false);

• Isolation level can be set at database level and it can be set at start of transaction

▪ E.g. In SQL - set transaction isolation level serializable


CHAPTER – VII
CONCURRENCY CONTROL

7.1. NEED FOR CONCURRENCY


Transactions can be executed in serial order one after other or in current manner. The
time taken to complete the transactions serially is more than when transactions are done
concurrently. Concurrency control schemes are the mechanisms to achieve isolation and
maintain consistency. A database must ensure that all possible schedules are either conflict or
view serializable, and recoverable and cascadeless. The goal of concurrency control mechanism
is to develop concurrency control protocols that will assure serializability.
The advantages of concurrent execution are:
• Increased processor and disk utilization - leads to better transaction throughput
E.g., one transaction may use the CPU while other may read from / or write to the disk
• Reduced average response time - short transactions need not wait behind long ones.
7.2. LOCK-BASED PROTOCOLS
A lock is a mechanism to control concurrent access to a data item (read or write
operation). Lock requests are made to concurrency-control manager and concurrency-control
manager decides whether the requests can be granted or denied. The transactions can proceed
only if the request is granted.
There are two modes in which data items can be locked:
1. exclusive (X) mode. Data item can be both read as well as written. X-lock is requested
using lock-X instruction.
2. shared (S) mode. Data item can only be read. S-lock is requested using lock-S
instruction.

Fig. 7.1 Lock Compatibility Matrix


7.2 Concurrency Control

• The above table in Figure 7.1 is lock-compatibility matrix. The concurrency-control


manager grants a lock on an item to a transaction iff the requested lock is compatible
with locks already held on the item by other transactions.

• Shared locks can be granted to any number of transactions on an item,

• But if any transaction holds an exclusive lock on an item, no other transactions are
permitted to hold any lock (Both shared and Exclusive) on the item.

Example:

The Transaction T1 transfers Rs.50 from Account B to Account A and the Transaction T2
displays the sum of Account A and Account B.

T1:lock-X(B);
read(B);
B := B − 50;
write(B);
unlock(B);
lock-X(A);
read(A);
A := A + 50;
write(A);
unlock(A)
T2:lock-S(A);
read (A);
unlock(A);
lock-S(B);
read (B);
unlock(B);
display(A+B)

A locking protocol is a set of rules followed by all transactions while requesting and
releasing locks. The Locking protocols enforce serializability. The following schedule shows
Database Management Systems 7.3

the requests made by the Transactions T1 and T2 to lock/unlock the data items A and B and the
permission granted by the concurrency-control manager.

Suppose that the amount in Accounts A and B are Rs.100 and Rs.200, respectively. If
these two transactions are executed serially, either in the order T1, T2 or the order T2, T1, then
transaction T2 will display the value $300. However, if these transactions are executed
concurrently, then schedule 1 in Figure 7.2 is possible. In this case, transaction T2 displays
Rs.250, which is incorrect. The reason for this mistake is that the transaction T1 unlocked data
item B too early, as a result of which T2 saw an inconsistent state.

Fig. 7.2 Schedule

The schedule shows the actions executed by the transactions, as well as the points at
which the concurrency-control manager grants the locks. The transaction making a lock request
cannot execute its next action until the concurrency control manager grants the lock.

Suppose now that unlocking is delayed to the end of the transaction. Transaction T3
corresponds to T1 with unlocking delayed and Transaction T4 corresponds to T2 with unlocking
delayed. The sequence of reads and writes in schedule 1, which lead to an incorrect total of
Rs.250 is no longer possible with T3.

T3:lock-X(B);
read(B);
B := B − 50;
write(B);
7.4 Concurrency Control

lock-X(A);
read(A);
A := A + 50;
write(A);
unlock(B);
unlock(A).
T4:lock-S(A);
read(A);
lock-S(B);
read(B);
display(A + B);
unlock(A);
unlock(B).

Unfortunately, locking can lead to an undesirable situation. Consider the partial


schedule of Figure 7.3 for T3 and T4. Since T3 is holding an exclusive mode lock on B and T4
is requesting a shared-mode lock on B, T4 is waiting for T3 to unlock B. Similarly, since T4 is
holding a shared-mode lock on A and T3 is requesting an exclusive-mode lock on A, T3 is
waiting for T4 to unlock A. Thus, we have arrived at a state where neither of these transactions
can ever proceed with its normal execution. This situation is called deadlock.

Fig. 7.3 Schedule

When deadlock occurs, the system must roll back one of the two transactions. Once a
transaction has been rolled back, the data items that were locked by that transaction are
unlocked. These data items are then available to the other transaction, which can continue with
its execution
Database Management Systems 7.5

When a transaction Ti requests a lock on a data item Q in a particular mode M, the


concurrency-control manager grants the lock provided that:

1. There is no other transaction holding a lock on Q in a mode that conflicts with M.

2. There is no other transaction that is waiting for a lock on Q and that made its lock request
before Ti

7.3. TWO PHASE LOCKING PROTOCOL

Two Phase locking protocol is a protocol which ensures that the schedules are conflict
serializable. There are two phases in two phase locking protocol – Growing Phase and Shrinking
Phase.

Phase 1: Growing Phase

• Transaction may obtain locks

• Transaction may not release locks

Phase 2: Shrinking Phase

• Transaction may release locks

• Transaction may not obtain locks

Lock point is a point where a transaction has acquired its final lock in the growing phase.
Two-phase locking does not ensure that the schedule is free from deadlock. In order to ensure
recoverability and avoid cascading roll-backs, extension of basic two-phase locking namely
Strict two-phase locking is required.

In a Strict two-phase locking, a transaction must hold all its exclusive locks till it
commits/aborts. There is a variation of Strict two-phase locking, Rigorous two-phase locking
in which a transaction must hold all locks till commit/abort.

7.3.1. Lock Conversions

The locks can be converted from shared to exclusive and vice versa. In Growing Phase
of a Two-phase locking protocol, a transaction can acquire a lock-S or lock-X. Lock conversion
can also be made to convert a lock-S to a lock-X which is termed as upgrade. Similarly, in a
Shrinking Phase, a transaction can release a lock-S or lock-X or convert a lock-X to a lock-S
which is downgrade.
7.6 Concurrency Control

Consider the following two transactions T8 and T9, for only some of the significant read
and write operations are shown:

T8:
read(a1);
read(a2);
...
read(an);
write(a1).
T9:
read(a1);
read(a2);
display(a1 + a2).

If we employ the two-phase locking protocol, then T8 must lock a1 in exclusive mode.
Therefore, any concurrent execution of both transactions amounts to a serial execution.
However, that T8 needs an exclusive lock on a1 only at the end of its execution, when it writes
a1. Thus, if T8 could initially lock a1 in shared mode, and then could later change the lock to
exclusive mode, we could get more concurrency, since T8 and T9 could access a1 and a2
simultaneously. Figure 7.4 shows the schedule with a lock conversion.

Fig. 7.4 Incomplete schedule with a lock conversion

Appropriate lock and unlock instructions will be automatically generated on the basis
of read and write requests from the transaction:
Database Management Systems 7.7

• When a transaction Ti issues a read(Q) operation, the system issues a lockS(Q)


instruction followed by the read(Q) instruction.

• When Ti issues a write(Q) operation, the system checks to see whether Ti already holds
a shared lock on Q. If it does, then the system issues an upgrade(Q) instruction, followed
by the write(Q) instruction. Otherwise, the system issues a lock-X(Q) instruction,
followed by the write(Q) instruction.

• All locks obtained by a transaction are unlocked after that transaction commits or aborts.

7.3.2 Implementation of Locking

Transactions can send lock or unlock requests as messages to lock manager. The lock
manager decides whether to grant the lock or not based on the lock – compatibility. In case of
a deadlock, the lock manager may ask the transaction to roll back. The requesting transaction
should wait until it gets reply from the lock manager. The lock manager maintains an in-memory
data-structure called a lock table to record the details about the granted locks and pending
requests. A sample lock table is shown in the following Figure 7.5.

Fig. 7.5 Lock Table

Dark blue squares indicate granted locks whereas light blue colored ones indicate
waiting requests. New request is added to the end of the queue of requests for the data item, and
granted only if it is compatible with all earlier granted locks on the data items. When a
transactions sends request to unlock a data item, then the request will be deleted, and later
7.8 Concurrency Control

requests are checked to see if they can now be granted. If a transaction aborts, all waiting or
granted requests of the transaction are deleted. To implement this efficiently, the lock manager
keeps a list of locks held by each transaction.

7.4 TIMESTAMP-BASED PROTOCOLS

Each transaction Ti is issued a unique timestamp TS(Ti) when it enters the system.
Newer transactions are assigned with timestamps greater than earlier ones. Timestamp could be
based on a logical counter or wall-clock time. In timestamp-based protocols time-stamp order
is same as that of serializability order.

Timestamp-based protocols maintain two timestamps for each data item Q:

• W-timestamp(Q) – the largest time-stamp of any transaction that executed write(Q)


successfully.

• R-timestamp(Q) - the largest time-stamp of any transaction that executed read(Q)


successfully.

Timestamp-based protocols imposes set of rules on read and write operations to ensure
that any conflicting operations are executed in timestamp order since out of order operations
cause transaction rollback.

Suppose a transaction Ti issues a read(Q)

1. If TS(Ti) < W-timestamp(Q), then Ti needs to read a value of Q that was already
overwritten by some other transaction and hence, the read operation is rejected, and Ti
is rolled back.

2. If TS(Ti) W-timestamp(Q), then the read operation is executed, and R-timestamp(Q)


is set to max(R-timestamp(Q), TS(Ti)).

Suppose that transaction Ti issues write(Q).

1. If TS(Ti) < R-timestamp(Q), then the value of Q that Ti is producing was needed
previously and hence, the write operation is rejected, and Ti is rolled back.

2. If TS(Ti) < W-timestamp(Q), then Ti is attempting to write an obsolete value of Q. i.e


tries to overwrite Q value that has been written by some other transaction. Hence, this
write operation is rejected, and Ti is rolled back.

3. Otherwise, the write operation is executed, and W-timestamp(Q) is set to TS(Ti).


Database Management Systems 7.9

Example

To illustrate this protocol, we consider transactions T25 and T26.


Transaction T25 displays the contents of accounts A and B:
T25:
read(B);
read(A);
display(A + B).
Transaction T26 transfers $50 from account B to account A, and then displays the contents of
both:
T26:
read(B);
B := B − 50;
write(B);
read(A);
A := A + 50;
write(A);
display(A + B).

In presenting schedules under the timestamp protocol, we shall assume that a transaction is
assigned a timestamp immediately before its first instruction. Thus, in schedule 3 of Figure
15.17, TS(T25) < TS(T26), and the schedule is possible under the timestamp protocol.

Fig. 15.17 Schedule 3


7.10 Concurrency Control

Thomas’ Write Rule

Thomas’ Write Rule is same as the timestamp ordering protocol. There is no


modification in read rule but there is a small change in step 2 of write operation. In step 2, when
Ti attempts to write an obsolete value of {Q}, the write operation can be ignored instead of
rolling back Ti. Thomas' Write Rule allows greater potential concurrency and allows some
view-serializable schedules that are not conflict-serializable. The read rule is same as in Step 1
of Time Stamp ordered protocol.

Suppose that transaction Ti issues write(Q).

1. If TS(Ti) < R-timestamp(Q), then the value of Q that Ti is producing was previously
needed, and it had been assumed that the value would never be produced. Hence, the
system rejects the write operation and rolls Ti back.

2. If TS(Ti) < W-timestamp(Q), then Ti is attempting to write an obsolete value of Q.


Hence, this write operation can be ignored. 3. Otherwise, the system executes the write
operation and sets W-timestamp(Q) to TS(Ti)

7.5 MULTIVERSION

• Multiversion schemes keep old versions of data item to increase concurrency. Several
variants of multiversion schemes are:

o Multiversion Timestamp Ordering

o Multiversion Two-Phase Locking

o Snapshot isolation

• The Key ideas behind multiversion schemes are:

o Each successful write results in the creation of a new version of the data item
written.

o Uses timestamps to label versions.

o When a read(Q) operation is issued, returns the an appropriate version of Q based


on the timestamp of the transaction issuing the read request.

7.5.1 Multiversion Timestamp Ordering

Each data item Q has a sequence of versions <Q1, Q2,...., Qm>. Each version Qk contains
three data fields:
Database Management Systems 7.11

• Content - the value of version Qk.

• W-timestamp(Qk) - timestamp of the transaction that wrote version Qk

• R-timestamp(Qk) - largest timestamp of a transaction that successfully read version Qk

Suppose that transaction Ti issues a read(Q) or write(Q) operation. Let Qk denote the
version of Q whose write timestamp is the largest write timestamp <= TS(Ti).

1. If transaction Ti issues a read(Q), then

▪ the value returned is the content of version Qk

▪ If R-timestamp(Qk) < TS(Ti), set R-timestamp(Qk) = TS(Ti).

2. If transaction Ti issues a write(Q)

i. if TS(Ti) < R-timestamp(Qk), then transaction Ti is rolled back.

ii. if TS(Ti) = W-timestamp(Qk), the contents of Qk are overwritten

iii. Otherwise, a new version Qi of Q is created

• W-timestamp(Qi) and R-timestamp(Qi) are initialized to TS(Ti).

Multiversion Timestamp Ordering protocol guarantees serializability. In multiversion


timestamp ordering, reads always succeed and a write by Ti is rejected if some other transaction
Tj that should read Ti’s write, has already read a version created by a transaction older than Ti.

7.5.2 Multiversion Two-Phase Locking

The multiversion two-phase locking protocol attempts to combine the advantages of


multiversion concurrency control with the advantages of two-phase locking. This protocol
differentiates between read-only transactions and update transactions. Multiversion two-phase
locking also ensures that schedules are recoverable and cascadeless.

➢ Update transactions:

▪ Perform rigorous two-phase locking - hold all locks up to the end of the transaction.

▪ Serialized according to their commit order.

▪ Each version of a data item has a single timestamp - ts-counter .

▪ When an item is to be read, it gets a shared lock on the item, and reads the latest
version of that item.
7.12 Concurrency Control

▪ When an item is to be written, it gets an exclusive lock on the item, and then creates
a new version of the data item.

▪ When the transaction completes its actions, it carries out commit processing

➢ Read-only transactions:

▪ The database system assigns read-only transactions a timestamp by reading the


current value of ts-counter before they start execution

▪ Follow the multiversion timestamp-ordering protocol for performing reads.

▪ When a transaction Ti issues a read(Q), the value returned is the contents of the
version whose timestamp is the largest timestamp less than or equal to TS(Ti).

▪ The read-only transactions that start after Ti increments ts-counter will see the
values updated by Ti , whereas those that start before Ti increments ts-counter will
see the value before the updates by Ti . In either case, read-only transactions never
need to wait for locks.

Versions are deleted in a manner like that of multiversion timestamp ordering. Suppose
there are two versions, Qk and Qj , of a data item, and that both versions have a
timestamp less than or equal to the timestamp of the oldest read-only transaction in the
system. Then, the older of the two versions Qk and Qj will not be used again and can be
deleted.

7.6. SNAPSHOT ISOLATION

In Snapshot isolation, a transaction is given a snapshot of the database at the time when
it begins its execution. It then operates on that snapshot in complete isolation from other
concurrent transactions. The data values in the snapshot consist only of values written by
committed transactions. This isolation is ideal for read-only transactions since they never wait
and are never aborted by the concurrency manager.

Deciding whether or not to allow an update transaction to commit requires some care.
Two transactions running concurrently might both update the same data item. Since these two
transactions operate in isolation using their own private snapshots, neither transaction sees the
update made by the other. If both transactions are allowed to write to the database, the first
update written will be overwritten by the second. The result is a lost update which can be
prevented by two variants of snapshot isolation - first committer wins and first updater wins.

Under first committer wins, when a transaction T enters the partially committed state,
the following actions are taken:
Database Management Systems 7.13

• A test is made to see if any transaction that was concurrent with T has already written
an update to the database for some data item that T intends to write.

• If some such transaction is found, then T aborts.

• If no such transaction is found, then T commits and its updates are written to the
database.

This approach is called first committer wins because if transactions conflict, the first
one to be tested using the above rule succeeds in writing its updates, while the subsequent ones
are forced to abort.

Under first updater wins the system uses a locking mechanism that applies only to
updates. When a transaction Ti attempts to update a data item, it requests a write lock on that
data item. If the lock is not held by a concurrent transaction, the following steps are taken after
the lock is acquired:

• If the item has been updated by any concurrent transaction, then Ti aborts.

• Otherwise Ti may proceed with its execution including possibly committing.

If, however, some other concurrent transaction Tj already holds a write lock on that data
item, then Ti cannot proceed and the following rules are followed:

• Ti waits until Tj aborts or commits.

• If Tj aborts, then the lock is released and Ti can obtain the lock. After the lock is
acquired, the check for an update by a concurrent transaction is performed as described
earlier: Ti aborts if a concurrent transaction had updated the data item, and proceeds
with its execution otherwise.

• If Tj commits, then Ti must abort. Locks are released when the transaction commits or
aborts.

This approach is called first updater wins because if transactions conflict, the first one
to obtain the lock is the one that is permitted to commit and perform its update while the
subsequent ones are to be aborted.

7.7 VALIDATION BASED PROTOCOL

Validation based protocol, also called as optimistic concurrency control commits the
transactions in serialization order, In order to do so, the validation protocol:

• Keep track of data items read/written by transaction


7.14 Concurrency Control

• detect any out-of-serialization order reads/writes by performing validation at commit


time

• writes at the end of transaction

The execution of transaction Ti is done in three phases.

1. Read and execution phase: Transaction Ti writes only to temporary local variables and
not to database.

2. Validation phase: Transaction Ti performs validation to determine if local variables can


be written without violating serializability.

3. Write phase: If there is no violation found in validation phase, updates are done in the
database; otherwise, Ti is rolled back.

In a current schedule, the three phases of transactions can be interleaved, but each
transaction must go through the three phases in the above order. For simplicity we can assume
that the validation and write phase occur together.

Each transaction Ti has 3 timestamps

• StartTS(Ti) : the time when Ti started its execution

• ValidationTS(Ti): the time when Ti entered its validation phase

• FinishTS(Ti) : the time when Ti finished its write phase

Validation tests use above timestamps and read/write sets to ensure that serializability
order is determined by validation time, TS(Ti) = ValidationTS(Ti). For all Ti with TS (Ti) < TS
(Tj) if any one of the following condition holds:

• finishTS(Ti) < startTS(Tj)

• startTS(Tj) < finishTS(Ti) < validationTS(Tj)

and Tj does not read any data item written by Ti, then the validation succeeds and Tj can be
committed. Otherwise, validation fails and Tj is aborted. If probability of conflicts is low, then
the Validation-based protocol provides greater degree of concurrency when compared with
locking or Time Stamp Ordered protocol. Example of schedule produced using validation is
given in Figure 7.6.
Database Management Systems 7.15

Fig. 7.6 Schedule Using Validation

7.8 MULTIPLE GRANULARITY LOCKING

Multiple Granularity locking allows data items to be of various sizes and define a
hierarchy of data granularities, where the small granularities are nested within larger ones. It
can be represented as a tree and when a transaction locks a node in the tree explicitly, it
implicitly locks all the node's descendants which are in the same mode. The locks have to be
acquired from root to leaf, whereas they have to be released from leaf to root. If there are too
many locks at a particular level, then switch to higher granularity S or X lock, termed as lock
granularity escalation.
There are two types of granularity of locking which specifies the level in the tree where
locking is done:
• Fine granularity (lower in tree)
▪ high concurrency
▪ high locking overhead
• Coarse granularity (higher in tree)
▪ low concurrency
▪ low locking overhead
Example of Granularity Hierarchy is given in Figure 7.7. The levels, starting from the
coarsest (top) level are
• database
• area
• file
• record
7.16 Concurrency Control

Fig. 7.7 Granularity Hierarchy

In addition to S and X lock modes, there are three additional lock modes with multiple
granularity:

• intention-shared (IS): indicates explicit locking at a lower level of the tree but only
with shared locks.

• intention-exclusive (IX): indicates explicit locking at a lower level with exclusive or


shared locks

• shared and intention-exclusive (SIX): the subtree rooted by that node is locked
explicitly in shared mode and explicit locking is being done at a lower level with
exclusive-mode locks.

Intention locks allow a higher level node to be locked in S or X mode without having to
check all descendent nodes.

The compatibility matrix for all lock modes is shown in Figure 7.8:

Fig. 7.8 Lock Compatibility Matrix including Intention mode


Database Management Systems 7.17

Transaction Ti can lock a node Q, using the following rules:


1. Locking can be done based on the lock compatibility matrix.
2. The root of the tree must be locked first in any mode.
3. A node Q can be locked by Ti in S or IS mode if and only if the parent of Q is currently
locked by Ti in either IX or IS mode.
4. A node Q can be locked by Ti in X, SIX, or IX mode only if and only if the parent of Q
is currently locked by Ti in either IX or SIX mode.
5. Ti can lock a node only if it has not previously unlocked any node
6. Ti can unlock a node Q only if none of the children of Q are currently locked by Ti.
7.9 DEADLOCK HANDLING
A system is in a deadlock state if there exists a set of transactions such that every
transaction in the set is waiting for another transaction in the set. Imagine that there are n number
of transactions {T0, T1,..., Tn} in a schedule with T0 is waiting for a data item that T1 holds, T1
is waiting for a data item that T2 holds ... and Tn−1 is waiting for a data item that Tn holds, and
Tn is waiting for a data item that T0 holds. No transaction can make progress in such a situation
and some of the transactions involved in the deadlock should be rolled back to the point where
it obtained a lock whose release resolves the deadlock.
Deadlock problem can be overcome by two different ways – either a deadlock
prevention protocol can be used to ensure that the system will never enter a deadlock state or
allowing the system to enter a deadlock state, and then try to recover by using a deadlock
detection and deadlock recovery scheme. Both methods may result in transaction rollback.
Deadlock Prevention is better if the probability that the system would enter a deadlock state is
relatively high. Otherwise, detection and recovery method is more efficient, but it requires
overhead that includes not only the run-time and the potential losses inherent in recovery from
a deadlock.
Let us consider the partial schedule in Figure 7.9.

Figure 7.9 Partial Schedule


7.18 Concurrency Control

In this schedule, neither T3 nor T4 can make progress. The reason is that T3 has locked
the data-item B in exclusive mode, and T3 request lock on B in shared mode which causes T4
to wait for T3 to release its lock on B. Similarly T3 waits for T4 to release its lock on A. Such a
situation is called a deadlock and to handle the deadlock either T3 or T4 must be rolled back
and its locks released.

Imagine a situation in which T1 has locked a data item X in shared mode. T2 needs to
acquire the lock in exclusive mode and waiting for T1 to release the lock an X-lock. Some other
transactions may in sequence request for S-lock on the same item and it would have been
granted also since T1 has locked in shared mode only. In that case, the same transaction is
repeatedly rolled back due to deadlocks. This leads to Starvation and Concurrency control
manager should be designed to prevent starvation.

7.9.1 Deadlock Prevention

Deadlock prevention protocols ensure that the system will never enter into a deadlock
state. Some prevention strategies:

• Require that each transaction locks all its data items before it begins execution.

• Impose partial ordering of all data items and require that a transaction can lock data
items only in the order specified by the partial order.

There are two approaches in deadlock prevention. One approach ensures that all the
required locks are acquired together so that no cyclic waits can occur. The other approach is
closer to deadlock recovery, and performs transaction rollback instead of waiting for a lock.

The simplest scheme under the first approach requires that each transaction locks all the
required data items before it begins execution. Either all are locked in one step or none are
locked. There are two main disadvantages to this protocol:

(1) It is often hard to predict what data items need to be locked, before the transaction
begins.

(2) data-item utilization may be very low because many data items may be locked but
unused for a long time.

The second approach for preventing deadlocks is to impose an ordering of all data items
and the transaction can lock data items only in a sequence mentioned in the ordering.

Once a transaction has locked a particular item, it cannot request locks on items that
precede that item in the ordering. This scheme is easy to implement only when the data items
to be accessed by a transaction is known at the beginning of execution itself.
Database Management Systems 7.19

Based on a counter or on the system clock, a unique time stamp is assigned to each
transaction when it begins. The system uses these timestamps to decide whether a transaction
should wait or roll back. If a transaction is rolled back, it retains its old timestamp when
restarted. There are two different deadlock-prevention schemes using timestamps have been
proposed.

a. wait-die

wait-die is non-preemptive scheme. If the transaction is older, then it may wait for
younger one to release the lock on the data item. But if the transaction is younger, it
never waits for older transaction to release the lock on the data item an they are rolled
back. But the drawback is that a transaction may die several times before acquiring a
lock.

b. wound-wait

wound-wait is preemptive in which an older transaction forces younger transaction to


rollback instead of waiting for the younger transaction to release the lock on the data
item. But Younger transactions may wait for older transaction. There will be fewer
rollbacks when compared to wait-die scheme.

In both schemes, a rolled back transaction is restarted with its original timestamp and
ensures that older transactions have precedence over newer ones, and starvation is thus
avoided.

In timeout based schemes, a transaction waits for a lock only for a specified amount of
time and after that the transaction is rolled back. The time-out scheme is simple to
implement and ensures that deadlocks get resolved by timeout. But may roll back
transaction unnecessarily even in absence of deadlock and it is difficult to determine
good value of the timeout interval. Starvation is also possible.

7.9.2 Deadlock Detection

Wait-for graph is used to detect deadlocks. Transactions are represented as Vertices. If


Ti is waiting for a lock held in conflicting mode by Tj, then an Edge is drawn from Ti Tj. The
system is in a deadlock state if and only if the wait-for graph has a cycle. A deadlock-detection
algorithm is to be executed periodically to look for cycles. The Figure 7.10 a is Wait-for graph
without a cycle and hence there is no deadlock. The Figure 7.10 b is Wait-for graph with a cycle
and hence there exist deadlock.
7.20 Concurrency Control

Fig. 7.10 a Wait-for graph without a cycle b)Wait-for graph with a cycle

7.9.3 Deadlock Recovery

After the detection of deadlock, some transaction will have to be rolled back (made a
victim) in order to break deadlock cycle.

The steps involved in recovery are

1. Select a transaction as victim that will incur minimum cost.

2. Rollback (determine how far to roll back the selected transaction)

There are two types of rollback:

▪ Total rollback: Abort the transaction and then restart it.

▪ Partial rollback: Roll back victim transaction only as far as necessary to release
locks that another transaction in cycle is waiting for

Starvation may happen during recovery and a solution is that oldest transaction in the
deadlock set is never chosen as victim.
CHAPTER – VIII
RECOVERY

8.1 RECOVERY CONCEPTS

A computer system may subject to failure due to disk crash, power outage, software
error, a fire in the machine room and information may be lost. The database system must take
actions in advance to ensure that the atomicity and durability properties of transactions. A
recovery scheme can restore the database to the consistent state that existed before the failure.
The recovery scheme must also provide high availability i.e., minimize the time for which the
database is not usable after a failure.

8.1.1 Failure Classification


There are various types of failure that may occur in a system. We shall consider only
the transaction failure, system crash and disk failure.
• Transaction failure.
There are two types of errors that may cause a transaction to fail:
➢ Logical error - The transaction can no longer continue because of bad input, data
not found, overflow, or resource limit exceeded.
➢ System error - The system has entered an undesirable state (for example, deadlock).
The transaction cannot continue with its normal execution. but can be reexecuted at
a later time.
• System crash - There is a hardware malfunction, or a bug in the database software or the
operating system, that causes the loss of the content of volatile storage, and brings
transaction processing to a halt. The content of nonvolatile storage remains intact, and
is not corrupted. This is known as the fail-stop assumption.
• Disk failure. A disk block loses its content due to head crash or failure during a data
transfer operation. Copies of the data on other disks, or archival backups on tertiary
media are used to recover from the failure.
The recovery algorithms ensure database consistency and transaction atomicity despite
failures and have two parts:
i. Actions taken during normal transaction processing with enough information to
allow recovery from failures.
8.2 Recovery

ii. Actions taken after a failure to recover the database contents to a state that ensures
consistency, atomicity and durability.
8.1.2 Storage
The various data items in the database may be stored and accessed in a number of
different storage media. There are three categories of storage:
▪ Stable storage
▪ Volatile storage
▪ Nonvolatile storage
Stable storage plays a critical role in recovery algorithms. To implement stable storage,
we need to replicate the needed information in several nonvolatile storage media (usually disk)
with independent failure modes, and to update the information in a controlled manner to ensure
that failure during data transfer does not damage the needed information. Block transfer
between memory and disk storage can result in:
• Successful completion: The transferred information arrived safely at its destination.
• Partial failure: A failure occurred during transfer, and the destination block has incorrect
information.
• Total failure: The failure occurred sufficiently early during the transfer that the
destination block remains intact.
We require that, if a data-transfer failure occurs, the system detects it and invokes a
recovery procedure to restore the block to a consistent state. To do so, the system must maintain
two physical blocks for each logical database block. In the case of mirrored disks, both blocks
are at the same location and in the case of remote backup, one of the blocks is local, whereas
the other is at a remote site. An output operation is executed as follows:
1. Write the information onto the first physical block.
2. When the first write completes successfully, write the same information onto the
second physical block.
3. The output is completed only after the second write completes successfully.
If the system fails while blocks are being written, it is possible that the two copies of a
block are inconsistent with each other. During recovery, for each block, the system would need
to examine two copies of the blocks. If both are the same and no detectable error exists, then no
further actions are necessary. But if the system detects an error in one block, then it replaces
its content with the content of the other block. If both blocks contain no detectable error, but
they differ in content, then the system replaces the content of the first block with the value of
the second. This recovery procedure ensures that a write to stable storage either succeeds
Database Management Systems 8.3

completely or results in no change.


The requirement of comparing every corresponding pair of blocks during recovery is
expensive to meet. We can reduce the cost greatly by keeping track of block writes that are in
progress and during recovery, only blocks for which writes are in progress need to be compared.
8.1.3 Data Access
The database system resides permanently on nonvolatile storage (usually disks) and only
parts of the database are in memory at any time. The database is partitioned into fixed-length
storage units called blocks. Blocks are the units of data transfer to and from disk, and may
contain several data items.
Transactions input information from the disk to main memory, and then output the
information back onto the disk. The blocks residing on the disk are referred to as physical
blocks. The blocks residing temporarily in main memory are referred to as buffer blocks. The
area of memory where blocks reside temporarily is called the disk buffer. Block movements
between disk and main memory are initiated through the following two operations:
input(B) - transfers the physical block B to main memory.
output(B) - transfers the buffer block B to the disk, and replaces the appropriate physical
block there.
The Figure 8.1 illustrates this scheme of block storage operations.

Fig. 8.1 Block Storage Operations


8.4 Recovery

Each transaction Ti has a private work area in which copies of data items accessed and
updated by Ti are kept. The system creates this work area when the transaction is initiated and
removes it when the transaction either commits or aborts. Each data item X kept in the work
area of transaction Ti is denoted by xi. Transaction Ti interacts with the database system by
transferring data to and from its work area to the system buffer using these two operations:

1. read(X) assigns the value of data item X to the local variable xi . It executes as follows:

a. If block BX on which X resides is not in main memory, it issues input(BX).

b. It assigns to xi the value of X from the buffer block.

2. write(X) assigns the value of local variable xi to data item X in the buffer block. It
executes this operation as follows:

a. If block BX on which X resides is not in main memory, it issues input(BX).

b. It assigns the value of xi to X in buffer BX.

A buffer block is eventually written out to the disk either because the buffer manager
needs the memory space for other purposes or because the database system wishes to reflect the
change to B on the disk. The database system performs a force-output of buffer B if it issues an
output(B).

When a transaction needs to access a data item X for the first time, it must execute
read(X). The system then performs all updates to X on xi . At any point during its execution a
transaction may execute write(X) to reflect the change to X in the database itself.

The output(BX) operation for the buffer block BX on which X resides does not need to
take effect immediately after write(X) is executed, since the block BX may contain other data
items that are still being accessed. The actual output may take place later. If the system crashes
after the write(X) operation was executed but before output (BX) was executed, the new value
of X is never written to disk and, thus, is lost.

8.2 RECOVERY AND ATOMICITY

Consider a transaction Ti that transfers Rs.50 from account A to account B, with initial
values of A and B being Rs.1000 and Rs.2000, respectively. Suppose that a system crash has
occurred during the execution of Ti , after output(BA) has taken place, but before output(BB)
was executed, where BA and BB denote the buffer blocks on which A and B reside.

When the system restarts, the value of A would be $950, while that of B would be $2000,
which is clearly inconsistent with the atomicity requirement for transaction Ti . Unfortunately,
Database Management Systems 8.5

there is no way to find out by examining the database state what blocks had been output, and
what had not, before the crash.

Our goal is to perform either all or no database modifications made by Ti . However, if


Ti performed multiple database modifications, several output operations may be required, and
a failure may occur after some of these modifications have been made, but before all of them
are made.

To achieve our goal of atomicity, we must first output to stable storage information
describing the modifications, without modifying the database itself. This information can help
us ensure that all modifications performed by committed transactions are reflected in the
database during recovery.

8.2.1. Log Records

The most widely used structure for recording the modifications done in a database is the
log. The log is a sequence of log records, recording all the update activities in the database.
There are several types of log records. An update log record describes a single database write.
It has these fields:

• Transaction identifier - the unique identifier of the transaction that performed the write
operation.

• Data-item identifier - the unique identifier of the data item written. Typically, it is the
location on disk of the data item with the block identifier of the block and an offset
within the block.

• Old value - the value of the data item prior to the write.

• New value - the value that the data item will have after the write.

The update log record is represented as <Ti, Xj , V1, V2>, indicating that transaction Ti
has performed a write on data item Xj . Xj had value V1 before the write, and has value V2 after
the write. There are special log records to record significant events during transaction
processing.

• <Ti start> - Transaction Ti has started.

• <Ti commit> - Transaction Ti has committed.

• <Ti abort> - Transaction Ti has aborted.

Whenever a transaction performs a write, the log record for that write will be created
8.6 Recovery

and added to the log, before the database is modified. Once a log record exists, we can output
the modification to the database. Also, we have the ability to undo a modification by setting
with the old-value field in log records.

For log records to be useful for recovery from system and disk failures, the log must
reside in stable storage. For now, we assume that every log record is written to the end of the
log on stable storage as soon as it is created.

8.2.2 Database Modification

A transaction creates a log record before modifying the database. The log records allow
the system to undo changes, if the transaction must be aborted. Similarly, they allow the system
to redo the changes, if the transaction has committed but the system crashed before those
changes are stored in the database on disk. The steps in modifying a data item are:

1. The transaction performs some computations in its own private part of main memory.

2. The transaction modifies the data block in the disk buffer in main memory holding the
data item.

3. The database system executes the output operation that writes the data block to disk.

There are two types of database modification techniques – deferred and immediate.

▪ The immediate-modification scheme allows updates to buffer/disk before the


transaction commits

o Update log record must be written before database item is written

o Output of updated blocks to disk can take place at any time before or after
transaction commit

o Order in which blocks are output can be different from the order in which they
are written.

▪ The deferred-modification scheme performs updates to buffer/disk only at the


time of transaction commit

o Simplifies some aspects of recovery

o But has overhead, since the transactions need to make local copies of all
updated data items
Database Management Systems 8.7

The recovery algorithms we describe in this chapter support immediate modification. A


recovery algorithm must take into account a variety of factors, including:

• The possibility that a transaction may have committed although some of its database
modifications exist only in the disk buffer in main memory and not in the database
on disk.

• The possibility that a transaction may have modified the database while in the active
state and, as a result of a subsequent failure, may need to abort.

Because all database modifications must be preceded by the creation of a log record, the
system has available both the old value prior to the modification of the data item and the
new value that is to be written for the data item. This allows the system to perform undo
and redo operations as appropriate.

• Undo using a log record sets the data item specified in the log record to the old
value.

• Redo using a log record sets the data item specified in the log record to the new
value.

8.2.3 Concurrency Control and Recovery

Imagine a situation in which transaction T1 has modified a data item X and the
concurrency control scheme permits another transaction T2 to modify X before T1 commits. If
undo operation is done in T1 (which restores the old value of X) , then the undo operation should
be done in T2 also. In order to avoid this situation, recovery algorithms require that if a data
item has been modified by a transaction, no other transaction can modify the data item until the
first transaction commits or aborts.

This requirement can be satisfied by using strict two-phase locking in which the
exclusive lock acquired on any up-dated data item is released after the transaction commits.
Snapshot-isolation and validation-based concurrency-control techniques also holds the acquired
exclusive locks until the transaction is committed.

Snapshot-isolation or validation concurrency control protocols make use of the


deferred-modification technique. However, some implementations of snapshot isolation use
immediate modification, but provide a logical snapshot on demand. Similarly, immediate
modification of the database is a natural fit with two-phase locking, but deferred modification
can also be used with two-phase locking.
8.8 Recovery

8.2.4 Transaction Commit

A transaction is said to be committed, when its commit log record, (the last log record)
has been written to stable storage. At that point all the previous log records have already been
output to stable storage. If there is a system crash, then the updates of the transaction can be
redone. If the system crash occurs before a log record < Ti commit> is output to stable storage,
then the transaction Ti will be rolled back.

8.2.5 Using the Log to Redo and Undo Transactions

Let us see how the log can be used to recover from a system crash, and to roll back
transactions during normal operation. Consider a transaction T0 is a transaction which transfers
Rs.50 from account A to account B. Initial balance in A is 1000 and B is 2000.

T0: read(A);

A := A-50;

write(A);

read(B);

B := B + 50;

write(B).

Let T1 be a transaction that withdraws $100 from account C. Initial balance in account C is :

T1: read(C);

C := C-100;

write(C).

The portion of the log which contains the relevant information with respect to the
transactions T0 and T1 is shown below.

<T0 start>

<T0 , A, 1000, 950>

<T0 , B, 2000, 2050>

<T0 commit>

<T1 start>
Database Management Systems 8.9

<T1 , C, 700, 600>

<T1 commit>

A possible order in which the actual outputs took place in both the database system and
the log as a result of the execution of T0 and T1 is shown below.

Log Database

<T0 start>

<T0 , A, 1000, 950>

<T0 , B, 2000, 2050>

A = 950

B = 2050

<T0 commit>

<T1 start>

<T1 , C, 700, 600>

C = 600

<T1 commit>

Using the log, the system can handle any failure other than the loss of information in
nonvolatile storage. The recovery scheme uses two recovery procedures redo(Ti) and undo(Ti).

• redo(Ti) sets the value of all data items updated by transaction Ti to the new values. The
order in which updates are carried out by redo is important. When recovering from a
system crash, if updates to a particular data item are applied in an order different from
the order in which they were applied originally, the final state of that data item will have
a wrong value.

• undo(Ti ) restores the value of all data items updated by transaction Ti to the old values.
The undo operation not only restores the data items to their old value, but also writes
log records to record the updates performed as part of the undo process. These log
records are special redo-only log records, since they do not need to contain the old-value
of the updated data item.

Similar to redo procedure, the order in which undo operations are performed is
8.10 Recovery

important. When the undo operation for transaction Ti completes, it writes a <Ti abort> log
record, indicating that the undo has completed.

After a system crash has occurred, the system consults the log to determine which
transactions need to be redone, and which need to be undone so as to ensure atomicity.

• Transaction Ti needs to be undone if the log contains the record <Ti start>, but does not
contain either the record <Ti commit> or the record <Ti abort>.

• Transaction Ti needs to be redone if the log contains the record <Ti start> and either the
record <Ti commit> or the record <Ti abort>.

Let us return to our banking example, with transaction T0 and T1 executed in serial order,
T0 followed by T1. Suppose that the system crashes before the successful completion of the
transactions. We shall consider three cases. The logs for each of these cases are as shown below
in Figure 8.2.

Fig. 8.2 Same log at three different times.

Case 1:

First, let us assume that the crash occurs just after the log record for the step

write(B)

of transaction T0 has been written to stable storage (Figure 8.2a). When the system
resumes, it finds the <T0 start> in the log, but there is no corresponding <T0 commit>
or <T0 abort> record. Hence undo(T0) is performed and the amount in the accounts A
and B (on the disk) are restored to Rs.1000 and Rs.2000, respectively.

Case 2:

Let us assume that the crash comes just after the log record for the step:

write(C)
Database Management Systems 8.11

of transaction T1 has been written to stable storage (Figure 8.2b). When the system
resumes back, two recovery actions need to be taken. The log contains both the record
<T0 start> and the record <T0 commit> and hence redo(T0) must be performed . The log
contains the record <T1 start> but there is no record <T1 commit> or <T1 abort> and
hence undo(T1) must be performed. At the end of the entire recovery procedure, the
values of accounts A, B, and C are $950, $2050, and $700, respectively.

Case 3:

Let us assume that the crash occurs just after the log record:

<T1 commit>

has been written to stable storage (Figure 8.2c). When the system resumes back, both
T0 and T1 need to be redone, since the records <T0 start> and <T0 commit> appear in
the log, as do the records <T1 start> and <T1 commit>. After the system performs the
recovery procedures redo(T0) and redo(T1), the values in accounts A, B, and C are $950,
$2050, and $600, respectively.

8.2.6 Checkpoints

When a system crash occurs, we must refer the log to determine those transactions that
need to be redone and those that need to be undone. In principle, we need to search the entire
log to determine this information. There are two major difficulties with this approach:

1. The search process is time-consuming.

2. Most of the transactions that, according to our algorithm, need to be redone have
already written their updates into the database. Although redoing them will cause
no harm, it will nevertheless cause recovery to take longer.

Check points reduce these types of overhead. There are two check point scheme that:

(a) does not permit any updates to be performed while the checkpoint operation is in
progress,

(b) outputs all modified buffer blocks to disk when the checkpoint is performed.

A checkpoint is performed as follows:

1. Output onto stable storage all log records currently residing in main memory.

2. Output to the disk all modified buffer blocks.

3. Output onto stable storage a log record of the form <checkpoint L>, where
8.12 Recovery

L is a list of transactions active at the time of the checkpoint. Transactions are not
allowed to perform any update actions, such as writing to a buffer block or writing a log record,
while a checkpoint is in progress.

The presence of a <checkpoint L> record in the log allows the system to streamline its
recovery procedure. Consider a transaction Ti that completed prior to the checkpoint. For such
a transaction, the <Ti commit> record or < Ti abort> record appears in the log before the
<checkpoint> record. Any database modifications made by Ti must have been written to the
database either prior to the checkpoint or as part of the checkpoint itself. Thus, at recovery time,
there is no need to perform a redo operation on Ti.

After a system crash has occurred, the system examines the log to find the last
<checkpoint L> record by searching the log backward, from the end of the log, until the first
<checkpoint L>.

The redo or undo operations need to be applied only to transactions in L, and to all
transactions that started execution after the <checkpoint L> record was written to the log. Let
us denote this set of transactions as T.

• For all transactions Tk in T that have no <Tk commit> record or <Tk abort> record in
the log, execute undo(Tk).

• For all transactions Tk in T such that either the record <Tk commit> or the record <Tk
abort> appears in the log, execute redo(Tk).

For example, consider the set of transactions T0, T1,...,T100. Suppose that the most recent
checkpoint took place during the execution of transaction T67 and T69, while T68 and all
transactions with subscripts lower than 67 are completed before the checkpoint. Transactions
T67, T69,...,T100 only need to be considered during the recovery scheme. Each of them needs to
be redone if it has completed or undone, if incomplete. Fuzzy checkpoint is a checkpoint where
transactions are allowed to perform updates even while buffer blocks are being written out.

8.3 RECOVERY ALGORITHM

The recovery algorithm requires that a data item that has been updated by an
uncommitted transaction cannot be modified by any other transaction, until the first transaction
has either committed or aborted.

i. Transaction Rollback

First consider transaction rollback during normal operation i.e., not during recovery
from a system crash. Rollback of a transaction Ti is performed as follows:
Database Management Systems 8.13

1. The log is scanned backward, and for each log record of Ti of the form <Ti , Xj ,
V1, V2> that is found:

a. The value V1 is written to data item Xj , and

b. A special redo-only log record <Ti, Xj , V1> is written to the log, where V1 is
the value being restored to data item Xj during the rollback. These log records
are sometimes called compensation log records.

2. Once the log record <Ti start> is found the backward scan is stopped, and a log
record <Ti abort> is written to the log.

ii. Recovery After a System Crash

Recovery after a crash, take place in two phases:

1. Redo phase

In the redo phase, the system replays updates of all transactions by scanning the log
forward from the last checkpoint. The log records that are replayed include log
records for transactions that were rolled back before system crash, and those that
had not committed when the system crash occurred. This phase also determines all
transactions that were incomplete at the time of the crash, and must therefore be
rolled back. Such incomplete transactions would either have been active at the time
of the checkpoint, and thus would appear in the transaction list in the checkpoint
record, or would have started later. Further, such incomplete transactions would
have neither a <Ti abort> nor a <Ti commit> record in the log.

The steps taken while scanning the log are as follows:

a. The list of transactions to be rolled back, undo-list, is initially set to the list L
in the <checkpoint L> log record.

b. Whenever a normal log record of the form <Ti , Xj , V1, V2>, or a redo-only log
record of the form <Ti , Xj , V2> is encountered, the operation is redone i.e.,,
the value V2 is written to data item Xj .

c. Whenever a log record of the form <Ti start> is found, Ti is added to undo-list.

d. Whenever a log record of the form <Ti abort> or <Ti commit> is found, Ti is
removed from undo-list.

At the end of the redo phase, undo-list contains the list of all transactions that are
incomplete, that is, they neither committed nor completed rollback before the crash.
8.14 Recovery

2. Undo Phase
In the undo phase, the system rolls back all transactions in the undo-list. It performs
rollback by scanning the log backward from the end.
a. Whenever it finds a log record belonging to a transaction in the undo- list, it
performs undo actions just as if the log record had been found during the
rollback of a failed transaction.
b. When the system finds a <Ti start> log record for a transaction Ti in undo-list,
it writes a <Ti abort> log record to the log, and removes Ti from undo-list.
c. The undo phase terminates once undo-list becomes empty, that is, the system
has found <Ti start> log records for all transactions that were initially in undo-
list.
After the undo phase of recovery terminates, normal transaction processing can resume.
Figure 8.3 shows an example of actions logged during normal operation, and actions
performed during failure recovery. In the log shown in the figure, transaction T1 had committed,
and transaction T0 had been completely rolled back, before the system crashed.

Fig. 8.3 Example of logged actions, and actions during recovery

When recovering from a crash, in the redo phase, the system performs a redo of all
operations after the last checkpoint record. In this phase, the list undo-list initially contains T0
and T1. T1 is removed first when its commit log record is found, while T2 is added when its start
log record is found. Transaction T0 is removed from undo-list when its abort log record is found,
leaving only T2 in undo-list. The undo phase scans the log backwards from the end, and when
it finds a log record of T2 updating A, the old value of A is restored, and a redo-only log record
written to the log. When the start record for T2 is found, an abort record is added for T2. Since
undo-list contains no more transactions, the undo phase terminates, completing recovery.
Database Management Systems 8.15

8.4 SHADOW COPIES AND SHADOW PAGING

In the shadow-copy scheme, a transaction that wants to update the database first creates
a complete copy of the database. All updates are done on the new database copy, keeping the
original shadow copy, untouched. If at any point the transaction has to be aborted, the system
merely deletes the new copy. The old copy of the database has not been affected. The current
copy of the database is identified by a pointer, called db-pointer, which is stored on disk. The
Figure 8.4 shows the shadow paging scheme.

Fig. 8.4 Shadow Paging

If the transaction partially commits i.e., executes its final statement, it is committed as
follows:

1. The operating system is asked to make sure that all pages of the new copy of the
database have been written out to disk.

2. After the operating system has written all the pages to disk, the database system
updates the pointer db-pointer to point to the new copy of the database.

3. The new copy then becomes the current copy of the database.

4. The old copy of the database is then deleted.

5. The transaction is said to have been committed at the point where the updated db-
pointer is written to disk.

The implementation actually depends on the write to db-pointer being atomic - either
all its bytes are written or none of its bytes are written. Disk systems provide atomic updates to
entire blocks, or at least to a disk sector. In other words, the disk system guarantees that it will
update db-pointer atomically.

Shadow copy schemes are commonly used by text editors. Shadow copying can be used
for small databases, but it would be extremely expensive for a large database. A variant of
shadow-copying, called shadow-paging, reduces copying as follows.
8.16 Recovery

▪ The scheme uses a page table containing pointers to all pages.

▪ Page table itself and all updated pages are copied to a new location.

▪ Any page which is not updated by a transaction is not copied, but instead the new
page

▪ table just stores a pointer to the original page.

▪ When a transaction commits, it atomically updates the pointer to the page table,
which

▪ acts as db-pointer, to point to the new copy.

Shadow paging does not work well with concurrent transactions and is not widely
used in databases.

8.5 ARIES

ARIES is a state of the art in recovery method which

• Incorporates numerous optimizations

• Reduce overheads during normal processing and to speed up recovery

When compared with the recovery algorithms discussed earlier, ARIES

1. Uses log sequence number (LSN) to identify log records and stores LSNs in pages to
identify what updates have already been applied to a database page

2. Supports Physiological redo

3. Uses a dirty page table to avoid unnecessary redos during recovery

4. Uses fuzzy checkpointing that only records information about dirty pages and does not
require dirty pages to be written out at checkpoint time.

ARIES performs optimizations using Physiological redo in which affected page is


physically identified. Physciological redo operation:

• Reduces logging overheads

▪ Physiological redo can log just the record deletion

▪ Physical redo would require logging of old and new values for much of the page
Database Management Systems 8.17

• Requires page to be output to disk atomically


▪ Easy to achieve with hardware RAID, also supported by some disk systems
▪ Incomplete page output can be detected by checksum techniques,
• But extra actions are required for recovery
• Treated as a media failure
8.5.1 ARIES Data Structures
Each log record in ARIES has a Log Sequence Number (LSN) that uniquely identifies
the record. The number is conceptually just a logical identifier whose value is greater for log
records that occur later in the log. In practice, the LSN is generated in such a way that it can
also be used to locate the log record on disk.
ARIES splits a log into multiple log files, each of which has a file number. When a log
file grows beyond some limit, ARIES creates a new log file. The LSN consists of a file number,
an offset and an identifier called the PageLSN. Whenever an update operation occurs on a page,
the LSN of its log record is stored in the PageLSN field of the page. During the redo phase, any
log records with LSN less than or equal to the PageLSN of a page should not be executed on
the page, since their actions are already reflected on the page.
There are special redo-only log records generated during transaction rollback, called
compensation log records (CLRs). These serve the same purpose as the redo-only log records
but the CLRs have an extra field, called the UndoNextLSN that records the LSN of the log that
needs to be undone next, when the transaction is being rolled back. The Figure 8.5 shows the
data structures used in Aries.

Fig. 8.5 Data Structures used in Aries


8.18 Recovery

8.5.2 Phases in ARIES Algorithm

a. Analysis pass: Determines Which transactions to undo, which pages were dirty at the
time of crash

b. Redo pass: Repeats history, redoing all actions from RedoLSN to bring the database to
a state it was in before the crash.

c. Undo pass: Rolls back all incomplete transactions

a. Analysis pass

The steps in the analysis pass are


1. Find the last complete checkpoint log record
▪ Reads DirtyPageTable from log record
▪ Sets RedoLSN = min of RecLSNs of all pages in DirtyPageTable
o In case no pages are dirty, RedoLSN = checkpoint record’s LSN
▪ Sets undo-list = list of transactions in checkpoint log record
▪ Reads LSN of last log record for each transaction in undo-list from checkpoint log
record
2. Scans forward from checkpoint
▪ If any log record found for transaction not in undo-list, adds transaction to undo-
list
▪ Whenever an update log record is found
o If page is not in DirtyPageTable, it is added with RecLSN set to LSN of the
update log record
▪ If transaction end log record found, delete transaction from undo-list
▪ Keeps track of last log record for each transaction in undo-list
3. At the end of analysis pass:
▪ RedoLSN determines where to start redo pass
▪ RecLSN for each page in DirtyPageTable used to minimize redo work
▪ All transactions in undo-list need to be rolled back
Database Management Systems 8.19

b. Redo Pass
Redo Pass repeats history by replaying every action not already reflected in the page on
disk. It scans the log forward from RedoLSN. Whenever an update log record is found,
it takes the following action:
1. If the page is not in DirtyPageTable or the LSN of the log record is less than the RecLSN
of the page in DirtyPageTable, then skip the log record
2. Otherwise fetch the page from disk. If the PageLSN of the page fetched from disk is
less than the LSN of the log record, redo the log record
c. Undo Pass
Undo pass performs backward scan on log undoing all transaction in undo-list.
Backward scan optimized by skipping unneeded log records as follows:
• Next LSN to be undone for each transaction set to LSN of last log record for
transaction found by analysis pass.
• At each step pick largest of these LSNs to undo, skip back to it and undo it
• After undoing a log record
▪ For ordinary log records, set next LSN to be undone for transaction to PrevLSN
noted in the log record
▪ For compensation log records (CLRs) set next LSN to be undo to UndoNextLSN
noted in the log record
8.5.3 Recovery Actions in ARIES
Figure 8.6 illustrates the recovery actions performed by ARIES, on an example log. We
assume that the last completed checkpoint pointer on disk points to the checkpoint log record
with LSN 7568. The PrevLSN values in the log records are shown using arrows in the figure,
while the UndoNextLSN value is shown using a dashed arrow for the one compensation log
record, with LSN 7565. The analysis pass would start from LSN 7568, and when it is complete,
RedoLSN would be 7564. Thus, the redo pass must start at the log record with LSN 7564. Note
that this LSN is less than the LSN of the checkpoint log record, since the ARIES checkpointing
algorithm does not flush modified pages to stable storage. The DirtyPageTable at the end of
analysis would include pages 4894, 7200 from the checkpoint log record, and 2390 which is
updated by the log record with LSN 7570. At the end of the analysis pass, the list of transactions
to be undone consists of only T145 in this example. The redo pass for the above example starts
from LSN 7564 and performs redo of log records whose pages appear in DirtyPageTable. The
undo pass needs to undo only transaction T145, and hence starts from its LastLSN value 7567,
and continues backwards until the record < T145 start> is found at LSN 7563.
8.20 Recovery

Fig. 8.6 Recovery Actions in Aries

8.5.4 Other ARIES Features

1. Recovery Independence - Pages can be recovered independently of others. E.g. if some


disk pages fail they can be recovered from a backup while other pages are being used

2. Savepoints - Transactions can record savepoints and roll back to a savepoint

▪ Useful for complex transactions

▪ Also used to rollback just enough to release locks on deadlock

3. Fine-grained locking - Index concurrency algorithms that permit tuple level locking on
indices can be used

4. Recovery optimizations:

▪ Dirty page table can be used to prefetch pages during redo.

▪ Out of order redo is possible and redo can be postponed and other log records can
continue to be processed.
CHAPTER – IX
DATA STORAGE

9.1 RAID
RAID (redundant array of independent disks) originally redundant array of inexpensive
disks) is a way of storing the same data in different places on multiple hard disks to protect data
in the case of a drive failure.
9.1.1. Introduction
Disk organization techniques manage a large number of disks, providing a view of a
single disk of high capacity and high speed by using multiple disks in parallel, and high
reliability by storing data redundantly, so that data can be recovered even if a disk fails.
9.1.2. Motivation for RAID
• Just as additional memory in form of cache, can improve the system performance, in the
same way additional disks can also improve system performance.
• In RAID we can use an array of disks which operates independently since there are many
disks, multiple I/O requests can be handled in parallel if the data required is on separate
disks
• A single I/O operation can be handled in parallel if the data required is distributed across
multiple disks.
9.1.3. Benefits of RAID
• Data loss can be very dangerous for an organization
• RAID technology prevents data loss due to disk failure
• RAID technology can be implemented in hardware or software
• Servers make use of RAID Technology
9.1.4. RAID Levels
RAID Level 0: Stripping and non-redundant

• RAID level 0 divides data into block units and writes them across a number of disks. As
data is placed across multiple disks it is also called ―data Striping.
9.2 Data Storage

• The advantage of distributing data over disks is that if different I/O requests are pending
for two different blocks of data, then there is a possibility that the requested blocks are
on different disks

There is no parity checking of data. So, if data in one drive gets corrupted then all the
data would be lost. Thus RAID 0 does not support data recovery Spanning is another
term that is used with RAID level 0 because the logical disk will span all the physical
drives. RAID 0 implementation requires minimum 2 disks.

Advantages

• I/O performance is greatly improved by spreading the I/O load across many channels &
drives.

• Best performance is achieved when data is striped across multiple controllers with only
one driver per controller.

Disadvantages

• It is not fault-tolerant; failure of one drive will result in all data in an array being lost.

RAID Level 1: Mirroring (or shadowing)

• Also known as disk mirroring; this configuration consists of at least two drives that
duplicate the storage of data. There is no striping.

• Read performance is improved since either disk can be read at the same time. Write
performance is the same as for single disk storage.

• Every write is carried out on both disks. If one disk in a pair fails, data still available in
the other.

• Data loss would occur only if a disk fails, and its mirror disk also fails before the system
is repaired Probability of combined event is very small.
Database Management Systems 9.3

RAID Level 2:

This configuration uses striping across disks. This level stripes data at a bit level and
each bit is stored in a separate drive. It requires a disk separately for storing ECC code of data.
The level uses the Hamming code for error correction. It is no longer or rarely used.

Advantages

• It uses a selected drive for uniformity in storing data.

• It detects error through hamming code.

• It can be a good answer to data-security problems.

Disadvantages

• It uses an extra drive for error detection.

• The need for hamming code makes it inconvenient for commercial use.
9.4 Data Storage

RAID Level 3: Bit-Interleaved Parity

The RAID 3 level stripes data at the byte level. It requires a separate parity disk which
stores the parity information for each byte. When a disk fails, data can be recovered with the
help of parity bytes corresponding to them for example to recover data in a damaged disk,
compute XOR of bits from other disks (including parity bit disk)

When writing data, corresponding parity bits must also be computed and written to a
parity bit disk. I/O operation addresses all the drives at the same time, RAID 3 cannot overlap
I/O. For this reason, RAID 3 is best for single-user systems with long record applications.

Advantages

• It enables high-speed transmission of data.

• In case of a disk failure, data can be reconstructed using the corresponding parity byte.

• Data can be used parallelly.

• It might be used where few users are referring to large files.

Disadvantages

• It needs an extra file to store parity bytes.

• Its performance is slow in case of files of small size.

• It can be said that it is not a reliable or cheap solution to storage problems.


Database Management Systems 9.5

RAID Level 4: Block-Interleaved Parity

RAID 4 is a quite popular one. It is similar to RAID 1 and RAID 3 in a few ways. It
goes for a block level data stripping which is similar to RAID 0. Just like RAID 3, it uses parity
disk to store data. When you combine both these features together you will clearly understand
what RAID 4 does. It stripes data at the block level and stores its corresponding parity bytes in
the parity disk. In case of a single disk failure, data can be recovered from this parity disk.

Advantages

• In case of a single disk failure, the lost data is recovered from the parity disk.

• It can be useful for large files.

Disadvantages

• It does not solve the problem of more than one disk failure.

• The level needs at least 3 disks as well as hardware backing for doing parity calculations.

• It might seem to be slow in case of small files.

RAID Level 5:

• RAID 5 uses striping as well as parity for redundancy. It is well suited for heavy read
and low write operations.
9.6 Data Storage

• Block-Interleaved Distributed Parity; partitions data and parity among all N + 1 disks,
rather than storing data in N disks and parity in 1 disk. So this level has some similarity
with RAID 4.

Advantages

• This level is known for distributed parity among the various disks in the group.

• It shows good performance without being expensive.

• It uses only one-fourth of the storage capacity for parity and leaves three-fourths of the
capacity to be used for storing data.

Disadvantages

• The recovery of data takes longer due to parity distributed among all disks.

• It is not able to help in the case where more than one disk fails.

RAID Level 6:

• This technique is similar to RAID 5, but includes a second parity scheme that is
distributed across the drives in the array. The use of additional parity allows the array to
continue to function even if two disks fail simultaneously. However, this extra
protection comes at a cost.

• P+Q Redundancy scheme; similar to Level 5, but stores extra redundant information to
guard against multiple disk failures. Better reliability than Level 5 at a higher cost; not
used as widely.
Database Management Systems 9.7

Advantages
• It can help you in case of 2 simultaneous disk failures.
• The number of drives for this level should be an even number with a minimum of 4
drives.
Disadvantages
• It uses only half for storing data as the other half is used for mirroring.
• It needs two extra disks for parity.
• It needs to write in two parity blocks and hence is slower than RAID 5.
• It has inadequate adaptability.
Each RAID level has its own set of advantages and disadvantages. So you have to decide
what you are looking for is safety or speed or storage space.
9.2. FILE ORGANIZATION
We know that data is stored in database, when we refer this data in terms of RDBMS
we call it collection of inter-related tables. However in layman terms you can say that the data
is stored in a physical memory in the form of files.
File organization is a way of organizing the data in such way so that it is easier to insert,
delete, modify and retrieve data from the files.

9.2.1. Purpose of File Organization

• File organization makes it easier & faster to perform operations (such as read, write,
update, delete etc.) on data stored in files.
9.8 Data Storage

• Removes data redundancy. File organization make sure that the redundant and duplicate
data gets removed. This alone saves the database from insert, update, delete operation
errors which usually happen when duplicate data is present in database.

• It saves storage cost. By organizing the data, the redundant data gets removed, which
lowers the storage space required to store the data.

• Improves accuracy. When redundant data gets removed and the data is stored in efficient
manner, the chances of data get wrong and corrupted goes down.

9.2.2. Types of File Organization

There are various ways to organize the data. Every file organization method is different
from each other; therefore, each file organization method has its own advantages and
disadvantages. It is up to the developer which method they choose in order to organize the data.
Usually, this decision is made based on what kind of data is present in database.

The different Types of file organization are

• Sequential File Organization


• Heap File Organization
• Hash File Organization
• B+ Tree File Organization
• Clustered File Organization
• Indexed sequential access method (ISAM)
9.2.2.1. Sequential File Organization
This is one of the easiest methods of file organization. In this method, files (records) are
stored in a sequential manner, one after another. There are two ways to do sequential file
organization.
• Pile File Method
• Sorted File Method

1. Pile File Method

In Pile File method one record is inserted after another record and the new record is
always inserted at the end of the file. If any record needs to be deleted, it gets searched
in the memory block and once it is deleted a new record can be written on the freed
memory block.
Database Management Systems 9.9

The Figure 9.1 shows a File that is being organized using Pile File method, as you can
see the records are not sorted and inserted in first come first serve basis. If you want to
organize the data in such a way that it gets sorted after insertion then use the sorted file
method, which is discussed in next section.

Figure 9.1 Pile Method

Inserting a new record in file using Pile File method

Here we are demonstrating the insertion of a new record R3 in an already present file
using Pile File method. Since this method of sequential organization just adds the new
record at the end of file, the new record R3 gets added at the end of the file, as shown
in the Figure 9.2.

Figure 9.2 Insertion using Pile Method

2. Sorted file Method

In sorted file method, a new record is inserted at the end of the file and then all the
records are sorted to adjust the position of the newly added record. In Figure 9.3, records
appear in sorted order when the file is organized using sorted file method. In case of a
record updation, once the update is complete, the whole file gets sorted again to change
the position of updated record in the file.
9.10 Data Storage

The sorting can be either ascending or descending; in Figure 9.3, the records are sorted
in ascending order.

Figure 9.3 Sorted File Method

Inserting a new record in file using Sorted File Method

In Figure 9.4, a new record R3 is added to an existing file. Although the record is added
at the end, its position gets changed after insertion. The whole file gets sorted after
addition of the new record and the new record R3, is placed just after record R1 as the
file is sorted in ascending order using sorted file method of sequential file organization.

Figure 9.4 Insertion using Sorted File Method

Advantages

• It is simple to adapt method. The implementation is simple compared to other file


organization methods.

• It is fast and efficient when we are dealing with huge amount of data.
Database Management Systems 9.11

• This method of file organization is mostly used for generating various reports and
performing statistical operations on data.

• Data can be stored on cheap storage devices.

Disadvantages

• Sorting the file takes extra time and it requires additional storage for sorting
operation.

• Searching a record is time consuming process in sequential file organization as the


records are searched in a sequential order.

9.2.2.2. Heap File Organization

Heap File Organization method is simple yet powerful file organization method. In this
method, the records are added in memory data blocks, in no particular order. Figure 9.5
demonstrates the Heap file organization. As you can see, records have been assigned to data
blocks in memory in no particular order.

Figure 9.5 Heap File Organization

Since the records are not sorted and not stored in consecutive data blocks in memory,
searching a record is time consuming process in this method. Update and delete operations also
give poor performance as the records needs to be searched first for updation and deletion, which
is already a time consuming operation. However if the file size is small, these operations give
one of the best performances compared to other methods so this method is widely used for small
size files. This method requires memory optimization and cleanup as this method doesn’t free
up the allocated data block after a record is deleted.

Data Insertion

Figure 9.6 demonstrates the addition of a new record in the file using heap file
organization method. As you can see a free data block which has not been assigned to any
9.12 Data Storage

record previously, has been assigned to the newly added record R2. The insertion of new record
is pretty simple in this method as there is no need to perform any sorting; any free data block is
assigned to the new record.

Figure 9.6 Insertion in Heap file organization

Advantages

• This is a popular method when huge amount of records needs to be added in the
database. Since the records are assigned to free data blocks in memory there is no need
to perform any special check for existing records, when a new record needs addition.
This makes it easier to insert multiple records all at once without worrying about
messing with the file organization.

• When the records are less and file size is small, it is faster to search and retrieve the data
from database using heap file organization compared to sequential file organization.

Disadvantages

• This method is inefficient if the file size is big, as the search, retrieve and update
operations consume more time compared to sequential file organization.

• This method doesn’t use the memory space efficiently, thus it requires memory cleanup
and optimization to free the unused data blocks in memory.
Database Management Systems 9.13

9.2.2.3. Hash File Organization

In this method, hash function is used to compute the address of a data block in memory
to store the record. The hash function is applied on certain columns of the records, known as
hash columns to compute the block address. These columns/fields can either be key or non-key
attributes.

Figure 9.7 demonstrates the hash file organization. As shown here, the records are stored
in database in no particular order and the data blocks are not consecutive. These memory
addresses are computed by applying hash function on certain attributes of these records.
Fetching a record is faster in this method as the record can be accessed using hash key column.
No need to search through the entire file to fetch a record.

Figure 9.7 Hash File Organization

Inserting a record

In Figure 9.8, you can see that a new record R5 needs to be added to the file. The same
hash function that generated the address for existing records in the file, will be used again to
compute the address (find data block in memory) for this new record by applying the has
function on the certain columns of this record.
9.14 Data Storage

Figure 9.8 Insertion in Hash File Organization

Advantages

• This method doesn’t require sorting explicitly as the records are automatically sorted in
the memory based on hash keys.

• Reading and fetching a record is faster compared to other methods as the hash key is
used to quickly read and retrieve the data from database.

• Records are not dependant on each other and are not stored in consecutive memory
locations so that prevents the database from read, write, update, delete anomalies.

Disadvantages

• Can cause accidental deletion of data, if columns are not selected properly for hash
function. For example, while deleting an Employee "Steve" using Employee_Name as
hash column can cause accidental deletion of other employee records if the other
employee name is also "Steve". This can be avoided by selecting the attributes properly,
for example in this case combining age, department or SSN with the employee_name
for hash key can be more accurate in finding the distinct record.

• Memory is not efficiently used in hash file organization as records are not stored in
consecutive memory locations.

• If there are more than one hash columns, searching a record using a single attribute will
not give accurate results.
Database Management Systems 9.15

9.2.2.4. Indexed sequential access method (ISAM)

Indexed sequential access method also known as ISAM method, is an upgrade to the
conventional sequential file organization method. You can say that it is an advanced version of
sequential file organization method. In this method, primary key of the record is stored with an
address as shown in Figure 9.9; this address is mapped to an address of a data block in memory.
This address field works as an index of the file.

In this method, reading and fetching a record is done using the index of the file. Index
field contains the address of a data record in memory, which can be quickly used to read and
fetch the record from memory.

Figure 9.9 ISAM File Organization

Advantages

Searching a record is faster in ISAM file organization compared to other file


organization methods as the primary key can be used to identify the record and since primary
key also has the address of the record, it can read and fetch the data from memory.

This method is more flexible compared to other methods as this allows to generate the index
field (address field) for any column of the record. This makes searching easier and efficient as
searches can be done using multiple column fields.

This allows range retrieval of the records since the address file is stored with the primary
key of the record; we can retrieve the record based on a certain range of primary key columns.

This method allows partial searches as well. For example, employee name starting with
“St” can be used to search all the employees with the name starting with letters “St”. This will
result all the records where employee name begins with the letters “St”.
9.16 Data Storage

Disadvantages

• Requires additional space in the memory to store the index field.

• After adding a record to the file, the file needs to be re-organized to maintain the
sequence based on primary key column.

• Requires memory cleanup because when a record is deleted, the space used by the record
needs to be released in order to be used by the other record.

• Performance issues are there if there is frequent deletion of records, as every deletion
needs a memory cleanup and optimization.

9.2.2.5. Cluster File Organization

Cluster file organization is different from the other file organization methods. Other file
organization methods mainly focus on organizing the records in a single file (table). Cluster file
organization is used, when we frequently need combined data from multiple tables.

While other file organization methods organize tables separately and combine the result
based on the query, cluster file organization stores the combined data of two or more frequently
joined tables in the same file known as cluster as shown in Figure 9.10. This helps in accessing
the data faster.

Figure 9.10 Cluster File Organization


Database Management Systems 9.17

There are two types of cluster file organizations:

1. Index based cluster file organization

2. Hash based cluster file organization

Index based cluster file organization

The example that we have shown in the above diagram is an index based cluster file
organization. In this type, the cluster is formed based on the cluster key and this cluster key
works as an index of the cluster.

Since EMP_DEP field is common in both the tables, this becomes the cluster key when
these two tables joined to form the cluster. Whenever we need to find the combined record of
employees and department based on the EMP_DEP, this cluster can be used to quickly retrieve
the data.

Hash based cluster file organization

This is same as index based cluster file organization except that in this type, the hash
function is applied on the cluster key to generate the hash value and that value is used in the
cluster instead of the index.

Note: The main difference between these two types is that in index based cluster, records are
stored with cluster key while in hash based cluster, and the records are stored with the hash
value of the cluster key.

Advantages

• This method is popularly used when multiple tables needs to be joined frequently based
on the same condition.

• When a table in database is joined with multiple tables of the same database then cluster
file organization method will be more efficient compared to other file organization
methods.

Disadvantages

• Not suitable for large databases: This method is not suitable if the size of the database
is huge as the performance of various operations on the data will be poor.

• Not flexible with joining condition: This method is not suitable if the join condition of
the tables keep changing, as it may take additional time to traverse the joined tables
again for the new condition.
9.18 Data Storage

• Isolated tables: If tables are not that related and there is rarely any join query on tables
then using this file organization is not recommended. This is because maintaining the
cluster for such tables will be useless when it is not used frequently.

9.2.2.6. B+ Tree File Organization in DBMS

Similar to ISAM file organization, B+ file organization also works with key & index
value of the records. It stores the records in a tree like structure that is why it is also known as
B+ Tree file organization. In B+ file organization, the leaf nodes store the records and
intermediate nodes only contain the pointer to the leaf nodes, these intermediate nodes do not
store any record.

Root node and intermediate nodes contain key field and index field. The key field is a
primary key of record which can be used to distinctly identify a record, the index field contains
the pointer (address) to the leaf node where the actual record is stored.

B+ Tree Representation

Let’s say we are storing the records of employees of an organization. These employee
records contain fields such as Employee_id, Employee_name, Employee_address etc. If we
consider Employee_id as primary key and the values of Employee_id ranges from 1001 to 1009
then the B+ tree representation can be as shown in Figure 9.11.

Figure 9.11 B+ Tree File Organization

The important point to note here is that the records are only stored at the leaf nodes,
other records contain the key and index value (pointer to leaf node). Leaf Node 1001 means
that it stores the complete record of the employee where employee id is “1001”. Similarly, nodes
1002 stores the record of employee with employee id “1002” and so on. The main advantage of
B+ file organization is that searching a record is faster. This is because all the leaf nodes (where
the actual record is stored) are at the same distance from the root node and can be accessed
faster. Since intermediate nodes do not contain the records and only contains the pointer to the
leaf nodes, the height of the B+ tree is shorter that makes the traversing easier and faster.
Database Management Systems 9.19

Advantages

• Searching is faster: As we discussed earlier, since all the leaf nodes are at minimal
distance from the root node, searching a record is faster in B+ tree file organization.

• Flexible: Adding new records and removing old records can be easily done in a B+ tree
as the B+ tree is flexible in terms of size; it can grow and shrink based on the records
that needs to be stored. It has no restriction on the amount of the records that can be
stored.

• Allows range retrieval: It allows range retrieval. For example, if there is a requirement
to fetch all the records from a certain range, then it can be easily done using this file
organization method.

• Allows partial searches: Similar to ISAM, this also allows partial searches. For
example, we can search all the employees where id starts with “10“.

• Better performance: This file organization method gives better performance than other
file organization methods for insert, update, and delete operations.

• Re-organization of records is not necessary to maintain performance.

Disadvantages

• Extra insertion and deletion cause space overhead.

• This method is not suitable for static tables as it is not efficient for static tables compared
to other file organization methods.

9.3. DATA DICTIONARY STORAGE

Data dictionary is defined as a DBMS component that stores the definition of data
characteristics and relationships. It is simple ―data about data‖ as metadata. The DBMS data
dictionary provides the DBMS with its self-describing characteristic. In effect, the data
dictionary resembles and X-ray of the company’s entire data set, and is a crucial element in the
data administration function.

The two main types of data dictionary exist, integrated and stand alone.

1) An integrated data dictionary is included with the DBMS. For example, all relational
DBMSs include a built-in data dictionary or system catalog that is frequently accessed
and updated by the RDBMS.
9.20 Data Storage

2) Other DBMSs especially older types, do not have a built-in data dictionary instead the
DBA may use third-party stand-alone data dictionary systems

Data dictionaries can also be classified as active or passive. An active data dictionary is
automatically updated by the DBMS with every database access, thereby keeping its access
information up-to-date. A passive data dictionary is not updated automatically and usually
requires a batch process to be run. Data dictionary access information is normally used by the
DBMS for query optimization purpose.

The data dictionary’s main function is to store the description of all objects that interact
with the database. Integrated data dictionaries tend to limit their metadata to the data managed
by the DBMS. Stand-alone data dictionary systems are more usually more flexible and allow
the DBA to describe and manage all the organization’s data, whether or not they are
computerized. Whatever the data dictionary’s format, its existence provides database designers
and end users with a much improved ability to communicate. In addition, the data dictionary is
the tool that helps the DBA to resolve data conflicts. Although, there is no standard format for
the information stored in the data dictionary several features are common. For example, the data
dictionary typically stores descriptions of all:

• Data elements that are defined in all tables of all databases. Specifically, the data
dictionary stores the name, data types, display formats, internal storage formats, and
validation rules. The data dictionary tells where an element is used, by whom it is used
and so on.

• Tables defined in all databases. For example, the data dictionary is likely to store the
name of the table creator, the date of creation access authorizations, the number of
columns, and so on.

• Indexes defined for each database tables. For each index, the DBMS stores at least the
index names, the attributes used, the location, specific index characteristics and the
creation date.

• Define databases: who created each database, the date of creation where the database is
located, who the DBA is and so on.

• End users and the Administrators of the data base

• Programs that access the database including screen formats, report formats application
formats, SQL queries and so on.

• Access authorization for all users of all databases.


Database Management Systems 9.21

• Relationships among data elements: which elements are involved, whether the
relationship is mandatory or optional, the connectivity and cardinality and so on.

If the data dictionary can be organized to include data external to the DBMS itself, it
becomes an especially flexible to for more general corporate resource management. The
management of such an extensive data dictionary, thus, makes it possible to manage the use and
allocation of all of the organization information regardless whether it has its roots in the
database data.

9.4. COLUMN ORIENTED STORAGE

A column oriented store database is a type of database that stores data using a column
oriented model. It is responsible for speeding up the time required to return a particular query.
It also is responsible for greatly improving the disk I/O performance. It is helpful in data
analytics and data warehousing. Also, the major motive of Columnar Database is to effectively
read and write data. Here are some examples for Columnar Database like Monet DB, Apache
Cassandra, SAP Hana, Amazon Redshift.

9.4.1. The Structure of a Column Store Database

Columns store databases use a concept called a keyspace. A keyspace is kind of like a
schema in the relational model. The keyspace contains all the column families (kind of like
tables in the relational model), which contain rows and columns as shown in Figure 9.12.

Figure 9.12 Keyspace in Column Store Database


9.22 Data Storage

Let us take a closer look at a column family. Consider the Figure 9.13.

Figure 9.13 Column Family

The above diagram shows:

• A column family consists of multiple rows.

• Each row can contain a different number of columns to the other rows. And the columns
don’t have to match the columns in the other rows (i.e., they can have different column
names, data types, etc.).

• Each column is contained to its row. It doesn’t span all rows like in a relational database.
Each column contains a name/value pair, along with a timestamp. Note that this example
uses Unix/Epoch time for the timestamp.

Row Construction

The figure 9.14 shows the breakdown of each element in the row.
Database Management Systems 9.23

Figure 9.14 Row in Column Family

The elements present in row are as follows:

• Row Key. Each row has a unique key, which is a unique identifier for that row.

• Column. Each column contains a name, a value, and timestamp.

• Name. This is the name of the name/value pair.

• Value. This is the value of the name/value pair.

• Timestamp. This provides the date and time that the data was inserted. This can be
used to determine the most recent version of data.

9.4.2. Advantages of Columnar Database

• Columnar databases can be used for different tasks such as when the applications that
are related to big data comes into play then the column-oriented databases have greater
attention in such case.

• The data in the columnar database has a highly compressible nature and has different
operations like (AVG), (MIN, MAX), which are permitted by the compression.

• Efficiency and Speed: The speed of Analytical queries that are performed is faster in
columnar databases.

• Self-indexing: Another benefit of a column-based DBMS is self-indexing, which uses


less disk space than a relational database management system containing the same data.

9.4.3. Limitation of Columnar Database

• For loading incremental data, traditional databases are more relevant as compared to
column-oriented databases.

• For Online transaction processing (OLTP) applications, Row oriented databases are
more appropriate than columnar databases.
CHAPTER – X
INDEXING AND HASHING

10.1. INTRODUCTION

An index for a file works like a catalogue in a library. Cards in alphabetic order tell us
where to find books by a particular author.

In real-world databases, indices like this might be too large to be efficient. We'll look at
more sophisticated indexing techniques. There are two kinds of indices.

• Ordered indices: indices are based on a sorted ordering of the values.

• Hash indices: indices are based on the values being distributed uniformly across a range
of buckets. The bucket to which a value is assigned is determined by a function, called
a hash function.

We will consider several indexing techniques. No one technique is the best. Each
technique is best suited for a particular database application. Methods will be evaluated
based on:

• Access Types - types of access that are supported efficiently, e.g., value-based search or
range search.

• Access Time - time to find a particular data item or set of items.

• Insertion Time - time taken to insert a new data item (includes time to find the right
place to insert).

• Deletion Time - time to delete an item (includes time taken to find item, as well as to
update the index structure).

• Space overhead - additional space occupied by an index structure.

We may have more than one index or hash function for a file. (The library may have
card catalogues by author, subject or title.)

The attribute or set of attributes used to look up records in a file is called the search key
(not to be confused with primary key, etc.).
10.2 Indexing and Hashing

10.2 ORDERED INDICES


• In order to allow fast random access, an index structure may be used.
• A file may have several indices on different search keys.
• If the file containing the records is sequentially ordered, the index whose search key
specifies the sequential order of the file is the primary index, or clustering index.
Note: The search key of a primary index is usually the primary key, but it is not necessarily so.
Indices whose search key specifies an order different from the sequential order of the
file are called the secondary indices, or non clustering indices.
There are two types of ordered indices.
Dense Index
• An index record appears for every search key value in file.
• This record contains search key value and a pointer to the actual record.
Sparse Index
• Index records are created only for some of the records.
• To locate a record, we find the index record with the largest search key value less than
or equal to the search key value we are looking for.
• We start at that record pointed to by the index record, and proceed along the pointers in
the file (that is, sequentially) until we find the desired record.
Fig 10.2 and 10.3 show dense and sparse indices for the deposit file shown in fig 10.1.
Notice how we would find records for Perryridge branch using both methods.
Dense indices are faster in general, but sparse indices require less space and impose less
maintenance for insertions and deletions.

Fig. 10.1 Sequential file for deposit records.


Database Management Systems 10.3

Fig. 10.2 Dense index

Fig. 10.3 Sparse index

A good compromise is

• To have a sparse index with one entry per block.

• Biggest cost is in bringing a block into main memory.

• We are guaranteed to have the correct block with this method, unless record is on an
overflow block (actually could be several blocks).

• Index size is small.


10.4 Indexing and Hashing

Multi-Level Indices
1. Even with a sparse index, index size may still grow too large. For 100,000 records, 10
per block, at one index record per block, that's 10,000 index records. Even if we can fit
100 index records per block, this is 100 blocks.
2. If index is too large to be kept in main memory, a search results in several disk reads.
If there are no overflow blocks in the index, we can use binary search.
This will read as many as 1 + log2(b) blocks (as many as 7 for our 100 blocks).
If index has overflow blocks, then sequential search typically used, reading all b index
blocks.
The solution is to construct a sparse index on the index (Figure 10.4).
• Use binary search on outer index. Scan index block found until correct index record
found. Use index record as before - scan block pointed to for desired record.
• For very large files, additional levels of indexing may be required.
• Indices must be updated at all levels when insertions or deletions require it.
• Frequently, each level of index corresponds to a unit of physical storage (e.g.
indices at the level of track, cylinder and disk).

Fig. 10.4 Two-level sparse index


Database Management Systems 10.5

Updation

Regardless of what form of index is used, every index must be updated whenever a
record is either inserted into or deleted from the file.

Deletion

• Find (look up) the record

• If the last record with a particular search key value, delete that search key value from
index.

• For dense indices, this is like deleting a record in a file.

• For sparse indices, delete a key value by replacing key value's entry in index by next
search key value. If that value already has an index entry, delete the entry.

Insertion

• Find place to insert.

• Dense index: insert search key value if not present.

• Sparse index: no change unless new block is created. (In this case, the first search key
value appearing in the new block is inserted into the index).

Secondary Indices

• If the search key of a secondary index is not a candidate key, it is not enough to point to
just the first record with each search-key value because the remaining records with the
same search-key value could be anywhere in the file. Therefore, a secondary index must
contain pointers to all the records.

• We can use an extra-level of indirection to implement secondary indices on search keys


that are not candidate keys. A pointer does not point directly to the file but to a bucket
that contains pointers to the file.

• To perform a lookup on Peterson, we must read all three records pointed to by entries in
bucket2.

• Only one entry points to a Peterson record, but three records need to be read.

• As file is not ordered physically by cname, this may take 3 block accesses.
10.6 Indexing and Hashing

• Secondary indices must be dense, with an index entry for every search-key value, and a
pointer to every record in the file.

• Secondary indices improve the performance of queries on non-primary keys.

• They also impose serious overhead on database modification: whenever a file is updated,
every index must be updated.

• Designer must decide whether to use secondary indices or not.

Examples of secondary sparse indices are shown in Fig. 10.5 a) and Fig. 10.5 b).

Green
Lindsay Brighton 217 Green 750
Downtown 101 Johnson 500
Smith
Downtown 110 Peterson 600
Mianus 215 Smith 700
Perriridge 102 Hayes 400
Perriridge 201 Williams 900
Perriridge 218 Lyle 700
Redwood 222 Lindsay 700
Round Hill 305 Turner 350

Fig. 10.5 a) Sparse secondary index on cname

Fig. 10.5 b Sparse secondary index on amount


Database Management Systems 10.7

10.3 B+ TREE INDEX FILES

The primary disadvantage of index-sequential file organization is that performance


degrades as the file grows and periodic reorganization of entire file is required. This is not the
case in B+ trees. It automatically reorganizes itself with small, local, changes, in the face of
insertions and deletions. Reorganization of entire file is not required in B+ trees. The features
of B+ trees are

1. B+ tree file structure maintains its efficiency despite frequent insertions and deletions.
It imposes some acceptable update and space overheads.

2. A B+ tree index is a balanced tree in which every path from the root to a leaf is of the
same length.

3. Each non leaf node in the tree must have between n/2 and n children, n-1 search keys
where n is fixed for a particular tree.

4. Special cases: if the root is not a leaf, it has at least 2 children. If the root is a leaf (that
is, there are no other nodes in the tree), it can have between 0 and (n − 1) values

10.3.1 Structure of a B+Tree

1. A B+ tree index is a multilevel index but is structured differently from that of multi-level
index sequential files.

2. A typical node (Figure 10.6) contains up to n-1 search key values K1, K2, … Kn-1, and n
pointers P1, P2,,,,,,Pn. Search key values in a node are kept in sorted order.

Fig. 10.6 Typical node of a B+ tree

• Ki are the search-key values

• Pi are pointers to children (for non-leaf nodes) or pointers to records or buckets of


records (for leaf nodes)

3. For leaf nodes, Pi (i = 1,…,n-1) points to either a file record with search key value Ki,
or a bucket of pointers to records with that search key value. Bucket structure is used if
search key is not a primary key, and file is not sorted in search key order. Pointer Pn (nth
pointer in the leaf node) is used to chain leaf nodes together in linear order (search key
order). This allows efficient sequential processing of the file.
10.8 Indexing and Hashing

4. Non-leaf nodes form a multilevel index on leaf nodes.

A non-leaf node may hold up to n pointers and must hold n/2 pointers. The number of
pointers in a node is called the fan-out of the node.

Consider a node containing m pointers, pointer Pi (i = 2,…., m) points to a subtree


containing search key values between Ki-1 and < Ki. Pointer Pm points to a subtree containing
search key values Km-1. Pointer P1 points to a subtree containing search key values < K1.

The following figures represents the B+ tree with n=3 and n=5.

Fig. 10.7 B+ tree with n=3

Fig. 10.8 B+ tree with n=5

10.3.2 Queries on B+ Trees

The following function returns the key value searched.

function find(v)

1. C=root

2. while (C is not a leaf node)

1. Let i be least number such that V  Ki.


Database Management Systems 10.9

2. if there is no such number i then

3. Set C = last non-null pointer in C

4. else if (v = C.Ki ) Set C = Pi +1

5. else set C = C.Pi

3. if for some i, Ki = V then return C.Pi

4. else return null /* no record with search-key value v exists. */

10.3.3 Updates on B+ Trees

1. Insertions and Deletions

Insertion and deletion are more complicated, as they may require splitting or combining
nodes to keep the tree balanced.

Perform the following steps to insert the key value.

Find the leaf node in which the search-key value would appear

1. If there is room in the leaf node, insert the value in the leaf node

2. Otherwise, split the node, then insert and propagate updates to parent nodes.

If splitting or combining are not required, deletion works as follows:

1. Find record to be deleted, and remove it from the bucket.

2. If bucket is now empty, remove search key value from leaf node.

2. Insertions Causing Splitting the Nodes

When insertion causes a leaf node to be too large, we split that node. Assume we wish
to insert a record with a value of “Clearview". There is no room for it in the leaf node
where it should appear. We now have n values (the n-1 search key values plus the new
one we wish to insert). We put the first n/2 values in the existing node, and the remainder
into a new node. The Figure 10.9 shows the B+ Tree before and after insertion of
“Clearview”.
10.10 Indexing and Hashing

Fig. 10.9 B+ Tree before and after insertion of “Clearview”

The new node must be inserted into the B+ tree. We also need to update search key
values for the parent (or higher) nodes of the split leaf node (Except if the new node is
the leftmost one). Order must be preserved among the search key values in each node.
If the parent was already full, it will have to be split. When a non-leaf node is split, the
children are divided among the two new nodes. In the worst case, splits may be required
all the way up to the root. (If the root is split, the tree becomes one level deeper.)

Note: when we start a B+ tree, we begin with a single node that is both the root and a
single leaf. When it gets full and another insertion occurs, we split it into two leaf nodes,
requiring a new root.

3. Deletions Causing Combining Nodes

Deleting records may cause tree nodes to contain too few pointers. Then we must
combine nodes. The result of deleting “Downtown" from the B+ tree of Fig 10.9 is
shown in Fig. 10.10.

Fig. 10.10 After deleting Downtown


Database Management Systems 10.11

In this case, the leaf node is empty and must be deleted. If we wish to delete “Perryridge"
from the B+ tree of Figure 10.9 the parent is left with only one pointer, and must be
coalesced with a sibling node. Sometimes higher-level nodes must also be coalesced. If
the root becomes empty as a result, the tree is one level less deep (Figure 10.11).
Sometimes the pointers must be redistributed to keep the tree balanced. Deleting
“Perryridge" from Figure 10.9 produces Figure 10.11.

Fig. 10.11 After deleting Perryridge

10.3.4 B+ Tree File Organization

1. The B+ tree structure is used not only as an index but also as an organizer for records
into a file.

2. In a B+ tree file organization, the leaf nodes of the tree store record instead of storing
pointers to records.

3. Since records are usually larger than pointers, the maximum number of records that can
be stored in a leaf node is less than the maximum number of pointers in a nonleaf node.

4. However, the leaf node is still required to be at least half full.

5. Insertion and deletion from a B+ tree file organization are handled in the same way as
that in a B+ tree index.

6. When a B+ tree is used for file organization, space utilization is particularly important.
We can improve the space utilization by involving more sibling nodes in redistribution
during splits and merges.

7. In general, if m nodes are involved in redistribution, each node can be guaranteed to


contain at least [(m-1) to n/m] entries. However, the cost of update becomes higher as
more siblings are involved in redistribution.
10.12 Indexing and Hashing

10.4 B TREE INDEX FILES

B tree indices are similar to B+ tree indices. The general structure of B tree is given in
Fig.10.12.

Fig. 10.12 Leaf and non leaf node of a B tree.

The Difference is that B tree eliminates the redundant storage of search key values. In
+
B tree of Fig 10.7, some search key values appear twice whereas B tree allows search key
values to appear only once as shown in Fig 10.13. Thus, we can store the index in less space.

Fig. 10.13 - B tree representation of Fig. 10.7


Advantages
• May use less tree nodes than a corresponding B+-Tree.
• Sometimes possible to find search-key value before reaching leaf node.
Disadvantages
• Only small fraction of all search-key values are found early
• Non-leaf nodes are larger, so fan-out is reduced. Thus, B-Trees typically have greater
depth than corresponding B+-Tree
• Insertion and deletion more complicated than in B+-Trees
• Implementation is harder than B+-Trees.

The advantages of B-Trees do not out weigh its disadvantages. Generally, the structural
simplicity of B+ tree is preferred.
Database Management Systems 10.13

10.5 HASHING

It is a technique used for performing insertions, deletions and search operation in


constant average time by implementing Hash table data structure. Instead of comparisons, it
uses a mathematical function.

The idea behind hashing is to provide a function h called a hash function or randomizing
function that is applied to the hash field value of a record and yields the address of the disk block
in which the record is stored. Hashing is typically implemented as a hash table through the use
of an array of records. Suppose that the array index range is from 0 to M-1 (Fig. 10.14) then we
have M slots whose addresses correspond to the array indexes. We choose a hash function that
transforms the hash field valve into an integer between 0 and M-1. One common hash function
is

h (k) = k mod M function

which returns the remainder of the integer hash field value K after division by M; this
value is then used for the record address. Non-integer hash field values can be transformed into
integers before the mod function is applied. For character strings, the numeric (ASCII) codes
associated with character can be used in the transformation.

Fig 10.14 Hashing

There are two types of hashing

1. Static hashing – the hash function maps search key value to a fixed set of locations

2. Dynamic hashing – the hash table can grow to handle more items at run time.
10.14 Indexing and Hashing

A collision occurs when the hash field value of a record that is being inserted hashes to
an address that already contains a different record. in this situation, we must insert the new
record in some other position, since its hash address is occupied. The process of finding another
position is called collision resolution. The methods for collision resolution are as follows:

1. Open addressing: processing from the occupied position specified by the hash address,
the program checks the subsequent positions in order until an unused position is found.

Algorithm: Collision resolution by open addressing.


i ← hash_address(K);
a←i
if location i is occupied
then begin i ← (i + 1) mod M;
while (i ≠ a) and location i is occupied
do i ← (i + 1) mod M;
if (i = a) then all position are full
else new_hash_address ← i;
end;

2. Chaining: Various overflow locations are kept, usually by extending the array with a
number of overflow positions. A pointer field is added to each record location and setting
the pointer of the pointer of the occupied of the hash address location to the address of
that overflow location.

3. Multiple hashing: The program applies a second hash function if the first results in a
collision. If another collision results, the program uses open addressing or applies third
hash function and then uses open addressing if necessary.

10.5.1 Static Hashing

Index schemes force us to traverse an index structure. Hashing avoids this kind of
unnecessary traversal.

Hash File Organization

Hashing involves computing the address of a data item by computing a function on the
search key value. A bucket is a unit of storage containing one or more entries (a bucket is
typically a disk block). A hash function h is a function from the set of all search key values K
to the set of all bucket addresses B as shown in Fig 10.15.
Database Management Systems 10.15

We choose a number of buckets to correspond to the number of search key values we


will have stored in the database. To perform a lookup on a search key value Ki, we compute
h(Ki), and search the bucket with that address.

If two search keys i and j map to the same address, because h(Ki) = h(Kj), then the bucket
at the address obtained will contain records with both search key values. In this case we will
have to check the search key value of every record in the bucket to get the ones we want.
Insertion and deletion are simple.

Fig 10.15 Static Hashing

Hash Functions

A good hash function gives an average-case lookup that is a small constant, independent
of the number of search keys. We hope records are distributed uniformly among the buckets.
The worst hash function maps all keys to the same bucket. The best hash function maps all keys
to distinct addresses. Ideally, distribution of keys to addresses is uniform and random.

Suppose we have 26 buckets, and map names beginning with ith letter of the alphabet
to the ith bucket.

Problem: This does not give uniform distribution. Many more names will be mapped to
“A" than to “X".

Typical hash functions perform some operation on the internal binary machine
representations of characters in a key. For example, compute the sum, modulo # of buckets, of
the binary representations of characters of the search key.
10.16 Indexing and Hashing

Handling of bucket overflows

1. Open hashing occurs where records are stored in different buckets. Compute the hash
function and search the corresponding bucket to find a record.

2. Closed hashing occurs where all records are stored in one bucket. Hash function
computes addresses within that bucket. (Deletions are difficult.) Not used much in
database applications.

Drawbacks of static hashing:

• Hash function must be chosen at implementation time.

• Number of buckets is fixed, but the database may grow.

• If number is too large, we waste space. If number is too small, we get too many
“collisions", resulting in records of many search key values being in the same bucket.
Choosing the number to be twice the number of search key values in the file gives a
good space/performance tradeoff.

Hash Indices

1. A hash index organizes the search keys with their associated pointers into a hash file
structure.

2. We apply a hash function on a search key to identify a bucket, and store the key and its
associated pointers in the bucket (or in overflow buckets).

3. Strictly speaking, hash indices are only secondary index structures, since if a file itself
is organized using hashing, there is no need for a separate hash index structure on it.

10.5.2 Dynamic Hashing

1. As the database grows over time, we have three options:

• Choose hash function based on current file size. Get performance degradation as
file grows.

• Choose hash function based on anticipated file size. Space is wasted initially.

• Periodically re-organize hash structure as file grows. Requires selecting new hash
function, recomputing all addresses and generating new bucket assignments.
Costly, and shuts down database.
Database Management Systems 10.17

2. Some hashing techniques allow the hash function to be modified dynamically to


accommodate the growth or shrinking of the database. These are called dynamic hash
functions.

Extendable hashing is one form of dynamic hashing. Extendable hashing splits and
coalesces buckets as database size changes. This imposes some performance overhead,
but space efficiency is maintained. As reorganization is on one bucket at a time,
overhead is acceptably low.

Figure 10.16 shows an extendable hash structure. Note that the i appearing over the
bucket address table tells how many bits are required to determine the correct bucket. It
may be the case that several entries point to the same bucket. All such entries will have
a common hash pre x, but the length of this pre x may be less than i.

So, we give each bucket an integer giving the length of the common hash pre x. This is
shown in Figure 10.16 as ij. Number of bucket entries pointing to bucket j is then
2(i-ij ).

Fig. 10.16 General extendable hash structure

4. To find the bucket containing search key value Kl:

Compute h(Kl).

Take the first i high order bits of h(Kl).

Look at the corresponding table entry for this i-bit string.

Follow the bucket pointer in the table entry.


10.18 Indexing and Hashing

5. We now look at insertions in an extendable hashing scheme.

Follow the same procedure for lookup, ending up in some bucket j.

If there is room in the bucket, insert information and insert record in the file.

If the bucket is full, we must split the bucket, and redistribute the records.

If bucket is split we may need to increase the number of bits we use in the hash.

6. Two cases exist:

1. If i = ij, then only one entry in the bucket address table points to bucket j.

Then we need to increase the size of the bucket address table so that we can include
pointers to the two buckets that result from splitting bucket j.

We increment i by one, thus considering more of the hash, and doubling the size of
the bucket address table.

Each entry is replaced by two entries, each containing original value.

Now two entries in bucket address table point to bucket j.

We allocate a new bucket z, and set the second pointer to point to z.

Set ij and iz to i.

Rehash all records in bucket j which are put in either j or z.

Now insert new record.

It is remotely possible, but unlikely, that the new hash will still put all of the records
in one bucket. If so, split again and increment i again.

2. If i > ij, then more than one entry in the bucket address table points to bucket j.

Then we can split bucket j without increasing the size of the bucket address table.

Note that all entries that point to bucket j correspond to hash prefixes that have the
same value on the leftmost ij bits.

We allocate a new bucket z, and set ij and iz to the original ij value plus 1.

Now adjust entries in the bucket address table that previously pointed to bucket j.

Leave the first half pointing to bucket j, and make the rest point to bucket z.
Database Management Systems 10.19

Rehash each record in bucket j as before.

Reattempt new insert.

Note that in both cases we only need to rehash records in bucket j.

The Deletion of records is similar. Buckets may have to be coalesced, and bucket
address table may have to be halved.

Example-Extendible Hashing: Now, let us consider a prominent example of hashing the


following elements: 16,4,6,22,24,10,31,7,9,20,26.

Bucket Size: 3 (Assume)

Hash Function: Suppose the global depth is X. Then the Hash Function returns X LSBs.

Solution: First, calculate the binary forms of each of the given numbers.

16- 10000
4- 00100
6- 00110
22- 10110
24- 11000
10- 01010
31- 11111
7- 00111
9- 01001
20- 10100
26- 11010

Initially, the global-depth and local-depth is always 1. Thus, the hashing frame looks like this:
10.20 Indexing and Hashing

Inserting 16:

The binary format of 16 is 10000 and global-depth is 1. The hash function returns 1 LSB of
10000 which is 0. Hence, 16 is mapped to the directory with id=0.

Inserting 4 and 6:

Both 4(100) and 6(110) have 0 in their LSB. Hence, they are hashed as follows:

Inserting 22: The binary form of 22 is 10110. Its LSB is 0. The bucket pointed by directory 0 is
already full. Hence, Over Flow occurs.

Since Local Depth = Global Depth, the bucket splits and directory expansion takes
place. Also, rehashing of numbers present in the overflowing bucket takes place after the split.
And, since the global depth is incremented by 1, now,the global depth is 2. Hence, 16,4,6,22
are now rehashed w.r.t 2 LSBs.[ 16(10000),4(100),6(110),22(10110) ]
Database Management Systems 10.21

Note: The bucket which was underflow has remained untouched. But, since the number
of directories has doubled, we now have 2 directories 01 and 11 pointing to the same bucket.
This is because the local-depth of the bucket has remained 1. And, any bucket having a local
depth less than the global depth is pointed-to by more than one directory.

Inserting 24 and 10: 24(11000) and 10 (1010) can be hashed based on directories with
id 00 and 10. Here, we encounter no overflow condition.

Inserting 31,7,9: All of these elements[ 31(11111), 7(111), 9(1001) ] have either 01 or
11 in their LSBs. Hence, they are mapped on the bucket pointed out by 01 and 11. We do not
encounter any overflow condition here.
10.22 Indexing and Hashing

Inserting 20: Insertion of data element 20 (10100) will again cause the overflow
problem. since the local depth of the bucket = global-depth, directory expansion (doubling)
takes place along with bucket splitting. Elements present in overflowing bucket are rehashed
with the new global depth. Now, the new Hash table looks like this:

Inserting 26: Global depth is 3. Hence, 3 LSBs of 26(11010) are considered. Therefore
26 best fits in the bucket pointed out by directory 010.

since the local depth of bucket < Global depth (2<3), directories are not doubled but,
only the bucket is split and elements are rehashed.

Finally, the output of hashing the given list of numbers is obtained.


Database Management Systems 10.23

Advantages

• Extendable hashing provides performance that does not degrade as the file grows.

• Minimal space overhead - no buckets need be reserved for future use. Bucket address
table only contains one pointer for each hash value of current pre x length.

Disadvantages:

• Extra level of indirection in the bucket address table

• Added complexity

Comparison of Indexing and Hashing

1. To make a wise choice between the methods seen, database designer must consider the
following issues:

• Is the cost of periodic re-organization of index or hash structure acceptable?

• What is the relative frequency of insertion and deletion?

• Is it desirable to optimize average access time at the expense of increasing worst-


case access time?

• What types of queries are users likely to pose?

2. The last issue is critical to the choice between indexing and hashing. If most queries are
of the form select A1, A2,…., An from r where Ai = c, then to process this query the
10.24 Indexing and Hashing

system will perform a lookup on an index or hash structure for attribute Ai with value
c.

3. For these sorts of queries a hashing scheme is preferable.

Index lookup takes time proportional to log of number of values in R for Ai.

Hash structure provides lookup average time that is a small constant (independent of
database size).

4. However, the worst-case favours indexing:

Hash worst-case gives time proportional to the number of values in R for Ai.

Index worst case still log of number of values in R.

5. Index methods are preferable where a range of values is specified in the query, e.g. select
A1, A2,… An from r where Ai c2 and Ai c1. This query finds records with Ai values
in the range from c1 to c2.

6. Using an index structure, we can find the bucket for value c1, and then follow the pointer
chain to read the next buckets in alphabetic (or numeric) order until we find c2.

If we have a hash structure instead of an index, we can find a bucket for c1 easily, but
it is not easy to find the “next bucket".

A good hash function assigns values randomly to buckets.

Also, each bucket may be assigned many search key values, so we cannot chain them
together.

To support range queries using a hash structure, we need a hash function that preserves
order.

For example, if K1 and K2 are search key values and K1 < K2 then h(K1) < h(K2).

Such a function would ensure that buckets are in key order.

Order-preserving hash functions that also provide randomness and uniformity are
extremely difficult to find.

Thus, most systems use indexing in preference to hashing unless it is known in advance
that range queries will be infrequent.
CHAPTER – XI
QUERY PROCESSING AND
OPTIMIZATION

11.1. QUERY PROCESSING

Query Processing is the activity performed in extracting data from the database. The
steps involved are:

1. Parsing and translation

2. Optimization

3. Evaluation

Parsing and Translation

Before query processing can begin, the system must translate the query into a usable
form. A language such as SQL is suitable for human use, but is ill suited to be the system’s
internal representation of a query. A more useful internal representation is one based on the
extended relational algebra.

Thus, the first action the system must take in query processing is to translate a given
query into its internal form. This translation process is similar to the work performed by the
parser of a compiler. In generating the internal form of the query, the parser checks the syntax
of the user’s query, verifies that the relation names appearing in the query are names of the
relations in the database, and so on. The system constructs a parse-tree representation of the
query, which it then translates into a relational-algebra expression. If the query was expressed
in terms of a view, the translation phase also replaces all uses of the view by the relational-
algebra expression that defines the view.

The steps involved in processing a query appear in Figure 11.1.


11.2 Query Processing and Optimization

Figure 11.1 – Steps in Query Processing

Suppose a user executes a query to fetch the records of the employees whose salary is
less than 75000. For doing this, the query is:

select emp_name from Employee where salary<75000;

Thus, to make the system understand the user query, it needs to be translated in the form
of relational algebra. We can bring this query in the relational algebra form as:

• σsalary<75000 (πsalary (Employee))

• πsalary (σsalary<75000 (Employee))

After translating the given query, we can execute each relational algebra operation by
using different algorithms.

Optimization

The different evaluation plans for a given query can have different costs. We do not
expect users to write their queries in a way that suggests the most efficient evaluation plan.
Rather, it is the responsibility of the system to construct a query evaluation plan that minimizes
the cost of query evaluation; this task is called query optimization

In order to optimize a query, a query optimizer must know the cost of each operation.
Although the exact cost is hard to compute, since it depends on many parameters such as actual
memory available to the operation, it is possible to get a rough estimate of execution cost for
each operation. Usually, a database system generates an efficient query evaluation plan, which
Database Management Systems 11.3

minimizes its cost. This type of task performed by the database system and is known as Query
Optimization.

The cost of the query evaluation can vary for different types of queries. For optimizing
a query, the query optimizer should have an estimated cost analysis of each operation. It is
because the overall operation cost depends on the memory allocations to several operations,
execution costs, and so on.

Evaluation

For this, with addition to the relational algebra translation, it is required to annotate the
translated relational algebra expression with the instructions used for specifying and evaluating
each operation.

• In order to fully evaluate a query, the system needs to construct a


query evaluation plan.

• The annotations in the evaluation plan may refer to the algorithms


to be used for the particular index or the specific operations.

• Such relational algebra with annotations is referred to


as Evaluation Primitives. The evaluation primitives carry the
instructions needed for the evaluation of the operation.

• Thus, a query evaluation plan defines a sequence of primitive


operations used for evaluating a query. The query evaluation plan is also referred to
as the query execution plan.

• A query execution engine is responsible for generating the output of the given query. It
takes the query execution plan, executes it, and finally makes the output for the user
query.

Finally, after selecting an evaluation plan, the system evaluates the query and produces
the output of the query.

11.2 SELECTION OPERATION IN QUERY PROCESSING

Estimating the cost of a query plan should be done by measuring the total resource
consumption. Generally, the selection operation is performed by the file scan. File scans are the
search algorithms that are used for locating and accessing the data. It is the lowest-level operator
used in query processing.
11.4 Query Processing and Optimization

11.2.1 Selection using File scans and Indices

In RDBMS or relational database systems, the file scan reads a relation only if the whole
relation is stored in one file only. When the selection operation is performed on a relation whose
tuples are stored in one file, it uses the following algorithms:

• Linear Search: In a linear search, the system scans each record to test whether satisfying
the given selection condition. For accessing the first block of a file, it needs an initial
seek. If the blocks in the file are not stored in contiguous order, then it needs some extra
seeks. However, linear search is the slowest algorithm used for searching, but it is
applicable in all types of cases. This algorithm does not care about the nature of
selection, availability of indices, or the file sequence. But other algorithms are not
applicable in all types of cases.

11.2.2 Selection Operation with Indexes

The index-based search algorithms are known as Index scans. Such index structures are
known as access paths. These paths allow locating and accessing the data in the file. There are
following algorithms that use the index in query processing:

Primary index, equality on a key:

We use the index to retrieve a single record that satisfies the equality condition for
making the selection. The equality comparison is performed on the key attribute carrying a
primary key.

Primary index, equality on nonkey:

The difference between equality on key and nonkey is that in this, we can fetch multiple
records. We can fetch multiple records through a primary key when the selection criteria specify
the equality comparison on a nonkey.

Secondary index, equality on key or nonkey:

The selection that specifies an equality condition can use the secondary index. Using
secondary index strategy, we can either retrieve a single record when equality is on key or
multiple records when the equality condition is on nonkey. When retrieving a single record, the
time cost is equal to the primary index. In the case of multiple records, they may reside on
different blocks. This results in one I/O operation per fetched record, and each I/O operation
requires a seek and a block transfer.
Database Management Systems 11.5

11.2.3 Selection Operations with Comparisons

For making any selection on the basis of a comparison in a relation, we can proceed it
either by using the linear search or via indices in the following ways:

Primary index, comparison:

When the selection condition given by the user is a comparison, then we use a primary
ordered index, such as the primary B+-tree index. For example, when A attribute of a relation
R compared with a given value v as A>v, then we use a primary index on A to directly retrieve
the tuples. The file scan starts its search from the beginning till the end and outputs all those
tuples that satisfy the given selection condition.

Secondary index, comparison:

The secondary ordered index is used for satisfying the selection operation that involves
<, >, ≤, or ≥ In this, the files scan searches the blocks of the lowest-level index.

(< ≤): In this case, it scans from the smallest value up to the given value v.

(>, ≥): In this case, it scans from the given value v up to the maximum value.

However, the use of the secondary index should be limited for selecting a few records.
It is because such an index provides pointers to point each record, so users can easily fetch the
record through the allocated pointers. Such retrieved records may require an I/O operation as
records may be stored on different blocks of the file. So, if the number of fetched records is
large, it becomes expensive with the secondary index.

11.2.4 Implementing Complex Selection Operations

Working on more complex selection involves three selection predicates known as


Conjunction, Disjunction, and Negation.

Conjunction:

A conjunctive selection is the selection having the form as: σ θ1ꓥθ2ꓥ…ꓥθn (r)

A conjunction is the intersection of all records that satisfies the above selection condition.

Disjunction:

A disjunctive selection is the selection having the form as: σ θ1ꓦθ2ꓦ…ꓦθn (r)

A disjunction is the union of all records that satisfies the given selection condition θi.
11.6 Query Processing and Optimization

Negation:

The result of a selection σ¬θ(r) is the set of tuples of given relation r where the selection
condition evaluates to false. But nulls are not present, and this set is only the set of tuples in
relation r that are not in σθ(r).

Complex selection operations are implemented by using the following algorithms

Conjunctive selection using one index:

In such type of selection operation implementation, we initially determine if any access


path is available for an attribute. If found one, then algorithms based on the index will work
better. Further completion of the selection operation is done by testing that each selected records
satisfy the remaining simple conditions. The cost of the selected algorithm provides the cost of
this algorithm.

Conjunctive selection via Composite index:

A composite index is the one that is provided on multiple attributes. Such an index may
be present for some conjunctive selections. If the given selection operation proves true on the
equality condition on two or more attributes and a composite index is present on these combined
attribute fields, then directly search the index. Such type of index evaluates the suitable index
algorithms.

Conjunctive selection via the intersection of identifiers:

This implementation involves record pointers or record identifiers. It uses indices with
the record pointers on those fields which are involved in the individual selection condition. It
scans each index for pointers to tuples satisfying the individual condition. Therefore, the
intersection of all the retrieved pointers is the set of pointers to the tuples that satisfies the
conjunctive condition. The algorithm uses these pointers to fetch the actual records. However,
in absence of indices on each individual condition, it tests the retrieved records for the other
remaining conditions.

Disjunctive selection by the union of identifiers:

This algorithm scans those entire indexes for pointers to tuples that satisfy the individual
condition. But only if access paths are available on all disjunctive selection conditions.
Therefore, the union of all fetched records provides pointers sets to all those tuples which satisfy
or prove the disjunctive condition. Further, it makes use of pointers for fetching the actual
records. Somehow, if the access path is not present for anyone condition, we need to use a linear
search to find those tuples that satisfy the condition. Thus, it is good to use a linear search for
determining such tests.
Database Management Systems 11.7

11.3. JOIN OPERATION IN QUERY PROCESSING

We use the term equi-join to refer to a join of the form

where A and B are attributes or sets of attributes of relations r and s, respectively. We use the
following in our examples:

• Number of records of student: nstudent =5000

• Number of blocks of student: bstudent =100

• Number of records of takes: ntakes =10000

• Number of blocks of takes: btakes =400

11.3.1. Nested-Loop Join

It is used to compute the theta join, of two relations r and s. This algorithm is called the
nested-loop join algorithm, since it basically consists of a pair of nested for loops. Relation r is
called the outer relation and relation s the inner relation of the join. tr · ts denotes the tuple
constructed by concatenating the attribute values of tuples tr and ts . In the worst case, the buffer
can hold only one block of each relation, and a total of nr ∗ bs + br block transfers.

• In the worst case, the number of block transfers is 5000 ∗ 400+100 = 2,000,100, plus
5000+100 = 5100 seeks.

• The worst-case cost of our final strategy would have been 10,000 ∗ 100 + 400 =
1,000,400 block transfers, plus 10,400 disk seeks.

11.3.2. Block Nested-Loop Join

Within each pair of blocks, every tuple in one block is paired with every tuple in the
other block, to generate all pairs of tuples. All pairs of tuples that satisfy the join condition are
added to the result. The primary difference in cost between the block nested-loop join and the
basic nested-loop join is that. In the worst case, each block in the inner relation s is read only
once for each block in the outer relation, instead of once for each tuple in the outer relation. In
11.8 Query Processing and Optimization

the worst case, there will be a total of br ∗ bs + br block transfers, where br and bs denote the
number of blocks containing records of r and s, respectively.

• In the worst case, a total of 100 ∗ 400+100 =40,100 block transfers plus 2∗100 = 200
seeks are required.

• This cost is a significant improvement over the 5000∗400+100 = 2,000,100.

• The performance of the nested-loop and block nested-loop procedures can be further
improved as follows:

1. If the join attributes in a natural join or an equi-join form a key on the inner relation,
then for each outer relation tuple the inner loop can terminate as soon as the first match
is found.

2. Use the biggest size that can fit in memory, while leaving enough space for the buffers
of the inner relation and the output.

3. Scan the inner loop alternately forward and backward and thus reducing the number of
disk accesses needed.

4. Replace file scans with more efficient index lookups.

11.3.3. Indexed Nested-Loop Join

The index is used to look up tuples in s that will satisfy the join condition with tuple tr.
This join method is called an indexed nested-loop join and it can be used with existing indices,
as well as with temporary indices created for the sole purpose of evaluating the join. The time
cost of the join can be computed as br (tT + tS) + nr ∗ c, where nr is the number of records in
relation r, and c is the cost of a single selection on s using the join condition.

11.3.4. Merge Join

The merge-join algorithm (also called the sort-merge-join algorithm) can be used to
compute natural joins and equi-joins. Let r(R) and s(S) be the relations whose natural join is to
Database Management Systems 11.9

be computed, and let R ∩ S denote their common attributes. Join can be computed in the merge
stage in the merge–sort algorithm. These pointers point initially to the first tuple of the
respective relations. As the algorithm proceeds, the pointers move through the relation

11.3.5. Hybrid Merge Join

Hybrid merge-join technique that combines indices with merge join. The hybrid merge-
join algorithm merges the sorted relation with the leaf entries of the secondary B+ tree index.
The result file contains tuples from the sorted relation and addresses for tuples of the unsorted
relation. The result file is then sorted on the addresses of tuples of the unsorted relation, allowing
efficient retrieval of the corresponding tuples, in physical storage order, to complete the join.
11.10 Query Processing and Optimization

11.3.6. Hash Join

In the hash-join algorithm, a hash function h is used to partition tuples of both relations.
The basic idea is to partition the tuples of each of the relations into sets that have the same hash
value on the join attributes.

Hash partitioning of relations is shown in Figure 11.2.

Figure 11.2 Hash partitioning

After the partitioning of the relations, the rest of the hash-join code performs a separate
indexed nested-loop join on each of the partition pairs i, for i = 0, . . . , nh. To do so, it first
builds a hash index on each si, and then probes with tuples from ri. The relation s is the build
input, and r is the probe input. The system repeats this splitting of the input until each partition
of the build input fits in memory. Such partitioning is called recursive partitioning.

Handling of Overflows

• The number of partitions is therefore increased by a small value, called the fudge
factor that is usually about 20 percent of the number of hash partitions.

• Hash-table overflows can be handled by either overflow resolution or overflow


avoidance.

• Overflow resolution is performed during the build phase, if a hash-index overflow is


detected.
Database Management Systems 11.11

11.4. SORTING OPERATION IN QUERY PROCESSING

• External sorting refers to sorting algorithms that are suitable for large files of records
stored on disk that do not fit entirely in main memory.

• Use a sort-merge strategy, which starts by sorting small subfiles – called runs – of
the main file and merges the sorted runs, creating larger sorted subfiles that are
merged in turn.

• The algorithm consists of two phases: sorting phase and merging phase.

➢ Sorting phase

Runs of the file that can fit in the available buffer space are read into main memory,
sorted using an internal sorting algorithm, and written back to disk as temporary sorted
subfiles (or runs).

– number of initial runs (nR), number of file blocks (b), and


available buffer space (nb)

– nR = |b/nB|

If the available buffer size is 5 blocks and the file contains 1024 blocks, then there are 205
initial runs each of size 5 blocks. After the sort phase, 205 sorted runs are stored as
temporary subfiles on disk.

➢ Merging phase

▪ The sorted runs are merged during one or more passes.


11.12 Query Processing and Optimization

▪ The degree of merging (dM ) is the number of runs that can be merged together in
each pass.
▪ In each pass, one buffer block is needed to hold one block from each of the runs
being merged, and one block is needed for containing one block of the merge result.
▪ dM = MIN {nB − 1, nR}, and the number of passes is |logdM (nR)|.
▪ In previous example, dM = 4, 205 runs → 52 runs → 13 runs → 4 runs → 1 run.
This means 4 passes.
▪ The complexity of external sorting (number of block accesses): (2 × b) + (2 × (b ×
(logdM b)))
For example:
– 5 initial runs [2, 8, 11], [4, 6, 7], [1, 9, 13], [3, 12, 15], [5, 10, 14].
– The available buffer nB = 3 blocks → dM = 2 (two-way merge)
– After first pass: 3 runs
[2, 4, 6, 7, 8, 11], [1, 3, 9, 12, 13, 15], [5, 10, 14]
– After second pass: 2 runs
[1, 2, 3, 4, 6, 7, 8, 9, 11, 12, 13, 15], [5, 10, 14]
– After third pass:
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]

11.5. QUERY OPTIMIZATION USING HEURISTICS AND COST ESTIMATION

We apply heuristic rules to modify the internal representation of a query which is usually
in the form of a query tree or a query graph data structure to improve its expected performance.
The SELECT and PROJECT operations reduce the size of a file and hence should be applied
before a join or other binary operation. A query tree is used to represent a relational algebra or
extended relational algebra expression. A query graph is used to represent a relational calculus
expression.

11.5.1. Notation for Query Trees and Query Graphs

Query Tree

A query tree is a tree data structure that corresponds to a relational algebra expression.
It represents the input relations of the query as leaf nodes of the tree, and represents the
relational algebra operations as internal nodes. An execution of the query tree consists of
executing an internal node operation whenever its operands are available and then replacing
Database Management Systems 11.13

that internal node by the relation that results from executing the operation. The order of
execution of operations starts at the leaf nodes which represents the input database relations for
the query, and ends at the root node, which represents the final operation of the query. The
execution terminates when the root node operation is executed and produces the result relation
for the query.

Consider the relational algebra expression:

This corresponds to the following SQL Query.

The corresponding query graph is given as follows:

Query Graph

Relations in the query are represented by relation nodes, which are displayed as single
circles. Constant values, typically from the query selection conditions, are represented by
constant nodes, which are displayed as double circles or ovals. Selection and join conditions
are represented by the graph edges. The attributes to be retrieved from each relation are
displayed in square brackets above each relation. The query graph does not indicate an order on
which operations to perform first. There is only a single graph corresponding to each query.
Hence, a query graph corresponds to a relational calculus expression. Example:
11.14 Query Processing and Optimization

11.5.2. General Transformation Rules for Relational Algebra Operations

1. Cascade of σ:

σ c1 and c2 and ... and cn (R) ≡ σc1 (σc2 (. . . (σcn (R)) . . .))

2. Commutativity of σ:

σc1 (σc2 (R)) ≡ σc2 (σc1 (R))

3. Cascade of π:

πList1(πList2(. . . (πListn(R)) . . .)) ≡ πList1(R)

4. Commuting σ with π: If the selection condition c involves only those attributes


A1, A2, . . . , An in the projection list.

πA1,A2,...,An (σc(R)) ≡ σc(πA1,A2,...,An (R))

5. Commutativity of Da (and ×):

R Dac S ≡ S Dac R R × S ≡ S × R

6. Commuting σ with Da (or ×): If the selection condition c can be written as (c1 and
c2), where c1 involves only the attributes of R and c2 involves only the attributes of S.

σc (R Da S) ≡ (σc1 (R)) Da (σc2 (S))

7. Commuting π with Da (or ×): Suppose the projection list L = {A1, . . . , An, B1, . . . ,
Bm}, where A1, . . . , An are attributes of R and B1, . . . , Bm are attributes of S

∗ If the join condition c involves only attributes in L. πL (R Dac S) ≡ (πA1 ,...,An (R))
Dac (πB1 ,...,Bm (S))

∗ If the join condition c contains additional attributes not in L. For example,

An+1, . . . , An+k of R and Bm+1, . . . , Bm+p of S.

πL (R Dac S) ≡ πL ((πA1 ,...,An,An+1 ,...,An+k (R)) Dac (πB1 ,...,Bm ,Bm+1 ,...,Bm+p (S)))
Database Management Systems 11.15

8. Commutativity of set operations: ∩ and ∪ are commutative, but not −.

9. Associativity of Da, ×, ∩, ∪: Let θ be one of the four operations. (R θ S) θ T ≡ R θ


(S θ T )

10. Commuting σ with set operations: Let θ be one of the three set operations ∩, ∪,
and −.σc(R θ S) ≡ (σc(R)) θ (σc(S))

11. The π operation commutes with ∪:

πL(R ∪ S) ≡ (πL(R)) ∪ (πL(S))

12. Converting a (σ, ×) sequence into Da:

(σc (R × S)) ≡ (R Dac S)

Another possible transformation - DeMorgan’s law:

not (c1 and c2) ≡ (not c1) or (not c2) not (c1 or c2) ≡ (not c1) and (not c2)

11.5.3. Outline of a heuristic algebraic optimization algorithm

1. Break up the SELECT operations: Using rule 1, break up any SELECT operations
with conjunctive conditions into a cascade of SELECT operations.

2. Push down the SELECT operations: Using rules 2, 4, 6, and 10 concerning the
commutativity of SELECT with other operations, move each SELECT operation
as far down the tree as is permitted by the attributes involved in the select
condition.

3. Rearrange the leaf nodes: Using rules 5 and 9 concerning commutativity and
associativity of binary operations, rearrange the leaf nodes of the tree using the
following criteria.

▪ Position the leaf node relations with most restrictive SELECT operations so
they are executed first in the query tree.

▪ Make sure that the ordering of leaf nodes does not cause CARTESIAN
PRODUCT operations.

4. Change CARTESIAN PRODUCT to JOIN operations: Using rule 12, combine a


CARTESIAN PRODUCT operation with a subsequent SELECT operation in the
tree into a JOIN operation.
11.16 Query Processing and Optimization

5. Break up and push down PROJECT operations: Using rules 3, 4, 7, and 11


concerning the cascading of PROJECT and the com- muting of PROJECT with
other operations, break down and move lists of projection attributes down the tree
as far as possible by creating new PROJECT operations as needed.

6. Identify subtrees for pipelining: Identify subtrees that represent groups of


operations that can be executed by a single algorithm.

11.5.4. Cost Estimates in Query Optimization

A query optimizer does not depend solely on heuristic rules. It also estimates and
compares the costs of executing a query using different execution strategies and algorithms.
Then chooses the strategy with the lowest cost estimate. Compiled queries where the
optimization is done at compile time and the resulting execution strategy code is stored and
executed directly at runtime. For interpreted queries, where the entire process occurs at run
time.

11.5.5. Cost Components for Query Execution

1. The cost of executing a query includes the following components

• Access cost to secondary storage - This is the cost of transferring (reading and writing)
data blocks between secondary disk storage and main memory buffers. This is also
known as disk I/O (input/output) cost.

2. Disk storage cost - This is the cost of storing on disk any intermediate files that are
generated by an execution strategy for the query.

3. Computation cost - This is the cost of performing in-memory operations on the records
within the data buffers during query execution. Such operations include searching for
and sorting records, merging records for a join or a sort operation, and performing
computations on field values. This is also known as CPU (central processing unit) cost.

4. Memory usage cost - This is the cost pertaining to the number of main memory buffers
needed during query execution.

5. Communication cost - This is the cost of shipping the query and its results from the
database site to the site or terminal where the query originated.
Database Management Systems 11.17

11.5.6. Catalog Information Used in Cost Functions

For a file whose records are all of the same type, the number of records (tuples) (r),
the (average) record size (R), and the number of file blocks (b) (or close estimates of them)
are needed.

• The blocking factor (bfr) for the file may also be needed.

• The primary file organization records may be unordered, ordered by an attribute with
or without a primary or clustering index, or hashed (static hashing or one of the dynamic
hashing methods) on a key attribute.

• Information is also kept on all primary, secondary, or clustering indexes and their
indexing attributes.

• The number of levels (x) of each multilevel index (primary, secondary, or clustering) is
needed for cost functions that estimate the number of block accesses that occur during
query execution.

• In some cost functions the number of first-level index blocks (bI1) is needed.

• Another important parameter is the number of distinct values (d) of an attribute and the
attribute selectivity (sl), which is the fraction of records satisfying an equality condition
on the attribute.

• This allows estimation of the selection cardinality (s = sl*r) of an attribute, which is the
average number of records that will satisfy an equality selection condition on that
attribute.

11.5. 7. Example to Illustrate Cost-Based Query Optimization

The potential join orders—without CARTESIAN PRODUCT—are

A new temporary relation is created after each join operation. Both the join method and
the access methods for the input relations must be determined. The access method is a table
scan (that is, a linear search). The PROJECT relation will have the selection operation
performed before the join, so two options exist and table scan (linear search) and the optimizer
11.18 Query Processing and Optimization

must compare their estimated costs. Here, the overall cost of the algorithm is composed by
adding the cost of individual index scans and cost of fetching the records in the intersection of
the retrieved lists of pointers. We can minimize the cost by sorting the list of pointers and
fetching the sorted records. So, we found the following two points for cost estimation

• We can fetch all selected records of the block using a single I/O operation because each
pointer in the block appears together.

• The disk-arm movement gets minimized as blocks are read in sorted order.

11.5. 8. Cost Estimation Chart for various Selection algorithms

Here, br is the number of blocks in the file, hi denotes the height of the B+- tree, b is the
number of blocks holding records with specified search key, n is the number of fetched records,
tT – average time taken by disk subsystem to transfer a block of data and tS - average block-
access time (disk seek time plus rotational latency).

Selection Algorithms Cost Why So?

Linear Search t s + br * t T It needs one initial seek with br block transfers.

Linear Search, Equality ts + (br/2) * tT It is the average case where it needs only one
on Key record satisfying the condition. So as soon as it
is found, the scan terminates.

Primary B+-tree index, (hi +1) * (tT + Each I/O operation needs one seek and one block
Equality on Key t S) transfer to fetch the record by traversing the
height of the tree.

Primary B+-tree index, hi * (tT + ts) + It needs one seek for each level of the tree, and
Equality on a Nonkey b * tT one seek for the first block.

Secondary B+-tree index, (hi + 1) * Each I/O operation needs one seek and one block
Equality on Key (tT + tS) transfer to fetch the record by traversing the
height of the tree.
Database Management Systems 11.19

Secondary B+-tree index, (hi + n) * It requires one seek per record because each
Equality on Nonkey (tT + tS) record may be on a different block.

Primary B+-tree index, hi * (tT + tS) + It needs one seek for each level of the tree, and
Comparison b * tT one seek for the first block.

Secondary B+-tree index, (hi + n) * It requires one seek per record because each
Comparison (tT + tS) record may be on a different block.
CHAPTER – XII
DISTRIBUTED DATABASES

12.1. INTRODUCTION

Distributed databases bring the advantages of distributed computing to the database


domain. A distributed computing system consists of a number of processing sites or nodes that
are interconnected by a computer network and that cooperate in performing certain assigned
tasks. As a general goal, distributed computing systems partition a big, unmanageable problem
into smaller pieces and solve it efficiently in a coordinated manner.

Distributed database (DDB) is defined as a collection of multiple logically interrelated


databases distributed over a computer network, and a distributed database management system
(DDBMS) as a software system that manages a distributed database while making the
distribution transparent to the user.

For a database to be called distributed, the following minimum conditions should be satisfied:

• Connection of database nodes over a computer network: There are multiple


computers, called sites or nodes. These sites must be connected by an underlying
network to transmit data and commands among sites.

• Logical interrelation of the connected databases: It is essential that the information


in the various database nodes must be logically related.

• Possible absence of homogeneity among connected nodes: It is not necessary that all
nodes be identical in terms of data, hardware, and software.

12.1.1. Transparency

In a DDB scenario, the data and software are distributed over multiple nodes connected
by a computer network. So, additional types of transparencies are needed as listed below:

• Data organization transparency (also known as distribution or network


transparency): This refers to freedom for the user from the operational details of the
network and the placement of the data in the distributed system. It may be divided into
location transparency and naming transparency.
12.2 Distributed Databases

o Location transparency refers to the fact that the command used to perform a task
is independent of the location of the data and the location of the node where the
command was issued.
o Naming transparency implies that once a name is associated with an object, the
named objects can be accessed unambiguously without additional specification as
to where the data is located.
• Replication transparency: Copies of the same data objects may be stored at multiple
sites for better availability, performance, and reliability. Replication transparency makes
the user unaware of the existence of these copies.
• Fragmentation transparency: Two types of fragmentation are possible.
o Horizontal fragmentation distributes a relation (table) into subrelations that are
subsets of the tuples (rows) in the original relation; this is also known as sharding
in the newer big data and cloud computing systems.
o Vertical fragmentation distributes a relation into subrelations where each
subrelation is defined by a subset of the columns of the original relation.
Fragmentation transparency makes the user unaware of the existence of fragments.
Other transparencies include design transparency and execution transparency—which
refer, respectively, to freedom from knowing how the distributed database is designed and
where a transaction executes.
12.1.2. Availability and Reliability
Reliability and availability are two of the most common potential advantages cited for
distributed databases. Reliability is broadly defined as the probability that a system is running
(not down) at a certain time point, whereas availability is the probability that the system is
continuously available during a time interval. We can directly relate reliability and availability
of the database to the faults, errors, and failures associated with it. A failure can be described
as a deviation of a system’s behavior from that which is specified in order to ensure correct
execution of operations.

Errors constitute that subset of system states that causes the failure. Fault is the cause of
an error.

To construct a system that is reliable, we can adopt several approaches. One common
approach stresses fault tolerance; it recognizes that faults will occur, and it designs mechanisms
that can detect and remove faults before they can result in a system failure. Another more
stringent approach attempts to ensure that the final system does not contain any faults. This is
done through an exhaustive design process followed by extensive quality control and testing.
Database Management Systems 12.3

12.1.3. Scalability and Partition Tolerance

Scalability determines the extent to which the system can expand its capacity while
continuing to operate without interruption. There are two types of scalability:

• Horizontal scalability: This refers to expanding the number of nodes in the distributed
system. As nodes are added to the system, it should be possible to distribute some of the
data and processing loads from existing nodes to the new nodes.

• Vertical scalability: This refers to expanding the capacity of the individual nodes in the
system, such as expanding the storage capacity or the processing power of a node.

The concept of partition tolerance states that the system should have the capacity to
continue operating while the network is partitioned.

12.1.4. Autonomy

Autonomy determines the extent to which individual nodes or DBs in a connected DDB
can operate independently. A high degree of autonomy is desirable for increased flexibility and
customized maintenance of an individual node. Autonomy can be applied to design,
communication, and execution.

• Design autonomy refers to independence of data model usage and transaction


management techniques among nodes.

• Communication autonomy determines the extent to which each node can decide on
sharing of information with other nodes.

• Execution autonomy refers to independence of users to act as they please.

12.1.5. Advantages of Distributed Databases

Some important advantages of DDB are listed below.

1. Improved ease and flexibility of application development: Developing and


maintaining applications at geographically distributed sites of an organization is
facilitated due to transparency of data distribution and control.

2. Increased availability: This is achieved by the isolation of faults to their site of origin
without affecting the other database nodes connected to the network so that the data and
software that exist at the failed site cannot be accessed. Further improvement is achieved
by judiciously replicating data and software at more than one site.
12.4 Distributed Databases

3. Improved performance: A distributed DBMS fragments the database by keeping the


data closer to where it is needed most. Data localization reduces the contention for CPU
and I/O services and simultaneously reduces access delays involved in wide area
networks. Moreover, interquery and intraquery parallelism can be achieved by
executing multiple queries at different sites, or by breaking up a query into a number of
subqueries that execute in parallel. This contributes to improved performance.

4. Easier expansion via scalability: In a distributed environment, expansion of the system


in terms of adding more data, increasing database sizes, or adding more nodes is much
easier than in centralized (non-distributed) systems.

12.1.6. Types of Distributed Databases

There are number of types of DDBMSs based on various criteria and factors. The factors
that make some of these systems different are

• Degree of homogeneity of the DDBMS software: If all servers (or individual local
DBMSs) use identical software and all users (clients) use identical software, the
DDBMS is called homogeneous; otherwise, it is called heterogeneous.

• Degree of local autonomy: If there is no provision for the local site to function as a
standalone DBMS, then the system has no local autonomy. On the other hand, if direct
access by local transactions to a server is permitted, the system has some degree of local
autonomy.

Figure 12.1 shows classification of DDBMS alternatives along orthogonal axes of


distribution, autonomy, and heterogeneity.

• Point A: For a centralized database, there is complete autonomy but a total lack of
distribution and heterogeneity.

• Point B: At one extreme of the autonomy spectrum, we have a DDBMS that looks like
a centralized DBMS to the user, with zero autonomy.

The degree of local autonomy provides further ground for classification into federated
and multidatabase systems.

• Point C: The term federated database system (FDBS) is used when there is some global
view or schema of the federation of databases that is shared by the applications.

• Point D: On the other hand, a multidatabase system has full local autonomy in that it
does not have a global schema but interactively constructs one as needed by the
application.
Database Management Systems 12.5

Figure 12.1: Classification of DDBMS

12.2. DISTRIBUTED DATABASE ARCHITECTURES

12.2.1. Parallel versus Distributed Architectures

The parallel architecture is more common in high-performance computing, where there


is a need for multiprocessor architectures to cope with the volume of data undergoing
transaction processing and warehousing applications. The distinction between parallel and
distributed database architecture is listed below:

There are two main types of multiprocessor system architectures that are commonplace:

• Shared memory (tightly coupled) architecture. Multiple processors share secondary


(disk) storage and also share primary memory.

• Shared disk (loosely coupled) architecture. Multiple processors share secondary (disk)
storage but each has their own primary memory.

These architectures enable processors to communicate without the overhead of


exchanging messages over a network. Database management systems developed using the
above types of architectures are termed parallel database management systems rather than
DDBMSs, since they utilize parallel processor technology.

Another type of multiprocessor architecture is called shared-nothing architecture. In this


architecture, every processor has its own primary and secondary (disk) memory, no common
memory exists, and the processors communicate over a highspeed interconnection network (bus
12.6 Distributed Databases

or switch). Shared-nothing architecture is also considered as an environment for parallel


databases.

Although the shared-nothing architecture resembles a distributed database computing


environment, major differences exist in the mode of operation. In shared-nothing
multiprocessor systems, there is symmetry and homogeneity of nodes; this is not true of the
distributed database environment, where heterogeneity of hardware and operating system at
each node is very common. Shared-nothing architecture is also considered as an environment
for parallel databases. Figure 12.2 (a) illustrates a parallel database (shared nothing), whereas
Figure 12.2 (b) illustrates a centralized database with distributed access and Figure 12.2 (c)
shows a pure distributed database.

Figure 12.2: Some different database system architectures. (a) Shared-nothing architecture

(b) A networked architecture with a centralized database at one of the sites

(c) A truly distributed database architecture


Database Management Systems 12.7

12.2.2. General Architecture of Pure Distributed Databases

In this section, we discuss both the logical and component architectural models of a
DDB. In Figure 12.3, which describes the generic schema architecture of a DDB, the enterprise
is presented with a consistent, unified view showing the logical structure of underlying data
across all nodes. This view is represented by the global conceptual schema (GCS), which
provides network transparency.

Figure 12.3: Schema architecture of distributed databases

The logical organization of data at each site is specified by the local conceptual schema
(LCS). The GCS, LCS, and their underlying mappings provide the fragmentation and
replication transparency.

Figure 12.3 shows the component architecture of a DDB. It is an extension of its


centralized counterpart. The global query compiler references the global conceptual schema
from the global system catalog to verify and impose defined constraints. The global query
optimizer references both global and local conceptual schemas and generates optimized local
queries from global queries. It evaluates all candidate strategies using a cost function that
estimates cost based on response time (CPU, I/O, and network latencies) and estimated sizes of
intermediate results. The latter is particularly important in queries involving joins.

Each local DBMS would have its local query optimizer, transaction manager, and
execution engines as well as the local system catalog, which houses the local schemas. The
global transaction manager is responsible for coordinating the execution across multiple sites
in conjunction with the local transaction manager at those sites.
12.8 Distributed Databases

12.2.3. Federated Database Schema Architecture

Typical five-level schema architecture to support global applications in the FDBS


environment is shown in Figure 12.4. In this architecture,

• Local schema is the conceptual schema (full database definition) of a component


database

• Component schema is derived by translating the local schema into a canonical data
model or common data model (CDM) for the FDBS.

• Export schema represents the subset of a component schema that is available to the
FDBS.

• Federated schema is the global schema or view, which is the result of integrating all the
shareable export schemas.

• External schemas define the schema for a user group or an application, as in the three-
level schema architecture.

Figure 12.4: The five-level schema architecture in a federated database system (FDBS)
Database Management Systems 12.9

12.2.4. An Overview of Three-Tier Client/Server Architecture


Full-scale DDBMSs have not been developed to support all the types of functionalities.
Instead, distributed database applications are being developed in the context of the client/server
architectures. It is now more common to use three-tier architecture rather than two-tier
architecture, particularly in Web applications. As shown in Figure 12.5, in the three-tier
client/server architecture, the following three layers exist:
Presentation layer (client):
This provides the user interface and interacts with the user. Web browsers are often
utilized, and the languages and specifications used include HTML, XHTML, CSS, Flash,
MathML, Scalable Vector Graphics (SVG), Java, JavaScript, Adobe Flex, and others. This layer
handles user input, output, and navigation by accepting user commands and displaying the
needed information, usually in the form of static or dynamic Web pages.
Application layer (business logic):
This layer programs the application logic. For example, queries can be formulated based
on user input from the client, or query results can be formatted and sent to the client for
presentation. Additional application functionality can be handled at this layer, such as security
checks, identity verification, and other functions
Database server:
This layer handles query and update requests from the application layer, processes the
requests, and sends the results. Usually SQL is used to access the database if it is relational or
object-relational, and stored database procedures may also be invoked. Query results (and
queries) may be formatted into XML when transmitted between the application server and the
database server.

Figure 12.5: The three-tier client/server architecture


12.10 Distributed Databases

12.3. DATA STORAGE

In a DDB, decisions must be made regarding which site should be used to store which
portions of the database. For now, we will assume that there is no replication; that is, each
relation—or portion of a relation—is stored at one site only.

Before we decide on how to distribute the data, we must determine the logical units of
the database that are to be distributed. The simplest logical units are the relations themselves;
that is, each whole relation is to be stored at a particular site. In our example, we must decide
on a site to store each of the relations EMPLOYEE, DEPARTMENT, PROJECT,
WORKS_ON, and DEPENDENT. In many cases, however, a relation can be divided into
smaller logical units for distribution. For example, consider the company database and assume
there are three computer sites—one for each department in the company. We may want to store
the database information relating to each department at the computer site for that department.
A technique called horizontal fragmentation or sharding can be used to partition each relation
by department.

12.3.1. Horizontal Fragmentation (Sharding)

A horizontal fragment or shard of a relation is a subset of the tuples in that relation. The
tuples that belong to the horizontal fragment can be specified by a condition on one or more
attributes of the relation, or by some other mechanism. For example, we may define three
horizontal fragments on the EMPLOYEE relation with the following conditions: (Dno = 5),
(Dno = 4), and (Dno = 1)—each fragment contains the EMPLOYEE tuples working for a
particular department. Similarly, we may define three horizontal fragments for the PROJECT
relation, with the conditions (Dnum = 5), (Dnum = 4), and (Dnum = 1) - each fragment contains
the PROJECT tuples controlled by a particular department. Horizontal fragmentation divides a
relation horizontally by grouping rows to create subsets of tuples, where each subset has a
certain logical meaning. These fragments can then be assigned to different sites (nodes) in the
distributed system.

Each horizontal fragment on a relation R can be specified in the relational algebra by a


σCi(R) (select) operation. A set of horizontal fragments whose conditions C1, C2, … , Cn include
all the tuples in R—that is, every tuple in R satisfies (C1 OR C2 OR … OR Cn)—is called a
complete horizontal fragmentation of R. In many cases a complete horizontal fragmentation is
also disjoint; that is, no tuple in R satisfies (Ci AND Cj) for any i ≠ j. To reconstruct the relation
R from a complete horizontal fragmentation, we need to apply the UNION operation to the
fragments.
Database Management Systems 12.11

12.3.2. Vertical Fragmentation

Each site may not need all the attributes of a relation, which would indicate the need for
a different type of fragmentation. Vertical fragmentation divides a relation “vertically” by
columns. A vertical fragment of a relation keeps only certain attributes of the relation. For
example, we may want to fragment the EMPLOYEE relation into two vertical fragments. The
first fragment includes personal information—Name, Bdate, Address, and Sex—and the second
includes work-related information—Ssn, Salary, Super_ssn, and Dno. A vertical fragment on a
relation R can be specified by a πLi (R) operation in the relational algebra. A set of vertical
fragments whose projection lists L1, L2, … , Ln include all the attributes in R but share only the
primary key attribute of R is called a complete vertical fragmentation of R. In this case the
projection lists satisfy the following two conditions:

• L1 ∪ L2 ∪ … ∪ Ln = ATTRS(R)

• Li ∩ Lj = PK(R) for any i ≠ j, where ATTRS(R) is the set of attributes of R and PK(R)
is the primary key of R

To reconstruct the relation R from a complete vertical fragmentation, we apply the


OUTER UNION operation to the vertical fragments.

12.3.3. Mixed (Hybrid) Fragmentation

We can intermix the two types of fragmentation, yielding a mixed fragmentation. For
example, we may combine the horizontal and vertical fragmentations of the EMPLOYEE
relation given earlier into a mixed fragmentation that includes six fragments. In this case, the
original relation can be reconstructed by applying UNION and OUTER UNION (or OUTER
JOIN) operations in the appropriate order.

In general, a fragment of a relation R can be specified by a SELECT-PROJECT


combination of operations πL(σC(R)). If C = TRUE (that is, all tuples are selected) and L ≠
ATTRS(R), we get a vertical fragment, and if C ≠ TRUE and L = ATTRS(R), we get a
horizontal fragment. Finally, if C ≠ TRUE and L ≠ ATTRS(R), we get a mixed fragment. Notice
that a relation can itself be considered a fragment with C = TRUE and L = ATTRS(R). Thus,
the term fragment is used to refer to a relation or to any of the preceding types of fragments.

12.4. TRANSACTION PROCESSING

The global and local transaction management software modules, along with the
concurrency control and recovery manager of a DDBMS, collectively guarantee the ACID
properties of transactions. An additional component called the global transaction manager is
introduced for supporting distributed transactions. The site where the transaction originated can
12.12 Distributed Databases

temporarily assume the role of global transaction manager and coordinate the execution of
database operations with transaction managers across multiple sites. The operations exported
by this interface are BEGIN_TRANSACTION, READ or WRITE, END_TRANSACTION,
COMMIT_TRANSACTION, and ROLLBACK (or ABORT).

• For READ operations, it returns a local copy if valid and available.

• For WRITE operations, it ensures that updates are visible across all sites containing
copies (replicas) of the data item.

• For ABORT operations, the manager ensures that no effects of the transaction are
reflected in any site of the distributed database.

• For COMMIT operations, it ensures that the effects of a write are persistently recorded
on all databases containing copies of the data item.

The transaction manager passes to the concurrency controller module the database
operations and associated information. The controller is responsible for acquisition and release
of associated locks. If the transaction requires access to a locked resource, it is blocked until the
lock is acquired. Once the lock is acquired, the operation is sent to the runtime processor, which
handles the actual execution of the database operation. Once the operation is completed, locks
are released and the transaction manager is updated with the result of the operation.

12.4.1. Two-Phase Commit Protocol

We described the two-phase commit protocol (2PC), which requires a global recovery
manager, or coordinator, to maintain information needed for recovery, in addition to the local
recovery managers and the information they maintain (log, tables). The two-phase commit
protocol has certain drawbacks that led to the development of the three-phase commit protocol.

12.4.2. Three-Phase Commit Protocol

The biggest drawback of 2PC is that it is a blocking protocol. Failure of the coordinator
blocks all participating sites, causing them to wait until the coordinator recovers. This can cause
performance degradation, especially if participants are holding locks to shared resources. Other
types of problems may also occur that make the outcome of the transaction nondeterministic.
These problems are solved by the three-phase commit (3PC) protocol, which essentially divides
the second commit phase into two subphases called prepare-to-commit and commit. The main
idea is to limit the wait time for participants who have prepared to commit and are waiting for
a global commit or abort from the coordinator. When a participant receives a precommit
message, it knows that the rest of the participants have voted to commit. If a precommit message
has not been received, then the participant will abort and release all locks
Database Management Systems 12.13

12.5. QUERY PROCESSING AND OPTIMIZATION

Now we give an overview of how a DDBMS processes and optimizes a query. First we
discuss the steps involved in query processing and then elaborate on the communication costs
of processing a distributed query. Then we discuss a special operation, called a semijoin, which
is used to optimize some types of queries in a DDBMS.

12.5.1. Distributed Query Processing

A distributed database query is processed in stages as follows:

1. Query Mapping: The input query on distributed data is specified formally using a query
language. It is then translated into an algebraic query on global relations. This translation
is done by referring to the global conceptual schema and does not take into account the
actual distribution and replication of data. Hence, this translation is largely identical to
the one performed in a centralized DBMS. It is first normalized, analyzed for semantic
errors, simplified, and finally restructured into an algebraic query.

2. Localization: In a distributed database, fragmentation results in relations being stored


in separate sites, with some fragments possibly being replicated. This stage maps the
distributed query on the global schema to separate queries on individual fragments using
data distribution and replication information.

3. Global Query Optimization: Optimization consists of selecting a strategy from a list


of candidates that is closest to optimal. A list of candidate queries can be obtained by
permuting the ordering of operations within a fragment query generated by the previous
stage. Time is the preferred unit for measuring cost. The total cost is a weighted
combination of costs such as CPU cost, I/O costs, and communication costs. Since
DDBs are connected by a network, often the communication costs over the network are
the most significant. This is especially true when the sites are connected through a wide
area network (WAN).

4. Local Query Optimization: This stage is common to all sites in the DDB. The
techniques are similar to those used in centralized systems.

The first three stages discussed above are performed at a central control site, whereas
the last stage is performed locally.
12.14 Distributed Databases

12.5.2. Data Transfer Costs of Distributed Query Processing

In a distributed system, several additional factors further complicate query processing.


The first is the cost of transferring data over the network. This data includes intermediate files
that are transferred to other sites for further processing, as well as the final result files that may
have to be transferred to the site where the query result is needed. Although these costs may not
be very high if the sites are connected via a high-performance local area network, they become
significant in other types of networks. Hence, DDBMS query optimization algorithms consider
the goal of reducing the amount of data transfer as an optimization criterion in choosing a
distributed query execution strategy

We illustrate this with two simple sample queries. Suppose that the EMPLOYEE and
DEPARTMENT relations are distributed at two sites as shown in Figure 12.4. We will assume
in this example that neither relation is fragmented. The size of the EMPLOYEE relation is 100
* 10,000 = 106 bytes, and the size of the DEPARTMENT relation is 35 * 100 = 3,500 bytes.

Figure 12.4: Example to illustrate volume of data transferred

Consider the query Q: For each employee, retrieve the employee name and the name of
the department for which the employee works. This can be stated as follows in the relational
algebra: Q: πFname, Lname, Dname(EMPLOYEE ⋈Dno=Dnumber DEPARTMENT)

The result of this query will include 10,000 records, assuming that every employee is
related to a department. Suppose that each record in the query result is 40 bytes long. The query
is submitted at a distinct site 3, which is called the result site because the query result is needed
there. Neither the EMPLOYEE nor the DEPARTMENT relations reside at site 3. There are
three simple strategies for executing this distributed query:
Database Management Systems 12.15

1. Transfer both the EMPLOYEE and the DEPARTMENT relations to the result site, and
perform the join at site 3. In this case, a total of 1,000,000 + 3,500 = 1,003,500 bytes
must be transferred.

2. Transfer the EMPLOYEE relation to site 2, execute the join at site 2, and send the result
to site 3. The size of the query result is 40 * 10,000 = 400,000 bytes, so 400,000 +
1,000,000 = 1,400,000 bytes must be transferred.

3. Transfer the DEPARTMENT relation to site 1, execute the join at site 1, and send the
result to site 3. In this case, 400,000 + 3,500 = 403,500 bytes must be transferred.

If minimizing the amount of data transfer is our optimization criterion, we should choose
strategy 3.

A more complex strategy, which sometimes works better than these simple strategies,
uses an operation called semijoin.

12.5.3. Distributed Query Processing Using Semijoin

The idea behind distributed query processing using the semijoin operation is to reduce
the number of tuples in a relation before transferring it to another site. Intuitively, the idea is to
send the joining column of one relation R to the site where the other relation S is located; this
column is then joined with S. Following that, the join attributes, along with the attributes
required in the result, are projected out and shipped back to the original site and joined with R.
Hence, only the joining column of R is transferred in one direction, and a subset of S with no
extraneous tuples or attributes is transferred in the other direction. If only a small fraction of the
tuples in S participate in the join, this can be an efficient solution to minimizing data transfer.

To illustrate this, consider the following strategy for executing Q:

1. Project the join attributes of DEPARTMENT at site 2, and transfer them to site 1. For
Q, we transfer F = πDnumber(DEPARTMENT), whose size is 4 * 100 = 400 Bytes.

2. Join the transferred file with the EMPLOYEE relation at site 1, and transfer the required
attributes from the resulting file to site 2. For Q, we transfer R = πDno, Fname, Lname(F
⋈Dnumber=Dno EMPLOYEE), whose size is 34 * 10,000 = 340,000 bytes.

3. Execute the query by joining the transferred file R with DEPARTMENT, and present
the result to the user at site 3.

Using this strategy, we transfer 340,400 bytes for Q. We limited the EMPLOYEE
attributes and tuples transmitted to site 2 in step 2 to only those that will actually be joined with
a DEPARTMENT tuple in step 3.
CHAPTER – XIII
NOSQL DATABASES

13.1. INTRODUCTION

The term NOSQL is generally interpreted as Not Only SQL—rather than NO to SQL—
and is meant to convey that many applications need systems other than traditional relational
SQL systems to augment their data management needs. Most NOSQL systems are distributed
databases or distributed storage systems, with a focus on semistructured data storage, high
performance, availability, data replication, and scalability as opposed to an emphasis on
immediate data consistency, powerful query languages, and structured data storage.

13.1.1. Emergence of NOSQL Systems

Consider a free e-mail application, such as Google Mail. There is a need for a storage
system that can manage all these e-mails. A structured relational SQL system may not be
appropriate because

• SQL systems offer too many services (powerful query language, concurrency control,
etc.), which this application may not need

• A structured data model such the traditional relational model may be too restrictive.

Although newer relational systems do have more complex object relational modelling
options, they still require schemas, which are not required by many of the NOSQL systems.

As another example, consider an application such as Facebook. User profiles, user


relationships, and posts must all be stored in a huge collection of data stores, and the appropriate
posts must be made available to the sets of users that have signed up to see these posts. Some
of the data for this type of application is not suitable for a traditional relational system and
typically needs multiple types of databases and data storage systems.

Some of the organizations that were faced with these data management and storage
applications decided to develop their own systems:

• Google developed a proprietary NOSQL system known as BigTable, which is used in


many of Google’s applications that require vast amounts of data storage, such as Gmail,
Google Maps, and Web site indexing. Apache Hbase is an open source NOSQL system
based on similar concepts. Google’s innovation led to the category of NOSQL systems
13.2 NOSQL Databases

known as column-based or wide column stores; they are also sometimes referred to as
column family stores.

• Amazon developed a NOSQL system called DynamoDB that is available through


Amazon’s cloud services. This innovation led to the category known as key-value data
stores or sometimes key-tuple or key-object data stores.

• Facebook developed a NOSQL system called Cassandra, which is now open source and
known as Apache Cassandra. This NOSQL system uses concepts from both key-value
stores and column-based systems.

• Other software companies started developing their own solutions and making them
available to users who need these capabilities—for example, MongoDB and CouchDB,
which are classified as document-based NOSQL systems or document stores.

• Another category of NOSQL systems is the graph-based NOSQL systems, or graph


databases; these include Neo4J and GraphBase, among others.

• Some NOSQL systems, such as OrientDB, combine concepts from many of the
categories discussed above.

13.1.2. Characteristics of NOSQL Systems

We divide the characteristics into two categories—those related to distributed databases


and distributed systems, and those related to data models and query languages.

NOSQL characteristics related to distributed databases and distributed systems:

• NOSQL systems emphasize high availability, so replicating the data is inherent in many
of these systems.

• Scalability is another important characteristic, because many of the applications that use
NOSQL systems tend to have data that keeps growing in volume.

• High performance is another required characteristic, whereas serializable consistency


may not be as important for some of the NOSQL applications.

NOSQL characteristics related to data models and query languages:

NOSQL systems emphasize performance and flexibility over modeling power and
complex querying.

Not Requiring a Schema: The flexibility of not requiring a schema is achieved in many
NOSQL systems by allowing semi-structured, self-describing data.
Database Management Systems 13.3

Less Powerful Query Languages: Many applications that use NOSQL systems may not
require a powerful query language such as SQL, because search (read) queries in these systems
often locate single objects in a single file based on their object keys. NOSQL systems typically
provide a set of functions and operations as a programming API (application programming
interface), so reading and writing the data objects is accomplished by calling the appropriate
operations by the programmer. In many cases, the operations are called CRUD operations, for
Create, Read, Update, and Delete. In other cases, they are known as SCRUD because of an
added Search (or Find) operation.

Versioning: Some NOSQL systems provide storage of multiple versions of the data
items, with the timestamps of when the data version was created.

13.1.3 Categories of NOSQL Systems

NOSQL systems have been characterized into four major categories, with some
additional categories that encompass other types of systems. They are

1. Document-based NOSQL systems: These systems store data in the form of documents
using well-known formats, such as JSON (JavaScript Object Notation). Documents are
accessible via their document id, but can also be accessed rapidly using other indexes.

2. NOSQL key-value stores: These systems have a simple data model based on fast access
by the key to the value associated with the key; the value can be a record or an object or
a document or even have a more complex data structure.

3. Column-based or wide column NOSQL systems: These systems partition a table by


column into column families (a form of vertical partitioning), where each column family
is stored in its own files. They also allow versioning of data values.

4. Graph-based NOSQL systems: Data is represented as graphs, and related nodes can be
found by traversing the edges using path expressions.

13.2. CAP THEOREM

The three letters in CAP refer to three desirable properties of distributed systems with
replicated data: consistency (among replicated copies), availability (of the system for read and
write operations) and partition tolerance (in the face of the nodes in the system being partitioned
by a network fault). Availability means that each read or write request for a data item will either
be processed successfully or will receive a message that the operation cannot be completed.
Partition tolerance means that the system can continue operating if the network connecting the
nodes has a fault that results in two or more partitions, where the nodes in each partition can
13.4 NOSQL Databases

only communicate among each other. Consistency means that the nodes will have the same
copies of a replicated data item visible for various transactions.

The CAP theorem states that it is not possible to guarantee all three of the desirable
properties consistency, availability, and partition tolerance at the same time in a distributed
system with data replication. If this is the case, then the distributed system designer would have
to choose two properties out of the three to guarantee. It is generally assumed that in many
traditional (SQL) applications, guaranteeing consistency through the ACID properties is
important.

On the other hand, in a NOSQL distributed data store, a weaker consistency level is
often acceptable, and guaranteeing the other two properties (availability, partition tolerance) is
important. Hence, weaker consistency levels are often used in NOSQL system instead of
guaranteeing serializability. In particular, a form of consistency known as eventual consistency
is often adopted in NOSQL systems.

13.3. DOCUMENT BASED SYSTEMS

Document-based or document-oriented NOSQL systems typically store data as


collections of similar documents. These types of systems are also sometimes known as
document stores. The individual documents somewhat resemble complex objects or XML
documents, but a major difference between document-based systems versus object and object-
relational systems and XML is that there is no requirement to specify a schema rather, the
documents are specified as self-describing data. Although the documents in a collection should
be similar, they can have different data elements (attributes), and new documents can have new
data elements that do not exist in any of the current documents in the collection.

Documents can be specified in various formats, such as XML. A popular language to


specify documents in NOSQL systems is JSON (JavaScript Object Notation).There are many
document-based NOSQL systems, including MongoDB and CouchDB, among many others.

13.3.1 MongoDB Data Model

MongoDB documents are stored in BSON (Binary JSON) format, which is a variation
of JSON with some additional data types and is more efficient for storage than JSON. Individual
documents are stored in a collection. We will use a simple example based on our COMPANY.
The operation createCollection is used to create each collection. For example, the following
command can be used to create a collection called project to hold PROJECT objects from the
COMPANY database:

db.createCollection(“project”, { capped : true, size : 1310720, max : 500 } )


Database Management Systems 13.5

The command to create another document collection called worker to hold information
about the EMPLOYEEs who work on each project is
db.createCollection(“worker”, { capped : true, size : 5242880, max : 2000 } ) )
The first parameter “project” is the name of the collection, which is followed by an
optional document that specifies collection options. In our example, the collection is capped;
this means it has upper limits on its storage space (size) and number of documents (max). The
capping parameters help the system choose the storage options for each collection.
Each document in a collection has a unique ObjectId field, called _id, which is
automatically indexed in the collection unless the user explicitly requests no index for the _id
field. The value of ObjectId can be specified by the user, or it can be system-generated if the
user does not specify an _id field for a particular document. System-generated ObjectIds have
a specific format, which combines the timestamp when the object is created (4 bytes, in an
internal MongoDB format), the node id (3 bytes), the process id (2 bytes), and a counter (3
bytes) into a 16-byte Id value. User-generated ObjectsIds can have any value specified by the
user as long as it uniquely identifies the document and so these Ids are similar to primary keys
in relational systems.
A collection does not have a schema. The structure of the data fields in documents is
chosen based on how documents will be accessed and used, and the user can choose a
normalized design (similar to normalized relational tuples) or a denormalized design (similar
to XML documents or complex objects). Interdocument references can be specified by storing
in one document the ObjectId or ObjectIds of other related documents.
13.3.2 MongoDB CRUD Operations
MongoDb has several CRUD operations, where CRUD stands for (create, read, update,
delete). Documents can be created and inserted into their collections using the insert operation,
whose format is:
db.<collection_name>.insert(<document(s)>)
E.g. db.project.insert( { _id: “P1”, Pname: “ProductX”, Plocation: “Bellaire” })
db.worker.insert( [ { _id: “W1”, Ename: “John Smith”, ProjectId: “P1”, Hours: 32.5 },
{_id: “W2”, Ename: “Joyce English”, ProjectId: “P1”, Hours: 20.0} ] )

Example of simple documents in MongoDB:

1. One of the options is to use Denormalized document design with embedded


subdocuments as listed below. Here the workers information is embedded in the project
document; so there is no need for the “worker” collection. This is known as the
denormalized pattern,
13.6 NOSQL Databases

E.g. Project document with an array of embedded workers: (Denormalized document


design
with embedded subdocuments)
{
_id: “P1”,
Pname: “ProductX”,
Plocation: “Bellaire”,
Workers: [
{ Ename: “John Smith”,
Hours: 32.5
},
{ Ename: “Joyce English”,
Hours: 20.0
}
]
);
2. Another option is to use the design where worker references are embedded in the project
document, but the worker documents themselves are stored in a separate “worker”
collection.
E.g. Project document with an embedded array of worker ids: (Embedded array of
document references)
{
_id: “P1”,
Pname: “ProductX”,
Plocation: “Bellaire”,
WorkerIds: [ “W1”, “W2” ]
}
{ _id: “W1”,
Ename: “John Smith”,
Hours: 32.5
Database Management Systems 13.7

}
{ _id: “W2”,
Ename: “Joyce English”,
Hours: 20.0
}
3. A third option would use a normalized design, similar to First Normal Form relations.
The choice of which design option to use depends on how the data will be accessed.
E.g. Normalized project and worker documents (not a fully normalized design for M:N
relationships):
{
_id: “P1”,
Pname: “ProductX”,
Plocation: “Bellaire”
}
{ _id: “W1”,
Ename: “John Smith”,
ProjectId: “P1”,
Hours: 32.5
}
The parameters of the insert operation can include either a single document or an array of
documents. The delete operation is called remove, and the format is:
db.<collection_name>.remove(<condition>)
E.g. db.project.remove({Plocation: “Chennai”})
There is also an update operation, which has a condition to select certain documents, and
a $set clause to specify the update. It is also possible to use the update operation to replace
an existing document with another one but keep the same ObjectId.
E.g. To update the project location, the command used is
db.project.insert( { _id: “P1”, Pname: “ProductX”, $set:{Plocation: “Chennai”}})
For read queries, the main command is called find, and the format is:
db.<collection_name>.find(<condition>)
E.g. db.project.find({Plocation: “Chennai”})
13.8 NOSQL Databases

13.4. KEY VALUE STORES

Key-value stores focus on high performance, availability, and scalability by storing data
in a distributed storage system. The data model used in key-value stores is relatively simple,
and in many of these systems, there is no query language but rather a set of operations that can
be used by the application programmers. The key is a unique identifier associated with a data
item and is used to locate this data item rapidly. The value is the data item itself, and it can have
very different formats for different key-value storage systems. In some cases, the value is just
a string of bytes or an array of bytes, and the application using the key-value store has to
interpret the structure of the data value. In other cases, some standard formatted data is allowed;
for example, structured data rows (tuples) similar to relational data, or semi structured data
using JSON or some other self-describing data format.

Different key-value stores can thus store unstructured, semi structured, or structured
data items. The main characteristic of key-value stores is the fact that every value (data item)
must be associated with a unique key, and that retrieving the value by supplying the key must
be very fast. There are many systems that fall under the key-value store label. Let us have a
brief introductory overview for some of these systems and their characteristics.

13.4.1 DynamoDB Overview

The DynamoDB system is an Amazon product and is available as part of Amazon’s


AWS/SDK platforms (Amazon Web Services/Software Development Kit). It can be used as
part of Amazon’s cloud computing services, for the data storage component.

The basic data model in DynamoDB uses the concepts of tables, items, and attributes.
A table in DynamoDB does not have a schema; it holds a collection of self-describing items.
Each item will consist of a number of (attribute, value) pairs, and attribute values can be single-
valued or multivalued. So basically, a table will hold a collection of items, and each item is a
self-describing record (or object). DynamoDB also allows the user to specify the items in JSON
format, and the system will convert them to the internal storage format of DynamoDB.

When a table is created, it is required to specify a table name and a primary key; the
primary key will be used to rapidly locate the items in the table. Thus, the primary key is the
key and the item is the value for the DynamoDB key-value store.

The primary key attribute must exist in every item in the table. The primary key can be
one of the following two types:

• A single attribute. The DynamoDB system will use this attribute to build a hash index
on the items in the table. This is called a hash type primary key. The items are not
ordered in storage on the value of the hash attribute.
Database Management Systems 13.9

• A pair of attributes. This is called a hash and range type primary key. The primary key
will be a pair of attributes (A, B): attribute A will be used for hashing, and because there
will be multiple items with the same value of A, the B values will be used for ordering
the records with the same A value. A table with this type of key can have additional
secondary indexes defined on its attributes. For example, if we want to store multiple
versions of some type of items in a table, we could use ItemID as hash and Date or
Timestamp (when the version was created) as range in a hash and range type primary
key.

Because DynamoDB is proprietary, an open source key-value system called Voldemort


is also discussed. Voldemort is based on many of the techniques proposed for
DynamoDB.

13.4.2 Voldemort Key-Value Distributed Data Store

Voldemort is an open source system available through Apache 2.0 open source licensing
rules. It is based on Amazon’s DynamoDB. The focus is on high performance and horizontal
scalability, as well as on providing replication for high availability and sharding for improving
latency (response time) of read and write requests. All three of those features - replication,
sharding, and horizontal scalability—are realized through a technique to distribute the key-
value pairs among the nodes of a distributed cluster; this distribution is known as consistent
hashing. Voldemort has been used by LinkedIn for data storage. Some of the features of
Voldemort are as follows:

• Simple basic operations: A collection of (key, value) pairs is kept in a Voldemort store.

• High-level formatted data values: The values v in the (k, v) items can be specified in
JSON (JavaScript Object Notation), and the system will convert between JSON and the
internal storage format. Other data object formats can also be specified if the application
provides the conversion (also known as serialization) between the user format and the
storage format as a Serializer class.

• Consistent hashing for distributing (key, value) pairs: A variation of the data distribution
algorithm known as consistent hashing is used in Voldemort for data distribution among
the nodes in the distributed cluster of nodes.

• Consistency and versioning: Voldemort uses a method similar to the one developed for
DynamoDB for consistency in the presence of replicas. Basically, concurrent write
operations are allowed by different processes so there could exist two or more different
values associated with the same key at different nodes when items are replicated.
Consistency is achieved when the item is read by using a technique known as versioning
13.10 NOSQL Databases

and read repair. Concurrent writes are allowed, but each write is associated with a vector
clock value. When a read occurs, it is possible that different versions of the same value
(associated with the same key) are read from different nodes. If the system can reconcile
to a single final value, it will pass that value to the read; otherwise, more than one version
can be passed back to the application, which will reconcile the various versions into one
version based on the application semantics and give this reconciled value back to the
nodes.
13.4.3. Examples of Other Key-Value Stores
Oracle key-value store
Oracle has one of the well-known SQL relational database systems, and Oracle also
offers a system based on the key-value store concept; this system is called the Oracle NoSQL
Database.
Redis key-value cache and store
Redis differs from the other systems discussed here because it caches its data in main
memory to further improve performance. It offers master-slave replication and high availability,
and it also offers persistence by backing up the cache to disk.
Apache Cassandra
Cassandra is a NOSQL system that is not easily categorized into one category; it is
sometimes listed in the column-based NOSQL category or in the key-value category. If offers
features from several NOSQL categories and is used by Facebook as well as many other
customers.
13.5. COLUMN BASED SYSTEMS
Another category of NOSQL systems is known as column-based or wide column
systems. The Google distributed storage system for big data, known as BigTable, is a well-
known example of this class of NOSQL systems, and it is used in many Google applications
that require large amounts of data storage, such as Gmail. Big-Table uses the Google File
System (GFS) for data storage and distribution. An open source system known as Apache Hbase
is somewhat similar to Google Big-Table.
BigTable (and Hbase) is sometimes described as a sparse multidimensional distributed
persistent sorted map, where the word map means a collection of (key, value) pairs (the key is
mapped to the value). One of the main differences that distinguish column-based systems from
key-value stores is the nature of the key. In column-based systems such as Hbase, the key is
multidimensional and so has several components: typically, a combination of table name, row
key, column, and timestamp. As we shall see, the column is typically composed of two
components: column family and column qualifier.
Database Management Systems 13.11

13.5.1. CRUD Operations

Hbase only provides low-level CRUD operations. It is the responsibility of the


application programs to implement more complex operations, such as joins between rows in
different tables. The create operation creates a new table and specifies one or more column
families associated with that table, but it does not specify the column qualifiers, as we discussed
earlier. The put operation is used for inserting new data or new versions of existing data items.
The get operation is for retrieving the data associated with a single row in a table, and the scan
operation retrieves all the rows.

Some Hbase basic CRUD operations:


Creating a table: create <tablename>, <column family>, <column family>, …
Inserting Data: put <tablename>, <rowid>, <column family>:<column qualifier>, <value>
Reading Data (all data in a table): scan <tablename>
Retrieve Data (one item): get <tablename>,<rowid>
Examples:
create ‘EMPLOYEE’, ‘Name’, ‘Address’, ‘Details’
put ‘EMPLOYEE’, ‘row1’, ‘Name:Fname’, ‘John’
put ‘EMPLOYEE’, ‘row1’, ‘Name:Lname’, ‘Smith’
put ‘EMPLOYEE’, ‘row1’, ‘Name:Nickname’, ‘Johnny’
put ‘EMPLOYEE’, ‘row1’, ‘Details:Job’, ‘Engineer’
put ‘EMPLOYEE’, ‘row1’, ‘Details:Review’, ‘Good’
put ‘EMPLOYEE’, ‘row2’, ‘Name:Fname’, ‘Alicia’
put ‘EMPLOYEE’, ‘row2’, ‘Name:Lname’, ‘Zelaya’
put ‘EMPLOYEE’, ‘row2’, ‘Name:MName’, ‘Jennifer’
put ‘EMPLOYEE’, ‘row2’, ‘Details:Job’, ‘DBA’
put ‘EMPLOYEE’, ‘row2’, ‘Details:Supervisor’, ‘James Borg’
put ‘EMPLOYEE’, ‘row3’, ‘Name:Fname’, ‘James’
put ‘EMPLOYEE’, ‘row3’, ‘Name:Minit’, ‘E’
put ‘EMPLOYEE’, ‘row3’, ‘Name:Lname’, ‘Borg’
put ‘EMPLOYEE’, ‘row3’, ‘Name:Suffix’, ‘Jr.’
put ‘EMPLOYEE’, ‘row3’, ‘Details:Job’, ‘CEO’
put ‘EMPLOYEE’, ‘row3’, ‘Details:Salary’, ‘1,000,000’
13.12 NOSQL Databases

13.5.2. Hbase Data Model and Versioning

Hbase data model. The data model in Hbase organizes data using the concepts of
namespaces, tables, column families, column qualifiers, columns, rows, and data cells. A
column is identified by a combination of (column family:column qualifier). Data is stored in a
self-describing form by associating columns with data values, where data values are strings.
Hbase also stores multiple versions of a data item, with a timestamp associated with each
version, so versions and timestamps are also part of the Hbase data model . As with other
NOSQL systems, unique keys are associated with stored data items for fast access, but the keys
identify cells in the storage system. Because the focus is on high performance when storing
huge amounts of data, the data model includes some storage-related concepts. We discuss the
Hbase data modeling concepts and define the terminology next. It is important to note that the
use of the words table, row, and column is not identical to their use in relational databases, but
the uses are related.

• Tables and Rows

Data in Hbase is stored in tables, and each table has a table name. Data in a table is
stored as self-describing rows. Each row has a unique row key, and row keys are strings
that must have the property that they can be lexicographically ordered, so characters that
do not have a lexicographic order in the character set cannot be used as part of a row
key.

• Column Families, Column Qualifiers, and Columns

A table is associated with one or more column families. Each column family will have
a name, and the column families associated with a table must be specified when the table
is created and cannot be changed later.

When the data is loaded into a table, each column family can be associated with many
column qualifiers, but the column qualifiers are not specified as part of creating a table.
So the column qualifiers make the model a self-describing data model because the
qualifiers can be dynamically specified as new rows are created and inserted into the
table. A column is specified by a combination of ColumnFamily: ColumnQualifier.

• Versions and Timestamps

Hbase can keep several versions of a data item, along with the timestamp associated
with each version. The timestamp is a long integer number that represents the system
time when the version was created, so newer versions have larger timestamp values.
Hbase uses midnight ‘January 1, 1970 UTC’ as timestamp value zero, and uses a long
Database Management Systems 13.13

integer that measures the number of milliseconds since that time as the system
timestamp value.

• Cells

A cell holds a basic data item in Hbase. The key (address) of a cell is specified by a
combination of (table, rowid, columnfamily, columnqualifier, timestamp). If timestamp
is left out, the latest version of the item is retrieved unless a default number of versions
is specified, say the latest three versions. The default number of versions to be retrieved,
as well as the default number of versions that the system needs to keep, are parameters
that can be specified during table creation.

• Namespaces

A namespace is a collection of tables. A namespace basically specifies a collection of


one or more tables that are typically used together by user applications, and it
corresponds to a database that contains a collection of tables in relational terminology.

13.5.3 Hbase Storage and Distributed System Concepts

Each Hbase table is divided into a number of regions, where each region will hold a
range of the row keys in the table; this is why the row keys must be lexicographically ordered.
Each region will have a number of stores, where each column family is assigned to one store
within the region. Regions are assigned to region servers (storage nodes) for storage. A master
server (master node) is responsible for monitoring the region servers and for splitting a table
into regions and assigning regions to region servers.

Hbase uses the Apache Zookeeper open source system for services related to managing
the naming, distribution, and synchronization of the Hbase data on the distributed Hbase server
nodes, as well as for coordination and replication services. Hbase also uses Apache HDFS
(Hadoop Distributed File System) for distributed file services. So Hbase is built on top of both
HDFS and Zookeeper.

13.6. GRAPH DATABASES

Another category of NOSQL systems is known as graph databases or graph-oriented


NOSQL systems. The data is represented as a graph, which is a collection of vertices (nodes)
and edges. Both nodes and edges can be labeled to indicate the types of entities and relationships
they represent, and it is generally possible to store data associated with both individual nodes
and individual edges. Many systems can be categorized as graph databases. One of the example
is Neo4j. Neo4j is an open source system, and it is implemented in Java.
13.14 NOSQL Databases

13.1. Neo4j Data Model

The data model in Neo4j organizes data using the concepts of nodes and relationships.
Both nodes and relationships can have properties, which store the data items associated with
nodes and relationships. Nodes can have labels; the nodes that have the same label are grouped
into a collection that identifies a subset of the nodes in the database graph for querying purposes.
A node can have zero, one, or several labels. Relationships are directed; each relationship has
a start node and end node as well as a relationship type, which serves a similar role to a node
label by identifying similar relationships that have the same relationship type. Properties can be
specified via a map pattern, which is made of one or more “name : value” pairs enclosed in
curly brackets; for example {Lname : ‘Smith’, Fname : ‘John’, Minit : ‘B’}.

There are various ways in which nodes and relationships can be created; for example,
by calling appropriate Neo4j operations from various Neo4j APIs. We will just show the high-
level syntax for creating nodes and relationships; to do so, we will use the Neo4j CREATE
command, which is part of the high-level declarative query language Cypher. Neo4j has many
options and variations for creating nodes and relationships using various scripting interfaces.

• Labels and properties

When a node is created, the node label can be specified. It is also possible to create
nodes without any labels.

CREATE (e1: EMPLOYEE, {Empid: ‘1’, Lname: ‘Smith’, Fname: ‘John’, Minit: ‘B’})
CREATE (e2: EMPLOYEE, {Empid: ‘2’, Lname: ‘Wong’, Fname: ‘Franklin’})
CREATE (e3: EMPLOYEE, {Empid: ‘3’, Lname: ‘Zelaya’, Fname: ‘Alicia’})
CREATE (e4: EMPLOYEE, {Empid: ‘4’, Lname: ‘Wallace’, Fname: ‘Jennifer’, Minit: ‘S’})

CREATE (d1: DEPARTMENT, {Dno: ‘5’, Dname: ‘Research’})
CREATE (d2: DEPARTMENT, {Dno: ‘4’, Dname: ‘Administration’})

CREATE (p1: PROJECT, {Pno: ‘1’, Pname: ‘ProductX’})
CREATE (p2: PROJECT, {Pno: ‘2’, Pname: ‘ProductY’})
CREATE (p3: PROJECT, {Pno: ‘10’, Pname: ‘Computerization’})
CREATE (p4: PROJECT, {Pno: ‘20’, Pname: ‘Reorganization’})

CREATE (loc1: LOCATION, {Lname: ‘Houston’})
Database Management Systems 13.15

CREATE (loc2: LOCATION, {Lname: ‘Stafford’})


CREATE (loc3: LOCATION, {Lname: ‘Bellaire’})
CREATE (loc4: LOCATION, {Lname: ‘Sugarland’})
Here, the node labels are EMPLOYEE, DEPARTMENT, PROJECT, and LOCATION,
and the created nodes correspond to some of the data from the COMPANY database. Properties
are enclosed in curly brackets { … }. It is possible that some nodes have multiple labels; for
example the same node can be labeled as PERSON and EMPLOYEE and MANAGER by
listing all the labels separated by the colon symbol as follows:
PERSON:EMPLOYEE:MANAGER. Having multiple labels is similar to an entity
belonging to an entity type (PERSON) plus some subclasses of PERSON (namely EMPLOYEE
and MANAGER) in the EER model but can also be used for other purposes.
• Relationships and relationship types
Few example relationships in Neo4j based on the COMPANY database is listed below:
CREATE (e1) – [ : WorksFor ] –> (d1)
CREATE (e3) – [ : WorksFor ] –> (d2)

CREATE (d1) – [ : Manager ] –> (e2)
CREATE (d2) – [ : Manager ] –> (e4)

CREATE (d1) – [ : LocatedIn ] –> (loc1)
CREATE (d1) – [ : LocatedIn ] –> (loc3)
CREATE (d1) – [ : LocatedIn ] –> (loc4)
CREATE (d2) – [ : LocatedIn ] –> (loc2)

CREATE (e1) – [ : WorksOn, {Hours: ‘32.5’} ] –> (p1)
CREATE (e1) – [ : WorksOn, {Hours: ‘7.5’} ] –> (p2)
CREATE (e2) – [ : WorksOn, {Hours: ‘10.0’} ] –> (p1)
CREATE (e2) – [ : WorksOn, {Hours: 10.0} ] –> (p2)
CREATE (e2) – [ : WorksOn, {Hours: ‘10.0’} ] –> (p3)
CREATE (e2) – [ : WorksOn, {Hours: 10.0} ] –> (p4)
13.16 NOSQL Databases

The → specifies the direction of the relationship, but the relationship can be traversed
in either direction. The relationship types (labels) are WorksFor, Manager, LocatedIn, and
WorksOn; only relationships with the relationship type WorksOn have properties (Hours).

• Paths

A path specifies a traversal of part of the graph. It is typically used as part of a query to
specify a pattern, where the query will retrieve from the graph data that matches the
pattern. A path is typically specified by a start node, followed by one or more
relationships, leading to one or more end nodes that satisfy the pattern.

• Optional Schema

A schema is optional in Neo4j. Graphs can be created and used without a schema, but
in Neo4j version 2.0, a few schema-related functions were added. The main features
related to schema creation involve creating indexes and constraints based on the labels
and properties. For example, it is possible to create the equivalent of a key constraint on
a property of a label, so all nodes in the collection of nodes associated with the label
must have unique values for that property.

• Indexing and node identifiers

When a node is created, the Neo4j system creates an internal unique system-defined
identifier for each node. To retrieve individual nodes using other properties of the nodes
efficiently, the user can create indexes for the collection of nodes that have a particular
label. Typically, one or more of the properties of the nodes in that collection can be
indexed. For example, Empid can be used to index nodes with the EMPLOYEE label,
Dno to index the nodes with the DEPARTMENT label, and Pno to index the nodes with
the PROJECT label.

Neo4j Interfaces and Distributed System Characteristics Neo4j have other interfaces
that can be used to create, retrieve, and update nodes and relationships in a graph
database. It also has two main versions: the enterprise edition, which comes with
additional capabilities, and the community edition. We discuss some of the additional
features of Neo4j in this subsection.

• Enterprise edition vs. community edition

Both editions support the Neo4j graph data model and storage system, as well as the
Cypher graph query language, and several other interfaces, including a high-
performance native API, language drivers for several popular programming languages,
such as Java, Python, PHP, and the REST (Representational State Transfer) API. In
Database Management Systems 13.17

addition, both editions support ACID properties. The enterprise edition supports
additional features for enhancing performance, such as caching and clustering of data
and locking.
• Graph visualization interface
Neo4j has a graph visualization interface, so that a subset of the nodes and edges in a
database graph can be displayed as a graph. This tool can be used to visualize query
results in a graph representation.
• Master-slave replication
Neo4j can be configured on a cluster of distributed system nodes (computers), where
one node is designated the master node. The data and indexes are fully replicated on
each node in the cluster. Various ways of synchronizing the data between master and
slave nodes can be configured in the distributed cluster.
• Caching
A main memory cache can be configured to store the graph data for improved
performance.
• Logical logs
Logs can be maintained to recover from failures. Review Questions 909 A full
discussion of all the features and interfaces of Neo4j is outside the scope of our
presentation. Full documentation of Neo4j is available online.

13.6.2 The Cypher Query Language of Neo4j

Neo4j has a high-level query language, Cypher. A Cypher query is made up of clauses.
When a query has several clauses, the result from one clause can be the input to the next clause
in the query. Basic simplified syntax of some common Cypher clauses is listed below:

Finding nodes and relationships that match a pattern: MATCH <pattern>


Specifying aggregates and other query variables: WITH <specifications>
Specifying conditions on the data to be retrieved: WHERE <condition>
Specifying the data to be returned: RETURN <data>
Ordering the data to be returned: ORDER BY <data>
Limiting the number of returned data items: LIMIT <max number>
Creating nodes: CREATE <node, optional labels and properties>
Creating relationships: CREATE <relationship, relationship type and optional properties>
13.18 NOSQL Databases

Deletion: DELETE <nodes or relationships>


Specifying property values and labels: SET <property values and labels>
Removing property values and labels: REMOVE <property values and labels>
Examples of simple Cypher queries are given below:
1. MATCH (d : DEPARTMENT {Dno: ‘5’}) – [ : LocatedIn ] → (loc)
RETURN d.Dname , loc.Lname
2. MATCH (e: EMPLOYEE {Empid: ‘2’}) – [ w: WorksOn ] → (p)
RETURN e.Ename , w.Hours, p.Pname
3. MATCH (e ) – [ w: WorksOn ] → (p: PROJECT {Pno: 2})
RETURN p.Pname, e.Ename , w.Hours
4. MATCH (e) – [ w: WorksOn ] → (p)
RETURN e.Ename , w.Hours, p.Pname
ORDER BY e.Ename
5. MATCH (e) – [ w: WorksOn ] → (p)
RETURN e.Ename , w.Hours, p.Pname
ORDER BY e.Ename
LIMIT 10
6. MATCH (e) – [ w: WorksOn ] → (p)
WITH e, COUNT(p) AS numOfprojs
WHERE numOfprojs > 2
RETURN e.Ename , numOfprojs
ORDER BY numOfprojs
7. MATCH (e) – [ w: WorksOn ] → (p)
RETURN e , w, p
ORDER BY e.Ename
LIMIT 10
8. MATCH (e: EMPLOYEE {Empid: ‘2’})
SET e.Job = ‘Engineer’
CHAPTER – XIV
DATABASE SECUTIRY

14.1. INTRODUCTION TO DATABASE SECURITY ISSUES

14.1.1 Types of Security Issues

Database security is a broad area that addresses many issues, including the following:

• Various legal and ethical issues regarding the right to access certain information for
example, some information may be deemed to be private and cannot be accessed legally
by unauthorized organizations or persons

• Policy issues at the governmental, institutional, or corporate level regarding what kinds
of information should not be made publicly available—for example, credit ratings and
personal medical records.

• System-related issues such as the system levels at which various security functions
should be enforced—for example, whether a security function should be handled at the
physical hardware level, the operating system level, or the DBMS level.

• The need in some organizations to identify multiple security levels and to categorize the
data and users based on these classifications—for example, top secret, secret,
confidential, and unclassified.

14.1.2. Threats to Databases

Threats to databases can result in the loss or degradation of some or all of the following
commonly accepted security goals: integrity, availability, and confidentiality.

• Loss of integrity: Database integrity refers to the requirement that information be


protected from improper modification. Modification of data includes creating, inserting,
and updating data; changing the status of data; and deleting data. If the loss of system
or data integrity is not corrected, continued use of the contaminated system or corrupted
data could result in inaccuracy, fraud, or erroneous decisions.

• Loss of availability: Database availability refers to making objects available to a human


user or a program who/which has a legitimate right to those data objects. Loss of
availability occurs when the user or program cannot access these objects.
14.2 Databases Security

• Loss of confidentiality: Database confidentiality refers to the protection of data from


unauthorized disclosure. The impact of unauthorized disclosure of confidential
information can range from violation of the Data Privacy Act to the jeopardization of
national security. Unauthorized, unanticipated, or unintentional disclosure could result
in loss of public confidence, embarrassment, or legal action against the organization.
14.1.3. Database Security
A DBMS typically includes a database security and authorization subsystem that is
responsible for ensuring the security of portions of a database against unauthorized access. It is
now customary to refer to two types of database security mechanisms:
• Discretionary security mechanisms. These are used to grant privileges to users,
including the capability to access specific data files, records, or fields in a specified
mode (such as read, insert, delete, or update).
• Mandatory security mechanisms. These are used to enforce multilevel security by
classifying the data and users into various security classes (or levels) and then
implementing the appropriate security policy of the organization. For example, a typical
security policy is to permit users at a certain classification (or clearance) level to see
only the data items classified at the user’s own (or lower) classification level.
14.1.4. Control Measures
Four main control measures are used to provide security of data in databases:
• Access control - The security mechanism of a DBMS must include provisions for
restricting access to the database system as a whole. This function, called access control,
is handled by creating user accounts and passwords to control the login process by the
DBMS.
• Inference control - Statistical databases are used to provide statistical information or
summaries of values based on various criteria. Security for statistical databases must
ensure that information about individuals cannot be accessed. It is sometimes possible
to deduce or infer certain facts concerning individuals from queries that involve only
summary statistics on groups; consequently, this must not be permitted either. This
problem, called statistical database security. The corresponding control measures are
called inference control measures.
• Flow control – It prevents information from flowing in such a way that it reaches
unauthorized users. Covert channels are pathways on which information flows
implicitly in ways that violate the security policy of an organization.
• Data encryption - Encryption can be used to provide additional protection for sensitive
portions of a database as well. The data is encoded using some coding algorithm.
Database Management Systems 14.3

14.1.5. Database Security and the DBA

DBA-privileged commands include commands for granting and revoking privileges to


individual accounts, users, or user groups and for performing the following types of actions:

1. Account creation: This action creates a new account and password for a user or a group
of users to enable access to the DBMS.

2. Privilege granting: This action permits the DBA to grant certain privileges to certain
accounts.

3. Privilege revocation: This action permits the DBA to revoke (cancel) certain privileges
that were previously given to certain accounts.

4. Security level assignment: This action consists of assigning user accounts to the
appropriate security clearance level.

The DBA is responsible for the overall security of the database system. Action 1 in the
preceding list is used to control access to the DBMS as a whole, whereas actions 2 and 3 are
used to control discretionary database authorization, and action 4 is used to control mandatory
authorization.

To keep a record of all updates applied to the database and of particular users who
applied each update, system log is used. It includes an entry for each operation applied to the
database that may be required for recovery from a transaction failure or system crash. We can
expand the log entries so that they also include the account number of the user and the online
computer or device ID that applied each operation recorded in the log. If any tampering with
the database is suspected, a database audit is performed, which consists of reviewing the log to
examine all accesses and operations applied to the database during a certain time period. When
an illegal or unauthorized operation is found, the DBA can determine the account number used
to perform the operation. Database audits are particularly important for sensitive databases that
are updated by many transactions and users, such as a banking database that can be updated by
thousands of bank tellers. A database log that is used mainly for security purposes serves as an
audit trail.

14.1.6. Sensitive Data and Types of Disclosures

Sensitivity of data is a measure of the importance assigned to the data by its owner for
the purpose of denoting its need for protection. Some databases contain only sensitive data
whereas other databases may contain no sensitive data at all. Handling databases that fall at
these two extremes is relatively easy because such databases can be covered by access control.
The situation becomes tricky when some of the data is sensitive whereas other data is not.
14.4 Databases Security

Several factors must be considered before deciding whether it is safe to reveal the data.
The three most important factors are data availability, access acceptability, and authenticity
assurance.

1. Data availability: If a user is updating a field, then this field becomes inaccessible and
other users should not be able to view this data. This blocking is only temporary and
only to ensure that no user sees any inaccurate data. This is typically handled by the
concurrency control mechanism.

2. Access acceptability: Data should only be revealed to authorized users. A database


administrator may also deny access to a user request even if the request does not directly
access a sensitive data item, on the grounds that the requested data may reveal
information about the sensitive data that the user is not authorized to have.

3. Authenticity assurance: Before granting access, certain external characteristics about


the user may also be considered. For example, a user may only be permitted access
during working hours. The system may track previous queries to ensure that a
combination of queries does not reveal sensitive data. The latter is particularly relevant
to statistical database queries.

The term precision, when used in the security area, refers to allowing as much as
possible of the data to be available, subject to protecting exactly the subset of data that is
sensitive. The definitions of security versus precision are as follows:

• Security: Means of ensuring that data is kept safe from corruption and that access to it
is suitably controlled. To provide security means to disclose only nonsensitive data and
to reject any query that references a sensitive field.

• Precision: To protect all sensitive data while disclosing or making available as much
nonsensitive data as possible.

14.2. ACCESS CONTROL BASED ON PRIVILEGES

The typical method of enforcing discretionary access control in a database system is


based on the granting and revoking of privileges.

14.2.1. Types of Discretionary Privileges

Informally, there are two levels for assigning privileges to use the database system:

• The account level: At this level, the DBA specifies the particular privileges that each
account holds independently of the relations in the database.
Database Management Systems 14.5

• The relation (or table) level: At this level, the DBA can control the privilege to access
each individual relation or view in the database.

The granting and revoking of privileges generally follow an authorization model for
discretionary privileges known as the access matrix model, where the rows of a matrix M
represent subjects (users, accounts, programs) and the columns represent objects (relations,
records, columns, views, operations). Each position M(i, j) in the matrix represents the types of
privileges (read, write, update) that subject i holds on object j.

In SQL, the following types of privileges can be granted on each individual relation R:

• SELECT (retrieval or read) privilege on R: Gives the account retrieval privilege using
the SELECT statement to retrieve tuples from R.

• Modification privileges on R: This gives the account the capability to modify the tuples
of R. In SQL, this includes three privileges: UPDATE, DELETE, and INSERT.
Additionally, both the INSERT and UPDATE privileges can specify that only certain
attributes of R can be modified by the account.

• References privilege on R. This gives the account the capability to reference (or refer
to) a relation R when specifying integrity constraints. This privilege can also be
restricted to specific attributes of R.

14.2.2. Revoking of Privileges

In some cases, it is desirable to grant a privilege to a user temporarily. For example, the
owner of a relation may want to grant the SELECT privilege to a user for a specific task and
then revoke that privilege once the task is completed. Hence, a mechanism for revoking
privileges is needed. In SQL, a REVOKE command is included for the purpose of canceling
privileges.

14.2.3. Propagation of Privileges Using the GRANT OPTION

Whenever the owner A of a relation R grants a privilege on R to another account B, the


privilege can be given to B with or without the GRANT OPTION. If the GRANT OPTION is
given, this means that B can also grant that privilege on R to other accounts. Suppose that B is
given the GRANT OPTION by A; B then can also grant the privilege on R to a third account C
with the GRANT OPTION. In this way, privileges on R can propagate to other accounts
without the knowledge of the owner of R. If the owner account A now revokes the privilege
granted to B, all the privileges that B propagated based on that privilege should automatically
be revoked by the system.
14.6 Databases Security

It is possible for a user to receive a certain privilege from two or more sources. For
example, A4 may receive a certain UPDATE R privilege from both A2 and A3. In such a case,
if A2 revokes this privilege from A4, A4 will still continue to have the privilege by virtue of
having been granted it from A3. If A3 later revokes the privilege from A4, A4 totally loses the
privilege. Hence, a DBMS that allows propagation of privileges must keep track of how all the
privileges were granted in the form of some internal log so that revoking of privileges can be
done correctly and completely.

14.2.4. An Example to Illustrate Granting and Revoking of Privileges

Suppose that the DBA creates four accounts—A1, A2, A3, and A4—and wants only A1
to be able to create base relations. To do this, the DBA must issue the following GRANT
command in SQL:

GRANT CREATETAB TO A1;

The CREATETAB (create table) privilege gives account A1 the capability to create new
database tables (base relations) and is hence an account privilege. Note that A1, A2, and so
forth may be individuals, like John in IT department or Mary in marketing; but they may also
be applications or programs that want to access a database.

In SQL2, the same effect can be accomplished by having the DBA issue a CREATE
SCHEMA command, as follows:

CREATE SCHEMA EXAMPLE AUTHORIZATION A1;

User account A1 can now create tables under the schema called EXAMPLE. To
continue our example, suppose that A1 creates the two base relations EMPLOYEE and
DEPARTMENT, A1 is then the owner of these two relations and hence has all the relation
privileges on each of them.

Next, suppose that account A1 wants to grant to account A2 the privilege to insert and
delete tuples in both of these relations. However, A1 does not want A2 to be able to propagate
these privileges to additional accounts. A1 can issue the following command:

GRANT INSERT, DELETE ON EMPLOYEE, DEPARTMENT TO A2;

The owner account A1 of a relation automatically has the GRANT OPTION, allowing
it to grant privileges on the relation to other accounts. However, account A2 cannot grant
INSERT and DELETE privileges on the EMPLOYEE and DEPARTMENT tables because A2
was not given the GRANT OPTION in the preceding command.
Database Management Systems 14.7

Next, suppose that A1 wants to allow account A3 to retrieve information from either of
the two tables and also to be able to propagate the SELECT privilege to other accounts. A1 can
issue the following command:

GRANT SELECT ON EMPLOYEE, DEPARTMENT TO A3 WITH GRANT OPTION;

The clause WITH GRANT OPTION means that A3 can now propagate the privilege to
other accounts by using GRANT. For example, A3 can grant the SELECT privilege on the
EMPLOYEE relation to A4 by issuing the following command:

GRANT SELECT ON EMPLOYEE TO A4;

Now suppose that A1 decides to revoke the SELECT privilege on the EMPLOYEE
relation from A3; A1 then can issue this command:

REVOKE SELECT ON EMPLOYEE FROM A3;

The DBMS must now revoke the SELECT privilege on EMPLOYEE from A3, and it
must also automatically revoke the SELECT privilege on EMPLOYEE from A4. This is
because A3 granted that privilege to A4, but A3 does not have the privilege any more.

Next, suppose that A1 wants to give back to A3 a limited capability to SELECT from
the EMPLOYEE relation and wants to allow A3 to be able to propagate the privilege. The
limitation is to retrieve only the Name, Bdate, and Address attributes and only for the tuples
with Dno = 5. A1 then can create the following view:

CREATE VIEW A3EMPLOYEE AS

SELECT Name, Bdate, Address

FROM EMPLOYEE

WHERE Dno = 5;

After the view is created, A1 can grant SELECT on the view A3EMPLOYEE to A3 as follows:

GRANT SELECT ON A3EMPLOYEE TO A3 WITH GRANT OPTION;

The UPDATE and INSERT privileges can specify particular attributes that may be
updated or inserted in a relation. Finally, suppose that A1 wants to allow A4 to update only the
Salary attribute of EMPLOYEE; A1 can then issue the following command:

GRANT UPDATE ON EMPLOYEE (Salary) TO A4;


14.8 Databases Security

14.2.5. Specifying Limits on Propagation of Privileges

Limiting horizontal propagation to an integer number i mean that an account B given


the GRANT OPTION can grant the privilege to at most i other accounts.

If account A grants a privilege to account B with the vertical propagation set to an


integer number j > 0, this means that the account B has the GRANT OPTION on that privilege,
but B can grant the privilege to other accounts only with a vertical propagation less than j. In
effect, vertical propagation limits the sequence of GRANT OPTIONS that can be given from
one account to the next based on a single original grant of the privilege.

14.3. ROLE-BASED ACCESS CONTROL

Role-based access control (RBAC) has emerged as a proven technology for managing
and enforcing security in large-scale enterprise-wide systems. Its basic notion is that privileges
and other permissions are associated with organizational roles rather than with individual users.
Individual users are then assigned to appropriate roles. Roles can be created using the CREATE
ROLE and DESTROY ROLE commands. The GRANT and REVOKE can then be used to
assign and revoke privileges from roles, as well as for individual users when needed. For
example, a company may have roles such as sales account manager, purchasing agent, mailroom
clerk, customer service manager, and so on. Multiple individuals can be assigned to each role.
Security privileges that are common to a role are granted to the role name, and any individual
assigned to this role would automatically have those privileges granted.

RBAC can be used with traditional access controls; it ensures that only authorized users
in their specified roles are given access to certain data or resources.

Each session can be assigned to several roles, but it maps to one user or a single subject
only. Many DBMSs have allowed the concept of roles, where privileges can be assigned to
roles.

Two roles are said to be mutually exclusive if both the roles cannot be used
simultaneously by the user. Mutual exclusion of roles can be categorized into two types, namely
authorization time exclusion (static) and runtime exclusion (dynamic). In authorization time
exclusion, two roles that have been specified as mutually exclusive cannot be part of a user’s
authorization at the same time. In runtime exclusion, both these roles can be authorized to one
user but cannot be activated by the user at the same time. Another variation in mutual exclusion
of roles is that of complete and partial exclusion.

The role hierarchy in RBAC is a natural way to organize roles to reflect the
organization’s lines of authority and responsibility. By convention, junior roles at the bottom
are connected to progressively senior roles as one move up the hierarchy. The hierarchic
Database Management Systems 14.9

diagrams are partial orders, so they are reflexive, transitive, and antisymmetric. In other words,
if a user has one role, the user automatically has roles lower in the hierarchy. Defining a role
hierarchy involves choosing the type of hierarchy and the roles, and then implementing the
hierarchy by granting roles to other roles. Role hierarchy can be implemented in the following
manner:

GRANT ROLE full_time TO employee_type1


GRANT ROLE intern TO employee_type2
The above are examples of granting the roles full_time and intern to two types of
employees.
Another important consideration in RBAC systems is the possible temporal constraints
that may exist on roles, such as the time and duration of role activations and the timed triggering
of a role by an activation of another role.
RBAC models have several desirable features, such as flexibility, policy neutrality,
better support for security management and administration, and a natural enforcement of the
hierarchical organization structure within organizations. Furthermore, an RBAC model
provides mechanisms for addressing the security issues related to the execution of tasks and
workflows, and for specifying user-defined and organization-specific policies. Easier
deployment over the Internet has been another reason for the success of RBAC models.
14.4. SQL INJECTION
SQL injection is one of the most common threats to a database system. Some of the
other frequent attacks on databases are:
• Unauthorized privilege escalation: This attack is characterized by an individual
attempting to elevate his or her privilege by attacking vulnerable points in the database
systems.
• Privilege abuse: Whereas unauthorized privilege escalation is done by an unauthorized
user, this attack is performed by a privileged user. For example, an administrator who is
allowed to change student information can use this privilege to update student grades
without the instructor’s permission.
• Denial of service: A denial of service (DOS) attack is an attempt to make resources
unavailable to its intended users. It is a general attack category in which access to
network applications or data is denied to intended users by overflowing the buffer or
consuming resources.
• Weak authentication: If the user authentication scheme is weak, an attacker can
impersonate the identity of a legitimate user by obtaining her login credentials.
14.10 Databases Security

14.4.1. SQL injection methods

In an SQL injection attack, the attacker injects a string input through the application,
which changes or manipulates the SQL statement to the attacker’s advantage. An SQL injection
attack can harm the database in various ways, such as unauthorized manipulation of the database
or retrieval of sensitive data. It can also be used to execute system-level commands that may
cause the system to deny service to the application.

SQL Manipulation:

A manipulation attack, which is the most common type of injection attack, changes an
SQL command in the application—for example, by adding conditions to the WHERE-clause of
a query, or by expanding a query with additional query components using set operations such
as UNION, INTERSECT, or MINUS. Other types of manipulation attacks are also possible.
For example, suppose that a simplistic authentication procedure issues the following query and
checks to see if any rows were returned:

SELECT * FROM users WHERE username = ‘jake’ and PASSWORD =‘jakespasswd’ ;

The attacker can try to change (or manipulate) the SQL statement by changing it as follows:

SELECT * FROM users WHERE username = ‘jake’ and (PASSWORD =‘jakespasswd’ or ‘x’
= ‘x’);

As a result, the attacker who knows that ‘jake’ is a valid login of some user is able to
log into the database system as ‘jake’ without knowing his password and is able to do everything
that ‘jake’ may be authorized to do to the database system.

Code Injection:

This type of attack attempts to add additional SQL statements or commands to the
existing SQL statement by exploiting a computer bug, which is caused by processing invalid
data. The attacker can inject or introduce code into a computer program to change the course of
execution. Code injection is a popular technique for system hacking or cracking to gain
information.

Function Call Injection:

In this kind of attack, a database function or operating system function call is inserted
into a vulnerable SQL statement to manipulate the data or make a privileged system call. For
example, it is possible to exploit a function that performs some aspect related to network
communication. In addition, functions that are contained in a customized database package, or
any custom database function, can be executed as part of an SQL query. For example, the dual
Database Management Systems 14.11

table is used in the FROM clause of SQL in Oracle when a user needs to run SQL that does not
logically have a table name. To get today’s date, we can use:

SELECT SYSDATE FROM dual;

The following example demonstrates that even the simplest SQL statements can be
vulnerable.

SELECT TRANSLATE (‘user input’, ‘from_string’, ‘to_string’) FROM dual;

Here, TRANSLATE is used to replace a string of characters with another string of


characters. The TRANSLATE function above will replace the characters of the ‘from_string’
with the characters in the ‘to_string’ one by one. This means that the f will be replaced with the
t, the r with the o, the o with the _, and so on.

This type of SQL statement can be subjected to a function injection attack. Consider the
following example:
SELECT TRANSLATE (“ || UTL_HTTP.REQUEST(‘http://129.107.2.1/’) || ”,‘98765432’,
‘9876’) FROM dual;
The user can input the string (“ || UTL_HTTP.REQUEST (‘http://129.107.2.1/’) ||”),
where || is the concatenate operator, thus requesting a page from a Web server. UTL_HTTP
makes Hypertext Transfer Protocol (HTTP) callouts from SQL. The REQUEST object takes a
URL (‘http://129.107.2.1/’ in this example) as a parameter, contacts that site, and returns the
data (typically HTML) obtained from that site. The attacker could manipulate the string he
inputs, as well as the URL, to include other functions and do other illegal operations. We just
used a dummy example to show conversion of ‘98765432’ to ‘9876’, but the user’s intent would
be to access the URL and get sensitive information. The attacker can then retrieve useful
information from the database server—located at the URL that is passed as a parameter—and
send it to the Web server.
14.4.2. Risks Associated with SQL Injection
SQL injection is harmful and the risks associated with it provide motivation for
attackers. Some of the risks associated with SQL injection attacks are:
• Database fingerprinting: The attacker can determine the type of database being used in
the backend so that he can use database-specific attacks that correspond to weaknesses
in a particular DBMS.
• Denial of service: The attacker can flood the server with requests, thus denying service
to valid users, or the attacker can delete some data.
14.12 Databases Security

• Bypassing authentication: This is one of the most common risks, in which the attacker
can gain access to the database as an authorized user and perform all the desired tasks.
• Identifying injectable parameters: In this type of attack, the attacker gathers important
information about the type and structure of the back-end database of a Web application.
This attack is made possible by the fact that the default error page returned by
application servers is often overly descriptive.
• Executing remote commands: This provides attackers with a tool to execute arbitrary
commands on the database. For example, a remote user can execute stored database
procedures and functions from a remote SQL interactive interface.
• Performing privilege escalation: This type of attack takes advantage of logical flaws
within the database to upgrade the access level.
14.4.3. Protection Techniques against SQL Injection
Protection against SQL injection attacks can be achieved by applying certain
programming rules to all Web-accessible procedures and functions. This section describes some
of these techniques.
Bind Variables (Using Parameterized Statements): The use of bind variables protects against
injection attacks and also improves performance.
Consider the following example using Java and JDBC:
PreparedStatement stmt = conn.prepareStatement( “SELECT * FROM
EMPLOYEE WHERE EMPLOYEE_ID=? AND PASSWORD=?”);
stmt.setString(1, employee_id);
stmt.setString(2, password);
Instead of embedding the user input into the statement, the input should be bound to a
parameter. In this example, the input ‘1’ is assigned (bound) to a bind variable ‘employee_id’
and input ‘2’ to the bind variable ‘password’ instead of directly passing string parameters.
Filtering Input (Input Validation):
This technique can be used to remove escape characters from input strings by using the
SQL Replace function. For example, the delimiter single quote (‘) can be replaced by two single
quotes (‘’).
Function Security:
Database functions, both standard and custom, should be restricted, as they can be
exploited in the SQL function injection attacks.
Database Management Systems 14.13

14.5. STATISTICAL DATABASE SECURITY

Statistical databases are used mainly to produce statistics about various populations. The
database may contain confidential data about individuals; this information should be protected
from user access. However, users are permitted to retrieve statistical information about the
populations, such as averages, sums, counts, maximums, minimums, and standard deviations.

Example: Let us consider PERSON relation with the attributes Name, Ssn, Income, Address,
City, State, Zip, Sex, and Last_degree for illustrating statistical database security.

A population is a set of tuples of a relation (table) that satisfy some selection condition.
Hence, each selection condition on the PERSON relation will specify a particular population of
PERSON tuples. For example, the condition Sex = ‘M’ specifies the male population; the
condition ((Sex = ‘F’) AND (Last_degree = ‘M.S.’ OR Last_degree = ‘Ph.D.’)) specifies the
female population that has an M.S. or Ph.D. degree as their highest degree; and the condition
City = ‘Houston’ specifies the population that lives in Houston.

However, statistical users are not allowed to retrieve individual data, such as the income
of a specific person. Statistical database security techniques must prohibit the retrieval of
individual data. This can be achieved by prohibiting queries that retrieve attribute values and
by allowing only queries that involve statistical aggregate functions such as COUNT, SUM,
MIN, MAX, AVERAGE, and STANDARD DEVIATION. Such queries are sometimes called
statistical queries.

In some cases, it is possible to infer the values of individual tuples from a sequence of
statistical queries. This is particularly true when the conditions result in a population consisting
of a small number of tuples. As an illustration, consider the following statistical queries:

Q1: SELECT COUNT (*) FROM PERSON WHERE <condition>;

Q2: SELECT AVG (Income) FROM PERSON WHERE <condition>;

Now suppose that we are interested in finding the Salary of Jane Smith, and we know
that she has a Ph.D. degree and that she lives in the city of Bellaire, Texas. We issue the
statistical query Q1 with the following condition:

(Last_degree=‘Ph.D.’ AND Sex=‘F’ AND City=‘Bellaire’ AND State=‘Texas’)


14.14 Databases Security

If we get a result of 1 for this query, we can issue Q2 with the same condition and find
the Salary of Jane Smith. Even if the result of Q1 on the preceding condition is not 1 but is a
small number—say 2 or 3—we can issue statistical queries using the functions MAX, MIN,
and AVERAGE to identify the possible range of values for the Salary of Jane Smith.

• The possibility of inferring individual information from statistical queries is reduced if


no statistical queries are permitted whenever the number of tuples in the population
specified by the selection condition falls below some threshold.

• Another technique for prohibiting retrieval of individual information is to prohibit


sequences of queries that refer repeatedly to the same population of tuples. It is also
possible to introduce slight inaccuracies or noise into the results of statistical queries
deliberately, to make it difficult to deduce individual information from the results.

• Another technique is partitioning of the database. Partitioning implies that records are
stored in groups of some minimum size; queries can refer to any complete group or set
of groups, but never to subsets of records within a group.

14.6. FLOW CONTROL

Flow control regulates the distribution or flow of information among accessible objects.
A flow between object X and object Y occurs when a program reads values from X and writes
values into Y. Flow controls check that information contained in some objects does not flow
explicitly or implicitly into less protected objects. Thus, a user cannot get indirectly in Y what
he or she cannot get directly in X. Most flow controls employ some concept of security class;
the transfer of information from a sender to a receiver is allowed only if the receiver’s security
class is at least as privileged as the sender’s security class. Example: Preventing a service
program from leaking a customer’s confidential data, and blocking the transmission of secret
military data to an unknown classified user.

A flow policy specifies the channels along which information is allowed to move. The
simplest flow policy specifies just two classes of information—confidential (C) and
nonconfidential (N)—and allows all flows except those from class C to class N. This policy can
solve the confinement problem that arises when a service program handles data such as
customer information, some of which may be confidential. For example, an income-tax-
computing service might be allowed to retain a customer’s address and the bill for services
rendered, but not a customer’s income or deductions.
Database Management Systems 14.15

Two types of flow can be distinguished: explicit flows, which occur as a consequence
of assignment instructions, such as Y:= f(X1,Xn,); and implicit flows, which are generated by
conditional instructions, such as if f(Xm+1, … , Xn) then Y:= f (X1,Xm).

14.7. ENCRYPTION AND PUBLIC KEY INFRASTRUCTURES

Encryption is the conversion of data into a form, called a ciphertext that cannot be easily
understood by unauthorized persons. It enhances security and privacy when access controls are
bypassed, because in cases of data loss or theft, encrypted data cannot be easily understood by
unauthorized persons. Some of the standard definitions are listed below:

• Ciphertext: Encrypted (enciphered) data

• Plaintext (or cleartext): Intelligible data that has meaning and can be read or acted
upon without the application of decryption

• Encryption: The process of transforming plaintext into ciphertext

• Decryption: The process of transforming ciphertext back into plaintext Encryption


consists of applying an encryption algorithm to data using some prespecified encryption
key. The resulting data must be decrypted using a decryption key to recover the original
data.

14.7.1. Symmetric Key Algorithms

A symmetric key is one key that is used for both encryption and decryption. By using a
symmetric key, fast encryption and decryption is possible for routine use with sensitive data in
the database. A message encrypted with a secret key can be decrypted only with the same secret
key. Algorithms used for symmetric key encryption are called secret key algorithms. Since
secret-key algorithms are mostly used for encrypting the content of a message, they are also
called content-encryption algorithms.

14.7.2. Public (Asymmetric) Key Encryption

Public key algorithms are based on mathematical functions rather than operations on bit
patterns. They address one drawback of symmetric key encryption, namely that both sender and
recipient must exchange the common key in a secure manner. In public key systems, two keys
are used for encryption/decryption. The public key can be transmitted in a nonsecure way,
whereas the private key is not transmitted at all. These algorithms—which use two related keys,
a public key and a private key, to perform complementary operations (encryption and
decryption)—are known as asymmetric key encryption algorithms. The two keys used for
public key encryption are referred to as the public key and the private key. The private key is
14.16 Databases Security

kept secret, but it is referred to as a private key rather than a secret key (the key used in
conventional encryption) to avoid confusion with conventional encryption.

A public key encryption scheme, or infrastructure, has six ingredients:

1. Plaintext. This is the data or readable message that is fed into the algorithm as input.

2. Encryption algorithm. This algorithm performs various transformations on the


plaintext.

3. Public and private keys. These are a pair of keys that have been selected so that if one
is used for encryption, the other is used for decryption. The exact transformations
performed by the encryption algorithm depend on the public or private key that is
provided as input. For example, if a message is encrypted using the public key, it can
only be decrypted using the private key.

4. Ciphertext. This is the scrambled message produced as output. It depends on the


plaintext and the key. For a given message, two different keys will produce two different
ciphertexts.

5. Decryption algorithm. This algorithm accepts the ciphertext and the matching key and
produces the original plaintext.

The essential steps are as follows:

1. Each user generates a pair of keys to be used for the encryption and decryption of
messages.

2. Each user places one of the two keys in a public register or other accessible file. This is
the public key. The companion key is kept private.

3. If a sender wishes to send a private message to a receiver, the sender encrypts the
message using the receiver’s public key.

4. When the receiver receives the message, he or she decrypts it using the receiver’s private
key. No other recipient can decrypt the message because only the receiver knows his or
her private key.

14.8. CHALLENGES TO MAINTAINING DATABASE SECURITY

Considering the vast growth in volume and speed of threats to databases and information
assets, research efforts need to be devoted to a number of issues: data quality, intellectual
property rights, and database survivability.
Database Management Systems 14.17

Data Quality:
The database community needs techniques and organizational solutions to assess and
attest to the quality of data. These techniques may include simple mechanisms such as quality
stamps that are posted on Web sites. We also need techniques that provide more effective
integrity semantics verification and tools for the assessment of data quality, based on techniques
such as record linkage. Application-level recovery techniques are also needed for automatically
repairing incorrect data.
Intellectual Property Rights:
With the widespread use of the Internet and intranets, legal and informational aspects of
data are becoming major concerns for organizations. To address these concerns, watermarking
techniques for relational data have been proposed. Digital watermarking has traditionally relied
upon the availability of a large noise domain within which the object can be altered while
retaining its essential properties. However, research is needed to assess the robustness of such
techniques and to investigate different approaches aimed at preventing intellectual property
rights violations
Database Survivability:
Database systems need to operate and continue their functions, even with reduced
capabilities, despite disruptive events such as information warfare attacks. A DBMS, in addition
to making every effort to prevent an attack and detecting one in the event of occurrence, should
be able to do the following:
• Confinement: Take immediate action to eliminate the attacker’s access to the system
and to isolate or contain the problem to prevent further spread.
• Damage assessment: Determine the extent of the problem, including failed functions
and corrupted data.
• Reconfiguration: Reconfigure to allow operation to continue in a degraded mode while
recovery proceeds.
• Repair: Recover corrupted or lost data and repair or reinstall failed system functions to
reestablish a normal level of operation.
• Fault treatment: To the extent possible, identify the weaknesses exploited in the attack
and take steps to prevent a recurrence.
The specific target of an attack may be the system itself or its data. Although attacks
that bring the system down outright are severe and dramatic, they must also be well timed to
achieve the attacker’s goal, since attacks will receive immediate and concentrated attention in
order to bring the system back to operational condition, diagnose how the attack took place, and
install preventive measures.

You might also like