0% found this document useful (0 votes)
27 views124 pages

23acs12 Dbms Complete Notes Unit i to V

The document provides an introduction to Database Management Systems (DBMS), defining data, the functions of a DBMS, and its advantages over traditional file processing systems. It discusses the properties of databases, various applications, and highlights the drawbacks of file processing systems, such as data redundancy and difficulty in accessing data. Additionally, it covers data models, levels of data abstraction, and the three-schema architecture for organizing data efficiently.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views124 pages

23acs12 Dbms Complete Notes Unit i to V

The document provides an introduction to Database Management Systems (DBMS), defining data, the functions of a DBMS, and its advantages over traditional file processing systems. It discusses the properties of databases, various applications, and highlights the drawbacks of file processing systems, such as data redundancy and difficulty in accessing data. Additionally, it covers data models, levels of data abstraction, and the three-schema architecture for organizing data efficiently.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 124

Unit 1

1.1 Introduction to DBMS:

Definition of Data: Data, we mean known facts that can be recorded and that have implicit
meaning. For example, consider the names, telephone numbers, and addresses of the people you
know.

Database Management System (DBMS) is a combination of two words that is database &
management system. Combining the meaning of both gives the definition of DBMS.

A database-management system (DBMS) is a collection of interrelated data and a


set of programs to access those data.

A database management system (DBMS) is a collection of programs that enables users to create
and maintain a database. The DBMS is hence a general-purpose software system that facilitates
the processes of defining, constructing, manipulating, and sharing databases among various
users and applications. Defining a database involves specifying the data types, structures, and
constraints for the data to be stored in the database.Constructing the database is the process of
storing the data itself on some storage medium that is controlled by the DBMS. Manipulating a
database includes such functions as querying the database to retrieve specific data, updating the
database to reflect changes in the miniworld, and generating reports from the data. Sharing a
database allows multiple users and programs to access the database concurrently.

General properties of database:


A database has the following implicit properties:
• A database represents some aspect of the real world, sometimes called the miniworld or the
universe of discourse (DoD). Changes to the miniworld are reflected in the database.
• A database is a logically coherent collection of data with some inherent meaning. A random
assortment of data cannot correctly be referred to as a database.
• A database is designed, built, and populated with data for a specific purpose. It has an intended
group of users and some preconceived applications in which these users are interested.

DATABASE SYSTEM APPLICATION:


• Banking: For customer information, accounts, and loans, and banking transactions.
• Airlines: For reservations and schedule information. Airlines were among the first to use
databases in a geographically distributed manner—terminals situated around the world accessed
the central database system through phone lines and other data networks.
• Universities: For student information, course registrations, and grades.
• Credit card transactions: For purchases on credit cards and generation of monthly statements.
• Telecommunication: For keeping records of calls made, generating monthly bills, maintaining
balances on prepaid calling cards, and storing information about the communication networks.
• Finance: For storing information about holdings, sales, and purchases of financial instruments
such as stocks and bonds.
• Sales: For customer, product, and purchase information.

Unit 1 2
• Manufacturing: For management of supply chain and for tracking production of items in
factories, inventories of items in warehouses/stores, and orders for items.
• Human resources: For information about employees, salaries, payroll taxes and
benefits, and for generation of paychecks.

WHAT IS FILE PROCESSING SYSTEM?


This typical file-processing system is supported by a conventional operating system. The system
stores permanent records in various files, and it needs different application programs to extract
records from, and add records to, the appropriate files. Before database management systems
(DBMSs) came along, organizations usually
stored information in such systems.

DRAWBACKS OF FILE PROCESSING SYSTEM:

1)Data redundancy and inconsistency: Since different programmers create the files and
application programs over a long period, the various files are likely to have different formats and
the programs may be written in several programming languages. Moreover, the same information
may be duplicated in several places (files). For example, the address and telephone number of a
particular customer may appear in a file that consists of savings-account records and in a file that
consists of checking-account records. This redundancy leads
to higher storage and access cost. In addition, it may lead to data inconsistency; that is, the
various copies of the same data may no longer agree. For example, a changed customer address
may be reflected in savings-account records but not elsewhere in the system.
2) Difficulty in accessing data: conventional file-processing environments do not allow
needed data to be retrieved in a convenient and efficient manner .Suppose that one of the bank
officers needs to find out the names of all customers who live within a particular postal-code
area. The officer asks the data-processing department to generate such a list. Because the
designers of the original system did not anticipate this request, there is no application program on
hand to meet it. There is, however, an application program to generate the list of all customers.
The bank officer has now two choices: either obtain the list of all customers manually and extract
the needed information manually or ask a system programmer to write the necessary application
program. Both alternatives are obviously unsatisfactory.
3) Data isolation. Because data are scattered in various files, and files may be in different
formats, writing new application programs to retrieve the appropriate data is difficult.
4)Integrity problems. The data values stored in the database must satisfy certain
types of consistency constraints. For example, the balance of a bank account
may never fall below a prescribed amount (say, $25).
5) Atomicity problems. A computer system, like any other mechanical or electrical device, is
subject to failure. In many applications, it is crucial that, if a failure occurs, the data be restored
to the consistent state that existed prior to the failure. Consider a program to transfer $50 from
account A to account B. If a system failure occurs during the execution of the program, it is
possible that the $50 was removed from account A but was not credited to account B, resulting in
an inconsistent database state. That is, the funds transfer must be atomic—it must happen in its
entirety or not at all. It is difficult to ensure atomicity in a conventional file-processing system.
6)Concurrent-access anomalies. For the sake of overall performance of the system and faster
response, many systems allow multiple users to update the data simultaneously. In such an

Unit 1 3
environment, interaction of concurrent updates may result in inconsistent data. To guard against
this possibility, the system must maintain some form of supervision. But supervision is difficult
to provide because data may be accessed by many different application programs.
7) Security problems. Not every user of the database system should be able to access all the
data. For example, in a banking system, payroll personnel need to see only that part of the
database that has information about the various bank employees. They do not need access to
information about customer accounts.But, since application programs are added to the system in
an ad hoc manner, enforcing such security constraints is difficult.

Advantages of DBMS:
The following advantages perform by the DMBS.
(1) Reduce Randomly (Duplication): Centralizes control of data by the DBA avoid a necessary
duplication of data & effectively reduce the total amount of data storage.
(2) Shared data: The data base allows the sharing of data under it control by any number of
application programs on users.
(3) Integrity: Centralizes control can also insure that they are incorporate in DBMS to provide
data integrity that is data available on single system & access by many people.
(4) Security: Data is very important to and organization & may be confidential. Such
confidential data must not be accessible by unauthorized user. The DBA who has the altimeters
possibilities for the data in the DBMS can ensure that proper access procedure including
authentication. DBMS check the permission before provide any access to other users.
(5) Conflict Regulations: Since the data base is under the control of DBA, Hence, various user
can not access any data without the permission. < Logging name, Password >.
(6) Data independent: Data can be a physical or logical both data are independent so that change
occur in hardware or software can not affect the access of data.
Disadvantages of DBMS:
1) The cost of purchasing & developing is more because it is more expansive than other
applications
2)backup and recovery operation are complex..
3)more workspace is required for its execution and storage.
4)excessive data entries may currupt the total data.

Functions of DBMS:
1)addition of new data.
2)sorting of data.
3)searching particular data.
4)printing particular data.
5)editing or changing sorted data.
6)deleting data.

Unit 1 4
DIFFERENCE BETWEEN FILE SYSTEM & DBMS
FILE SYSTEM DBMS

2)there is inconsistency of data. 2) inconsistency of data is reduced as


redundancy is reduced.
3)no provision for data security. 3)provision of data security is made.
4)no standard representation of data. 4) standard representation of data is achieved
using relational data model.
1)high rate of redundancy of data exists in a 1)redundancy is reduced.
typical file processing system.
5)data integrity is not there. 5)data integrity is there.
6)data can not be accessed easily. 6)data can be accessed easily through rows and
columns.
7)data cannot be shared. 7) data can be shared.
8)retrieval of data is time consuming. 8) retrieval of data is easy.
9)program-data independence is not there. 9) program-data independence is there.

Data Abstraction:

Figure 1.1 The three levels of data abstraction

For the system to be usable, it must retrieve data efficiently. The need for efficiency has led
designers to use complex data structures to represent data in the database. Since many database-
systems users are not computer trained, developers hide the complexity from users through
several levels of abstraction, to simplify users’ interactions with the system:
 Physical level- The lowest level of abstraction describes how the data are actually stored.
The physical level describes complex low-level data structures in detail.
 Logical level- The next-higher level of abstraction describes what data are stored in the
database, and what relationships exist among those data. Database administrators, who must
decide what information to keep in the database, use the logical level of abstraction.
 View level- The highest level of abstraction describes only part of the entire database.
Even though the logical level uses simpler structures, complexity remains because of the variety
of information stored in a large database. Many users of the database system do not need all this
information; instead, they need to access only a part of the database. The view level of

Unit 1 5
abstraction exists to simplify their interaction with the system. The system may provide many
views for the same database.
Figure 1.1 shows the relationship among the three levels of abstraction.
Eg.An analogy to the concept of data types in programming languages may clarify the
distinction among levels of abstraction. Most high-level programming languages support the
notion of a record type. For example, in a Pascal-like language, we may declare a record as
follows:
type customer = record
customer-id : string;
customer-name : string;
customer-street : string;
customer-city : string;
end;
This code defines a new record type called customer with four fields. Each field has a name and
a type associated with it. A banking enterprise may have several such record types, including
• account, with fields account-number and balance
• employee, with fields employee-name and salary

*At the physical level, a customer, account, or employee record can be described as a block of
consecutive storage locations (for example, words or bytes). The language compiler hides this
level of detail from programmers.
*At the logical level, each such record is described by a type definition, as in the previous code
segment, and the interrelationship of these record types is defined as well. Programmers using a
programming language work at this level of abstraction. Similarly, database administrators
usually work at this level of abstraction.
*Finally, at the view level, computer users see a set of application programs that hide details of
the data types. Similarly, at the view level, several views of the database are defined, and
database users see these views. In addition to hiding details of the logical level of the database,
the views also provide a security mechanism to prevent users from accessing certain parts of the
database. For example, tellers in a bank see only that part of the database that has information on
customer accounts; they cannot access information about salaries of employees.

DATA MODELS:
Underlying the structure of a database is the data model: a collection of conceptual tools for
describing data, data relationships, data semantics, and consistency constraints.
A data model provides a way to describe the design of database at physical, logical and view
level.
1) Object based logical model
2) Record based logical model
3) Physical data model

Object base logical model :


Object based logical model are used in describing data at the conceptual & view levels. There are
different model use as object base logical model.
a) The entity relationship model
b) The oriented model

Unit 1 6
c) Semantic data model
d) Functional data model.
a)The entity relationship model:
The entity relationship model The entity-relationship (E-R) data model is based on a perception
of a real world that consists of a collection of basic objects, called entities, and of relationships
among these objects.A relationship is an association among several entities. For example, a
depositor relationship associates a customer with each account that she has. The set of all entities
of the same type and the set of all relationships of the same type are termed an entity set and
relationship set, respectively.The overall logical structure (schema) of a database can be
expressed graphically by an E-R diagram, which is built up from the following components:
• Rectangles - which represent entity sets
• Ellipses - which represent attributes
• Diamonds - which represent relationships among entity sets
• Lines - which link attributes to entity sets and entity sets to relationships.

b)Object oriented model : This model is based on a collection of object. An object contains
values store in a variable with the object. The object oriented model accept the values from
variables & functions.
2)Record based logical model :
Record based logical model are used at the conpetual & view levels. It use all the object oriented
base logical data. They are classified in 3 groups.
i)Relational model
ii)Network model .
iii)Hierarchical model
i)Relational model : Relational model represent data & relationship among data by a collection
of table. Each table contains number of rows & columns & they are logically group up into the
single unit.
E.g.:
Customer Table Account Table

Name SSN Street Account No. Account No. Balance


Raj 123 Pimpri A101 A101 1000
Ravi 456 Pune A!02 A102 2000
Smith 789 Bombay A103 A103 3000
Raj 123 North A104 A104 4000

ii) Network model : Data in the network model are represented by collection of records &
relationship as parent-child. It records or organize in a arbitrary graphs. Hence, each node
logically connect to each other.

Unit 1 7
Raj 123 Pimpri A101 A101 1000
Ravi 456 Pune A102 A102 2000
Smith 789 Bombay A103
A103 3000
A104 4000
iii)Hierarchical Model : Data in the hierarchical model represented as a tree & manage all data in
parent – child relationship. All data internally connected to each other but their relations as a tree
structure.
(Refer fig. from labmanual)
3) Physical data model
Physical data model are used to described data at the lowest level. This data model concered with
physical data available in disk & stored the structure with its column name & data type. These
physical data model are consist of following models.
1) unifying model.
2) frame memory model.

Some Important Definations:


Database Schema :The description of a database is called the database schema, which is
specified during database design and is not expected to change frcquentlv.

Schemas Diagrams : Most data models have certain conventions for displaying schemas as
diagrams. A displayed schema is called a schema diagram.

Instances / Database State :The data in the database at a particular moment in time is called a
database state or snapshot. It is also called the current set of occurrences or instances in the
database.

Unit 1 8
The Three-Schema Architecture:

FIGURE 2.2 The three-schema architecture

The goal of the three-schema architecture, illustrated in Figure 2.2, is to separate the user
applications and the physical database. In this architecture, schemas can be defined at the
following three levels:
1. The internal level has an internal schema, which describes the physical storage structure of the
database. The internal schema uses a physical data model and describes the complete details of
data storage and access paths for the database.
2. The conceptual level has a conceptual schema, which describes the structure of the whole
database for a community of users. The conceptual schema hides the details of physical storage
structures and concentrates on describing entities, data types, relationships, user operations, and
constraints. Usually, a representational data model is used to describe the conceptual schema
when a database system is implemented.
3. The external or view level includes a number of external schemas or user views. Each external
schema describes the part of the database that a particular user group is interested in and hides
the rest of the database from that user group.

*MAPPING : The processes of transforming requests and results between levels are called
mappings.

Unit 1 9
Data Independence:
The three-schema architecture can be used to further explain the concept of data independence.
Data Independence:Is the capacity to change the schema at one level of a database system
without having to change the schema at the next higher level. We can define two types of data
independence
1. Logical data independence is the capacity to change the conceptual schema without having
to change external schernas or application programs. We may change the conceptual schema to
expand the database (by adding a record type or data item), to change constraints, or to reduce
the database (by removing a record type or data item).
2. Physical data independence is the capacity to change the internal schema without having to
change the conceptual schema. Hence, the external schemas need not be changed as well.
Changes to the internal schema may be needed because some physical files had to be
reorganized-for example, by creating additional access structures-to improve the performance of
retrieval or update. If the same data as before remains in the database, we should not have to
change the conceptual schema.

Database Languages:
A database system provides a data definition language to specify the database schema and a data
manipulation language to express database queries and updates.
Data-Definition Language
-We specify a database schema by a set of definitions expressed by a special language called a
data-definition language (DDL).
For instance, the following statement in the SQL language defines the account table:
create table account
(account-number char(10),
balance integer)
Execution of the above DDL statement creates the account table. In addition, it updates a special
set of tables called the data dictionary or data directory.
Data-Manipulation Language
Data manipulation is
• The retrieval of information stored in the database
• The insertion of new information into the database
• The deletion of information from the database
• The modification of information stored in the database
-A data-manipulation language (DML) is a language that enables users to access or manipulate
data as organized by the appropriate data model.
There are basically two types:
• Procedural DMLs require a user to specify what data are needed and how to get those data.
• Declarative DMLs (also referred to as nonprocedural DMLs) require a user to specify what data
are needed without specifying how to get those data.

*A query is a statement requesting the retrieval of information. The portion of a


DML that involves information retrieval is called a query language.

Unit 1 10
Database Users:
There are four different types of database-system users, differentiated by the way they expect to
interact with the system.
1) Naive users: are unsophisticated users who interact with the system by invoking one of the
application programs that have been written previously. Forexample, a bank teller who needs to
transfer $50 from account A to account B invokes a program called transfer. This program asks
the teller for the amount of money to be transferred, the account from which the money is to be
transferred, and the account to which the money is to be transferred.
2) Application programmers :are computer professionals who write application programs.
Application programmers can choose from many tools to develop user interfaces. Rapid
application development (RAD) tools are tools that enable an application programmer to
construct forms and reports without writing a program.
3) Sophisticated users: interact with the system without writing programs. Instead, they form
their requests in a database query language. They submiteach such query to a query processor,
whose function is to break down DML statements into instructions that the storage manager
understands. Analysts who submit queries to explore data in the database fall in this category.
4) Specialized users :are sophisticated users who write specialized databaseapplications that do
not fit into the traditional data-processing framework. Among these applications are computer-
aided design systems, knowledge base and expert systems, systems that store data with complex
data types (for example, graphics data and audio data), and environment-modeling systems.

Database Administrator & Responsibilities of DBA:


One of the main reasons for using DBMSs is to have central control of both the data and
the programs that access those data. A person who has such central control over the system is
called a database administrator (DBA). The functions of a DBA include:
* Schema definition- The DBA creates the original database schema by executing a set of data
definition statements in the DDL.
* Storage structure and access-method definition.
* Schema and physical-organization modification. The DBA carries out changes to the schema
and physical organization to reflect the changing needs of the organization, or to alter the
physical organization to improve performance.
* Granting of authorization for data access. By granting different types of authorization, the
database administrator can regulate which parts of the database various users can access. The
authorization information is kept in a special system structure that the database system consults
whenever someone attempts to access the data in the system.
* Routine maintenance. Examples of the database administrator’s routine maintenance activities
are:
- Periodically backing up the database, either onto tapes or onto remote servers, to prevent loss of
data in case of disasters such as flooding.
-Ensuring that enough free disk space is available for normal operations, and upgrading disk
space as required.
- Monitoring jobs running on the database and ensuring that performance is not degraded by very
expensive tasks submitted by some users.

Unit 1 11
Overall architecture of DBMS:

Fig: System Structure


A database system is partitioned into modules that deal with each of the responsibilities
of the overall system. The functional components of a database system can be broadly divided
into :
i)storage manager
ii)query processor components.

i) Storage Manager :
A storage manager is a program module that provides the interface between the low level
data stored in the database and the application programs and queries submitted to the system.
The storage manager is responsible for the interaction with the file manager. The raw data are
stored on the disk using the file system, which is usually provided by a conventional operating
system. The storage manager translates the various DML statements into low-level file-system
commands. Thus, the storage manager is responsible for storing, retrieving, and updating data in
the database.
The storage manager components include • Authorization and integrity manager-which tests for
the satisfaction of integrity constraints and checks the authority of users to access data.
• Transaction manager- which ensures that the database remains in a consistent (correct) state
despite system failures, and that concurrent transaction executions proceed without conflicting.
• File manager- which manages the allocation of space on disk storage and the data structures
used to represent information stored on disk.
• Buffer manager- which is responsible for fetching data from disk storage into main memory,
and deciding what data to cache in main memory. The buffer manager is a critical part of the
database system, since it enables the database

Unit 1 12
to handle data sizes that are much larger than the size of main memory.
The storage manager implements several data structures as part of the physical
system implementation:
• Data files- which store the database itself.
• Data dictionary- which stores metadata about the structure of the database, in
Particular the schema of the database.
• Indices- which provide fast access to data items that hold particular values.

ii)The Query Processor :


The query processor components include,
• DDL interpreter - which interprets DDL statements and records the definitions
in the data dictionary.
• DML compiler - which translates DML statements in a query language into an
evaluation plan consisting of low-level instructions that the query evaluation
engine understands.
A query can usually be translated into any of a number of alternative evaluation
plans that all give the same result. The DML compiler also performs
query optimization, that is, it picks the lowest cost evaluation plan from among
the alternatives.
• Query evaluation engine - which executes low-level instructions generated by
the DML compiler.

Unit 1 13
LECTURE-3: 3 level Architecture of DBMS
Database Basics:

Data Item:
The data item is also called as field in data processing and is the smallest unit of data that has
meaning to its users.
Eg: “e101”, ”sumit”

Entities and attributes:


An entity is a thing or object in the real world that is distinguishable from all other objects
Eg: Bank, employee, student

Attributes are properties are properties of an entity.


Eg: Empcode, ename, rolno, name

Logical data and physical data :


Logical data are the data for the table created by user in primary memory.
Physical data refers to the data stored in the secondary memory.

Schema and sub-schema :


A schema is a logical data base description and is drawn as a chart of the types of data that are used.
It gives the names of the entities and attributes and specify the relationships between them.

A database schema includes such information as :

 Characteristics of data items such as entities and attributes .


 Logical structures and relationships among these data items .
 Format for storage representation.
 Integrity parameters such as physical authorization and back up policies.

A subschema is derived schema derived from existing schema as per the user requirement. There
may be more then one subschema create for a single conceptual schema.

Three Level Architecture of DBMS :

External level View View View


user1 User2 User n

Mapping supplied by DBMS


Conceptual
level
Conceptual view

Mapping supplied by DBMS/OS


Internal level
Internal level
A database management system that provides three level of data is said to follow three-level
architecture .
 External level
 Conceptual level
 Internal level

External Level :
The external level is at the highest level of database abstraction . At this level, there will be many
views define for different users requirement. A view will describe only a subset of the database. Any
number of user views may exist for a given global schema(coneptual schema).

For example, each student has different view of the time table. the view of a student of BTech
(CSE) is different from the view of the student of Btech (ECE). Thus this level of abstraction is
concerned with different categories of users.
Each external view is described by means of a schema called sub schema.

Conceptual Level :
At this level of database abstraction all the database entities and the relationships among them are
included. One conceptual view represents the entire database. This conceptual view is defined by
the conceptual schema.

The conceptual schema hides the details of physical storage structures and concentrate on
describing entities, data types, relationships, user operations and constraints.

It describes all the records and relationships included in the conceptual view. There is only one
conceptual schema per database. It includes feature that specify the checks to relation data
consistency and integrity.

Internal level :
It is the lowest level of abstraction closest to the physical storage method used. It indicates how the
data will be stored and describes the data structures and access methods to be used by the database.
The internal view is expressed by internal schema.

The following aspects are considered at this level:


1. Storage allocation e.g: B-tree, hashing
2. Access paths eg. specification of primary and secondary keys, indexes etc
3. Miscellaneous eg. Data compression and encryption techniques, optimization of the internal
structures.

Database Users :

Naive Users :
Users who need not be aware of the presence of the database system or any other system supporting
their usage are considered naïve users . A user of an automatic teller machine falls on this category.
Online Users :
These are users who may communicate with the database directly via an online terminal or
indirectly via a user interface and application program. These users are aware of the database
system and also know the data manipulation language system.

Application Programmers :
Professional programmers who are responsible for developing application programs or user
interfaces utilized by the naïve and online user falls into this category.

Database Administration :
A person who has central control over the system is called database administrator .
The function of DBA are :
1. Creation and modification of conceptual Schema definition
2. Implementation of storage structure and access method.
3. Schema and physical organization modifications .
4. Granting of authorization for data access.
5. Integrity constraints specification.
6. Execute immediate recovery procedure in case of failures
7. Ensure physical security to database

Database language :

1) Data definition language (DDL) :


DDL is used to define database objects .The conceptual schema is specified by a set of
definitions expressed by this language. It also gives some details about how to implement
this schema in the physical devices used to store the data. This definition includes all the
entity sets and their associated attributes and their relationships. The result of DDL
statements will be a set of tables that are stored in special file called data dictionary.

2) Data Manipulation Language (DML) :


A DML is a language that enables users to access or manipulate data stored in the database.
Data manipulation involves retrieval of data from the database, insertion of new data into the
database and deletion of data or modification of existing data.

There are basically two types of DML:


 Procedural: Which requires a user to specify what data is needed and how to get it.
 Non-Procedural: which requires a user to specify what data is needed with out
specifying how to get it.

3) Data Control Language (DCL):


This language enables user to grant authorization and canceling authorization of database
objects.
LECTURE-5: ER-MODEL

Data Model:
The data model describes the structure of a database. It is a collection of conceptual tools for
describing data, data relationships and consistency constraints and various types of data models
such as
1. Object based logical model
2. Record based logical model
3. Physical model

Types of data model:


1. Object based logical model
a. ER-model
b. Functional model
c. Object oriented model
d. Semantic model
2. Record based logical model
a. Hierarchical database model
b. Network model
c. Relational model
3. Physical model

Entity Relationship Model (ER Model)


The entity-relationship data model perceives the real world as consisting of basic objects, called
entities and relationships among these objects. It was developed to facilitate database design by
allowing specification of an enterprise schema which represents the overall logical structure of a
data base.

Main Features of ER-MODEL:


 Entity relationship model is a high level conceptual model
 It allows us to describe the data involved in a real world enterprise in terms of objects and
their relationships.
 It is widely used to develop an initial design of a database
 It provides a set of useful concepts that make it convenient for a developer to move from a
basic set of information to a detailed and description of information that can be easily
implemented in a database system
 It describes data as a collection of entities, relationships and attributes.

Basic Concepts:
The E-R data model employs three basic notions : entity sets, relationship sets and attributes.

Entity Sets:
An entity is a “thing” or “object” in the real world that is distinguishable from all other objects. For
example, each person in an enterprise is an entity. An entity has a set properties and the values for
some set of properties may uniquely identify an entity. BOOK is entity and its properties (called as
attributes) bookcode, booktitle, price etc.

An entity set is a set of entities of the same type that share the same properties, or attributes. The set
of all persons who are customers at a given bank.

Attributes:
An entity is represented by a set of attributes. Attributes are descriptive properties possessed by
each member of an entity set.

Customer is an entity and its attributes are customerid, custmername, custaddress etc.

An attribute as used in the E-R model, can be characterized by the following attribute types.

a) Simple and Composite Attribute:


Simple attributes are the attributes which can’t be divided into sub parts, e.g. customerid, empno
Composite attributes are the attributes which can be divided into subparts, e.g. name consisting of
first name, middle name, last name and address consisting of city, pincode, state.

b) Single-Valued and Multi-Valued Attribute:


The attribute having unique value is single –valued attribute, e.g. empno, customerid, regdno etc.
The attribute having more than one value is multi-valued attribute, eg: phone-no, dependent name,
vehicle.

c) Derived Attribute:
The values for this type of attribute can be derived from the values of existing attributes, e.g. age
which can be derived from currentdate – birthdate and experience_in_year can be calculated as
currentdate-joindate.

d) NULL Valued Attribute:


The attribute value which is not known to user is called NULL valued attribute.

Relationship Sets:
A relationship is an association among several entities. A relationship set is a set of relationships of
the same type. Formally, it is a mathematical relation on n>=2 entity sets. If E 1, E2…En are entity
sets, then a relation ship set R is a subset of
{(e1,e2,…en) | e1Є E1, e2 Є E2.., en Є En}
where (e1,e2,…en) is a relation ship.

customer borrow loan

Consider the two entity sets customer and loan. We define the relationship set borrow to denote the
association between customers and the bank loans that the customers have.

Mapping Cardinalities:
Mapping cardinalities or cardinality ratios, express the number of entities to which another entity
can be associated via a relationship set. Mapping cardinalities are most useful in describing binary
relationship sets, although they can contribute to the description of relationship sets that involve
more than two entity sets. For a binary relationship set R between entity sets A and B, the mapping
cardinalities must be one of the following:

1. One to One:
An entity in A is associated with at most one entity in B, and an entity in B is associated with at
most one entity in A.
Eg: relationship between college and principal
1 1
college has principal

2. One to Many:
An entity in A is associated with any number of entities in B. An entity in B is associated with at the
most one entity in A.
Eg: Relationship between department and faculty
1 M
Department Works Faculty
in

3. Many to One:
An entity in A is associated with at most one entity in B. An entity in B is associated with any
number in A.

M 1
emp Department
Works

4. Many to Many:
Entities in A and B are associated with any number of entities from each other.

M N
customer account
deposits

More about Entities and Relationship:

Recursive Relationships:
When the same entity type participates more than once in a relationship type in different roles, the
relationship types are called recursive relationships.

Participation Constraints:
The participation constraints specify whether the existence of any entity depends on its being
related to another entity via the relationship. There are two types of participation constraints
a) Total : When all the entities from an entity set participate in a relationship type, is called total
participation. For example, the participation of the entity set student on the relationship set must
‘opts’ is said to be total because every student enrolled must opt for a course.

b) Partial: When it is not necessary for all the entities from an entity set to particapte in a
relationship type, it is called partial participation. For example, the participation of the entity set
student in ‘represents’ is partial, since not every student in a class is a class representative.

Weak Entity:
Entity types that do not contain any key attribute, and hence can not be identified independently are
called weak entity types. A weak entity can be identified by uniquely only by considering some of
its attributes in conjunction with the primary key attribute of another entity, which is called the
identifying owner entity.

Generally a partial key is attached to a weak entity type that is used for unique identification of
weak entities related to a particular owner type. The following restrictions must hold:
 The owner entity set and the weak entity set must participate in one to may relationship set.
This relationship set is called the identifying relationship set of the weak entity set.
 The weak entity set must have total participation in the identifying relationship.

Example:
Consider the entity type Dependent related to Employee entity, which is used to keep track of the
dependents of each employee. The attributes of Dependents are: name, birthdate, sex and
relationship. Each employee entity set is said to its own the dependent entities that are related to it.
However, not that the ‘Dependent’ entity does not exist of its own, it is dependent on the Employee
entity.

Keys:

Super Key:
A super key is a set of one or more attributes that taken collectively, allow us to identify uniquely an
entity in the entity set. For example , customer-id, (cname, customer-id), (cname, telno)

Candidate Key:
In a relation R, a candidate key for R is a subset of the set of attributes of R, which have the
following properties:
1. Uniqueness: No two distinct tuples in R have the same values for the candidate key
2. Irreducible: No proper subset of the candidate key has the uniqueness property that is
the candidate key. Eg: (cname,telno)

Primary Key:
The primary key is the candidate key that is chosen by the database designer as the principal means
of identifying entities within an entity set. The remaining candidate keys if any, are called Alternate
Key.
LECTURE-6: ER-DIAGRAM:

The overall logical structure of a database using ER-model graphically with the help of an ER-
diagram.

Symbols use ER- diagram:

composite attribute
entity

Weak entity

attribute Relationship

Multi valued attribute


Identifying
Derived attribute Relationship
Key attribute

1 m
1 1

One-to -one One-to -many


m 1
m n

many-to -one many-to -many

Total participation Partial participation


A Univeristy registrar's office maintains data about the following entities:
(a) Course, includeing number,title,credits,syllabus and prereqisites
(b) course offering,including course number,year,semester,section number,instructor timings,
and class room
(c) Students including student-id,name and program
(d) Instructors, including identification number,name,department and title
further, the enrollment of students in courses and grades awarded to students in each course they are
enrolled for must be appropriate modeled.
Construct an E-R diagram for the registrar's office. Document all assumptions that you may make
about the mapping constratints
Cosidet a university database for the scheduling of class rooms for final exams. This database could
be modeled as the single entity set exam, with attributes course-name,section-number,room-number
and time, Alternatively, one or more additional entity sets would be defined, along with relationship
sets to replae some of the attributes of the exam entity set, as
 course with attributes name,department and c-number
 section with attributes s-number and enrollment and dependent as a weak entity set on
course
 room with attributes r-number,capacity and building
LECTURE-7: Advanced ER-Diagram:

Abstraction is the simplification mechanism used to hide superfluous details of a set of objects. It
allows one to concentrate on the properties that are of interest to the application. There are two main
abstraction mechanism used to model information:

Generalization and specialization:


Generalization is the abstracting process of viewing set of objects as a single general class by
concentrating on the general characteristics of the constituent sets while suppressing or ignoring
their differences. It is the union of a number of lower-level entity types for the purpose of
producing a higher-level entity type. For instance, student is a generalization of graduate or
undergraduate, full-time or part-time students. Similarly, employee is generalization of the classes
of objects cook, waiter, and cashier. Generalization is an IS_A relationship; therefore, manager
IS_AN employee, cook IS_AN employee, waiter IS_AN employee, and so forth.

Specialization is the abstracting process of introducing new characteristics to an existing class of


objects to create one or more new classes of objects. This involves taking a higher-level, and using
additional characteristics, generating lower-level entities. The lower-level entities also inherits the,
characteristics of the higher-level entity. In applying the characteristics size to car we can create a
full-size, mid-size, compact or subcompact car. Specialization may be seen as the reverse process of
generalization addition specific properties are introduced at a lower level in a hierarchy of objects.

empno name
dob

employee
Generalization Specialization

Is Is
degree _a _a degree

Full time Part-time


employee employee

Is Is Is Is
_a _a _a _a

faculty staff teaching casual

degree Intrest Intrest Classificatio hourra


n te

EMPLOYEE(empno,name,dob) Faculty(empno,degree,intrest)
FULL_TIME_EMPLOYEE(empno,salary) Staff(empno,hour-rate)
PART_TIME_EMPLOYEE(empno,type) Teaching (empno,stipend)
Aggregation:
Aggregation is the process of compiling information on an object, there by abstracting a higher
level object. The entity person is derived by aggregating the characteristics of name, address, ssn.
Another form of the aggregation is abstracting a relationship objects and viewing the relationship as
an object.

Job

Branch
Employe
Works
e on

Manag
es

Manager
ER- Diagram For College Database
LECTURE-8: Conversion of ER-Diagram to Relational Database

Conversion of Entity Sets:


1. For each strong entity type E in the ER diagram, we create a relation R containing all the
single attributes of E. The primary key of the relation R will be one of the key attribute of R.

STUDENT(rollno (primary key),name, address)


FACULTY(id(primary key),name ,address, salary)
COURSE(course-id,(primary key),course_name,duration)
DEPARTMENT(dno(primary key),dname)

2. For each weak entity type W in the ER diagram, we create another relation R that contains
all simple attributes of W. If E is an owner entity of W then key attribute of E is also include
In R. This key attribute of R is set as a foreign key attribute of R. Now the combination of
primary key attribute of owner entity type and partial key of the weak entity type will form
the key of the weak entity type

GUARDIAN((rollno,name) (primary key),address,relationship)

Conversion of Relationship Sets:


Binary Relationships:
 One-to-One Relationship:
For each 1:1 relationship type R in the ER-diagram involving two entities E1 and E2 we
choose one of entities(say E1) preferably with total participation and add primary key
attribute of another E as a foreign key attribute in the table of entity(E1). We will also include
all the simple attributes of relationship type R in E1 if any, For example, the department
relationship has been extended tp include head-id and attribute of the relationship.
DEPARTMENT(D_NO,D_NAME,HEAD_ID,DATE_FROM)

 One-to-Many Relationship:
For each 1:N relationship type R involving two entities E1 and E2, we identify the entity type
(say E1) at the N-side of the relationship type R and include primary key of the entity on the
other side of the relation (say E2) as a foreign key attribute in the table of E1. We include all
simple attribute (or simple components of a composite attribute of R (if any) in the table E1)

For example:
The works in relationship between the DEPARTMENT and FACULTY. For this relationship
choose the entity at N side, i.e, FACULTY and add primary key attribute of another entity
DEPARTMENT i.e., DNO as a foreign key attribute in FACULTY.

FACULTY(CONTAINS WORKS_IN RELATIOSHIP)


(ID, NAME, ADDRESS, BASIC_SAL, DNO)

 Many-to-Many Relationship:
For each M:N relationship type R, we create a new table (say S) to represent R, we also
include the primary key attributes of both the participating entity types as a foreign key
attribute in S. Any simple attributes of the M:N relationship type (or simple components as a
composite attribute) is also included as attributes of S.
For example:
The M:N relationship taught-by between entities COURSE and FACULTY should be
represented as a new table. The structure of the table will include primary key of COURSE
and primary key of FACULTY entities.

TAUGHT-BY (ID (primary key of FACULTY table), course-id (primary key of COURSE
table)

 N-ary Relationship:
For each N-ary relationship type R where n>2, we create a new table S to represent R, We
include as foreign key attributes in S the primary keys of the relations that represent the
participating entity types. We also include any simple attributes of the N-ary relationship type
(or simple components of complete attribute) as attributes of S. The primary key of S is
usually a combination of all the foreign keys that reference the relations representing the
participating entity types.

Customer Loan

Loan -
sanctio
n

Employee

LOAN-SANCTION (cusomer-id, loanno, empno, sancdate, loan_amount)

 Multi-Valued Attributes:
For each multivalued attribute ‘A’, we create a new relation R that includes an attribute
corresponding to plus the primary key attributes k of the relation that represents the entity
type or relationship that has as an attribute. The primary key of R is then combination of A
and k.
For example, if a STUDENT entity has rollno, name and phone number where phone number
is a multivalued attribute then we will create table PHONE (rollno, phoneno) where primary
key is the combination. In the STUDENT table we need not have phone number, instead if
can be simply (rollno, name) only.
PHONE(rollno, phoneno)
name
Account_n
o
Account branch

generalisation
specialisation
Is-a

intrest charges

Saving Current

 Converting Generalisation /Specification Hierarchy to Tables:


A simple rule for conversion may be to decompose all the specialized entities into table in
case they are disjoint, for example, for the figure we can create the three tables as:
Account (account_no, name, branch, balance)
Saving_Account (account-no, intrest)
Current_Account (account-no, charges)
UNIT II
LECTURE-10: RELATIONAL MODEL

RELATIONAL MODEL
Relational model is simple model in which database is represented as a collection of “relations”
where each relation is represented by two-dimensional table.

The relational model was founded by E. F. Codd of the IBM in 1972. The basic concept in the
relational model is that of a relation.

Properties:
o It is column homogeneous. In other words, in any given column of a table, all items are of
the same kind.
o Each item is a simple number or a character string. That is a table must be in first normal
form.
o All rows of a table are distinct.
o The ordering of rows with in a table is immaterial.
o The column of a table are assigned distinct names and the ordering of these columns is
immaterial.

Domain, attributes tuples and relational:

Tuple:
Each row in a table represents a record and is called a tuple .A table containing ‘n’ attributes in a
record is called is called n-tuple.

Attributes:
The name of each column in a table is used to interpret its meaning and is called an attribute.Each
table is called a relation. In the above table, account_number, branch name, balance are the
attributes.

Domain:
A domain is a set of values that can be given to an attributes. So every attribute in a table has a
specific domain. Values to these attributes can not be assigned outside their domains.

Relation:
A relation consist of
o Relational schema
o Relation instance

Relational Schema:
A relational schema specifies the relation’s name, its attributes and the domain of each attribute. If
R is the name of a relation and A1, A2,…An is a list of attributes representing R then
R(A1,A2,…,An) is called a Relational Schema. Each attribute in this relational schema takes a
value from some specific domain called domain(Ai).
Example:
PERSON (PERSON_ID:INTEGER, NAME:STRING, AGE:INTEGER, ADDRESS:STRING)

Total number of attributes in a relation denotes the degree of a relation since the PERSON relation
scheme contains four attributes, so this relation is of degree 4.

Relation Instance:
A relational instance denoted as r is a collection of tuples for a given relational schema at a specific
point of time.
A relation state r to the relations schema R(A1, A2…, An) also denoted by r(R) is a set of n-tuples
R{t1,t2,…tm}
Where each n-tuple is an ordered list of n values
T=<v1,v2,….vn>
Where each vi belongs to domain (Ai) or contains null values.
The relation schema is also called ‘intension’ and the relation state is also called ‘extension’.
Eg: Relation schema for Student
STUDENT(rollno:string, name:string, city:string, age:integer)

Relation instance:
Student:
Rollno Name City Age
101 Sujit Bam 23
102 kunal bbsr 22

Keys:

Super key:
A super key is an attribute or a set of attributes used to identify the records uniquely in a relation.
For example, customer-id, (cname, customer-id), (cname,telno)

Candidate key:
Super keys of a relation can contain extra attributes. Candidate keys are minimal super keys. i.e,
such a key contains no extraneous attribute. An attribute is called extraneous if even after removing
it from the key, makes the remaining attributes still has the properties of a key(atribute represents
entire table).

In a relation R, a candidate key for R is a subset of the set of attributes of R, which have the
following properties:
 Uniqueness: no two distinct tuples in R have the same values for
the candidate key
 Irreducible: No proper subset of the candidate key has the
uniqueness property that is the candidate key.
 A candidate key’s values must exist. It can’t be null.
 The values of a candidate key must be stable. Its value can not change outside the
control of the system.
Eg: (cname,telno)
Primary key:
The primary key is the candidate key that is chosen by the database designer as the principal
means of identifying entities with in an entity set. The remaining candidate keys if any are
called alternate key.
LECTURE-11: CONSTRAINTS

RELATIONAL CONSTRAINTS:
There are three types of constraints on relational database that include
o DOMAIN CONSTRAINTS
o KEY CONSTRAINTS
o INTEGRITY CONSTRAINTS

DOMAIN CONSTRAINTS:
It specifies that each attribute in a relation an atomic value from the corresponding domains. The
data types associated with commercial RDBMS domains include:
o Standard numeric data types for integer
o Real numbers
o Characters
o Fixed length strings and variable length strings
Thus, domain constraints specifies the condition that we to put on each instance of the relation.
So the values that appear in each column must be drawn from the domain associated with that
column.
Rollno Name City Age
101 Sujit Bam 23
102 kunal bbsr 22

Key Constraints:
This constraints states that the key attribute value in each tuple msut be unique .i.e, no two tuples
contain the same value for the key attribute.(null values can allowed)
Emp(empcode,name,address) . here empcode can be unique

Integrity CONSTRAINTS:
There are two types of integrity constraints:
o Entity Integrity Constraints
o Referential Integrity constraints

Entity Integrity Constraints:


It states that no primary key value can be null and unique. This is because the primary key is used to
identify individual tuple in the relation. So we will not be able to identify the records uniquely
containing null values for the primary key attributes. This constraint is specified on one individual
relation.

Referential Integrity Constraints:


It states that the tuple in one relation that refers to another relation must refer to an existing tuple in
that relation. This constraints is specified on two relations. If a column is declared as foreign key
that must be primary key of another table.

Department (deptcode, dname)


Here the deptcode is the primary key.

Emp (empcode, name, city, deptcode).


Here the deptcode is foreign key.
Module-2:
LECTURE-19

Relational Algebra:
Basic operations:
1. Selection (σ) Selects a subset of rows from relation.
2. Projection (π) Selects a subset of columns from relation.
3. Cross-product (×) Allows us to combine two relations.
4. Set-difference ( ) Tuples in relation. 1, but not in relationn. 2.
5. Union (U) Tuples in reln. 1 and in reln. 2.
6. Rename( ρ) Use new name for the Tables or fields.
Additional operations:
7. Intersection (∩), Join( ), Division(÷): Not essential, but (very!) useful.
Since each operation returns a relation, operations can be composed! (Algebra is
“closed”.)
Projection
 Deletes attributes that are not in projection list.
 Schema of result contains exactly the fields in the projection list, with the same names that
they had in the (only) input relation. ( Unary Operation)
 Projection operator has to eliminate duplicates! (as it returns a relation which is a set)
o Note: real systems typically don’t do duplicate elimination unless the user explicitly
asks for it. (Duplicate values may be representing different real world entity or
relationship).
Example: Consider the BOOK table:
Acc-No Title Author
100 “DBMS” “Silbershatz”
200 “DBMS” “Ramanuj”
300 “COMPILER” “Silbershatz”
400 “COMPILER” “Ullman”
500 “OS” “Sudarshan”
600 “DBMS” “Silbershatz”

Title

πTitle(BOOK) = “DBMS”
“COMPILER”
“OS”

Selection
 Selects rows that satisfy selection condition.
 No duplicates in result
 Schema of result identical to schema of (only) input relation.
 Result relation can be the input for another relational algebra operation! (Operator
composition.)
Example: For the example given above:
σAcc-no>300(BOOK) =
Acc- Title Author
No
400 “COMPILER “Ullman”

500 “OS” “Sudarshan”
600 “DBMS” “Silbershatz”
σTitle=”DBMS”(BOOK)=

Acc- Title Author


No
100 “DBMS” “Silbershatz”
200 “DBMS” “Ramanuj”
600 “DBMS” “Silbershatz”

πAcc-no (σTitle=”DBMS” (BOOK))= Acc-


No
100
200
600

Union, Intersection, Set-Difference

 All of these operations take two input relations, which must be union-compatible:
o Same number of fields.
o Corresponding’ fields have the same type.
 What is the schema of result?
Consider:
Borrower Depositor
Cust- Loan-no Cust-name Acc-no
name Suleman A-100
Ram L-13 Radheshyam A-300
Shyam L-30 Ram A-401
Suleman L-42

List of customers who are either borrower or depositor at bank= πCust-name (Borrower) U
πCust-name (Depositor)=

Cust-name
Ram
Shyam Customers who are both borrowers and depositors = πCust-name
Suleman (Borrower) ∩ πCust-name (Depositor)=
Radeshyam
Cust-
name
Ram
Suleman

Customers who are borrowers but not depositors = πCust-name (Borrower) πCust-name
(Depositor)=

Cust-name

Shyam
Cartesian-Product or Cross-Product (S1 × R1)
 Each row of S1 is paired with each row of R1.
 Result schema has one field per field of S1 and R1, with field names `inherited’ if possible.
 Consider the borrower and loan tables as follows:

Borrower: Loan:
Cust-name Loan-no Loan-no Amount
Ram L-13 L-13 1000
Shyam L-30 L-30 20000
Suleman L-42 L-42 40000

Cross product of Borrower and Loan, Borrower × Loan =

Borrower.Cust- Borrower.Loan- Loan.Loan- Loan.Amount


name no no
Ram L-13 L-13 1000
Ram L-13 L-30 20000
Ram L-13 L-42 40000
Shyam L-30 L-13 1000
Shyam L-30 L-30 20000
Shyam L-30 L-42 40000
Suleman L-42 L-13 1000
Suleman L-42 L-30 20000
Suleman L-42 L-42 40000

The rename operation can be used to rename the fields to avoid confusion when two field names are
same in two participating tables:

For example the statement, ρLoan-borrower(Cust-name,Loan-No-1, Loan-No-2,Amount)( Borrower × Loan) results


into- A new Table named Loan-borrower is created where it has four fields which are renamed as
Cust-name, Loan-No-1, Loan-No-2 and Amount and the rows contains the same data as the cross
product of Borrower and Loan.

Loan-borrower:
Cust- Loan-No-1 Loan- Amount
name No-2
Ram L-13 L-13 1000
Ram L-13 L-30 20000
Ram L-13 L-42 40000
Shyam L-30 L-13 1000
Shyam L-30 L-30 20000
Shyam L-30 L-42 40000
Suleman L-42 L-13 1000
Suleman L-42 L-30 20000
Suleman L-42 L-42 40000
Rename Operation:
It can be used in two ways :
 return the result of expression E in the table named x.
 return the result of expression E in the table named x with the attributes
renamed to A1, A2,…, An.
 It’s benefit can be understood by the solution of the query “ Find the largest account balance
in the bank”
It can be solved by following steps:
 Find out the relation of those balances which are not largest.
 Consider Cartesion product of Account with itself i.e. Account × Account
 Compare the balances of first Account table with balances of second Account table in the
product.
 For that we should rename one of the account table by some other name to avoid the
confusion
It can be done by following operation
ΠAccount.balance (σAccount.balance < d.balance(Account× ρd(Account))
 So the above relation contains the balances which are not largest.
 Subtract this relation from the relation containing all the balances i.e . Πbalance (Account).
So the final statement for solving above query is
Πbalance (Account)- ΠAccount.balance (σAccount.balance < d.balance(Account× ρd(Account))
LECTURE-20

Additional Operations
Natural Join ( )
 Forms Cartesian product of its two arguments, performs selection forcing equality on
those attributes that appear in both relations
 For example consider Borrower and Loan relations, the natural join between them
will automatically perform the selection on the table returned by
Borrower × Loan which force equality on the attribute that appear in both Borrower
and Loan i.e. Loan-no and also will have only one of the column named Loan-No.
 That means = σBorrower.Loan-no = Loan.Loan-no (Borrower × Loan).
 The table returned from this will be as follows:

Eliminate rows that does not satisfy the selection criteria “σBorrower.Loan-no = Loan.Loan-no” from Borrower
× Loan =
Borrower.Cust- Borrower.Loan- Loan.Loan- Loan.Amount
name no no
Ram L-13 L-13 1000
Ram L-13 L-30 20000
Ram L-13 L-42 40000
Shyam L-30 L-13 1000
Shyam L-30 L-30 20000
Shyam L-30 L-42 40000
Suleman L-42 L-13 1000
Suleman L-42 L-30 20000
Suleman L-42 L-42 40000

And will remove one of the column named Loan-no.

 i.e. =
Cust-name Loan-no Amount
Ram L-13 1000
Shyam L-30 20000
Suleman L-42 40000

Division Operation:
 denoted by ÷ is used for queries that include the phrase “for all”.
 For example “Find customers who has an account in all branches in branch city
Agra”. This query can be solved by following statement.
ΠCustomer-name. branch-name ( ) ÷ Πbranch-name (σBranch-city=”Agra”(Branch)
 The division operations can be specified by using only basic operations as follows:
Let r(R) and s(S) be given relations for schema R and S with
r ÷ s = ΠR-S(r) - ΠR-S ((ΠR-S (r) × s) - ΠR-S,S (r))
LECTURE-21

Tuple Relational Calculus

Relational algebra is an example of procedural language while tuple relational calculus is a


nonprocedural query language.
A query is specified as:
{t | P(t)}, i.e it is the set of all tuples t such that predicate P is true for t.

The formula P(t) is formed using atoms which uses the relations, tuples of relations and fields of
tuples and following symbols

These atoms can then be used to form formulas with following symbols

For example : here are some queries and a way to express them using tuple calculus:
o Find the branch-name, loan-number and amount for loans over Rs 1200.
.

o Find the loan number for each loan of an amount greater that Rs1200.

o Find the names of all the customers who have a loan from the Sadar branch.

o Find all customers who have a loan , an account, or both at the bank

o Find only those customers who have both an account and a loan.

o Find all customers who have an account but do not have loan.

o Find all customers who have an account at all branches located in Agra
Domain Relational Calculus
1. Domain relational calculus is another non procedural language for expressing database
queries.
2. A query is specified as:
{<x1,x2,…,xn> | P(x1,x2,…,xn)} where x1,x2,…,xn represents domain variables. P represent a
predicate formula as in tuple calculus
 Since the domain variables are referred in place of tuples the formula doesn’t refer the fields
of tuples rather they refer the domain variables.
 For example the queries in domain calculus are mentioned as follows:
o Find the branch-name, loan-number and amount for loans over Rs 1200.
.
o Find the loan number for each loan of an amount greater that Rs1200.

o Find the names of all the customers who have a loan from the Sadar branch and find
the loan amount

o Find names of all customers who have a loan , an account, or both at the Sadar
Branch

o Find only those customers who have both an account and a loan.

o Find all customers who have an account but do not have loan.

o Find all customers who have an account at all branches located in Agra

Outer Join.
Outer join operation is an extension of join operation to deal with missing information
 Suppose that we have following relational schemas:
Employee( employee-name, street, city)
Fulltime-works(employee-name, branch-name, salary)
A snapshot of these relations is as follows:
Employee: employee- street city
name
Ram M G Road Agra
Shyam New Mandi Mathura
Road
Suleman Bhagat Singh Aligarh
Road
Fulltime-works

employee- branch- salary


name name
Ram Sadar 30000
Shyam Sanjay Place 20000
Rehman Dayalbagh 40000
Suppose we want complete information of the full time employees.
 The natural join ( )will result into the loss of information for
Suleman and Rehman because they don’t have record in both the tables ( left and right
relation). The outer join will solve the problem.
 Three forms of outer join:
o Left outer join( :the tuples which doesn’t match while doing natural join from
left relation are also added in the result putting null values in missing field of right
relation.
o Right outer join( :the tuples which doesn’t match while natural join from right
relation are also added in the result putting null values in missing field of left
relation.
o Full outer join( ): include both of the left and right outer joins i.e. adds the
tuples which did not match either in left relation or right relation and put null in
place of missing values.
 The result for three forms of outer join are as follows:
Left join: =
employee- street City branch- salary
name name
Ram M G Road Agra Sadar 30000
Shyam New Mandi Mathura Sanjay 20000
Road Place
Suleman Bhagat Singh Aligarh Null Null
Road

Right join: =

employee- street city branch- salary


name name
Ram M G Road Agra Sadar 30000
Shyam New Mandi Mathura Sanjay 20000
Road Place
Rehman null null Dayalbagh 40000

Full join: =

employee- street city branch- salary


name name
Ram M G Road Agra Sadar 30000
Shyam New Mandi Mathura Sanjay 20000
Road Place
Suleman Bhagat Singh Aligarh null null
Road
Rehman null null Dayalbagh 40000
UNIT III

LECTURE-22

Structured Query Language (SQL)


Introduction
Commercial database systems use more user friendly language to specify the queries.
SQL is the most influential commercially marketed product language.
Other commercially used languages are QBE, Quel, and Datalog.
Basic Structure
 The basic structure of an SQL consists of three clauses: select, from and where.
 select: it corresponds to the projection operation of relational algebra. Used to list the
attributes desired in the result.
 from: corresponds to the Cartesian product operation of relational algebra. Used to list the
relations to be scanned in the evaluation of the expression
 where: corresponds to the selection predicate of the relational algebra. It consists of a
predicate involving attributes of the relations that appear in the from clause.
 A typical SQL query has the form:
select A1, A2,…, An
from r1, r2,…, rm
where P
o Ai represents an attribute
o rj represents a relation
o P is a predicate
o It is equivalent to following relational algebra expression:
o
[Note: The words marked in dark in this text work as keywords in SQL language. For example
“select”, “from” and “where” in the above paragraph are shown in bold font to indicate that they
are keywords]
Select Clause
Let us see some simple queries and use of select clause to express them in SQL.
 Find the names of all branches in the Loan relation
select branch-name
from Loan
 By default the select clause includes duplicate values. If we want to force the elimination of
duplicates the distinct keyword is used as follows:
select distinct branch-name
from Loan
 The all key word can be used to specify explicitly that duplicates are not removed. Even if
we not use all it means the same so we don’t require all to use in select clause.
select all branch-name
from Loan
 The asterisk “*” can be used to denote “all attributes”. The following SQL statement will
select and all the attributes of Loan.
select *
from Loan
 The arithmetic expressions involving operators, +, -, *, and / are also allowed in select
clause. The following statement will return the amount multiplied by 100 for the rows in
Loan table.
select branch-name, loan-number, amount * 10 from Loan.
Where Clause
 Find all loan numbers for loans made at “Sadar” branch with loan amounts greater than Rs
1200.
select loan-number
from Loan
where branch-name= “Sadar” and amount > 1200
 where clause uses uses logival connectives and, or, and not
 operands of the logical connectives can be expressions involving the comparison operators
<, <=, >, >=, =, and < >.
 between can be used to simplify the comparisons
select loan-number
from Loan
where amount between 90000 and 100000
From Clause
 The from clause by itself defines a Cartesian product of the relations in the clause.
 When an attribute is present in more than one relation they can be referred as relation-
name.attribute-name to avoid the ambiguity.
 For all customers who have loan from the bank, find their names and loan numbers
select distinct customer-name, Borrower.loan-number
from Borrower, Loan
where Borrower.loan-number = Loan.loan-number
The Rename Operation
 Used for renaming both relations both relations and attributes in SQL
 Use as clause: old-name as new-name
 Find the names and loan numbers of the customers who have a loan at the “Sadar” branch.
select distinct customer-name, borrower.loan-number as loan-id
from Borrower, Loan
where Borrower.loan-number = Loan.loan-number and
branch-name = “Sadar”
we can now refer the loan-number instead by the name loan-id.
 For all customers who have a loan from the bank, find their names and loan-numbers.
select distinct customer-name, T.loan-number
from Borrower as T, Loan as S
where T.loan-number = S.loan-number
 Find the names of all branches that have assets greater than at least one branch located in
“Mathura”.
select distinct T.branch-name
from branch as T, branch as S
where T.assets > S.assets and S.branch-city = “Mathura”
String Operation
 Two special characters are used for pattern matching in strings:
o Percent ( % ) : The % character matches any substring
o Underscore( _ ): The _ character matches any character
 “%Mandi”: will match with the strings ending with “Mandi” viz. “Raja Ki mandi”, “Peepal
Mandi”
 “_ _ _” matches any string of three characters.
 Find the names of all customers whose street address includes the substring “Main”
select customer-name
from Customer
where customer-street like “%Main%”
Set Operations
 union, intersect and except operations are set operations available in SQL.
 Relations participating in any of the set operation must be compatible; i.e. they must have
the same set of attributes.
 Union Operation:
o Find all customers having a loan, an account, or both at the bank
(select customer-name from Depositor )
union
(select customer-name from Borrower )
It will automatically eliminate duplicates.
o If we want to retain duplicates union all can be used
(select customer-name from Depositor )
union all
(select customer-name from Borrower )
 Intersect Operation
o Find all customers who have both an account and a loan at the bank
(select customer-name from Depositor )
intersect
(select customer-name from Borrower )
o If we want to retail all the duplicates
(select customer-name from Depositor )
intersect all
(select customer-name from Borrower )
 Except Opeartion
o Find all customers who have an account but no loan at the bank
(select customer-name from Depositor )
except
(select customer-name from Borrower )
o If we want to retain the duplicates:
(select customer-name from Depositor )
except all
(select customer-name from Borrower )
Aggregate Functions
 Aggregate functions are those functions which take a collection of values as input and return
a single value.
 SQL offers 5 built in aggregate functions-
o Average: avg
o Minimum:min
o Maximum:max
o Total: sum
o Count:count
 The input to sum and avg must be a collection of numbers but others may have collections
of non-numeric data types as input as well
 Find the average account balance at the Sadar branch
select avg(balance)
from Account
where branch-name= “Sadar”
The result will be a table which contains single cell (one row and one column) having
numerical value corresponding to average balance of all account at sadar branch.
 group by clause is used to form groups, tuples with the same value on all attributes in the
group by clause are placed in one group.
 Find the average account balance at each branch
select branch-name, avg(balance)
from Account
group by branch-name
 By default the aggregate functions include the duplicates.
 distinct keyword is used to eliminate duplicates in an aggregate functions:
 Find the number of depositors for each branch
select branch-name, count(distinct customer-name)
from Depositor, Account
where Depositor.account-number = Account.account-number
group by branch-name
 having clause is used to state condition that applies to groups rather than tuples.
 Find the average account balance at each branch where average account balance is more
than Rs. 1200
select branch-name, avg(balance)
from Account
group by branch-name
having avg(balance) > 1200
 Count the number of tuples in Customer table
select count(*)
from Customer
 SQL doesn’t allow distinct with count(*)
 When where and having are both present in a statement where is applied before having.
SQL AND Operator Syntax
LECTURE-23

Nested Sub queries


A subquery is a select-from-where expression that is nested within another query.
Set Membership
The in and not in connectives are used for this type of subquery.
“Find all customers who have both a loan and an account at the bank”, this query can be written
using nested subquery form as follows
select distinct customer-name
from Borrower
where customer-name in(select customer-name
from Depositor )
 Select the names of customers who have a loan at the bank, and whose names are neither
“Smith” nor “Jones”
select distinct customer-name
from Borrower
where customer-name not in(“Smith”, “Jones”)
Set Comparison
Find the names of all branches that have assets greater than those of at least one branch located in
Mathura
select branch-name
from Branch
where asstets > some (select assets
from Branch
where branch-city = “Mathura” )
1. Apart from > some others comparison could be < some , <= some , >= some , = some , <
> some.
2. Find the names of all branches that have assets greater than that of each branch located in
Mathura
select branch-name
from Branch
where asstets > all (select assets
from Branch
where branch-city = “Mathura” )
 Apart from > all others comparison could be < all , <= all , >= all , = all , < >all.

Views
In SQL create view command is used to define a view as follows:
create view v as <query expression>
where <query expression> is any legal query expression and v is the view name.
 The view consisting of branch names and the names of customers who have either an
account or a loan at the branch. This can be defined as follows:

create view All-customer as


(select branch-name, customer-name
from Depositor, Account
where Depositor.account-number=account.account-number)
union
(select branch-name, customer-name
from Borrower, Loan
where Borrower.loan-number = Loan.loan-number)

 The attributes names may be specified explicitly within a set of round bracket after the name
of view.
 The view names may be used as relations in subsequent queries. Using the view
Allcustomer
Find all customers of Sadar branch
select customer-name
from All-customer
where branch-name= “Sadar”
 A create-view clause creates a view definition in the database which stays until a command
- drop view view-name - is executed.
Modification of Database
Deletion
 In SQL we can delete only whole tuple and not the values on any particular
attributes. The command is as follows:

delete from r where P.


where P is a predicate and r is a relation.
 delete command operates on only one relation at a time. Examples are as follows:
 Delete all tuples from the Loan relation
delete from Loan
o Delete all of the Smith’s account record
delete from Depositor
where customer-name = “Smith”
o Delete all loans with loan amounts between Rs 1300 and Rs 1500.
delete from Loan
where amount between 1300 and 1500
o Delete the records of all accounts with balances below the average at the bank
delete from Account
where balance < ( select avg(balance)
from Account)

Insertion
In SQL we either specify a tuple to be inserted or write a query whose result is a
set of tuples to be inserted. Examples are as follows:
Insert an account of account number A-9732 at the Sadar branch having balance
of Rs 1200
insert into Account
values(“Sadar”, “A-9732”, 1200)
the values are specified in the order in which the corresponding attributes are
listed in the relation schema.
SQL allows the attributes to be specified as part of the insert statement
insert into Account(account-number, branch-name, balance)
values(“A-9732”, “Sadar”, 1200)
insert into Account(branch-name, account-number, balance)
values(“Sadar”, “A-9732”, 1200)

Provide for all loan customers of the Sadar branch a new Rs 200 saving account
for each loan account they have. Where loan-number serve as the account number
for these accounts.
insert into Account
select branch-name, loan-number, 200
from Loan
where branch-name = “Sadar”

Updates
Used to change a value in a tuple without changing all values in the tuple.
Suppose that annual interest payments are being made, and all balances are to be
increased by 5 percent.
update Account
set balance = balance * 1.05
Suppose that accounts with balances over Rs10000 receive 6 percent interest,
whereas all others receive 5 percent.
update Account
set balance = balance * 1.06
where balance > 10000
update Account
set balance = balance * 1.05
where balance <= 10000
Data Definition Language
Data Types in SQL
char(n): fixed length character string, length n.
varchar(n): variable length character string, maximum length n.
int: an integer.
smallint: a small integer.
numeric(p,d): fixed point number, p digits( plus a sign), and d of the p digits are
to right of the decimal point.
real, double precision: floating point and double precision numbers.
float(n): a floating point number, precision at least n digits.
date: calendar date; four digits for year, two for month and two for day of month.
time: time of day n hours minutes and seconds.
Domains can be defined as
create domain person-name char(20).
the domain name person-name can be used to define the type of an attribute just like
built-in domain.
Schema Definition in SQL
create table command is used to define relations.
create table r (A1D1, A2D2,… , AnDn,
<integrity constraint1>,
…,
<integrity constraintk>)

where r is relation name, each Ai is the name of attribute, Di is the domain type of
values of Ai. Several types of integrity constraints are available to define in SQL.

Integrity Constraints which are allowed in SQL are

primary key(Aj1, Aj2,… , Ajm)


and
check(P) where P is the predicate.

drop table command is used to remove relations from database.


alter table command is used to add attributes to an existing relation
alter table r add A D
it will add attribute A of domain type D in relation r.
alter table r drop A
it will remove the attribute A of relation r.
LECTURE-24

Integrity Constraints
 Integrity Constraints guard against accidental damage to the database.
 Integrity constraints are predicates pertaining to the database.
 Domain Constraints:
 Predicates defined on the domains are Domain constraints.
 Simplest Domain constraints are defined by defining standard data types of the attributes
like Integer, Double, Float, etc.
 We can define domains by create domain clause also we can define the constraints on such
domains as follows:
create domain hourly-wage numeric(5,2)
constraint wage-value-test check(value >= 4.00)
 So we can use hourly-wage as data type for any attribute where DBMS will automatically
allow only values greater than or equal to 4.00.
 Other examples for defining Domain constraints are as follows:
create domain account-number char(10)
constraint account-number-null-test check(value not null)
create domain account-type char(10)
constraint account-type-test
check (value in ( “Checking”, “Saving”))
By using the later domain of two above the DBMS will allow only values for any attribute having
type as account-type i.e. Checking and Saving.
 Referential Integrity:
 Foreign Key: If two table R and S are related to each other, K1 and K2 are primary keys of
the two relations also K1 is one of the attribute in S. Suppose we want that every row in S
must have a corresponding row in R, then we define the K1 in S as foreign key. Example in
our original database of library we had a table for relation BORROWEDBY, containing two
fields Card No. and Acc. No. . Every row of BORROWEDBY relation must have
corresponding row in USER Table having same Card No. and a row in BOOK table having
same Acc. No.. Then we will define the Card No. and Acc. No. in BORROWEDBY relation
as foreign keys.
 In other way we can say that every row of BORROWEDBY relation must refer to some row
in BOOK and also in USER tables.
 Such referential requirement in one table to another table is called Referential Integrity.
UNIT IV
LECTURE-26

RELATIONAL DATABASE DEGIN

Database design is a process in which you create a logical data model for a database, which store
data of a company. It is performed after initial database study phase in the database life cycle. You
use normalization technique to create the logical data model for a database and eliminate data
redundancy.

Normalization also allows you to organize data efficiently in a database and reduce anomalies
during data operation. Various normal forms, such as first, second and third can be applied to create
a logical data model for a database. The second and third normal forms are based on partial
dependency and transitivity dependency. Partial dependency occurs when a row of table is uniquely
identified by one column that is a part of a primary key. A transitivity dependency occurs when a
non key column is uniquely identified by values in another non-key column of a table.

Database Design Process:


We can identify six main phases of the database design process:
1. Requirement collection and analysis
2. Conceptual database design
3. Choice of a DBMS
4. Data model mapping(logical database design)
5. Physical database design
6. Database system implementation and tuning

1. Requirement Collection and Analysis


Before we can effectively design a data base we must know and analyze the expectation of
the users and the intended uses of the database in as much as detail.

2. Conceptual Database Design


The goal for this phase is to produce a conceptual schema for the database that is
independent of a specific DBMS.
 We often use a high level data model such ER-model during this phase
 We specify as many of known database application on transactions as possible using a
notation that is independent of any specific dbms.
 Often the dbms choice is already made for the organization the intent of conceptual
design still to keep , it as free as possible from implementation consideration.

3. Choice of a DBMS
The choice of dbms is governed by a no. of factors some technical other economic and still
other concerned with the politics of the organization.
The economics and organizational factors that offer the choice of the dbms are:
Software cost, maintenance cost, hardware cost, database creation and conversion cost,
personnel cost, training cost, operating cost.

4. Data model mapping (logical database design)


During this phase, we map the conceptual schema from the high level data model used on
phase 2 into a data model of the choice dbms.

5. Physical databse design


During this phase we design the specification for the database in terms of physical storage
structure ,record placement and indexes.
6. Database system implementation and tuning
During this phase, the database and application programs are implemented, tested and
eventually deployed for service.

Informal Guidelines for Relation Design

Want to keep the semantics of the relation attributes clear. The information in a tuple should
represent exactly one fact or an entity. The hidden or buried entities are what we want to discover
and eliminate.

 Design a relation schema so that it is easy to explain its meaning.


 Do not combine attributes from multiple entity types and relationship types into a single
relation. Use a view if you want to present a simpler layout to the end user.
 A relation schema should correspond to on entity type or relationship type.
 Minimize redundant information in tuples, thus reducing update anomalies
 If anomalies are present, try to decompose the relation into two or more to represent the
separate facts, or document the anomalies well for management in the applications
programs.

Minimize the use of null values. Nulls have multiple interpretations:

 The attribute does not apply to this tuple


 The attribute value is unknown
 The attribute value is absent
 The attribute value might represent an actual value

If nulls are likely (non-applicable) then consider decomposition of the relation into two or more
relations that hold only the non-null valued tuples.

 Do not permit the creation of spurious tuples

Too much decomposition of relations into smaller ones may also lose information or generate
erroneous information

 Be sure that relations can be logically joined using natural join and the result doesn't
generate relationships that don't exist

Functional Dependencies

FD's are constraints on well-formed relations and represent a formalism on the infrastructure of
relation.

Definition: A functional dependency (FD) on a relation schema R is a constraint X → Y, where X


and Y are subsets of attributes of R.
Definition: an FD is a relationship between an attribute "Y" and a determinant (1 or more other
attributes) "X" such that for a given value of a determinant the value of the attribute is uniquely
defined.

 X is a determinant
 X determines Y
 Y is functionally dependent on X
 X→Y
 X →Y is trivial if Y ⊆ X

Definition: An FD X → Y is satisfied in an instance r of R if for every pair of tuples, t and s: if t


and s agree on all attributes in X then they must agree on all attributes in Y

A key constraint is a special kind of functional dependency: all attributes of relation occur on the
right-hand side of the FD:

 SSN → SSN, Name, Address

Example Functional Dependencies

Let R be
NewStudent(stuId, lastName, major, credits, status, socSecNo)

FDs in R include

 {stuId}→{lastName}, but not the reverse


 {stuId} →{lastName, major, credits, status, socSecNo, stuId}
 {socSecNo} →{stuId, lastName, major, credits, status, socSecNo}
 {credits}→{status}, but not {status}→{credits}

ZipCode→AddressCity

 16652 is Huntingdon’s ZIP

ArtistName→BirthYear

 Picasso was born in 1881

Autobrand→Manufacturer, Engine type

 Pontiac is built by General Motors with gasoline engine

Author, Title→PublDate

 Shakespeare’s Hamlet was published in 1600

Trivial Functional Dependency

The FD X→Y is trivial if set {Y} is a subset of set {X}

Examples: If A and B are attributes of R,


 {A}→{A}
 {A,B} →{A}
 {A,B} →{B}
 {A,B} →{A,B}

are all trivial FDs and will not contribute to the evaluation of normalization.

FD Axioms

Understanding: Functional Dependencies are recognized by analysis of the real world; no


automation or algorithm. Finding or recognizing them are the database designer's task.

FD manipulations:

 Soundness -- no incorrect FD's are generated


 Completeness -- all FD's can be generated

Axiom Name Axiom Example


if a is set of attributes, b ⊆ a,
Reflexivity SSN,Name → SSN
then a →b
if a→ b holds and c is a set of SSN → Name then
Augmentation
attributes, then ca→cb SSN,Phone → Name, Phone
if a →b holds and b→c holds,
Transitivity SSN →Zip and Zip → City then SSN →City
then a→ c holds
Union or Additivity if a → b and a → c holds then SSN→Name and SSN→Zip then SSN→Name,Zip
* a→ bc holds
Decomposition or if a → bc holds then a → b
SSN→Name,Zip then SSN→Name and SSN→Zip
Projectivity* and a → c holds
if a → b and cb → d hold then Address → Project and Project,Date →Amount then
Pseudotransitivity*
ac → d holds Address,Date → Amount
ab→ c does NOT imply a → c
(NOTE)
and b → c

*Armstrong's Axioms (basic axioms)


LECTURE-27

CLOSURE OF A SET OF FUNCTIONAL DEPEDENCIES

Given a relational schema R, a functional dependencies f on R is logically implied by a set of


functional dependencies F on R if every relation instance r(R) that satisfies F also satisfies f.

The closure of F, denoted by F+, is the set of all functional dependencies logically implied by F.
The closure of F can be found by using a collection of rules called Armstrong axioms.
Reflexivity rule: If A is a set of attributes and B is subset or equal to A, then A→B holds.
Augmentation rule: If A→B holds and C is a set of attributes, then CA→CB holds
Transitivity rule: If A→B holds and B→C holds, then A→C holds.
Union rule: If A→B holds and A→C then A→BC holds
Decomposition rule: If A→BC holds, then A→B holds and A→C holds.
Pseudo transitivity rule: If A→B holds and BC→D holds, then AC→D holds.

Suppose we are given a relation schema R=(A,B,C,G,H,I) and the set of function dependencies
{A→B,A→C,CG→H,CG→I,B→H}
We list several members of F+ here:
1. A→H, since A→B and B→H hold, we apply the transitivity rule.
2. CG→HI. Since CG→H and CG→I , the union rule implies that CG→HI
3. AG→I, since A→C and CG→I, the pseudo transitivity rule implies that AG→I holds

Algorithm of compute F+ :
To compute the closure of a set of functional dependencies F:
F+ = F
repeat
for each functional dependency f in F+
apply reflexivity and augmentation rules on f
add the resulting functional dependencies to F+
for each pair of functional dependencies f1and f2 in F+
if f1 and f2 can be combined using transitivity
then add the resulting functional dependency to F+
until F+ does not change any further

large.
LECTURE-28

LOSS LESS DECOMPOSITION

A decomposition of a relation scheme R<S,F> into the relation schemes Ri(1<=i<=n) is said to be a
lossless join decomposition or simply lossless if for every relation R that satisfies the FDs in F, the
natural join of the projections or R gives the original relation R, i.e,
R= R1( R) R2( R) …….. Rn( R)
If R is subset of R1( R ) R2( R ) …….. Rn( R)
Then the decomposition is called lossy.

DEPEDENCY PRSERVATION:

Given a relation scheme R<S,F> where F is the associated set of functional dependencies on the
attributes in S,R is decomposed into the relation schemes R1,R2,…Rn with the fds F1,F2…Fn, then
this decomposition of R is dependency preserving if the closure of F’ (where F’=F1 U F2 U … Fn)
Example:
Let R(A,B,C) AND F={A→B}. Then the decomposition of R into R1(A,B) and R2(A,C) is lossless
because the FD { A→B} is contained in R1 and the common attribute A is a key of R1.

Example:
Let R(A,B,C) AND F={A→B}. Then the decomposition of R into R1(A,B) and R2(B,C) is not
lossless because the common attribute B does not functionally determine either A or C. i.e, it is not
a key of R1 or R 2.

Example:
Let R(A,B,C,D) and F={A→B, A→C, C→D,}. Then the decomposition of R into R1(A,B,C) with
the FD F1={ A→B , A→C }and R2(C,D) with FD F2={ C→D} . In this decomposition all the
original FDs can be logically derived from F1 and F2, hence the decomposition is dependency
preserving also . the common attribute C forms a key of R2. The decomposition is lossless.

Example:
Let R(A,B,C,D) and F={A→B, A→C, A→D,}. Then the decomposition of R into R1(A,B,D) with
the FD F1={ A→B , A→D }and R2(B,C) with FD F2={ } is lossy because the common attribute B
is not a candidate key of either R1 and R2 .
In addition , the fds A→C is not implied by any fds R1 or R2. Thus the decomposition is not
dependency preserving.

Full functional dependency:


Given a relational scheme R and an FD X→Y ,Y is fully functional dependent on X if there is no Z,
where Z is a proper subset of X such that Z→Y. The dependency X→Y is left reduced, there being
no extraneous attributes attributes in the left hand side of the dependency.

Partial dependency:
Given a relation dependencies F defined on the attributes of R and K as a candidate key ,if X is a
proper subset of K and if F|= X→A, then A is said to be partial dependent on K

Prime attribute and non prime attribute:


A attribute A in a relation scheme R is a prime attribute or simply prime if A is part of any
candidate key of the relation. If A is not a part of any candidate key of R, A is called a nonprime
attribute or simply non prime .

Trivial functional dependency:


A FD X→Y is said to be a trivial functional dependency if Y is subset of X.

LECTURE-29

Normalization

While designing a database out of an entity–relationship model, the main problem existing in that
“raw” database is redundancy. Redundancy is storing the same data item in more one place. A
redundancy creates several problems like the following:

1. Extra storage space: storing the same data in many places takes large amount of disk space.
2. Entering same data more than once during data insertion.
3. Deleting data from more than one place during deletion.
4. Modifying data in more than one place.
5. Anomalies may occur in the database if insertion, deletion, modification etc are no done
properly. It creates inconsistency and unreliability in the database.

To solve this problem, the “raw” database needs to be normalized. This is a step by step process of
removing different kinds of redundancy and anomaly at each step. At each step a specific rule is
followed to remove specific kind of impurity in order to give the database a slim and clean look.

Un-Normalized Form (UNF)

If a table contains non-atomic values at each row, it is said to be in UNF. An atomic value is
something that can not be further decomposed. A non-atomic value, as the name suggests, can be
further decomposed and simplified. Consider the following table:

Emp-Id Emp-Name Month Sales Bank-Id Bank-Name


E01 AA Jan 1000 B01 SBI
Feb 1200
Mar 850
E02 BB Jan 2200 B02 UTI
Feb 2500
E03 CC Jan 1700 B01 SBI
Feb 1800
Mar 1850
Apr 1725

In the sample table above, there are multiple occurrences of rows under each key Emp-Id. Although
considered to be the primary key, Emp-Id cannot give us the unique identification facility for any
single row. Further, each primary key points to a variable length record (3 for E01, 2 for E02 and 4
for E03).

First Normal Form (1NF)


A relation is said to be in 1NF if it contains no non-atomic values and each row can provide a
unique combination of values. The above table in UNF can be processed to create the following
table in 1NF.

Emp-Name Month Sales Bank-Id Bank-Name


Emp-Id
E01 AA Jan 1000 B01 SBI
E01 AA Feb 1200 B01 SBI
E01 AA Mar 850 B01 SBI
E02 BB Jan 2200 B02 UTI
E02 BB Feb 2500 B02 UTI
E03 CC Jan 1700 B01 SBI
E03 CC Feb 1800 B01 SBI
E03 CC Mar 1850 B01 SBI
E03 CC Apr 1725 B01 SBI

As you can see now, each row contains unique combination of values. Unlike in UNF, this relation
contains only atomic values, i.e. the rows can not be further decomposed, so the relation is now in
1NF.

Second Normal Form (2NF)

A relation is said to be in 2NF f if it is already in 1NF and each and every attribute fully depends on
the primary key of the relation. Speaking inversely, if a table has some attributes which is not
dependant on the primary key of that table, then it is not in 2NF.

Let us explain. Emp-Id is the primary key of the above relation. Emp-Name, Month, Sales and
Bank-Name all depend upon Emp-Id. But the attribute Bank-Name depends on Bank-Id, which is
not the primary key of the table. So the table is in 1NF, but not in 2NF. If this position can be
removed into another related relation, it would come to 2NF.

Emp-Id Emp-Name Month Sales Bank-Id


E01 AA JAN 1000 B01
E01 AA FEB 1200 B01
E01 AA MAR 850 B01
E02 BB JAN 2200 B02
E02 BB FEB 2500 B02
E03 CC JAN 1700 B01
E03 CC FEB 1800 B01
E03 CC MAR 1850 B01
E03 CC APR 1726 B01

Bank-Id Bank-Name
B01 SBI
B02 UTI
After removing the portion into another relation we store lesser amount of data in two relations
without any loss information. There is also a significant reduction in redundancy.

Third Normal Form (3NF)

A relation is said to be in 3NF, if it is already in 2NF and there exists no transitive dependency in
that relation. Speaking inversely, if a table contains transitive dependency, then it is not in 3NF, and
the table must be split to bring it into 3NF.

What is a transitive dependency? Within a relation if we see


A → B [B depends on A]
And
B → C [C depends on B]
Then we may derive
A → C[C depends on A]

Such derived dependencies hold well in most of the situations. For example if we have
Roll → Marks
And
Marks → Grade
Then we may safely derive
Roll → Grade.

This third dependency was not originally specified but we have derived it.

The derived dependency is called a transitive dependency when such dependency becomes
improbable. For example we have been given
Roll → City
And
City → STDCode

If we try to derive Roll → STDCode it becomes a transitive dependency, because obviously the
STDCode of a city cannot depend on the roll number issued by a school or college. In such a case
the relation should be broken into two, each containing one of these two dependencies:
Roll → City
And
City → STD code
LECTURE-30

Boyce-Code Normal Form (BCNF)

A relationship is said to be in BCNF if it is already in 3NF and the left hand side of every
dependency is a candidate key. A relation which is in 3NF is almost always in BCNF. These could
be same situation when a 3NF relation may not be in BCNF the following conditions are found true.

1. The candidate keys are composite.


2. There are more than one candidate keys in the relation.
3. There are some common attributes in the relation.

Professor Code Department Head of Dept. Percent Time


P1 Physics Ghosh 50
P1 Mathematics Krishnan 50
P2 Chemistry Rao 25
P2 Physics Ghosh 75
P3 Mathematics Krishnan 100

Consider, as an example, the above relation. It is assumed that:

1. A professor can work in more than one department


2. The percentage of the time he spends in each department is given.
3. Each department has only one Head of Department.

The relation diagram for the above relation is given as the following:

The given relation is in 3NF. Observe, however, that the names of Dept. and Head of Dept. are
duplicated. Further, if Professor P2 resigns, rows 3 and 4 are deleted. We lose the information that
Rao is the Head of Department of Chemistry.

The normalization of the relation is done by creating a new relation for Dept. and Head of Dept. and
deleting Head of Dept. form the given relation. The normalized relations are shown in the
following.
Professor Code Department Percent Time
P1 Physics 50
P1 Mathematics 50
P2 Chemistry 25
P2 Physics 75
P3 Mathematics 100

Department Head of Dept.


Physics Ghosh
Mathematics Krishnan
Chemistry Rao

See the dependency diagrams for these new relations.

Fourth Normal Form (4NF)

When attributes in a relation have multi-valued dependency, further Normalization to 4NF and 5NF
are required. Let us first find out what multi-valued dependency is.

A multi-valued dependency is a typical kind of dependency in which each and every attribute
within a relation depends upon the other, yet none of them is a unique primary key.

We will illustrate this with an example. Consider a vendor supplying many items to many projects
in an organization. The following are the assumptions:

1. A vendor is capable of supplying many items.


2. A project uses many items.
3. A vendor supplies to many projects.
4. An item may be supplied by many vendors.

A multi valued dependency exists here because all the attributes depend upon the other and yet none
of them is a primary key having unique value.

Vendor Code Item Code Project No.


V1 I1 P1
V1 I2 P1
V1 I1 P3
V1 I2 P3
V2 I2 P1
V2 I3 P1
V3 I1 P2
V3 I1 P3

The given relation has a number of problems. For example:

1. If vendor V1 has to supply to project P2, but the item is not yet decided, then a row with a
blank for item code has to be introduced.
2. The information about item I1 is stored twice for vendor V3.

Observe that the relation given is in 3NF and also in BCNF. It still has the problem mentioned
above. The problem is reduced by expressing this relation as two relations in the Fourth Normal
Form (4NF). A relation is in 4NF if it has no more than one independent multi valued dependency
or one independent multi valued dependency with a functional dependency.

The table can be expressed as the two 4NF relations given as following. The fact that vendors are
capable of supplying certain items and that they are assigned to supply for some projects in
independently specified in the 4NF relation.

Vendor-Supply
Vendor Code Item Code
V1 I1
V1 I2
V2 I2
V2 I3
V3 I1
Vendor-Project
Vendor Code Project No.
V1 P1
V1 P3
V2 P1
V3 P2

Fifth Normal Form (5NF)

These relations still have a problem. While defining the 4NF we mentioned that all the attributes
depend upon each other. While creating the two tables in the 4NF, although we have preserved the
dependencies between Vendor Code and Item code in the first table and Vendor Code and Item code
in the second table, we have lost the relationship between Item Code and Project No. If there were a
primary key then this loss of dependency would not have occurred. In order to revive this
relationship we must add a new table like the following. Please note that during the entire process
of normalization, this is the only step where a new table is created by joining two attributes, rather
than splitting them into separate tables.

Project No. Item Code


P1 11
P1 12
P2 11
P3 11
P3 13
Let us finally summarize the normalization steps we have discussed so far.

Input Transformation Output


Relation Relation
All Eliminate variable length record. Remove multi-attribute lines in table. 1NF
Relations
1NF Remove dependency of non-key attributes on part of a multi-attribute 2NF
Relation key.
2NF Remove dependency of non-key attributes on other non-key attributes. 3NF
3NF Remove dependency of an attribute of a multi attribute key on an BCNF
attribute of another (overlapping) multi-attribute key.
BCNF Remove more than one independent multi-valued dependency from 4NF
relation by splitting relation.
4NF Add one relation relating attributes with multi-valued dependency.
LECTURE-31
QUERY PROCESSING

Query processing includes translation of high-level queries into low-level expressions that can be used at the
physical level of the file system, query optimization and actual execution of the query to get the result. It is a
three-step process that consists of parsing and translation, optimization and execution of the query submitted
by the user.

A query is processed in four general steps:


1. Scanning and Parsing
2. Query Optimization or planning the execution strategy
3. Query Code Generator (interpreted or compiled)
4. Execution in the runtime database processor

1. Scanning and Parsing

When a query is first submitted (via an applications program), it must be scanned and parsed to
determine if the query consists of appropriate syntax.
Scanning is the process of converting the query text into a tokenized representation.
The tokenized representation is more compact and is suitable for processing by the parser.
This representation may be in a tree form.
The Parser checks the tokenized representation for correct syntax.
In this stage, checks are made to determine if columns and tables identified in the query exist in the
database and if the query has been formed correctly with the appropriate keywords and structure.
If the query passes the parsing checks, then it is passed on to the Query Optimizer.

2. Query Optimization or Planning the Execution Strategy

For any given query, there may be a number of different ways to execute it.
Each operation in the query (SELECT, JOIN, etc.) can be implemented using one or more different
Access Routines.
For example, an access routine that employs an index to retrieve some rows would be more efficient
that an access routine that performs a full table scan.
The goal of the query optimizer is to find a reasonably efficient strategy for executing the query (not
quite what the name implies) using the access routines.
Optimization typically takes one of two forms: Heuristic Optimization or Cost Based Optimization
In Heuristic Optimization, the query execution is refined based on heuristic rules for reordering the
individual operations.
With Cost Based Optimization, the overall cost of executing the query is systematically reduced by
estimating the costs of executing several different execution plans.

3. Query Code Generator (interpreted or compiled)

Once the query optimizer has determined the execution plan (the specific ordering of access routines),
the code generator writes out the actual access routines to be executed.
With an interactive session, the query code is interpreted and passed directly to the runtime database
processor for execution.
It is also possible to compile the access routines and store them for later execution.

4. Execution in the runtime database processor

At this point, the query has been scanned, parsed, planned and (possibly) compiled.
The runtime database processor then executes the access routines against the database.
The results are returned to the application that made the query in the first place.
Any runtime errors are also returned.
Lecture-32
Query Optimization
To enable the system to achieve (or improve) acceptable performance by choosing a better (if not the
best) strategy during the process of a query. One of the great strengths to the relational database.

Automatic Optimization vs. Human Programmer

1. A good automatic optimizer will have a wealth of information available to it that human
programmers typically do not have.
2. An automatic optimizer can easily reprocess the original relational request when the
organization of the database is changed. For a human programmer, reorganization would
involve rewriting the program.
3. The optimizer is a program, and therefore is capable of considering literally hundreds of
different implementation strategies for a given request, which is much more than a human
programmer can.
4. The optimizer is available to a wide range of users, in an efficient and cost-effective manner.

The Optimization Process


1. Cast the query into some internal representation, such as a query tree structure.
2. Convert the internal representation to canonical form.

*A subset (say C) of a set of queries (say Q) is said to be a set of canonical forms for Q if and only if
every query Q is equivalent to just one query in C.

During this step, some optimization is already achieved by transforming the internal representation
to a better canonical form.
Possible improvements
a. Doing the restrictions (selects) before the join.
b. Reduce the amount of comparisons by converting a restriction condition to an equivalent
condition in conjunctive normal form- that is, a condition consisting of a set of restrictions
that are ANDed together, where each restriction in turn consists of a set of simple comparisons
connected only by OR's.
c. A sequence of restrictions (selects) before the join.
d. In a sequence of projections, all but the last can be ignored.
e. A restriction of projection is equivalent to a projection of a restriction.
f. Others
3. Choose candidate low-level procedures by evaluate the transformed query.
*Access path selection: Consider the query expression as a series of basic operations (join,
restriction, etc.), then the optimizer choose from a set of pre-defined, low-level
implementation procedures. These procedures may involve the user of primary key, foreign
key or indexes and other information about the database.

4. Generate query plans and choose the cheapest by constructing a set of candidate query plans first,
then choose the best plan. To pick the best plan can be achieved by assigning cost to each
given plan. The costs is computed according to the number of disk I/O's involved.
UNIT V

Transaction Concept: Transaction State, Implementation of Atomicity and Durability,


Concurrent Executions, Serializability, Recoverability, Implementation of Isolation, Testing
for Serializability, Failure Classification, Storage, Recovery and Atomicity, Recovery
algorithm. Indexing Techniques: B+ Trees: Search, Insert, Delete algorithms, File
Organization and Indexing, Cluster Indexes, Primary and Secondary Indexes , Index data
Structures, Hash Based Indexing: Tree base Indexing ,Comparison of File Organizations,
Indexes and Performance Tuning

Transaction

o The transaction is a set of logically related operation. It contains a group of tasks.


o A transaction is an action or series of actions. It is performed by a single user to
perform operations for accessing the contents of the database.

Example: Suppose an employee of bank transfers Rs 800 from X's account to Y's account.
This small transaction contains several low-level tasks:

X's Account

1. Open_Account(X)
2. Old_Balance = X.balance
3. New_Balance = Old_Balance - 800
4. X.balance = New_Balance
5. Close_Account(X)

Y's Account

1. Open_Account(Y)
2. Old_Balance = Y.balance
3. New_Balance = Old_Balance + 800
4. Y.balance = New_Balance
5. Close_Account(Y)

Operations of Transaction:

Following are the main operations of transaction:

Read(X): Read operation is used to read the value of X from the database and stores it in a
buffer in main memory.
Write(X): Write operation is used to write the value back to the database from the buffer.

Let's take an example to debit transaction from an account which consists of following
operations:

1. 1. R(X);
2. 2. X = X - 500;
3. 3. W(X);

Let's assume the value of X before starting of the transaction is 4000.

o The first operation reads X's value from database and stores it in a buffer.
o The second operation will decrease the value of X by 500. So buffer will contain
3500.
o The third operation will write the buffer's value to the database. So X's final value will
be 3500.

But it may be possible that because of the failure of hardware, software or power, etc. that
transaction may fail before finished all the operations in the set.

For example: If in the above transaction, the debit transaction fails after executing operation
2 then X's value will remain 4000 in the database which is not acceptable by the bank.

To solve this problem, we have two important operations:

Commit: It is used to save the work done permanently.

Rollback: It is used to undo the work done.

Transaction property

The transaction has the four properties. These are used to maintain consistency in a database,
before and after the transaction.

Property of Transaction

1. Atomicity
2. Consistency
3. Isolation
4. Durability
Atomicity

o It states that all operations of the transaction take place at once if not, the transaction
is aborted.
o There is no midway, i.e., the transaction cannot occur partially. Each transaction is
treated as one unit and either run to completion or is not executed at all.

Atomicity involves the following two operations:


Abort: If a transaction aborts then all the changes made are not visible.

Commit: If a transaction commits then all the changes made are visible.

Example: Let's assume that following transaction T consisting of T1 and T2. A consists of
Rs 600 and B consists of Rs 300. Transfer Rs 100 from account A to account B.

T1 T2

Read(A) Read(B)
A:= A-100 Y:= Y+100
Write(A) Write(B)

After completion of the transaction, A consists of Rs 500 and B consists of Rs 400.

If the transaction T fails after the completion of transaction T1 but before completion of
transaction T2, then the amount will be deducted from A but not added to B. This shows the
inconsistent database state. In order to ensure correctness of database state, the transaction
must be executed in entirety.

Consistency

o The integrity constraints are maintained so that the database is consistent before and
after the transaction.
o The execution of a transaction will leave a database in either its prior stable state or a
new stable state.
o The consistent property of database states that every transaction sees a consistent
database instance.
o The transaction is used to transform the database from one consistent state to another
consistent state.

For example: The total amount must be maintained before or after the transaction.

1. Total before T occurs = 600+300=900


2. Total after T occurs= 500+400=900

Therefore, the database is consistent. In the case when T1 is completed but T2 fails, then
inconsistency will occur.

Isolation

o It shows that the data which is used at the time of execution of a transaction cannot be
used by the second transaction until the first one is completed.
o In isolation, if the transaction T1 is being executed and using the data item X, then
that data item can't be accessed by any other transaction T2 until the transaction T1
ends.
o The concurrency control subsystem of the DBMS enforced the isolation property.

Durability

o The durability property is used to indicate the performance of the database's


consistent state. It states that the transaction made the permanent changes.
o They cannot be lost by the erroneous operation of a faulty transaction or by the
system failure. When a transaction is completed, then the database reaches a state
known as the consistent state. That consistent state cannot be lost, even in the event of
a system's failure.
o The recovery subsystem of the DBMS has the responsibility of Durability property.

States of Transaction

In a database, the transaction can be in one of the following states -

Active state
o The active state is the first state of every transaction. In this state, the transaction is
being executed.
o For example: Insertion or deletion or updating a record is done here. But all the
records are still not saved to the database.
Partially committed
o In the partially committed state, a transaction executes its final operation, but the data
is still not saved to the database.
o In the total mark calculation example, a final display of the total marks step is
executed in this state.

Committed

A transaction is said to be in a committed state if it executes all its operations successfully. In


this state, all the effects are now permanently saved on the database system.

Failed state
o If any of the checks made by the database recovery system fails, then the transaction
is said to be in the failed state.
o In the example of total mark calculation, if the database is not able to fire a query to
fetch the marks, then the transaction will fail to execute.

Aborted
o If any of the checks fail and the transaction has reached a failed state then the
database recovery system will make sure that the database is in its previous consistent
state. If not then it will abort or roll back the transaction to bring the database into a
consistent state.
o If the transaction fails in the middle of the transaction then before executing the
transaction, all the executed transactions are rolled back to its consistent state.
o After aborting the transaction, the database recovery module will select one of the two
operations:
1. Re-start the transaction
2. Kill the transaction

Schedule

A series of operation from one transaction to another transaction is known as schedule. It is


used to preserve the order of the operation in each of the individual transaction.
1. Serial Schedule

The serial schedule is a type of schedule where one transaction is executed completely before
starting another transaction. In the serial schedule, when the first transaction completes its
cycle, then the next transaction is executed.

For example: Suppose there are two transactions T1 and T2 which have some operations. If
it has no interleaving of operations, then there are the following two possible outcomes:

1. Execute all the operations of T1 which was followed by all the operations of T2.
2. Execute all the operations of T1 which was followed by all the operations of T2.

o In the given (a) figure, Schedule A shows the serial schedule where T1 followed by
T2.
o In the given (b) figure, Schedule B shows the serial schedule where T2 followed by
T1.

2. Non-serial Schedule

o If interleaving of operations is allowed, then there will be non-serial schedule.


o It contains many possible orders in which the system can execute the individual
operations of the transactions.
o In the given figure (c) and (d), Schedule C and Schedule D are the non-serial
schedules. It has interleaving of operations.

3. Serializable schedule
o The serializability of schedules is used to find non-serial schedules that allow the
transaction to execute concurrently without interfering with one another.
o It identifies which schedules are correct when executions of the transaction have
interleaving of their operations.
o A non-serial schedule will be serializable if its result is equal to the result of its
transactions executed serially.
Here,

Schedule A and Schedule B are serial schedule.

Schedule C and Schedule D are Non-serial schedule.


Testing of Serializability

Serialization Graph is used to test the Serializability of a schedule.

Assume a schedule S. For S, we construct a graph known as precedence graph. This graph
has a pair G = (V, E), where V consists a set of vertices, and E consists a set of edges. The set
of vertices is used to contain all the transactions participating in the schedule. The set of
edges is used to contain all edges Ti ->Tj for which one of the three conditions holds:

1. Create a node Ti → Tj if Ti executes write (Q) before Tj executes read (Q).


2. Create a node Ti → Tj if Ti executes read (Q) before Tj executes write (Q).
3. Create a node Ti → Tj if Ti executes write (Q) before Tj executes write (Q).

o If a precedence graph contains a single edge Ti → Tj, then all the instructions of Ti
are executed before the first instruction of Tj is executed.
o If a precedence graph for schedule S contains a cycle, then S is non-serializable. If the
precedence graph has no cycle, then S is known as serializable.

For example:
Explanation:

Read(A): In T1, no subsequent writes to A, so no new edges


Read(B): In T2, no subsequent writes to B, so no new edges
Read(C): In T3, no subsequent writes to C, so no new edges
Write(B): B is subsequently read by T3, so add edge T2 → T3
Write(C): C is subsequently read by T1, so add edge T3 → T1
Write(A): A is subsequently read by T2, so add edge T1 → T2
Write(A): In T2, no subsequent reads to A, so no new edges
Write(C): In T1, no subsequent reads to C, so no new edges
Write(B): In T3, no subsequent reads to B, so no new edges
Precedence graph for schedule S1:

The precedence graph for schedule S1 contains a cycle that's why Schedule S1 is non-
serializable.
Explanation:

Read(A): In T4,no subsequent writes to A, so no new edges


Read(C): In T4, no subsequent writes to C, so no new edges
Write(A): A is subsequently read by T5, so add edge T4 → T5
Read(B): In T5,no subsequent writes to B, so no new edges
Write(C): C is subsequently read by T6, so add edge T4 → T6
Write(B): A is subsequently read by T6, so add edge T5 → T6
Write(C): In T6, no subsequent reads to C, so no new edges
Write(A): In T5, no subsequent reads to A, so no new edges
Write(B): In T6, no subsequent reads to B, so no new edges

Precedence graph for schedule S2:

The precedence graph for schedule S2 contains no cycle that's why ScheduleS2 is
serializable.

Conflict Serializable Schedule

o A schedule is called conflict serializability if after swapping of non-conflicting


operations, it can transform into a serial schedule.
o The schedule will be a conflict serializable if it is conflict equivalent to a serial
schedule.

Conflicting Operations

The two operations become conflicting if all conditions satisfy:

1. Both belong to separate transactions.


2. They have the same data item.
3. They contain at least one write operation.
Example:

Swapping is possible only if S1 and S2 are logically equal.

Here, S1 = S2. That means it is non-conflict.

Here, S1 ≠ S2. That means it is conflict.

Conflict Equivalent
In the conflict equivalent, one can be transformed to another by swapping non-conflicting
operations. In the given example, S2 is conflict equivalent to S1 (S1 can be converted to S2
by swapping non-conflicting operations).

Two schedules are said to be conflict equivalent if and only if:

1. They contain the same set of the transaction.


2. If each pair of conflict operations are ordered in the same way.

Example:

Schedule S2 is a serial schedule because, in this, all operations of T1 are performed before
starting any operation of T2. Schedule S1 can be transformed into a serial schedule by
swapping non-conflicting operations of S1.

After swapping of non-conflict operations, the schedule S1 becomes:

T1 T2

Read(A)
Write(A)
Read(B)
Write(B)
Read(A)
Write(A)
Read(B)
Write(B)

Since, S1 is conflict serializable.

View Serializability

o A schedule will view serializable if it is view equivalent to a serial schedule.


o If a schedule is conflict serializable, then it will be view serializable.
o The view serializable which does not conflict serializable contains blind writes.

View Equivalent

Two schedules S1 and S2 are said to be view equivalent if they satisfy the following
conditions:

1. Initial Read

An initial read of both schedules must be the same. Suppose two schedule S1 and S2. In
schedule S1, if a transaction T1 is reading the data item A, then in S2, transaction T1 should
also read A.

Above two schedules are view equivalent because Initial read operation in S1 is done by T1
and in S2 it is also done by T1.
2. Updated Read

In schedule S1, if Ti is reading A which is updated by Tj then in S2 also, Ti should read A


which is updated by Tj.

Above two schedules are not view equal because, in S1, T3 is reading A updated by T2 and
in S2, T3 is reading A updated by T1.

3. Final Write

A final write must be the same between both the schedules. In schedule S1, if a transaction
T1 updates A at last then in S2, final writes operations should also be done by T1.

Hello Java Program for Beginners

Above two schedules is view equal because Final write operation in S1 is done by T3 and in
S2, the final write operation is also done by T3.

Example:
Schedule S

With 3 transactions, the total number of possible schedule

1. = 3! = 6
2. S1 = <T1 T2 T3>
3. S2 = <T1 T3 T2>
4. S3 = <T2 T3 T1>
5. S4 = <T2 T1 T3>
6. S5 = <T3 T1 T2>
7. S6 = <T3 T2 T1>

Taking first schedule S1:

Schedule S1

Step 1: final updation on data items

In both schedules S and S1, there is no read except the initial read that's why we don't need to
check that condition.

Step 2: Initial Read

The initial read operation in S is done by T1 and in S1, it is also done by T1.

Step 3: Final Write


The final write operation in S is done by T3 and in S1, it is also done by T3. So, S and S1 are
view Equivalent.

The first schedule S1 satisfies all three conditions, so we don't need to check another
schedule.

Hence, view equivalent serial schedule is:

1. T1 → T2 → T3

Recoverability of Schedule

Sometimes a transaction may not execute completely due to a software issue, system crash or
hardware failure. In that case, the failed transaction has to be rollback. But some other
transaction may also have used value produced by the failed transaction. So we also have to
rollback those

tr

ansactions.

The above table 1 shows a schedule which has two transactions. T1 reads and writes the
value of A and that value is read and written by T2. T2 commits but later on, T1 fails. Due to
the failure, we have to rollback T1. T2 should also be rollback because it reads the value
written by T1, but T2 can't be rollback because it already committed. So this type of schedule
is known as irrecoverable schedule.

Irrecoverable schedule: The schedule will be irrecoverable if Tj reads the updated value of
Ti and Tj committed before Ti commit.
The above table 2 shows a schedule with two transactions. Transaction T1 reads and writes
A, and that value is read and written by transaction T2. But later on, T1 fails. Due to this, we
have to rollback T1. T2 should be rollback because T2 has read the value written by T1. As it
has not committed before T1 commits so we can rollback transaction T2 as well. So it is
recoverable with cascade rollback.

Recoverable with cascading rollback: The schedule will be recoverable with cascading
rollback if Tj reads the updated value of Ti. Commit of Tj is delayed till commit of Ti.

The above Table 3 shows a schedule with two transactions. Transaction T1 reads and write A
and commits, and that value is read and written by T2. So this is a cascade less recoverable
schedule.

Failure Classification

To find that where the problem has occurred, we generalize a failure into the following
categories:

1. Transaction failure
2. System crash
3. Disk failure
1. Transaction failure

The transaction failure occurs when it fails to execute or when it reaches a point from
where it can't go any further. If a few transaction or process is hurt, then this is called
as transaction failure.

Reasons for a transaction failure could be -

1. Logical errors: If a transaction cannot complete due to some code error or an


internal error condition, then the logical error occurs.
2. Syntax error: It occurs where the DBMS itself terminates an active
transaction because the database system is not able to execute it. For
example, The system aborts an active transaction, in case of deadlock or
resource unavailability.

2. System Crash
o System failure can occur due to power failure or other hardware or software
failure. Example: Operating system error.

Fail-stop assumption: In the system crash, non-volatile storage is assumed


not to be corrupted.

3. Disk Failure
o It occurs where hard-disk drives or storage drives used to fail frequently. It
was a common problem in the early days of technology evolution.
o Disk failure occurs due to the formation of bad sectors, disk head crash, and
unreachability to the disk or any other failure, which destroy all or part of disk
storage.

Log-Based Recovery

o The log is a sequence of records. Log of each transaction is maintained in some stable
storage so that if any failure occurs, then it can be recovered from there.
o If any operation is performed on the database, then it will be recorded in the log.
o But the process of storing the logs should be done before the actual transaction is
applied in the database.
o

Let's assume there is a transaction to modify the City of a student. The following logs are
written for this transaction.
o When the transaction is initiated, then it writes 'start' log.
1. <Tn, Start>
o When the transaction modifies the City from 'Noida' to 'Bangalore', then another log is
written to the file.

1. <Tn, City, 'Noida', 'Bangalore' >


o When the transaction is finished, then it writes another log to indicate the end of the
transaction.

1. <Tn, Commit>

There are two approaches to modify the database:

1. Deferred database modification:


o The deferred modification technique occurs if the transaction does not modify the
database until it has committed.
o In this method, all the logs are created and stored in the stable storage, and the
database is updated when a transaction commits.

2. Immediate database modification:


o The Immediate modification technique occurs if database modification occurs while
the transaction is still active.
o In this technique, the database is modified immediately after every operation. It
follows an actual database modification.

Recovery using Log records

When the system is crashed, then the system consults the log to find which transactions need
to be undone and which need to be redone.

1. If the log contains the record <Ti, Start> and <Ti, Commit> or <Ti, Commit>, then
the Transaction Ti needs to be redone.
2. If log contains record<Tn, Start> but does not contain the record either <Ti, commit>
or <Ti, abort>, then the Transaction Ti needs to be undone.

Checkpoint

o The checkpoint is a type of mechanism where all the previous logs are removed from
the system and permanently stored in the storage disk.
o The checkpoint is like a bookmark. While the execution of the transaction, such
checkpoints are marked, and the transaction is executed then using the steps of the
transaction, the log files will be created.
o When it reaches to the checkpoint, then the transaction will be updated into the
database, and till that point, the entire log file will be removed from the file. Then the
log file is updated with the new step of transaction till next checkpoint and so on.
o The checkpoint is used to declare a point before which the DBMS was in the
consistent state, and all transactions were committed.

Recovery using Checkpoint

In the following manner, a recovery system recovers the database from this failure:

o The recovery system reads log files from the end to start. It reads log files from T4 to
T1.
o Recovery system maintains two lists, a redo-list, and an undo-list.
o The transaction is put into redo state if the recovery system sees a log with <Tn,
Start> and <Tn, Commit> or just <Tn, Commit>. In the redo-list and their previous
list, all the transactions are removed and then redone before saving their logs.
o For example: In the log file, transaction T2 and T3 will have <Tn, Start> and <Tn,
Commit>. The T1 transaction will have only <Tn, commit> in the log file. That's why
the transaction is committed after the checkpoint is crossed. Hence it puts T1, T2 and
T3 transaction into redo list.
o The transaction is put into undo state if the recovery system sees a log with <Tn,
Start> but no commit or abort log found. In the undo-list, all the transactions are
undone, and their logs are removed.
o For example: Transaction T4 will have <Tn, Start>. So T4 will be put into undo list
since this transaction is not yet complete and failed amid.

INDEXING TECHNIQUES:

B+ Tree

o The B+ tree is a balanced binary search tree. It follows a multi-level index format.
o In the B+ tree, leaf nodes denote actual data pointers. B+ tree ensures that all leaf
nodes remain at the same height.
o In the B+ tree, the leaf nodes are linked using a link list. Therefore, a B+ tree can
support random access as well as sequential access.

Structure of B+ Tree

o In the B+ tree, every leaf node is at equal distance from the root node. The B+ tree is
of the order n where n is fixed for every B+ tree.
o It contains an internal node and leaf node.

Internal node

o An internal node of the B+ tree can contain at least n/2 record pointers except the root
node.
o At most, an internal node of the tree contains n pointers.

Leaf node
o The leaf node of the B+ tree can contain at least n/2 record pointers and n/2 key
values.
o At most, a leaf node contains n record pointer and n key values.
o Every leaf node of the B+ tree contains one block pointer P to point to next leaf node.

Searching a record in B+ Tree

Suppose we have to search 55 in the below B+ tree structure. First, we will fetch for the
intermediary node which will direct to the leaf node that can contain a record for 55.

So, in the intermediary node, we will find a branch between 50 and 75 nodes. Then at the
end, we will be redirected to the third leaf node. Here DBMS will perform a sequential search
to find 55.

B+ Tree Insertion

Suppose we want to insert a record 60 in the below structure. It will go to the 3rd leaf node
after 55. It is a balanced tree, and a leaf node of this tree is already full, so we cannot insert
60 there.

In this case, we have to split the leaf node, so that it can be inserted into tree without affecting
the fill factor, balance and order.

The 3rd leaf node has the values (50, 55, 60, 65, 70) and its current root node is 50. We will
split the leaf node of the tree in the middle so that its balance is not altered. So we can group
(50, 55) and (60, 65, 70) into 2 leaf nodes.
If these two has to be leaf nodes, the intermediate node cannot branch from 50. It should have
60 added to it, and then we can have pointers to a new leaf node.

Triggers in SQL (Hindi)

This is how we can insert an entry when there is overflow. In a normal scenario, it is very
easy to find the node where it fits and then place it in that leaf node.

B+ Tree Deletion

Suppose we want to delete 60 from the above example. In this case, we have to remove 60
from the intermediate node as well as from the 4th leaf node too. If we remove it from the
intermediate node, then the tree will not satisfy the rule of the B+ tree. So we need to modify
it to have a balanced tree.

After deleting node 60 from above B+ tree and re-arranging the nodes, it will show as
follows:

File Organization

o The File is a collection of records. Using the primary key, we can access the records.
The type and frequency of access can be determined by the type of file organization
which was used for a given set of records.
o File organization is a logical relationship among various records. This method defines
how file records are mapped onto disk blocks.
o File organization is used to describe the way in which the records are stored in terms
of blocks, and the blocks are placed on the storage medium.
o The first approach to map the database to the file is to use the several files and store
only one fixed length record in any given file. An alternative approach is to structure
our files so that we can contain multiple lengths for records.
o Files of fixed length records are easier to implement than the files of variable length
records.

Objective of file organization

o It contains an optimal selection of records, i.e., records can be selected as fast as


possible.
o To perform insert, delete or update transaction on the records should be quick and
easy.
o The duplicate records cannot be induced as a result of insert, update or delete.
o For the minimal cost of storage, records should be stored efficiently.

Types of file organization:

File organization contains various methods. These particular methods have pros and cons on
the basis of access or selection. In the file organization, the programmer decides the best-
suited file organization method according to his requirement.

Types of file organization are as follows:

o Sequential file organization


o Heap file organization
o Hash file organization
o B+ file organization
o Indexed sequential access method (ISAM)
o Cluster file organization

Sequential File Organization

This method is the easiest method for file organization. In this method, files are stored
sequentially. This method can be implemented in two ways:

1. Pile File Method:

o It is a quite simple method. In this method, we store the record in a sequence, i.e., one
after another. Here, the record will be inserted in the order in which they are inserted
into tables.
o In case of updating or deleting of any record, the record will be searched in the
memory blocks. When it is found, then it will be marked for deleting, and the new
record is inserted.

Insertion of the new record:

Suppose we have four records R1, R3 and so on upto R9 and R8 in a sequence. Hence,
records are nothing but a row in the table. Suppose we want to insert a new record R2 in the
sequence, then it will be placed at the end of the file. Here, records are nothing but a row in
any table.

2. Sorted File Method:


o In this method, the new record is always inserted at the file's end, and then it will sort
the sequence in ascending or descending order. Sorting of records is based on any
primary key or any other key.
o In the case of modification of any record, it will update the record and then sort the
file, and lastly, the updated record is placed in the right place.

Insertion of the new record:

Suppose there is a preexisting sorted sequence of four records R1, R3 and so on upto R6 and
R7. Suppose a new record R2 has to be inserted in the sequence, then it will be inserted at the
end of the file, and then it will sort the sequence.

Pros of sequential file organization

o It contains a fast and efficient method for the huge amount of data.
o In this method, files can be easily stored in cheaper storage mechanism like magnetic
tapes.
o It is simple in design. It requires no much effort to store the data.
o This method is used when most of the records have to be accessed like grade
calculation of a student, generating the salary slip, etc.
o This method is used for report generation or statistical calculations.

Cons of sequential file organization

o It will waste time as we cannot jump on a particular record that is required but we
have to move sequentially which takes our time.
o Sorted file method takes more time and space for sorting the records.

Heap file organization

o It is the simplest and most basic type of organization. It works with data blocks. In
heap file organization, the records are inserted at the file's end. When the records are
inserted, it doesn't require the sorting and ordering of records.
o When the data block is full, the new record is stored in some other block. This new
data block need not to be the very next data block, but it can select any data block in
the memory to store new records. The heap file is also known as an unordered file.
o In the file, every record has a unique id, and every page in a file is of the same size. It
is the DBMS responsibility to store and manage the new records.

Insertion of a new record


Suppose we have five records R1, R3, R6, R4 and R5 in a heap and suppose we want to insert
a new record R2 in a heap. If the data block 3 is full then it will be inserted in any of the
database selected by the DBMS, let's say data block 1.

If we want to search, update or delete the data in heap file organization, then we need to
traverse the data from staring of the file till we get the requested record.

If the database is very large then searching, updating or deleting of record will be time-
consuming because there is no sorting or ordering of records. In the heap file organization,
we need to check all the data until we get the requested record.

Pros of Heap file organization

o It is a very good method of file organization for bulk insertion. If there is a large
number of data which needs to load into the database at a time, then this method is
best suited.
o In case of a small database, fetching and retrieving of records is faster than the
sequential record.

Cons of Heap file organization

o This method is inefficient for the large database because it takes time to search or
modify the record.
o
o This method is inefficient for large databases.

Hash File Organization

Hash File Organization uses the computation of hash function on some fields of the records.
The hash function's output determines the location of disk block where the records are to be
placed.

When a record has to be received using the hash key columns, then the address is generated,
and the whole record is retrieved using that address. In the same way, when a new record has
to be inserted, then the address is generated using the hash key and record is directly inserted.
The same process is applied in the case of delete and update.

In this method, there is no effort for searching and sorting the entire file. In this method, each
record will be stored randomly in the memory.
Indexed sequential access method (ISAM)

ISAM method is an advanced sequential file organization. In this method, records are stored
in the file using the primary key. An index value is generated for each primary key and
mapped with the record. This index contains the address of the record in the file.
If any record has to be retrieved based on its index value, then the address of the data block is
fetched and the record is retrieved from the memory.

Pros of ISAM:

o In this method, each record has the address of its data block, searching a record in a
huge database is quick and easy.
o This method supports range retrieval and partial retrieval of records. Since the index
is based on the primary key values, we can retrieve the data for the given range of
value. In the same way, the partial value can also be easily searched, i.e., the student
name starting with 'JA' can be easily searched.

Cons of ISAM

o This method requires extra space in the disk to store the index value.
o When the new records are inserted, then these files have to be reconstructed to
maintain the sequence.
o When the record is deleted, then the space used by it needs to be released. Otherwise,
the performance of the database will slow down.

Cluster file organization

o When the two or more records are stored in the same file, it is known as clusters.
These files will have two or more tables in the same data block, and key attributes
which are used to map these tables together are stored only once.
o This method reduces the cost of searching for various records in different files.
o The cluster file organization is used when there is a frequent need for joining the
tables with the same condition. These joins will give only a few records from both
tables. In the given example, we are retrieving the record for only particular
departments. This method can't be used to retrieve the record for the entire
department.
In this method, we can directly insert, update or delete any record. Data is sorted based on the
key with which searching is done. Cluster key is a type of key with which joining of the table
is performed.

Types of Cluster file organization:

Cluster file organization is of two types:

1. Indexed Clusters:

In indexed cluster, records are grouped based on the cluster key and stored together. The
above EMPLOYEE and DEPARTMENT relationship is an example of an indexed cluster.
Here, all the records are grouped based on the cluster key- DEP_ID and all the records are
grouped.

2. Hash Clusters:
It is similar to the indexed cluster. In hash cluster, instead of storing the records based on the
cluster key, we generate the value of the hash key for the cluster key and store the records
with the same hash key value.

Pros of Cluster file organization

o The cluster file organization is used when there is a frequent request for joining the
tables with same joining condition.
o It provides the efficient result when there is a 1:M mapping between the tables.

Cons of Cluster file organization

o This method has the low performance for the very large database.
o If there is any change in joining condition, then this method cannot use. If we change
the condition of joining then traversing the file takes a lot of time.
o This method is not suitable for a table with a 1:1 condition.

Indexing in DBMS

o Indexing is used to optimize the performance of a database by minimizing the number


of disk accesses required when a query is processed.
o The index is a type of data structure. It is used to locate and access the data in a
database table quickly.

Index structure:

Indexes can be created using some database columns.

o The first column of the database is the search key that contains a copy of the primary
key or candidate key of the table. The values of the primary key are stored in sorted
order so that the corresponding data can be accessed easily.
o The second column of the database is the data reference. It contains a set of pointers
holding the address of the disk block where the value of the particular key can be
found.
Indexing Methods

Ordered indices

The indices are usually sorted to make searching faster. The indices which are sorted are
known as ordered indices.

Example: Suppose we have an employee table with thousands of record and each of which is
10 bytes long. If their IDs start with 1, 2, 3....and so on and we have to search student with
ID-543.

o In the case of a database with no index, we have to search the disk block from starting
till it reaches 543. The DBMS will read the record after reading 543*10=5430 bytes.
o In the case of an index, we will search using indexes and the DBMS will read the
record after reading 542*2= 1084 bytes which are very less compared to the previous
case.

Primary Index

o If the index is created on the basis of the primary key of the table, then it is known as
primary indexing. These primary keys are unique to each record and contain 1:1
relation between the records.
o As primary keys are stored in sorted order, the performance of the searching operation
is quite efficient.
o The primary index can be classified into two types: Dense index and Sparse index.

Dense index
o The dense index contains an index record for every search key value in the data file. It
makes searching faster.
o In this, the number of records in the index table is same as the number of records in
the main table.
o It needs more space to store index record itself. The index records have the search key
and a pointer to the actual record on the disk.

Sparse index

o In the data file, index record appears only for a few items. Each item points to a block.
o In this, instead of pointing to each record in the main table, the index points to the
records in the main table in a gap.

Clustering Index

o A clustered index can be defined as an ordered data file. Sometimes the index is
created on non-primary key columns which may not be unique for each record.
o In this case, to identify the record faster, we will group two or more columns to get
the unique value and create index out of them. This method is called a clustering
index.
o The records which have similar characteristics are grouped, and indexes are created
for these group.

Example: suppose a company contains several employees in each department. Suppose we


use a clustering index, where all employees which belong to the same Dept_ID are
considered within a single cluster, and index pointers point to the cluster as a whole. Here
Dept_Id is a non-unique key.

The previous schema is little confusing because one disk block is shared by records which
belong to the different cluster. If we use separate disk block for separate clusters, then it is
called better technique.
Secondary Index

In the sparse indexing, as the size of the table grows, the size of mapping also grows. These
mappings are usually kept in the primary memory so that address fetch should be faster. Then
the secondary memory searches the actual data based on the address got from mapping. If the
mapping size grows then fetching the address itself becomes slower. In this case, the sparse
index will not be efficient. To overcome this problem, secondary indexing is introduced.

In secondary indexing, to reduce the size of mapping, another level of indexing is introduced.
In this method, the huge range for the columns is selected initially so that the mapping size of
the first level becomes small. Then each range is further divided into smaller ranges. The
mapping of the first level is stored in the primary memory, so that address fetch is faster. The
mapping of the second level and actual data are stored in the secondary memory (hard disk).
For example:

o If you want to find the record of roll 111 in the diagram, then it will search the highest
entry which is smaller than or equal to 111 in the first level index. It will get 100 at
this level.
o Then in the second index level, again it does max (111) <= 111 and gets 110. Now
using the address 110, it goes to the data block and starts searching each record till it
gets 111.
o This is how a search is performed in this method. Inserting, updating or deleting is
also done in the same manner.

Hashing

In a huge database structure, it is very inefficient to search all the index values and reach the
desired data. Hashing technique is used to calculate the direct location of a data record on the
disk without using index structure.

In this technique, data is stored at the data blocks whose address is generated by using the
hashing function. The memory location where these records are stored is known as data
bucket or data blocks.
In this, a hash function can choose any of the column value to generate the address. Most of
the time, the hash function uses the primary key to generate the address of the data block. A
hash function is a simple mathematical function to any complex mathematical function. We
can even consider the primary key itself as the address of the data block. That means each
row whose address will be the same as a primary key stored in the data block.

The above diagram shows data block addresses same as primary key value. This hash
function can also be a simple mathematical function like exponential, mod, cos, sin, etc.
Suppose we have mod (5) hash function to determine the address of the data block. In this
case, it applies mod (5) hash function on the primary keys and generates 3, 3, 1, 4 and 2
respectively, and records are stored in those data block addresses.
Types of Hashing:

o Static Hashing
o Dynamic Hashing

Static Hashing

In static hashing, the resultant data bucket address will always be the same. That means if we
generate an address for EMP_ID =103 using the hash function mod (5) then it will always
result in same bucket address 3. Here, there will be no change in the bucket address.

Hence in this static hashing, the number of data buckets in memory remains constant
throughout. In this example, we will have five data buckets in the memory used to store the
data.
Operations of Static Hashing

o Searching a record

When a record needs to be searched, then the same hash function retrieves the address of the
bucket where the data is stored.

o Insert a Record

When a new record is inserted into the table, then we will generate an address for a new
record based on the hash key and record is stored in that location.

o Delete a Record

To delete a record, we will first fetch the record which is supposed to be deleted. Then we
will delete the records for that address in memory.

o Update a Record

To update a record, we will first search it using a hash function, and then the data record is
updated.

HTML Tutorial

If we want to insert some new record into the file but the address of a data bucket generated
by the hash function is not empty, or data already exists in that address. This situation in the
static hashing is known as bucket overflow. This is a critical situation in this method.
To overcome this situation, there are various methods. Some commonly used methods are as
follows:

1. Open Hashing

When a hash function generates an address at which data is already stored, then the next
bucket will be allocated to it. This mechanism is called as Linear Probing.

For example: suppose R3 is a new address which needs to be inserted, the hash function
generates address as 112 for R3. But the generated address is already full. So the system
searches next available data bucket, 113 and assigns R3 to it.

2. Close Hashing

When buckets are full, then a new data bucket is allocated for the same hash result and is
linked after the previous one. This mechanism is known as Overflow chaining.

For example: Suppose R3 is a new address which needs to be inserted into the table, the
hash function generates address as 110 for it. But this bucket is full to store the new data. In
this case, a new bucket is inserted at the end of 110 buckets and is linked to it.
Dynamic Hashing

o The dynamic hashing method is used to overcome the problems of static hashing like
bucket overflow.
o In this method, data buckets grow or shrink as the records increases or decreases. This
method is also known as Extendable hashing method.
o This method makes hashing dynamic, i.e., it allows insertion or deletion without
resulting in poor performance.

How to search a key

o First, calculate the hash address of the key.


o Check how many bits are used in the directory, and these bits are called as i.
o Take the least significant i bits of the hash address. This gives an index of the
directory.
o Now using the index, go to the directory and find bucket address where the record
might be.

How to insert a new record

o Firstly, you have to follow the same procedure for retrieval, ending up in some
bucket.
o If there is still space in that bucket, then place the record in it.
o If the bucket is full, then we will split the bucket and redistribute the records.

For example:
Consider the following grouping of keys into buckets, depending on the prefix of their hash
address:

The last two bits of 2 and 4 are 00. So it will go into bucket B0. The last two bits of 5 and 6
are 01, so it will go into bucket B1. The last two bits of 1 and 3 are 10, so it will go into
bucket B2. The last two bits of 7 are 11, so it will go into B3.

Insert key 9 with hash address 10001 into the above structure:

o Since key 9 has hash address 10001, it must go into the first bucket. But bucket B1 is
full, so it will get split.
o The splitting will separate 5, 9 from 6 since last three bits of 5, 9 are 001, so it will go
into bucket B1, and the last three bits of 6 are 101, so it will go into bucket B5.
o Keys 2 and 4 are still in B0. The record in B0 pointed by the 000 and 100 entry
because last two bits of both the entry are 00.
o Keys 1 and 3 are still in B2. The record in B2 pointed by the 010 and 110 entry
because last two bits of both the entry are 10.
o Key 7 are still in B3. The record in B3 pointed by the 111 and 011 entry because last
two bits of both the entry are 11.

Advantages of dynamic hashing

o In this method, the performance does not decrease as the data grows in the system. It
simply increases the size of memory to accommodate the data.
o In this method, memory is well utilized as it grows and shrinks with the data. There
will not be any unused memory lying.
o This method is good for the dynamic database where data grows and shrinks
frequently.

Disadvantages of dynamic hashing

o In this method, if the data size increases then the bucket size is also increased. These
addresses of data will be maintained in the bucket address table. This is because the
data address will keep changing as buckets grow and shrink. If there is a huge
increase in data, maintaining the bucket address table becomes tedious.
o In this case, the bucket overflow situation will also occur. But it might take little time
to reach this situation than static hashing.

You might also like