0% found this document useful (0 votes)
7 views

DBMS unit 1-5 notes (1)

Uploaded by

nanipavan830
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

DBMS unit 1-5 notes (1)

Uploaded by

nanipavan830
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 69

***UNIT-I***

INTRODUCTION TO DBMS:--
Data:--

Data is meaningful known raw facts that can be processed and


stored as information.

Database:--
Database is a collection of interrelated and organized data. In
general, it is a collection of files.

➢ Database is a collection of related data and data is a collection DATABASE vs FILE SYSTEMS:--
of facts and figures that can be processed to produce
information.
➢ A Database Management System stores data in such a way
that it becomes easier to retrieve, manipulate, and produce
information.

The DBMS contains these particulars like:---

✓ A collection of interrelated data.


✓ A set of programs to access the data.
✓ An environment, which is efficient and convenient to use.

Applications of DBMS:-- CHARACTERISTICS OF DBMS:--

Databases touch every area of our life. The list of few areas where Real-world entity:-- A modern DBMS is more realistic and uses real-
they are extensively used is:-- world entities to design its architecture. It uses the behavior and
attributes too.
• Banks • Universities • Airlines • E-Commerce • Stock
Ex:- School database which is related to the students.
Exchanges • Weather Forecast • Manufacturing Assemblies
• Human Resource.
Relation-based tables:-- DBMS allows entities and relations among Concurrent Access and Crash Recovery:-- A DBMS schedules
them to form tables. A user can understand the architecture of a concurrent accesses to the data in such a manner that users can
database just by looking at the table names. think of the data as being accessed by only one user at a time.
Further, the DBMS protects users from the effects of system failures.
Data Independence:-- Application programs should not, ideally, be
exposed to details of data representation and storage, The DBMS Drawbacks of using File systems to store data & store data (cont.):--
provides an abstract view of the data that hides such details.
➢ Data Redundancy & Inconsistency
Efficient Data Access:-- A DBMS utilizes a variety of sophisticated
Multiple file formats, duplication of information in diff. files.
techniques to store and retrieve data efficiently. This feature is
especially important if the data is stored on external storage devices. ➢ Difficulty in Accessing Data

Data Integrity and Security:-- If data is always accessed through the Need to write a new program to carry out each new task
DBMS, the DBMS can enforce integrity constraints. ➢ Integrity Problems
Ex:- It checks the data before entering into database which is related Integrity constants {e.g.., Account Bal.>0}.
to employee & classes of user.
Hard to add new constraints or existing changes.
Consistency:-- Consistency is a state where every relation in a
database remains consistent. A DBMS can provide greater ➢ Atomicity of Updates
consistency as compared to earlier forms of data storing applications Failed one will leave the database in an inconsistent state with
like file-processing systems. partial updates carried out.
Query Language:-- DBMS is equipped with query language, which Ex:- Transfer funds from one to another account should either
makes it more efficient to retrieve and manipulate data. A user can complete or not happens at all.
apply as many and as different filtering options as required to
retrieve a set of data. Traditionally it was not possible where file- ➢ Concurrent Access by Multiple Users
processing system was used. It will needs performance.
Data Administration:-- Experienced professionals who understand ➢ Security Problems
the nature of the data being managed, and how different groups of
Hard to provide user access to some, but not all the data.
users use it, can be responsible for organizing the data
representation to minimize redundancy and for fine-tuning the DataBase Systems offers solutions for all the above problems
storage of the data to make retrieval efficient. & another problems also.
Reasons for not choosing DBMS:- Though there are several 1. Actors on the scene.
advantages with DBMS, some applications, with tight real-time 2. Workers behind the scene.
constraints or just a few well-defined critical operations for which ➢ Actors on the scene:-
efficient custom code must be written are not choosing DBMS.
Those who actually use and control the database content, and those
✓ A DBMS is a complex piece of software, optimized for certain who design, develop and maintain database applications.
kinds of workloads and its performance may not be adequate
Database Administrator (DBA):- This is the chief administrator, who
for certain specialized applications.
oversees and manages the database system (including the data and
✓ An application may need to manipulate the data in ways not
software).
supported by the query language. In such a situation, the
abstract view of the data presented by the DBMS does not Designers:- Designers are the group of people who actually work on
match the application's needs and actually gets in the way. the designing part of the database.

DATABASE USERS:-- End Users:- These are persons who access the database for querying,
updating, & report generation.
A typical DBMS has users with different rights and permissions who
use it for different purposes. Some users retrieve data and some ➢ WORKERS BEHIND THE SCENE:-
back it up. The users of a DBMS can be broadly categorized as Those who design and develop the DBMS software and related tools,
follows:--- and the computer systems operators.

1. DBMS system designers/ implementers: provide the DBMS


software that is at the foundation of all.

2. Tool developers: design and implement software tools facilitating


database system design, performance monitoring, creation of
graphical user interfaces, prototyping, etc.

3. Operators and maintenance personnel: responsible for the day-to-


day operation of the system.

INTRODUCTION OF DIFFERENT DATA MODELS:--


Or A data model is a collection of concepts that can be used to describe
These are divided into 2 types. They are:-- the structure of a database and provides the necessary means to
achieve this abstraction whereas structure of a database means the
data types, relationships and constraints that should hold on the Diamonds, which represent relationships among entity sets.
data.
Lines, which link attributes to entity sets and entity sets to
The various data models that have been proposed fall into three relationships.
different groups.
ER Model is based on –
➢ Object based logical models.
• Entities and their attributes.
➢ Record-based logical models.
➢ Physical models. • Relationships among entities.
1. Object based logical models:-

They are used in describing data at the logical and view levels. They
are characterized by the fact that they provide fairly flexible
structuring capabilities and allow data constraints to be specified
explicitly.

In this some different models are:-- • Entity:− An entity in an ER Model is a Real-world entity having
1. The E-R model (Entity Relationship model ‘or’ Diagram). properties called “Attributes”. Every attribute is defined by its set of
values called “Domain”.
2. The object-oriented model.
• Relationship:− The logical association among entities is called
3. The semantic data model.
“Relationship”. Relationships are mapped with entities in various
4. The functional data model. ways. Mapping cardinalities define the number of association
between two entities.
✓ The E-R model:--
Mapping cardinalities:− • One to One • One to Many • Many to
The (E-R) data model is based on a perception of a real worker that
One • Many to Many.
consists of a collection of basic objects, called entities, and of
relationships among these objects. The overall logical structure of a
database can be expressed graphically by an E-R diagram.

Components in E-R Model is:--

Rectangles, which represents entity sets.

Ellipse, which represent attributes.


✓ The Object-Oriented Model:- The main highlights of this model are:−

Like the E-R model the object-oriented model is based on a • Data is stored in tables called relations.
collection of objects. An object contains values stored in instance
• Relations can be normalized.
variables within the object. An object also contains bodies of code
that operate on the object. These bodies of code are called • In normalized relations, values saved are atomic values.
“Methods”.
• Each row in a relation contains a unique value.
✓ The Semantic Data Model:-
• Each column in a relation contains values from a same domain.
The semantic data model is a method of structuring data in
2. RECORD-BASED LOGICAL MODELS:--
order to represent it in a specific logical way. It is a
conceptual data model that includes semantic information that These are also used in describing data at the logical and view
adds a basic meaning to the data and the relationships that lie levels. In contrast to object-based data models, they are used
between them. both to specify the overall logical structures of the database, and
✓ The Functional Data Model:- to provide a higher-level description of the implementation.

Functional Data Models are a form of Semantic Data Model The three most widely accepted record based data models are:
which appeared early in database history. They use the o The relational Model.
mathematical formalism of function application to represent and
o Network Model.
follow associations between data items.
o Hierarchical Model.
➢ Relational Model:-

The most popular data model in DBMS is the Relational Model. It is


more scientific a model than others. This model is based on first-
order predicate logic and defines a table as an n-ary relation.

“This figure is common to all these 3:-- Network; Relational;


Hierarchical Data Models.”
I. The Relational Model:-
➢ It is most commonly used database model.
➢ It is more flexible than network & hierarchical models.
➢ It consists of simple models.
➢ A relation represents a particular Entity. It is used to store
III. Hierarchical Model:-
info. About the entity.
➢ It arranges records in hierarchy like an organizational chart.
Example table for Relational Model Database:-- ➢ Each record type in this model is called a Node or Segment.
➢ A node represents a particular entity.
➢ The top-most node is called “Root”. Each node is a sub-
ordinate of the node that is at the nxt higher level.
➢ A higher level node is called “Parent node” & lower level
node is called “Child”. A parent node can have one or more
child nodes.

II. Network Model:- Drawbacks of hierarchical model database:-


➢ It is similar to Hierarchical Model database. ✓ It is difficult to modify.
➢ The difference is that child node can have more than one ✓ It cannot represent all the relations between data.
parent node. ✓ It is a restricted model.
➢ The child node represented by arrows in network model.
➢ It also provides more flexibility than hierarchical model. Advantages of hierarchical model database:-

Dis-advantages of Network Model Database:- ✓ Simplicity:-

➢ System Complextity:- Since the database is based on the hierarchical structure, the
1. In this data can be accessed one record at a time. relationship between the various layers is logically simple.
2. A user-friendly DBMS cannot be created using this. ✓ Data Security:–
➢ Lack of Structural Independence:-
Hierarchical model was the first database that offered the data
1. Making structural modifications to the database is very difficult
security that is provided and enforced by the DBMS.
in this model as accessing the navigational method.
2. In this we can achieves data independence, it still fails to ✓ Efficiency:–
achieve structural independence.
Example tree figure for network model database:--
The hierarchical database model is a very efficient one when the
database contains a large number of one-to-many relationships
and when the users require large number of transactions, using
data whose relationships are fixed.

Example tree figure for hierarchical model database:--

DATABASE INSTANCES & SCHEMAS:----


These are similar to types & variables in programming languages.

▪ Logical Schema:-- The overall logical structure of database.

Analogous to type info of a variable program.

▪ Physical Schema:-- The overall physical structure of


database.
3. Physical Data Models:- ▪ Instance:-- The actual content of the database at a particular
These are used to describe data at the lowest level. In contrast to point in time.
logical data models, there are few physical data models in use. Analogous to the value of a variable.
Two of the widely known ones are the unifying model and the
▪ Physical Data Independence:-- The ability to modify the
frame-memory model.
physical schema without changing the logical schema.
Levels of Abstractions:-
SQL→ STANDARD QUERY LANGUAGE COMMANDS IN
➢ Physical Level:-- Describes how a record is stored. DBMS:--
➢ Logical Level:-- Describes data stored in database, & the
relationships among the data. 1. SQL-- STANDARD QUERY LANGUAGE :-
➢ View Level:-- Application programs hide details of data ➢ It is most widely used commercial language.
types. Views can also hide info for security purposes. ➢ SQL is NOT a Turing machine equivalent language.
➢ To be able to compute complex functions SQL is usually
embedded in some higher-level language.
➢ Language extensions to allow embedded SQL.
2. DDL-- DATA DEFINITION LANGUAGE :-
➢ DDL compiler generates a set of table templates stored in a In DBMS, data can be shared by authorized users of the
“Data Dictionary”. organization. The database administrator manages the data
➢ Data dictionary contains metadata (i.e; data about data). and gives rights to users to access the data. Many users can be
➢ Database Schemas. authorized to access the same piece of information
➢ Integrity Constants. simultaneously.
➢ Primary Key (ID uniquely identifies instructors).
✓ DATA CONSISTENCY :-
➢ Authorization who can access what.
3. DML-- DATA MANIPULATING LANGUAGE :- By controlling the data redundancy, the data consistency is
➢ Language for accessing & manipulating the data organized obtained. If the DBMS has controlled redundancy, the database
by the appropriate data model. system enforces consistency.
➢ It is also known as query language. ✓ INTEGRATION OF DATA :-
4. XML-- EXTENSIBLE MARKUP LANGUAGE :-
➢ Defined by the WWW consortium (W3C). Integrity constraints or consistency rules can be applied to
➢ Originally intended as a document markup language not a database so that the correct data can be entered into database.
database language. The constraints may be applied to data item within a single
➢ XML is the great way to exchange data, not just record or the may be applied to relationships between records.
documents. ✓ DATA SECURITY :-
➢ XML has become the basis for all new gen. data
interchange formats. Form is very important object of DBMS. You can create forms
➢ It is the widely spl tool for prasing, browsing & querying very easily and quickly in DBMS. Once a form is created, it can
XML docs/data. be used many times and it can be modified very easily.

✓ BACKUP & RECOVERY PROCEDURES :-


ADVANTAGES OF DBMS:--
Most of the DBMS’s provide the 'backup and recovery' sub-
✓ Controlling Data Redundancy :-
systems that automatically create the backup of data and
In DBMS, all data of an organization is integrated into a single restore data if required.
database file. The data is recorded in only one place in the
✓ DATA INDEPENDENCE :-
database and it is not duplicated.
In DBMS, you can easily change the structure of database
✓ SHARING OF DATA :-
without modifying the application program.

DIS-ADVANTAGES OF DBMS:--
✓ COST OF SOFTWARE/HARDWARE & MIGRATION :-

A significant disadvantage of the DBMS system is cost .In


addition to the cost of purchasing or developing the software,
the hardware has to be upgraded to allow for the extensive
programs and work spaces required for their execution and
storage.

✓ PROBLEM ASSOCIATED WITH CENTRALIZATION :-


TIRE-2 :-
While centralization reduces duplication, the lack of duplication
It is like Client Server application. The direct communication takes
requires that the database be adequately backed up so that in
place between client & server. There is no intermediate between
the case of failure the data can be recovered. Centralization
client & server.
also means that the data is accessible from a single source.

✓ COMPLEXITY OF BACKUP & RECOVERY :-

Backup and recovery operations are fairly complex in a DBMS


environment, and this is exacerbated in a concurrent multi user
database system.

ARCHITECTURE OF DBMS :--

It contains 3 types of architectures. They are:---

1. 1 TIRE.
2. 2 TIRE.
3. 3 TIRE.
TIRE-1 :-

It involves putting all the required components for a software


application/technology on a single server or platform. Basically,
a 1-Tire Architecture keeps all the elements of an application,
including the Interface, Middleware & Back-end data, in one
place.
TIRE-3 :-

It is a Client-server architecture in which the functional process


logic, data access, computer data storage & user interface are
developed & maintained as independent modules on separate
platforms.

ADVANTAGES & DIS-ADVANTAGES OF “TIRE-2 ARCHITECTURE”:-

➢ Advantages :--
➢ Fast Communication.
➢ Easy to Manage.
➢ Dis-advantages :--
➢ Scalability.
➢ Security.

Examples:--

In Colleges, Universities, Companies & Houses.


ADVANTAGES OF “TIRE-3 ARCHITECTURE”:-
➢ Scalability.
➢ Security.
*UNIT-02*
➢ Data Independence. ***RELATIONAL MODELS***
Examples:-- Introduction To Relational Models:--
➢ Web-Applications.
Relational Models was proposed by the “E.F. CODD” to
➢ Facebook e.t.c….
model data in the form of relations & tables. After designing
the conceptual model of the database using ER diagram, we
need to convert the conceptual model in the relational model
which can be implemented using any RDBMS-->(Relational
***HAPPY LEARNING*** Database Management System).

***THANK YOU*** Example:- SQL; My SQL; Oracle 10g’s E.T.C…

Relational Model:-
• It will represents how data is stored in relational
databases.
• A relational database stores data in the form of relations
(tables).
• Consider a relation student with the attributes like:- Roll
number; Name; Address; Phone & Age.
• Table name:-- Student_Data

Roll-No Name Address Phone Age


33337 Sri Paris Xxxxxxxxx 20
33338 Sai Los-angles Xxxxxxxxx 25
33339 Kim Sweden Xxxxxxxxx 30
Attributes:-
Attributes are the properties that will define a relation. It is defined as a set of data types, that have same attributes.
A relation consists of ‘Relation Schema’ & ‘Relational
E.g:- Roll-no; Name.
Instance’.
Tuple:-
Relation Schema:-
Each row in the relation is known as “Tuple”.
It represents the name of the relation with it’s attributes.
E.g:- 33337 sri paris xxxxxxxx 20.
E.g:- Student_Data (Roll-No;Name;Address;Phone & Age).
This is relation schema for Students_Data.
Degree:- Relation Instance:-
The No.Of attributes in the relation is known as “Degree”. The set of tuples of a relation at a particular instance of time
• The student_Data is defined as degree-5. is called ad “Relation Instance”.

Column:- • The Student_Data table shows the relation


instance of student at a particular time. It changes
It represents the set of values for a particular attribute.
whenever there is insertion, deletion or updation
E.g:- Roll-No (or) Age; E.T.C… in the database.
Null Value:- • Example of an instance relation.

The value which is not known or un-available is called as Roll-No Name Address Phone Age
“Null Value”. It is represented by blank space. 33337 Sri Paris Xxxxxxxxx 20
33338 Sai Los-angles Xxxxxxxxx 25
Concept Of Domain:-
33339 Kim Sweden Xxxxxxxxx 30
The domain of a database is the set of all allowable values or
attributes of the database.
E.g:- Gender-(Male;Female;Other). Fields (columns,attributes e.t.c.)

Relation:- Tuples
Importance Of Null Values:- In every table the name of the column or row is called as
“Domain”.
• SQL supports a special value known as NULL. Which is
used to represent the values of attributes that may be E.g:- 1.Create domain id_value; int; constraint; id_test check
unknown or not allow to a tuple. value<100.
• It is an important to understand that a null value is 2.Create table student (stu_id, id_value, primary key,
different from zero value. stu_name, varchar(30), stu_age int);
• A null value is used to represent a missing value, But
that it usually has 1 of 3 different interpretation.
Key Constraints In DBMS:-
➢ Value unknown. (Value exists but it is unknown). • Constraints are nothing but rules. That are allowed to be
➢ Value not available.(Exists with a purpose). followed while entering data into columns of database.
➢ Attribute not applicable.(Undefined for this tuple). • Constraints ensure that data entered by the user into
Constraints:- columns must be with in the criteria specified by the
condition.
• These are the rules enforced on the data columns of a • We have 6 types of key contraints in DBMS. They are:-
table. These are used to limit the type of the data that
1.Not Null 2.Unique 3.Default
can go into a database.
• This ensures the accuracy & reliability of the data in the 4.Check 5.Primary Key 6.Foreign Key
database. Constraints could be either on a column level
1.Not Null:-
or in a table level.
The not null specification prohibits the insertion of a null
Domain Constraints In DBMS:-
value for the attribute. Any database modification that would
Domain constraints specify that within each tuple, the value cause a null to be inserted in an attribute declared to be not
of each attribute ‘A’ must be an atomic value from the null generates an error diagnostic.
domain dom(A). Each attribute value must be either null or
Example:- CREATE TABLE Persons ( ID int NOT NULL UNIQUE,
drawn from the domain of that attribute.
LastName varchar(255) NOT NULL, FirstName varchar(255),
Domain Constraint= Data type check for column + Constraint. Age int );
2.Unique:- It is ensures that the data entered by the user for that
column is within the range of values or possible values
The unique specification says that attributes Aj1 , Aj2, . . . ,
specified.
Ajm form a candidate key; that is, no two tuples in the
relation can be equal on all the listed attributes. E.g:- CREATE TABLE Persons ( ID int, LastName varchar(255),
FirstName varchar(255), Age int CHECK (Age>=18) );
Example:- CREATE TABLE Persons ( ID int UNIQUE, LastName
varchar(255) NOT NULL, FirstName varchar(255), Age int ); 5.Primary Key:-
3.Default:- It is a constraint in a table which will uniquely identifies each
row record in a database table by enabling 1 or more
It is used in SQL is used to add default data to the
columns in the table.
columns. But default column value can be customised
E.g:- ‘Roll-No’ of a student is a primary key.
.i.e.; it can be overridden.
6.Foreign Key:-
E.g:- 1.
Sometimes the information stored in a relation is linked to
Roll-No Name Address the information stored in another relation. If one of the
33337 Sri Paris relations is modified, the other must be checked, and
33338 Sai Paris perhaps modified, to keep the data consistent.
33339 Kim Paris
E.g:- CREATE TABLE Enrolled ( studid CHAR(20) , cid CHAR(20),
grade CHAR(10), PRIMARY KEY (studid, cid), FOREIGN KEY
Row with Default values
(studid) REFERENCES Students );
E.g:-2.
Integrity Constarints In DBMS:-
CREATE TABLE Persons ( ID int not null, LastName
1.Entity Integrity Constraints.
varchar(255) NOT NULL, FirstName varchar(255)Default ‘sri’);
2.Referential Integrity Constraints.
4.Check:-
1.Entity Integrity Constraints:-
Integrity constraints ensure that changes made to the • Allows users to access data in the relational database
database by authorized users do not result in a loss of data management systems.
consistency. Thus, integrity constraints guard against
• Allows users to describe the data.
accidental damage to the database. Examples of integrity
constraints are: • Allows users to define the data in a database and
manipulate that data.
• An instructor name cannot be null.
• Allows to embed within other languages using SQL
• No two instructors can have the same instructor ID.
modules, libraries & pre-compilers.
• Every department name in the course relation must have a
• Allows users to create and drop databases and tables.
matching department name in the department relation.
• Allows users to create view, stored procedure, functions in
2.Referential Integrity Constraints:-
a database.
Ensuring that a value that appears in one relation for a given
• Allows users to set permissions on tables, procedures and
set of attributes also appears for a certain set of attributes in
views.
another relation. This condition is called “Referential
Integrity”. let r1 and r2 be relations whose set of attributes Rules:-
are R1 and R2, respectively, with primary keys K1 and K2. We
• SQL is not sensitive case. Generally keywords of SQL are
say that a subset α of R2 is a foreign key referencing K1 in
written in upper case.
relation r1 if it is required that, for every tuple t2 in r2, there
• Using the SQL statements, You can perform most of the
must be a tuple t1 in r1 such that t1.K1 = t2.α. Requirements
actions in database.
of this form are called ‘Referential-Integrity Constraints’ or
• Statements of SQL are depended on text lines. We can
Subset Dependencies’.
use a single SQL statement on 1 or multiple text line.
Basic SQL:- “Standard Query Language--- SQL”
SQL Process:-
SQL is widely popular because it offers the following
When you are executing an SQL command for any RDBMS,
advantages:-
the system determines the best way to carry out your
request and SQL engine figures out how to interpret the task. Simple Database Schema:-
A database schema is a structure that represents the logical
storage of the data in a database. It is the logical
representation of a database which shows how the data is
stored logically in the entire database.
It contains the schema objects like:-tables, fields, packages,
views, relations, primary and foreign keys.
It includes the follows:-
1.Consistent formatting for all data entries.
2.Database objects & unique keys for all data entries.
There are various components included in this process. These 3.Tables with multiple columns & each column contains it’s
components are:- name & data type.
• Query Dispatcher

• Optimization Engines

• Classic Query Engine

• SQL Query Engine E.T.C...

Characteristics Of SQL:-
• Easy to learn.
• Easy to create, insert, edit, delete e.t.c…
• User can access the data from DBMS.
• User can describe the data easily.
SQL Data Types:- Float -1.79E +308 1.79E +308 It is used to
specify a
SQL data type is used to define the values that a column can floating point
contain. Every column is required to have a name & data value. E.g:- 3.7;
3.3
type in the database table. Real -3.40E +38 3.40E +38 It specifies a
single precision
Data Types Of SQL:- floating point
number.
3.Extract Numeric Data Type:-
DATA TYPE DESCRIPTION
Int It is used to specify an integer
value.
Small Int It is used to specify an small
integer value.
1.Binary Data Types:- Bit It has the number of bits to
store.
In this there are 3 types:- Decimal It specifies a numeric value
that can have a decimal
• Binary:- It has maximum length of 8000 Bytes. It
number.
contains fixed-length binary data.
• Var-Binary:- It has maximum length of 8000 Bytes. It Numeric It is used to specify a numeric
contains variable-length of binary data. value.
• Image:- It has maximum length of 2,147,483,647 Bytes. 4.Date & Time Data Type:-
It contains variable-length of binary data.
DATA TYPE DESCRIPTION
2.Numeric Data Type:- Date It is used to store the year,
month, days values.
DATA TYPE FROM TO DESCRIPTION Time It is used to store the hour,
min’s, sec’s values.
Time Stamp It stores the year, month,
day, hour, min & sec’s values. 1. In the above table “STUDENT” is the table name,
5.String Data Type:- ‘STU_ID’, ‘STU_NAME’, ‘CLASS’, ‘E-MAIL’, ‘GROUP’ are
the column names.
DATA TYPE DESCRIPTION
Char It has max.length of 8000 • The combination of data is multiple columns forms a
characters. It contains fixed row. E.g:- 3377, ‘Sri’, ‘Class’, ‘E-mail’, ‘Group’ are the
length & Non-Unicode data of 1 row.
characters.
Varchar It has max.length of 8000 SQL Commands:-
characters contains variable DDL - Data Definition Language:-
length & Non-Unicode
characters. 1. Create Table.
Text It has max.length of 2. Alter Table.
2,147,483,647 characters. It
contains variable length & 3. Drop Table.
Non-Unicode characters. Create Table:-
SQL Table:-
Creates a new table, a view of a table, or other object in
SQL table is a collection of data which is organised in the the database.
terms of rows & columns.
➢ SYNTAX:-
• In DBMS the table is known as Relation & Row as a CREATE TABLE table_name ( column_1 datatype,
‘Tuple’. column_2 datatype, column_3 datatype );
• E.g:- Let’s see the STUDENT table.
Alter Table:-
STU_ID Name Class E-mail Group Modifies an existing database object, such as a table.
3377 Sri AID-B [email protected] CSE ➢ SYNTAX:-
1234 Jhon ECE-A [email protected] ECE
3737 Sunny EEE-2 [email protected] EEE ALTER TABLE table_name ADD column_name datatype;

Drop Table:-
Deletes an entire table, a view of a table or other objects in DELETE:-
the database.
Deletes records.
➢ SYNTAX:-
➢ SYNTAX:-
DROP TABLE table_name;
DELETE FROM table_name WHERE some_column =
DML - Data Manipulation Language:- some_value;

These commands are used to manipulate data in DCL - Data Control Language:-
database.
These commands are used to control the data in
1.Insert. database.

2.Update. 1.Grant.

3.Delete. 2.Revoke.

INSERT:- GRANT:-
Creates a record. Gives a privilege or premission to user.

➢ SYNTAX:- ➢ SYNTAX:-

INSERT INTO table_name (column_1, column_2, GRANT privilege_name


ON object_name
column_3) VALUES (value_1, 'value_2', value_3);
TO {user_name |PUBLIC |role_name}
UPDATE:- [WITH GRANT OPTION];

Modifies records. REVOKE:-

➢ SYNTAX:- Takes back privileges or permission granted from user.

UPDATE table_name SET some_column = some_value ➢ SYNTAX:-


WHERE some_column = some_value;
REVOKE privilege_name SELECT column_name, COUNT(*) FROM table_name
ON object_name GROUP BY column_name HAVING COUNT(*) > value;
FROM {user_name |PUBLIC |role_name};
USING WHERE CLAUSE:- ➢ ORDER BY CLAUSE:-

In this there are 3 types clauses. They are:-- It is used to sort the order of the data in database in
Ascending or Descending Orders.
1.Group By Clause.
✓ SYNTAX:-
2.Having Clause.
SELECT column_name FROM table_name ORDER BY
3.Order By Clause. column_name ASC | DESC;

➢ GROUP BY CLAUSE:- ➢ SQL Where:-


i. It is used to arrange identical data into groups It is a clause used to filter the result set to include the
& to select statement. rows which where the condition is TRUE.
ii. It follows by the WHERE CLAUSE in a SELECT
✓ SYNTAX:-
statement & precedes the ORDER BY CLAUSE.
✓ SYNTAX:- SELECT column_name(s) FROM table_name WHERE
column_name operator value;
SELECT column_name, COUNT(*) FROM table_name
GROUP BY column_name; • It uses in some conditional selection:-
• = -------- Equal to.
➢ HAVING CLAUSE:- • > -------- Greater Than.
i. It is used to specify a search condition for a group • < -------- Less Than.
or an aggregate. • >= ------ Greater Than or Equal To.
ii. It is used in a Group By Clause. If you are not using
• <= ------ Less Than or Equal To.
group by clause then you can use having function
• <> ------ Not Equal To.
like a Where Clause.
SQL OPERATORS:-
✓ SYNTAX:-
SQL statements generally contains some reserved values that are within a set of values, given the
words or characters that are used to perform minimum value and the maximum value.
operations such as Arithmetic & Logical Operations
EXISTS: The EXISTS operator is used to search for the
E.T.C… These reserved words are known as ‘Operators’.
presence of a row in a specified table that meets a
➢ SQL ARITHEMATIC OPERATORS:- certain criterion.
IN: The IN operator is used to compare a value to a list
of literal values that have been specified.
LIKE: The LIKE operator is used to compare a value to
similar values using wildcard operators.
NOT: The NOT operator reverses the meaning of the
logical operator with which it is used.
Eg: NOT EXISTS, NOT BETWEEN, NOT IN, etc. This is a
negate operator.
OR: The OR operator is used to combine multiple
➢ SQL LOGICAL OPERATORS:-
conditions in an SQL statement's WHERE clause.
These can allows you to test for the truth of a condition.
IS NULL: The NULL operator is used to compare a value
ALL: The ALL operator is used to compare a value to all with a NULL value.
values in another value set.
UNIQUE: The UNIQUE operator searches every row of a
AND: The AND operator allows the existence of multiple specified table for uniqueness no duplicates.
conditions in an SQL statement's WHERE clause.
SQL DATE & TIME FUNCTIONS:-
ANY: The ANY operator is used to compare a value to
any applicable value in the list as per the condition. The Date & Time Functions in DBMS are Quite useful to
BETWEEN: The BETWEEN operator is used to search for Manipulate & Store Values related to Date & Time.

• The different Date & Time Functions are:-


• Sysdate:- SQL>SELECT SYSDATE FROM DUAL; 18- SQL STRING FUNCTIONS:-
MAY-17.
1) Concat:- CONCAT returns char1 concatenated with
• next_day:- SQL>SELECT
char2. Both char1 and char2 can be any of the datatypes
NEXT_DAY(SYSDATE,’WED’)FROM DUAL; 24-MAY-
SQL> SELECT CONCAT(‘ORACLE’,’CORPORATION’)FROM
09.
DUAL; ORACLECORPORATION.
• add_months:- SQL>SELECT
ADD_MONTHS(SYSDATE,2)FROM DUAL; 28-JUL- 2) Lpad:- LPAD returns expr1, left-padded to length n
09. characters with the sequence of characters in expr2.
• last_day: SQL>SELECT LAST_DAY(SYSDATE)FROM SQL>SELECT LPAD(‘ORACLE’,15,’*’)FROM DUAL;
DUAL; 31-MAY-17. *********ORACLE.
• months_between:- SQL>SELECT 3) Rpad:- RPAD returns expr1, right-padded to length n
MONTHS_BETWEEN(SYSDATE,18-AUG-17) FROM characters with expr2, replicated as many times as
EMP; 3. necessary. SQL>SELECT RPAD (‘ORACLE’,15,’*’)FROM
• Least:- SQL>SELECT LEAST('10-JAN-07','12-OCT- DUAL; ORACLE*********.
07')FROM DUAL; 10-JAN-07.
4) Ltrim:- Returns a character expression after removing
• Greatest:- SQL>SELECT GREATEST('10-JAN-07','12-
leading blanks. SQL>SELECT LTRIM(‘SSMITHSS’,’S’)FROM
OCT-07')FROM DUAL; 10-JAN-07.
DUAL; MITHSS.
• Trunc:- returns starting day of the week if format
specified is 'DAY' SQL>SELECT 5) Rtrim:- Returns a character string after truncating all
TRUNC(SYSDATE,'DAY')FROM DUAL; 14-MAY-17. trailing blanks. SQL>SELECT RTRIM(‘SSMITHSS’,’S’)FROM
• Round:- returns starting day of the next week if DUAL; SSMITH.
format specified is 'DAY' SQL>SELECT 6) Lower:- Returns a character expression after
ROUND(SYSDATE,'DAY')FROM DUAL; 21-MAY-17. converting uppercase character data to lowercase.
• to_char:- SQL> select to_char(sysdate, SQL>SELECT LOWER(‘DBMS’)FROM DUAL; dbms.
"dd\mm\yy") from dual; 18-may-17.
• to_date:- SQL> select to_date(sysdate,
"dd\mm\yy") from dual; 18-may-17.
Database Management Systems Prof. B. Satyanarayana Reddy

7) Upper:- Returns a character expression with Unit – III


Entity Relationship Model: Introduction, Representation of entities, attributes, entity set, relationship,
lowercase character data converted to uppercase. relationship set, constraints, sub classes, super class, inheritance, specialization, generalization using ER
SQL>SELECT UPPER(‘dbms’)FROM DUAL; DBMS.
Diagrams.
SQL : Creating tables with relationship, implementation of key and integrity constraints, nested queries, sub
queries, grouping, aggregation, ordering, implementation of different types of joins, view(updatable and non-
8) Length:- Returns the number of characters, rather updatable), relational set operations.

than the number of bytes, of the given string expression, INTRODUCTION:


excluding trailing blanks. SQL>SELECT Entity: An entity is something which is described in the database by storing its data, it may be a concrete entity
a conceptual entity.
LENGTH(‘DATABASE’)FROM DUAL; 8. Entity set: An entity set is a collection of similar entities. The Employees entity set with attributes ssn, name,
and lot is shown in the following figure.
9) Substr:- Returns part of a character, binary, text, or
image expression. SQL>SELECT
SUBSTR(‘ABCDEFGHIJ’3,4) FROM DUAL; CDEF.
10) Instr:- The INSTR functions search string for
substring. The function returns an integer indicating the Attribute: An attribute describes a property associated with entities. Attribute will have a name and a value for
each entity.
position of the character in string that is the first Domain: A domain defines a set of permitted values for a attribute
character of this occurrence. SQL>SELECT Entity Relationship Model: An ERM is a theoretical and conceptual way of showing data relationships in
software development. It is a database modeling technique that generates an abstract diagram or visual
INSTR('CORPORATE FLOOR','OR',3,2)FROM DUAL; 14. representation of a system's data that can be helpful in designing a relational database.
ER model allows us to describe the data involved in a real-world enterprise in terms of objects and their
HAPPY LEARNING relationships and is widely used to develop an initial database design.

REPRESENTATION:
***THE END***
1. ENTITIES:
Entities are represented by using rectangular boxes. These are named with the entity name that they
represent.

Fig: Student and Employee entities

2. ATTRIBUTES:
Attributes are the properties of entities. Attributes are represented by means of ellipses. Every ellipse
represents one attribute and is directly connected to its entity.

1
www.Jntufastupdates.com 1
Database Management Systems Prof. B. Satyanarayana Reddy Database Management Systems Prof. B. Satyanarayana Reddy

Types of attributes:
 Simple attribute − Simple attributes are atomic values, which cannot be divided further. For example, a
student's phone number is an atomic value of 10 digits.

 Composite attribute − Composite attributes are made of more than one simple attribute. For example, a
student's complete name may have first_name and last_name.

 Derived attribute − Derived attributes are the attributes that do not exist in the physical database, but
their values are derived from other attributes present in the database. For example, average_salary in a
department should not be saved directly in the database, instead it can be derived. For another example,
age can be derived from data_of_birth.

 Single-value attribute − Single-value attributes contain single value. For example − Derived attributes are depicted by dashed ellipse.
Social_Security_Number.

 Multi-value attribute − Multi-value attributes may contain more than one values. For example, a person
can have more than one phone number, email_address, etc.

ER Representation for Attributes:

Attributes are the properties of entities. Attributes are represented by means of ellipses. Every ellipse represents
one attribute and is directly connected to its entity (rectangle).

3. RELATIONSHIP:
Relationships are represented by diamond-shaped box. Name of the relationship is written inside
the diamond-box. All the entities (rectangles) participating in a relationship, are connected to it by a line.

Types of relationships:
If the attributes are composite, they are further divided in a tree like structure. Every node is then connected to
its attribute. That is, composite attributes are represented by ellipses that are connected with an ellipse. Degree of Relationship is the number of participating entities in a relationship defines the degree of the
relationship. Based on degree the relationships are categorized as

 Unary = degree 1
 Binary = degree 2
 Ternary = degree 3
 n-ary = degree

Unary Relationship: A relationship with one entity set. It is like a relationship among 2 entities of same entity
set. Example: A professor ( in-charge) reports to another professor (Head Of the Dept).
Multi valued attributes are depicted by double ellipse.

2 3
www.Jntufastupdates.com 2 www.Jntufastupdates.com 3
Database Management Systems Prof. B. Satyanarayana Reddy Database Management Systems Prof. B. Satyanarayana Reddy

Each professor teaches one course and each course is taught by one professor.

Binary Relationship: A relationship among 2 entity sets. Example: A professor teaches a course and a course
is taught by a professor. 2. One-to-many relationship: When more than one instance of an entity is associated with a relationship,
then the relationship is one-to-many relationship. Each entity in A is associated with zero or more entities
in B and each entity in B is associated with at most one entity in A.

Ternary Relationship: A relationship among 3 entity sets. Example: A professor teaches a course in so and so
semester.

Each professor teaches 0 (or) more courses and each course is taught by at most one professor.

3. Many-to-one relationship: When more than one instance of entity is associated with the relationship, then
n-array Relationship: A relationship among n entity sets. the relationship is many-to-one relationship. Each entity in A is associated with at most one entity in B and
each entity in B is associated with 0 (or) more entities in A.
E1
E2

R E3
Each professor teaches at most one course and each course is taught by 0 (or) more professors.

Cardinality defines the number of entities in one entity set, which can be associated with the number of entities
of other set via relationship set. Cardinality ratios are categorized into 4. They are.

1. One-to-One relationship: When only one instance of an entities are associated with the relationship, then 4. Many-to-Many relationship: If more than one instance of an entity on the left and more than one instance
the relationship is one-to-one relationship. Each entity in A is associated with at most one entity in B and of an entity on the right can be associated with the relationship, then it depicts many-to-many relationship.
each entity in B is associated with at most one entity in A. Each entity in A is associated with 0 (or) more entities in B and each entity in B is associated with 0 (or)
more entities in A.

4 5
www.Jntufastupdates.com 4 www.Jntufastupdates.com 5
Database Management Systems Prof. B. Satyanarayana Reddy Database Management Systems Prof. B. Satyanarayana Reddy

4. Each Professor teaches course.


Each professor teaches 0 (or) more courses and each course is taught by 0 (or) more professors. min=0 (Partial Participation)
max=many (no Key)

4. RELATIONSHIP SET: Note: Partial Participation is the default participation.


A set of relationships of similar type is called a relationship set. Like entities, a relationship too can
have attributes. These attributes are called descriptive attributes. STRONG AND WEAK ENTITY SETS:
Strong Entity set: If each entity in the entity set is distinguishable or it has a key then such an entity set is
PARTICIPATION CONSTRAINTS: known as strong entity set.
 Total Participation − If Each entity in the entity set is involved in the relationship then the participation
of the entity set is said to be total. Total participation is represented by double lines.

 Partial participation − If, Not all entities of the entity set are involved in the relationship then such a
participation is said to be partial. Partial participation is represented by single lines.

Example: Participation Constraints can be explained easily with some examples. They are as follows.
Weak Entity set: If each entity in the entity set is not distinguishable or it doesn't has a key then such an entity
1.Each Professor teaches at least one course. set is known as strong entity set.
min=1 (Total Participation)
max=many (No key)

2. Each Professor teaches at most one course.


min=0 (Partial Participation) eno is key so it is represented by solid underline. dname is partial key. It can't distinguish the tuples in the
max=many (Key) Dependent entity set. so dname is represented by dashed underline.
Weak entity set is always in total participation with the relation. If entity set is weak then the
relationship is also known as weak relationship, since the dependent relation is no longer needed when the
owner left.
Ex: policy dependent details are not needed when the owner (employee) of that policy left or fired from the
company or expired. The detailed ER Diagram is as follows.
3. Each Professor teaches Exactly one course.
min=1 (Total Participation)
max=1 (Key)

6 7
www.Jntufastupdates.com 6 www.Jntufastupdates.com 7
Database Management Systems Prof. B. Satyanarayana Reddy Database Management Systems Prof. B. Satyanarayana Reddy

Specialization: is opposite to Generalization. It is a top-down approach in which one higher level entity can be
broken down into two lower level entity. In specialization, some higher level entities may not have lower-level
entity sets at all. In specialization, a group of entities is divided into sub-groups based on their characteristics.
Take a group ‘Person’ for example. A person has name, date of birth, gender, etc. These properties are common
in all persons, human beings. But in a company, persons can be identified as employee, employer, customer, or
vendor, based on what role they play in the company.
The cardinality of the owner entity set is with weak relationship is 1 : m. Weak entity set is uniquely
identifiable by partial key and key of the owner entity set.
Dependent entity set is key to the relation because the all the tuples of weak entity set are associated
with the owner entity set tuples.

GENERALIZATION AND SPECIALIZATION


One entity type might be a subtype of another, very similar to subclasses in OO programming.
Relationship that is existed between these entities is known as IsA relationship. The two entities related by IsA
are always descriptions of the same real-world object. These are typically used in databases to be implemented
as Object Oriented Models. The upper entity type is the more abstract entity type (super type) from which the
Inheritance: We use all the above features of ER-Model in order to create classes of objects in object-oriented
lower entities inherit its attributes.
programming. The details of entities are generally hidden from the user; this process known as abstraction. Inheritance
is an important feature of Generalization and Specialization. It allows lower-level entities to inherit the attributes of
Properties of Is A:
higher-level entities.
Inheritance:
 All attributes of the super type apply to the subtype.
 The subtype inherits all attributes of its super type.
 The key of the super type is also the key of the subtype.
Transitivity:
 This property creates a hierarchy of IsA relationships.
Advantages:
 Used to create a more concise and readable E-R diagram.
 It best maps to object oriented approaches either to databases or related applications.
 Attributes common to different entity sets need not be repeated.
 They can be grouped in one place as attributes of the supertype.
 Attributes of (sibling) subtypes are likely to be different.

The process of sub grouping with in a entity set is known as specialization or generalization.
Specialization follows top down approach and generalization follows bottom-up approach. Both the speculation
and generalization are depicted using a triangle component labeled as IS A.
Generalization: is a bottom-up approach in which two lower level entities combine to form a higher level
entity. In generalization, the higher level entity can also combine with other lower level entity to make further
higher level entity. In generalization, a number of entities are brought together into one generalized entity based
on their similar characteristics. For example, pigeon, house sparrow, crow and dove can all be generalized as
Birds.

8 9
www.Jntufastupdates.com 8 www.Jntufastupdates.com 9
Database Management Systems Prof. B. Satyanarayana Reddy Database Management Systems Prof. B. Satyanarayana Reddy

Attribute inheritance is a crucial property where a subclass entity set inherits all the attributes of its AGGREGATION:
An aggregation is not a ternary relationship but is an attempt to establish the relationship with another
super class entity set. Attributes can be additionally specified which is used to give a clear representation
relationship set. It is also termed as relationship with in a relationship. Aggregation can be used over a binary,
though that same attribute is found nowhere in the hierarchy. ternary or a quaternary relationship set. Aggregation is denoted using a dashed rectangle.
Employee and customer can inherit the attributes of Person entity and they have their own attributes like
salary for employee and credit_rating for customer. similarly, the entities officer, teller and secretary inherit alll Aggregation over ternary relationship:
the attributes of employee and they can have their own attributes like office_member for officer,
station_number & hours_worked for teller and hours_worked for secretary.
If an entity set has one single higher level entity set then it is termed as single inheritance. If it has
multiple higher level entity sets then we can term it as multiple inheritance.

Constraints in Class Hierarchies:


Constraints that can be applied for Class Hierarchies are:
1. Condition Constraints
2. User Defined Constraints

A Condition Defined Constraint is imposed, while classifying the entities of a higher level entity set to
be part of (or) a member of lower level entity sets based on a specified defined constraints.
Example: Every higher level entity in the entity set "Account" is checked using the attribute ''acc_type" to be
assigned either to the "SavingsAccount" or to the "CurrentAccount". SavingsAccount and CurrentAccount are
lower level entity sets.
If no condition is specified during the process of designing the lower level entity sets, then it is called Aggregation over binary Relationships:
user defined constraint.
Disjoint Constraint: This constraint checks whether an entity belongs to only one lower level entity set
or not.
Overlapping Constraint: This constraint ensures by testing out that an entity in the higher level entity set
belong to more than one lower level entity sets.
Completeness Constraint: This is also called total constraint which specifies whether or not and entity in
the higher level entity set must belong if at least one lower level entity set in generalization or
specialization.

When we consider the completeness constraint, we come across total and partial constraints. i.e., Total
Participation constraint and Partial Participation Constraint.

Total Participation forces that a higher level entity set 's entity(Every entity) must belong to at least
one lower level entity set mandatorily.
Ex: An account entity set's entity set must be belong to either savings account entity set or
current account entity set. In the examples shown above, we treated the already existed relationship sets "WorksFor" and "Sponsors" as an
entity set for defining the new relationship sets "Manages" and "Monitors". A relationship set is participating in
another relationship. So it can be termed as aggregation.
Partial Participation is rarely found with an entity set because sometimes an entity set in the higher
level entity set beside being a member of that higher level entity set, doesn't belong to any of the
lower level entity sets immediately until the stipulated period. TERNARY RELATIONSHIP DECOMPOSED INTO BINARY:
Ex: A new employer listed in the higher level entity set but not designated to any one of the Consider the following ER diagram, representing insurance policies owned by employees at a company.
available teams that belong to the lower level entity set. It depicts 3 entity sets Employee, policy and Dependents. The 3 entity sets are associated with a ternary
relationship set called Covers. Each employee can own several polices, each policy can be owned by several
employees, and each dependent can be covered by several policies.

1 1
www.Jntufastupdates.com
0 10 www.Jntufastupdates.com
1 11
Database Management Systems Prof. B. Satyanarayana Reddy Database Management Systems Prof. B. Satyanarayana Reddy

Supply in the ternary relationship set from the first figure, which has a set of relationship instances (s,j,p) which
means 's' is a supplier who is supplying part 'p' to a project 'j'.

A ternary relationship represent different information than 3 binary relationship sets do. Here the
relationship sets canSupply, uses and supplies substitute the ternary relationship set "supply".

if we wish to model the following additional requirements:

identified by taking pname in


conjunction with the policyid of a policy entity (which, intuitively, covers the given dependent). "CANSUPPLY", "USES" and "SUPPLIES" are the three binary relationship sets established where
The best way to model this is to switch away from the ternary relationship set, and instead use two
distinct binary relationship sets.  Supplier and part which have "CANSUPPLY" binary relationship include an instance (s,p) which says
supplier 's' can supply part 'p' to any project.
 "USES" relationship between project and part includes an instance (j,p) which says project 'j' uses part 'p'.
 "SUPPLIES" binary relationship between supplier and project includes an instance (s,j) which says supplier
's' supplies some part to project 'j'.

No combination of binary relationships is an adequate substitute. because there is question "where to add
quantity attribute?". Is it to the can-supply or to the uses or to the supplies??

The solution for this is to maintain the same ternary relationship with a weak entity set Supply which has
attribute Qty.

TERNARY VS BINARY: Generally the degree of a relationship set is assessed by counting the no. of nodes
or edges that emanate from that relationship set.

1 1
www.Jntufastupdates.com
2 12 www.Jntufastupdates.com
3 13
Database Management Systems Prof. B. Satyanarayana Reddy Database Management Systems Prof. B. Satyanarayana Reddy

SQL SQL> CREATE TABLE Reports_To ( in_charge_ssn CHAR (11),


hod_ssn CHAR (11) ,
CREATING TABLES WITH RELATIONSHIPS: PRIMARY KEY (in_charge_ssn,hod_ssn),
FOREIGN KEY (in_charge_ssn) REFERENCES Professor(ssn),
Entity Set: An entity set is mapped to a relation in a straightforward way: Each attribute of the entity set
FOREIGN KEY (hod_ssn) REFERENCES Professor(ssn) );
becomes an attribute of the table. Note that we know both the domain of each attribute and the (primary) key of
an entity set.
Relationship sets with key constraints: If a relationship set involves n entity sets and some of them are linked
via arrows in the ER diagram, the key for anyone of these m entity sets constitutes a key for the relation to
which the relationship set is mapped. Hence we have m candidate keys, and one of these should be designated
as the primary key.

SQL> CREATE TABLE Employees ( ssn CHAR(11), name CHAR(30) , lot INTEGER,
PRIMARY KEY (ssn) );
Relationship sets without constraints: To represent a relationship, we must be able to identify each
participating entity and give values to the descriptive attributes of the relationship. Thus, the attributes of the
relation include:
• The primary key attributes of each participating entity set, as foreign key fields.
• The descriptive attributes of the relationship set. The set of non descriptive attributes is a super key for the The table corresponding to Manages has the attributes ssn, did, since. However, because each
relation. If there are no key constraints, this set of attributes is a candidate key. department has at most one manager, no two tuples can have the same did value but differ on the ssn value. A
consequence of this observation is that did is itself a key for Manages; indeed, the set did, ssn is not a key
(because it is not minimal). The Manages relation can be defined using the following SQL statement:

SQL> CREATE TABLE Manages (ssn CHAR (11) , did INTEGER, since DATE,
PRIMARY KEY (did),
FOREIGN KEY (ssn) REFERENCES Employees,
FOREIGN KEY (did) REFERENCES Departments)
A second approach to translating a relationship set with key constraints is often superior because it
avoids creating a distinct table for the relationship set. The idea is to include the information about the
relationship set in the table corresponding to the entity set with the key, taking advantage of the key constraint.
In the Manages example, because a department has at most one manager, we can add the key fields of the
SQL> CREATE TABLE Works_In ( ssn CHAR(11),did INTEGER, Employees tuple denoting the manager and the since attribute to the Departments tuple.
address CHAR(20) ,since DATE, PRIMARY KEY (ssn, did, address),
FOREIGN KEY (ssn) REFERENCES Employees, This approach eliminates the need for a separate Manages relation, and queries asking for a department's
FOREIGN KEY (address) REFERENCES Locations, manager can be answered without combining information from two relations. The only drawback to this
FOREIGN KEY (did) REFERENCES Departments); approach is that space could be wasted if several departments have no managers. In this case the added fields
would have to be filled with null values. The first translation (using a separate table for Manages) avoids this
Another Example: inefficiency, but some important queries require us to combine information from two relations, which can be a
slow operation.
The following SQL statement, defining a DepLMgr relation that captures the information in both
Departments and Manages, illustrates the second approach to translating relationship sets with key constraints:

SQL> CREATE TABLE DepLMgr ( did INTEGER, dname CHAR(20),


budget REAL, ssn CHAR (11) , since DATE,
PRIMARY KEY (did),
FOREIGN KEY (ssn) REFERENCES Employees);

1 1
www.Jntufastupdates.com
4 14 www.Jntufastupdates.com
5 15
Database Management Systems Prof. B. Satyanarayana Reddy Database Management Systems Prof. B. Satyanarayana Reddy

Relationship Sets with Participation Constraints We can capture the desired semantics with the following definition of the Dep_Policy relation:
Every department is required to have a manager, due to the participation constraint, and at most one SQL> CREATE TABLE Dep_Policy (pname CHAR(20) , age INTEGER,
manager, due to the key constraint.
cost REAL, eno CHAR (11) ,
PRIMARY KEY (pname, eno),
FOREIGN KEY (eno) REFERENCES Employees ON DELETE CASCADE );
Observe that the primary key is (pname, eno) , since Dependents is a weak entity. We have to ensure
that every Dependents entity is associated with an Employees entity (the owner), as per the total participation
constraint on Dependents. That is, eno cannot be null. This is ensured because eno , is part of the primary key.
The CASCADE option ensures that information about an employee's policy and dependents is deleted if the
corresponding Employees tuple is deleted.

IMPLEMENTATION OF KEY AND INTEGRITY CONSTRAINTS


1.NOT NULL: When a column is defined as NOTNULL, then that column becomes a mandatory column.
SQL> CREATE TABLE DepLMgr ( did INTEGER, dname CHAR(20) , budget REAL, It implies that a value must be entered into the column if the record is to be accepted for storage in the table.
ssn CHAR(11) NOT NULL, since DATE, Syntax: CREATE TABLE Table_Name(column_name data_type(size) NOT NULL, );
PRIMARY KEY (did),
FOREIGN KEY (ssn) REFERENCES Employees ON DELETE NO ACTION); Example:
It also captures the participation constraint that every department must have a manager: Because ssn SQL> CREATE Table emp2(eno number(5) not null,ename varchar2(10));
cannot take on null values, each tuple of DepLMgr identifies a tuple in Employees (who is the manager). The Table created.
NO ACTION specification, which is the default and need not be explicitly specified, ensures that an Employees
tuple cannot be deleted while it is pointed to by a Dept-Mgr tuple. If we wish to delete such an Employees SQL> desc emp2;
tuple, we must first change the DepLMgr tuple to have a new employee as manager. Name Null? Type
---------------- ----------------- -------------
To ensure total participation of Departments in Works_In using SQL, we need an assertion. We have to
guarantee that every did value in Departments appears in a tuple of Works_In; further, this tuple of Works_In ENO NOT NULL NUMBER(5)
must also have non-null values in the fields that are foreign keys referencing other entity sets involved in the ENAME VARCHAR2(10)
relationship (in this example, the ssn field). We can ensure the second part of this constraint by imposing the
stronger requirement that ssn in Works-ln cannot contain null values.
2. UNIQUE: The purpose of a unique key is to ensure that information in the column(s) is unique i.e. a
Weak Entity Sets: A weak entity set always participates in a one-to-many binary relationship and has a key value entered in column(s) defined in the unique constraint must not be repeated across the column(s). A
constraint and total participation. The second translation approach is ideal in this case, but we must take into table may have many unique keys.
account that the weak entity has only a partial key. Also, when an owner entity is deleted, we want all owned
weak entities to be deleted. Syntax: CREATE TABLE Table_Name(column_name data_type(size) UNIQUE, ….);

Consider the Dependents weak entity set shown in Figure , with partial key pname. A Dependents entity Example:
can be identified uniquely only if we take the key of the owning Employees entity and the pname of the SQL> CREATE Table emp3(eno number(5) unique,ename varchar2(10));
Dependents entity, and the Dependents entity must be deleted if the owning Employees entity is deleted.
Table created.
SQL> desc emp3;
Name Null? Type
-------------------------- ------------------ --------------------
ENO NUMBER(5)
ENAME VARCHAR2(10)
SQL> insert into emp3 values(&eno,'&ename');
Enter value for eno: 1
Enter value for ename: sss
old 1: insert into emp3 values(&eno,'&ename')
new 1: insert into emp3 values(1,'sss')
1 row created.

1 1
www.Jntufastupdates.com
6 16 www.Jntufastupdates.com
7 17
Database Management Systems Prof. B. Satyanarayana Reddy Database Management Systems Prof. B. Satyanarayana Reddy

SQL> / SUB QUERIES


Enter value for eno: 1 A Subquery or Inner query or a Nested query is a query within another SQL query and embedded within
Enter value for ename: sas the WHERE clause. Subquery is used to return data that will be used in the main query as a condition to further
old 1: insert into emp3 values(&eno,'&ename') restrict the data to be retrieved. Subqueries can be used with the SELECT, INSERT, UPDATE, and DELETE
new 1: insert into emp3 values(1,'sas') statements along with the operators like =, <, >, >=, <=, IN, BETWEEN, etc.
insert into emp3 values(1,'sas')
ERROR at line 1: There are a few rules that subqueries must follow −
ORA-00001: unique constraint (SCOTT.SYS_C003006) violated  Subqueries must be enclosed within parentheses.
 A subquery can have only one column in the SELECT clause, unless multiple columns are in the main
3. CHECK: Specifies a condition that each row in the table must satisfy. To satisfy the constraint, each row query for the subquery to compare its selected columns.
in the table must make the condition either TRUE or unknown (due to a null).  An ORDER BY command cannot be used in a subquery, although the main query can use an ORDER BY.
Syntax: CREATE TABLE Table_Name(column_name data_type(size) CHECK(logical The GROUP BY command can be used to perform the same function as the ORDER BY in a subquery.
 Subqueries that return more than one row can only be used with multiple value operators such as the IN
expression), ….);
operator.
Example:  The SELECT list cannot include any references to values that evaluate to a BLOB, ARRAY, CLOB, or
SQL> CREATE TABLE student (sno NUMBER (3), name CHAR(10),class NCLOB.
CHAR(5),CHECK(class IN(‘CSE’,’CAD’,’VLSI’));  A subquery cannot be immediately enclosed in a set function.
 The BETWEEN operator cannot be used with a subquery. However, the BETWEEN operator can be used
within the subquery.
4. PRIMARY KEY: A field which is used to identify a record uniquely. A column or combination of
columns can be created as primary key, which can be used as a reference from other tables. A table contains Subqueries with the SELECT Statement: Subqueries are most frequently used with the SELECT statement.
primary key is known as Master Table.
 It must uniquely identify each record in a table. Syntax:
 It must contain unique values. SELECT column_name [, column_name ] FROM table1 [, table2 ]
 It cannot be a null field. WHERE column_name OPERATOR
 It cannot be multi port field. (SELECT column_name [, column_name ] FROM table1 [, table2 ] [WHERE])
 It should contain a minimum no. of fields necessary to be called unique. Example:
Consider the CUSTOMERS table having the following records −

Syntax: CREATE TABLE Table_Name(column_name data_type(size) PRIMARY KEY, ….); ID NAME AGE ADDRESS SALARY
Example: 1 Ramesh 35 Ahmadabad 2000.00
2 Khilan 25 Delhi 1500.00
SQL> CREATE TABLE faculty (fcode NUMBER(3) PRIMARY KEY, 3 Kaushik 23 Kota 2000.00
fname CHAR(10)); 4 Chaitali 25 Mumbai 6500.00
5 Hardik 27 Bhopal 8500.00
6 Komal 22 MP 4500.00
5. FOREIGN KEY: It is a table level constraint. We cannot add this at column level. To reference any
7 Muffy 24 Indore 10000.00
primary key column from other table this constraint can be used. The table in which the foreign key is
defined is called a detail table. The table that defines the primary key and is referenced by the foreign key
is called the master table. Now, let us check the following subquery with a SELECT statement.

Syntax: CREATE TABLE Table_Name ( col_name type(size) SQL> SELECT * FROM CUSTOMERS
FOREIGN KEY(col_name) REFERENCES table_name WHERE ID IN (SELECT ID FROM CUSTOMERS WHERE SALARY > 4500) ;
);
Example: This would produce the following result.
SQL> CREATE TABLE subject (
ID NAME AGE ADDRESS SALARY
scode NUMBER (3) PRIMARY KEY,
4 Chaitali 25 Mumbai 6500.00
subname CHAR(10),fcode NUMBER(3), 5 Hardik 27 Bhopal 8500.00
FOREIGN KEY(fcode) REFERENCE faculty ); 7 Muffy 24 Indore 10000.00

1 1
www.Jntufastupdates.com
8 18 www.Jntufastupdates.com
9 19
Database Management Systems Prof. B. Satyanarayana Reddy Database Management Systems Prof. B. Satyanarayana Reddy

Subqueries with the INSERT Statement: Subqueries also can be used with INSERT statements. The INSERT SQL> DELETE FROM CUSTOMERS WHERE AGE IN
statement uses the data returned from the subquery to insert into another table. The selected data in the subquery (SELECT AGE FROM CUSTOMERS_BKP WHERE AGE >= 27 );
can be modified with any of the character, date or number functions.
Syntax: This would impact two rows and finally the CUSTOMERS table would have the following records.
INSERT INTO table_name [ (column1 [, column2 ]) ]
SELECT [ *|column1 [, column2 ] FROM table1 [, table2 ] [ WHERE VALUE OPERATOR] ID NAME AGE ADDRESS SALARY
2 Khilan 25 Delhi 1500.00
Example: Consider a table CUSTOMERS_BKP with similar structure as CUSTOMERS table. Now to copy
3 Kaushik 23 Kota 2000.00
the complete CUSTOMERS table into the CUSTOMERS_BKP table, you can use the following syntax.
4 Chaitali 25 Mumbai 6500.00
SQL> INSERT INTO CUSTOMERS_BKP 6 Komal 22 MP 4500.00
SELECT * FROM CUSTOMERS WHERE ID IN 7 Muffy 24 Indore 10000.00
(SELECT ID FROM CUSTOMERS) ;
GROUPING
Subqueries with the UPDATE Statement: The subquery can be used in conjunction with the UPDATE The SQL GROUP BY clause is used in collaboration with the SELECT statement to arrange identical
statement. Either single or multiple columns in a table can be updated when using a subquery with the data into groups. This GROUP BY clause follows the WHERE clause in a SELECT statement and precedes the
UPDATE statement. ORDER BY clause.
Syntax: The basic syntax of a GROUP BY clause is shown in the following code block. The GROUP BY
Syntax: clause must follow the conditions in the WHERE clause and must precede the ORDER BY clause if one is
UPDATE table SET column_name = new_value [ WHERE OPERATOR [ VALUE ] used.
(SELECT COLUMN_NAME FROM TABLE_NAME [WHERE)] )
SELECT column1, column2 FROM table_name WHERE [ conditions ]
Example: Assuming, we have CUSTOMERS_BKP table available which is backup of CUSTOMERS table. GROUP BY column1, column2 ORDER BY column1, column2
The following example updates SALARY by 0.25 times in the CUSTOMERS table for all the customers whose
AGE is greater than or equal to 27. Guidelines to use Group By Clause
 If the group function is included in the select clause., we should not use individual result columns.
SQL> UPDATE CUSTOMERS SET SALARY = SALARY * 0.25  The extra non-group functional columns should be declared in the Group By clause.
WHERE AGE IN (SELECT AGE FROM CUSTOMERS_BKP WHERE AGE >= 27 );
 Using WHERE clause, rows can be pre executed before dividing them into groups.
 Column aliases can't be used in the Group by clause.
This would impact two rows and finally CUSTOMERS table would have the following records.
 By Default, rows are sorted by ascending order of columns included in the Group By list.
ID NAME AGE ADDRESS SALARY Examples:
1 Ramesh 35 Ahmadabad 125.00 Display the average salary of the departments from Emp table.
2 Khilan 25 Delhi 1500.00 SQL> select deptno,AVG(sal) from emp group by deptno;
3 Kaushik 23 Kota 2000.00 Display the minimum and maximum salaries of employees working as clerks in each department.
4 Chaitali 25 Mumbai 6500.00 SQL> select deptno,min(sal),max(sal) from emp where job='CLERK' Group by deptno;
5 Hardik 27 Bhopal 2125.00
6 Komal 22 MP 4500.00 Excluding Groups of Results: While using Group By clause, there is a provision to exclude some group results
7 Muffy 24 Indore 10000.00 using HAVING clause. HAVING clause is used to specify which groups can be specified. It is used to filter the
data which is associated with the group functions.
Subqueries with the DELETE Statement: The subquery can be used in conjunction with the DELETE
Syntax:
statement like with any other statements mentioned above.
SELECT column1, column2 FROM table1, table2 WHERE [ conditions ]
Syntax: GROUP BY column1, column2 HAVING [ conditions ]
DELETE FROM TABLE_NAME [ WHERE OPERATOR [ VALUE ]
(SELECT COLUMN_NAME FROM TABLE_NAME [ WHERE) ] ) Sequence of steps:
Example: Assuming, we have a CUSTOMERS_BKP table available which is a backup of the CUSTOMERS  First rows are grouped.
table. The following example deletes the records from the CUSTOMERS table for all the customers whose  Group functions are applied to that identifies groups.
AGE is greater than or equal to 27.  Groups that match with the criteria in having clause are displayed.

2 2
www.Jntufastupdates.com
0 20 www.Jntufastupdates.com
1 21
Database Management Systems Prof. B. Satyanarayana Reddy Database Management Systems Prof. B. Satyanarayana Reddy

The HAVING clause can predict Group By clause but it is more logical to declare it after Group By MIN Function:
clause. Group By clause can be used without a group function in the SELECT list. If rows need to be restricted
based on the result of a Group function, we must have a group by clause as well as Having clause. Existence of SQL> select min(Salary) from Employees;
Group by clause does not guarantee the existence of HAVING clause but the existence of HAVING clause OUTPUT:
demands the existence of Group By clause. MIN(SALARY)
29860
Example:
Display the Departments having the min salary of clerks is > 1000 MAX Function:
SQL> select deptno, min(sal) from emp where job='CLERK'
group by deptno HAVING min(sal)> 1000; SQL> select max(Salary) from Employees;
Display the sum of the salaries of the departments. OUTPUT:
SQL> select deptno, sum(sal) from emp group by deptno; MAX(SALARY)
65800
SUM Function:
AGGREGATION: Aggregation Functions or Group Functions
These function return a single row based on group of rows. These can appear in SELECT list and SQL> select sum(Salary) from Employees;
HAVING clauses only. These operate on sets of rows to give one result per group. The set may be whole table OUTPUT:
or table split into group. SUM(SALARY)
212574
Guidelines to use Aggregate Functions:
Distinct makes the functions to consider only non duplicate value. AVG Function:
All makes the function to consider every value including duplicates.
Syntax: SQL> select avg(Salary) from Employees;
GroupFunctionName (Distinct/ All columns) OUTPUT:
AVG(SALARY)
The data types for arguments may be char,varchar, number or Date. All group functions except count(*) 42514.8
ignore NULL values. To substitute a value for NULL value, use the NVL() function. When a group function is
declared in a select List, no single row columns should be declared. other columns can be declared but they COUNT Function:
should be declared in the group by clause.
The list of Aggregate Functions are: SQL> select count(IdNum) from Employees;
OUTPUT:
MIN returns the smallest value in a given column
MAX returns the largest value in a given column COUNT(IDNUM)
SUM returns the sum of the numeric values in a given column 5
AVG returns the average value of a given column
COUNT returns the total number of values in a given column COUNT(*) Function:
COUNT(*) returns the number of rows in a table
SQL> select count(*) from Employees;
Consider the following table: OUTPUT:
COUNT(*)
5

2 2
www.Jntufastupdates.com
2 22 www.Jntufastupdates.com
3 23
Database Management Systems Prof. B. Satyanarayana Reddy Database Management Systems Prof. B. Satyanarayana Reddy

ORDERING: (ORDER BY CLAUSE)


The SQL ORDER BY clause is used to sort the data in ascending or descending order, based on one or
more columns. Some databases sort the query results in an ascending order by default.
Syntax:
select column-list from table_name[where condition]
[order by column1, column2,..columnn][asc|desc];
Example: 1. INNER JOIN: The INNER JOIN keyword selects records that have matching values in both tables
Consider the following table: Syntax:
select column_name(s) from table1 inner join table2
ID NAME AGE ADDRESS SALARY on table1.column_name = table2.column_name;
1 Ramesh 35 Ahmadabad 2000
2 Khilan 25 Delhi 1500 2. LEFT JOIN: The LEFT JOIN keyword returns all records from the left table (table1), and the matched
3 Kowshik 23 Kota 2000 records from the right table (table2). The result is NULL from the right side, if there is no match.
4 Chaitali 25 Mumbai 6500 Syntax:
5 Hardhik 27 Bhopal 8500 select column_name(s)from table1left join table2
6 Komal 22 MP 4500 on table1.column_name = table2.column_name;
7 Muffy 24 Indore 10000
3. RIGHT JOIN: The RIGHT JOIN keyword returns all records from the right table (table2), and the
matched records from the left table (table1). The result is NULL from the left side, when there is no match.
SQL > select * from customers order by name, salary;
Syntax:
OUTPUT:
select column_name(s)from table1right join table2
ID NAME AGE ADDRESS SALARY on table1.column_name table2.column_name;
4 Chaitali 25 Mumbai 6500
5 Hardhik 27 Bhopal 8500 4. FULL JOIN: The FULL OUTER JOIN keyword return all records when there is a match in either left
3 Kowshik 23 Kota 2000 (table1) or right (table2) table records.
2 Khilan 25 Delhi 1500
6 Komal 22 MP 4500 Syntax:
select column_name(s)from table1full outer join table2
7 Muffy 24 Indore 10000
on table1.column_name = table2.column_name;
1 Ramesh 35 Ahmadabad 2000
5. SQL SELF JOIN: A self JOIN is a regular join, but the table is joined with itself.
IMPLEMENTATION OF JOINS Syntax:
select column_name(s) from table1 t1, table1 t2 where condition;
A JOIN clause is used to combine rows from two or more tables, based on a related column between
them.
EXAMPLES:
Types of SQL JOINS: Consider the following two tables.
There are 4 different types of SQL joins. Table 1 − CUSTOMERS Table is as follows.
 (INNER) JOIN: Returns records that have matching values in both tables
ID NAME AGE ADDRESS SALARY
 LEFT (OUTER) JOIN: Return all records from the left table, and the matched records from the right table
 RIGHT (OUTER) JOIN: Return all records from the right table, and the matched records from the left 1 Ramesh 31 Ahmadabad 2000.00
table 2 Khilan 25 Delhi 1500.00
 FULL (OUTER) JOIN: Return all records when there is a match in either left or right table
3 Kaushik 23 Kota 2000.00
5 Hardik 27 Mumbai 6500.00
6 Komal 22 MP 4500.00
7 Muffy 24 Indore 10000.00

2 2
www.Jntufastupdates.com
4 24 www.Jntufastupdates.com
5 25
Database Management Systems Prof. B. Satyanarayana Reddy Database Management Systems Prof. B. Satyanarayana Reddy

Table 2 − ORDERS Table is as follows. This would produce the following result −
OID DATE CUSTOMER_ID AMOUNT
ID NAME AMOUNT DATE
102 2009-10-08 00:00:00 3 3000
3 Kaushik 3000 2009-10-08-00:00:00
100 2009-10-08 00:00:00 3 1500
3 Kaushik 1500 2009-10-08 00:00:00
101 2009-11-20 00:00:00 2 1560
2 Khilan 1560 2009-11-20 00:00:00
103 2008-05-20 00:00:00 4 2060
4 Chaitali 2060 2008-05-20 00:00:00
Inner Join:
Full Join:
SQL> SELECT id, name, amount, date FROM customers
INNER JOIN orders SQL> SELECT id, name, amount, date FROM customers
ON customers.id = orders.customer_id; FULL JOIN orders
ON customers.id = orders.customer_id;
This would produce the following result −
This would produce the following result −
ID NAME AMOUNT DATE
ID NAME AMOUNT DATE
3 Kaushik 3000 2009-10-08-00:00:00
1 Ramesh NULL NULL
3 Kaushik 1500 2009-10-08 00:00:00
2 Khilan 1560 2009-11-20 00:00:00
2 Khilan 1560 2009-11-20 00:00:00
3 Kaushik 3000 2009-10-08-00:00:00
4 Chaitali 2060 2008-05-20 00:00:00
3 Kaushik 1500 2009-10-08 00:00:00
Left Join: 4 Chaitali 2060 2008-05-20 00:00:00
SQL> SELECT id, name, amount, date FROM customers 5 Hardik NULL NULL
LEFT JOIN orders 6 Komal NULL NULL
ON customers.id = orders.customer_id; 7 Muffy NULL NULL
This would produce the following result − 3 Kaushik 3000 2009-10-08-00:00:00
3 Kaushik 1500 2009-10-08 00:00:00
ID NAME AMOUNT DATE
2 Khilan 1560 2009-11-20 00:00:00
1 Ramesh NULL NULL
4 Chaitali 2060 2008-05-20 00:00:00
2 Khilan 1560 2009-11-20 00:00:00
3 Kaushik 3000 2009-10-08-00:00:00
Self Join:
3 Kaushik 1500 2009-10-08 00:00:00
SQL> SELECT a.id, b.name, a.salary FROM customers a, customers b
4 Chaitali 2060 2008-05-20 00:00:00 WHERE a.salary<b.salary
5 Hardik NULL NULL This would produce the following result −
6 Komal NULL NULL
7 Muffy NULL NULL ID NAME SALARY
2 Ramesh 1500.00
Right Join: 2 Kaushik 1500.00
SQL> SELECT id, name, amount, date FROM customers 1 Chaitail 2000.00
RIGHT JOIN orders 2 Chaitail 1500.00
ON customers.id = orders.customer_id;
3 Chaitail 2000.00
6 Chaitail 4500.00

2 2
www.Jntufastupdates.com
6 26 www.Jntufastupdates.com
7 27
Database Management Systems Prof. B. Satyanarayana Reddy Database Management Systems Prof. B. Satyanarayana Reddy

1 Hardik 2000.00 7.No SELECT DISTINCT clause


8.Any columns excluded from the VIEW must be NULL-able or have a DEFAULT in the base table,
2 Hardik 1500.00 so that a whole row can be constructed for insertion By implication, the VIEW must also contain a
3 Hardik 2000.00 key of the table.
4 Hardik 6500.00
In short, we are absolutely sure that each row in the VIEW maps back to one and only one row in the
6 Hardik 4500.00 base table. Some updating is handled by the CASCADE option in the referential integrity constraints on the
1 Komal 2000.00 base tables, not by the VIEW declaration.
2 Komal 1500.00 The definition of updatability in Standard SQL is actually fairly limited, but very safe. The database
system could look at information it has in the referential integrity constraints to widen the set of allowed
3 Komal 2000.00 updatable VIEWs. You will find that some implementations are now doing just that, but it is not common yet.
1 Muffy 2000.00 The SQL Standard definition of an updatable VIEW is actually a subset of the possible updatable
2 Muffy 1500.00 VIEWs, and a very small subset at that. The major advantage of this definition is that it is based on syntax and
not semantics.
3 Muffy 2000.00
4 Muffy 6500.00 Examples of Updatable and Non-updatable View.
5 Muffy 8500.00 CREATE VIEW view_1 AS SELECT * FROM Table1 WHERE x IN (1,2);
-- updatable, has a key!
6 Muffy 4500.00

CREATE VIEW view_2 AS SELECT * FROM Table1 WHERE x = 1 UNION ALL SELECT * FROM Table1
VIEWS: WHERE x = 2;
A view is nothing more than a SQL statement that is stored in the database with an associated name. A -- not updatable!
view is actually a composition of a table in the form of a predefined SQL query. A view can contain all rows of
a table or select rows from a table. A view can be created from one or many tables which depends on the More about Views:
written SQL query to create a view.
A view takes up no storage space other than for the definition of the view in the data dictionary.
Views, which are a type of virtual tables allow users to do the following −
A view contains no data. All the data it shows comes from the base tables.
 Structure data in a way that users or classes of users find natural or intuitive.
A view can provide an additional level of table security by restricting access to a set of rows or columns of a
 Restrict access to the data in such a way that a user can see and (sometimes) modify exactly what they table.
need and no more.
A view hides implementation complexity. The user can select from the view with a simple SQL, unaware that
 Summarize data from various tables which can be used to generate reports. the view is based internally on a join between multiple tables.

A view lets you change the data you can access, applying operators, aggregation functions, filters etc. on the
2 Types of Views Updatable and Read-only -views base table.
Unlike base tables, VIEWs are either updatable or read-only, but not both. INSERT, UPDATE, and
DELETE operations are allowed on updatable VIEWs and base tables, subject to any other constraints. A view isolates applications from changes in definitions of base tables. Suppose a view uses two columns of
INSERT, UPDATE, and DELETE are not allowed on read-only VIEWs, but you can change their base tables, a base table, it makes no difference to the view if other columns are added, modified or removed from the base
as you would expect. An updatable VIEW is one that can have each of its rows associated with exactly one row table.
in an underlying base table.
To know about the views in your own schema, look up user_views.
When the VIEW is changed, the changes pass unambiguously through the VIEW to that underlying base
table. Updatable VIEWs in Standard SQL are defined only for queries that meet these criteria: The underlying SQL definition of the view can be read via select text from user_views for the view.
1.Built on only one table
2.No GROUP BY clause Oracle does not enforce constraints on views. Instead, views are subject to the constraints of their base tables.
3.No HAVING clause
4.No aggregate functions
5.No calculated columns
6.No UNION, INTERSECT, or EXCEPT

2 2
www.Jntufastupdates.com
8 28 www.Jntufastupdates.com
9 29
Database Management Systems Prof. B. Satyanarayana Reddy Database Management Systems Prof. B. Satyanarayana Reddy

Example:
Creating Views
SQL> CREATE VIEW CUSTOMERS_VIEW AS
Database views are created using the CREATE VIEW statement. Views can be created from a single
SELECT name, age FROM CUSTOMERS
table, multiple tables or another view. To create a view, a user must have the appropriate system privilege
WHERE age IS NOT NULL WITH CHECK OPTION;
according to the specific implementation.
The WITH CHECK OPTION in this case should deny the entry of any NULL values in the view's AGE
Syntax: column, because the view is defined by data that does not have a NULL value in the AGE column.
CREATE VIEW view_name AS
SELECT column1, column2..... FROM table_name WHERE [condition]; Updating a View

we can include multiple tables in your SELECT statement in a similar way as we use them in a normal SQL A view can be updated under certain conditions which are given below −
SELECT query.
 The SELECT clause may not contain the keyword DISTINCT.
Example:  The SELECT clause may not contain summary functions.
Consider the CUSTOMERS table having the following records −
 The SELECT clause may not contain set functions.
ID NAME AGE ADDRESS SALARY  The SELECT clause may not contain set operators.
1 Ramesh 32 Ahmadabad 2000.00  The SELECT clause may not contain an ORDER BY clause.
2 Khilan 25 Delhi 1500.00
3 Kaushik 23 Kota 2000.00  The FROM clause may not contain multiple tables.
4 Chaitali 25 Mumbai 6500.00  The WHERE clause may not contain subqueries.
5 Hardik 27 Bhopal 8500.00
 The query may not contain GROUP BY or HAVING.
6 Komal 22 MP 4500.00
7 Muffy 24 Indore 10000.00  Calculated columns may not be updated.
Following is an example to create a view from the CUSTOMERS table. This view would be used to have  All NOT NULL columns from the base table must be included in the view in order for the INSERT
customer name and age from the CUSTOMERS table. query to function.

SQL > CREATE VIEW CUSTOMERS_VIEW AS SELECT name, age FROM CUSTOMERS; So, if a view satisfies all the above-mentioned rules then you can update that view. The following code block
has an example to update the age of Ramesh.
Now, you can query CUSTOMERS_VIEW in a similar way as you query an actual table. Following is an
example for the same. SQL > UPDATE CUSTOMERS_VIEW SET AGE = 35 WHERE name = 'Ramesh';

SQL > SELECT * FROM CUSTOMERS_VIEW; This would ultimately update the base table CUSTOMERS and the same would reflect in the view itself. Now,
try to query the base table and the SELECT statement would produce the following result.
This would produce the following result.
ID NAME AGE ADDRESS SALARY
NAME AGE 1 Ramesh 35 Ahmadabad 2000.00
Ramesh 32 2 Khilan 25 Delhi 1500.00
Khilan 25 3 Kaushik 23 Kota 2000.00
Kaushik 23 4 Chaitali 25 Mumbai 6500.00
Chaitali 25 5 Hardik 27 Bhopal 8500.00
Hardik 27 6 Komal 22 MP 4500.00
Komal 22 7 Muffy 24 Indore 10000.00
Muffy 24
The With Check Option: Inserting Rows into a View
The WITH CHECK OPTION is a CREATE VIEW statement option. The purpose of the WITH Rows of data can be inserted into a view. The same rules that apply to the UPDATE command also
CHECK OPTION is to ensure that all UPDATE and INSERTs satisfy the condition(s) in the view definition. If apply to the INSERT command. Here, we cannot insert rows in the CUSTOMERS_VIEW because we have not
they do not satisfy the condition(s), the UPDATE or INSERT returns an error. included all the NOT NULL columns in this view, otherwise you can insert rows in a view in a similar way as
you insert them in a table.

3 3
www.Jntufastupdates.com
0 30 www.Jntufastupdates.com
1 31
Database Management Systems Prof. B. Satyanarayana Reddy Database Management Systems Prof. B. Satyanarayana Reddy

Deleting Rows from a View UNION: Combines the result of 2 select statements into one result set, and then estimates any duplicate rows
Rows of data can be deleted from a view. The same rules that apply to the UPDATE and INSERT commands from that result set.
apply to the DELETE command.

Following is an example to delete a record having AGE = 22. UNION ALL: Combines the result of 2 SELECT statements into one result set including the duplicates.

SQL > DELETE FROM CUSTOMERS_VIEW WHERE age = 22; INTERSECT: Returns only the rows that are returned by each of two SELECT statements.

This would ultimately delete a row from the base table CUSTOMERS and the same would reflect in the view MINUS: Takes the result set of each SELECT statement, and removes those rows that are also recommended
itself. Now, try to query the base table and the SELECT statement would produce the following result. by a second SELECT statement.
ID NAME AGE ADDRESS SALARY
Point Of Concentration:
1 Ramesh 35 Ahmadabad 2000.00
2 Khilan 25 Delhi 1500.00  The queries are all executed independently but their output is merged.
3 Kaushik 23 Kota 2000.00  Only final queries ends with a semicolon(;).
4 Chaitali 25 Mumbai 6500.00
Rules And Restrictions:
5 Hardik 27 Bhopal 8500.00
 The result set of both the queries must have same number of columns.
7 Muffy 24 Indore 10000.00
 The datatype of each column in the second result set must match the datatype of the corresponding column
in the first result set.
Dropping Views
Obviously, where you have a view, you need a way to drop the view if it is no longer needed.  The 2 SELECT statements may not contain an ORDER BY clause. The final result of the entire set
operations can be ordered.
Syntax:  The columns used for ordering must be defined through the column number.
DROP VIEW view_name;
Examples:
Example: Display the employees who work in departments 10 and 30with out duplicates.
SQL> DROP VIEW CUSTOMERS_VIEW;
SQL> SELECT empno, ename from emp where deptno=10
UNION
SELECT empno, ename from emp where deptno=30;
SET OPERATIONS
 These operators are used to combine information of similar datatype from one or more than one table.
 Datatype of the corresponding columns in all the select statement should be same. Display the employees who work in departments 10 and 30.
 Different types of set commands are SQL> SELECT empo,ename from emp where deptno=10
 UNION UNION ALL
 UNION ALL SELECT empno, ename from emp where deptno=30 ;
 INTERSECT
Display the employees who work in both the departments with deptno 10 and 30.
 MINUS
 Set operators are combine 2 or more queries into one result . SQL> SELECT empno, ename from emp where deptno=10
 The result of each SELECT statement can be treated as a set and SQL set operators can be applied on INTERSECT
those sets to arrive at a final result. SELECT empno, ename from emp where deptno=30 ;
 SQL statements containing set operators are referred to as compound queries, and each SELECT
statements in a command query in referred to as a compound query.
Display the employees whose row number is less than 7 but not less than 6.
 Set operations are often called vertical joins, as a result combines data from 2 or more SELECT based on
columns instead of rows. SQL> SELECT rownum , ename from emp where rownum<7
MINUS
Syntax: SELECT rownum , ename from emp where rownum<6;
<compound query>
{ UNION | UNION ALL | MINUS | INTERSECT }
<compound query>

3 3
www.Jntufastupdates.com
2 32 www.Jntufastupdates.com
3 33
Database Management Systems Prof. B. Satyanarayana Reddy Database Management Systems Prof. B. Satyanarayana Reddy

RELATIONAL OPERATIONS UNION: Builds a relation consisting of all rows appearing in either or both of the two relations. For example,
consider two relations, A and B, consisting of rows:
Given this simple and restricted data structure, it is possible to define some very powerful relational A: a B: a => A union B: a
operators which, from the users' point of view, act in parallel' on all entries in a table simultaneously, although b e b
their implementation may require conventional processing. c c
Codd originally defined eight relational operators. e
1. SELECT originally called RESTRICT INTERSECT: Builds a relation consisting of all rows appearing in both of the two relations.
2. PROJECT For example, consider two relations, A and B, consisting of rows:
3. JOIN A: a B: a => A intersect B: a
4. PRODUCT b e
5. UNION c
6. INTERSECT
7. DIFFERENCE DIFFERENCE: Builds a relation consisting of all rows appearing in the first and not in the second of the two
8. DIVIDE relations. For example, consider two relations, A and B, consisting of rows:
The most important of these are (1), (2), (3) and (8), which, together with some other aggregate A: a B: a => A - B: b and B - A: e
functions, are powerful enough to answer a wide range of queries. The eight operators will be described as b e c
general procedures - i.e. not in the syntax of SQL or any other relational language. The important point is that c
they define the result required rather than the detailed process of obtaining it - what but not how.
DIVIDE: Takes two relations, one binary and one unary, and builds a relation consisting of all values of one
column of the binary relation that match, in the other column, all values in the unary relation.
SELECT: RESTRICTS the rows chosen from a table to those entries with specified attribute values. A: a x B: x => A divide B: a
SELECT item FROM stock_level WHERE quantity > 100 a y y
a z
constructs a new, logical table - an unnamed relation - with one column per row (i.e. item) containing all rows
b x
from stock_level that satisfy the WHERE clause.
c y

Of the relational operators 3.2.4. to 3.2.8.defined by Codd, the most important is DIVISION. For
PROJECT: Selects rows made up of a sub-set of columns from a table.
example, suppose table A contains a list of suppliers and commodities, table B a list of all commodities
PROJECT stock_item OVER item AND description bought by a company. Dividing A by B produces a table listing suppliers who sell all commodities
produces a new logical table where each row contains only two columns - item and description. The
new table will only contain distinct rows from stock_item; i.e. any duplicate rows so formed will be
eliminated.

JOIN: Associates entries from two tables on the basis of matching column values.
JOIN stock_item WITH stock_level OVER item
It is not necessary for there to be a one-to-one relationship between entries in two tables to be joined -
entries which do not match anything will be eliminated from the result, and entries from one table which
match several entries in the other will be duplicated the required number of times.

PRODUCT: Builds a relation from two specified relations consisting of all possible combinations of rows,
one from each of the two relations. For example, consider two relations, A and B, consisting of rows:
A: a B: d => A product B: a d
b e a e
c b d
b e
c d
c e

3 3
www.Jntufastupdates.com
4 34 www.Jntufastupdates.com
5 35
Database Management Systems Prof. B. Satyanarayana Reddy Database Management Systems Prof. B. Satyanarayana Reddy

Unit – IV CONCEPT OF FUNCTIONAL DEPENDENCY


SCHEMA REFINEMENT (NORMALIZATION) : Purpose of Normalization or schema refinement, concept Functional dependency (FD) is a set of constraints between two attributes in a relation. Functional dependency
of functional dependency, normal forms based on functional dependency(1NF, 2NF and 3 NF), concept of says that if two tuples have same values for attributes A1, A2,..., An, then those two tuples must have to have
surrogate key, Boyce-codd normal form(BCNF), Lossless join and dependency preserving decomposition, same values for attributes B1, B2, ..., Bn.
Fourth normal form(4NF).
Functional dependency is represented by an arrow sign (→) that is, X→Y, where X functionally determines Y.
The left-hand side attributes determine the values of attributes on the right-hand side.
PURPOSE OF NORMALIZATION OR SCHEMA REFINEMENT
ARMSTRONG'S AXIOMS
Database Normalization is a technique of organizing the data in the database. Normalization is a
systematic approach of decomposing tables to eliminate data redundancy and undesirable characteristics like If F is a set of functional dependencies then the closure of F, denoted as F+, is the set of all functional
Insertion, Update and Deletion Anomalies. It is a multi-step process that puts data into tabular form by dependencies logically implied by F. Armstrong's Axioms are a set of rules, that when applied repeatedly,
removing duplicated data from the relation tables. If a database design is not perfect, it may contain anomalies, generates a closure of functional dependencies.
which are like a bad dream for any database administrator. Managing a database with anomalies is next to  Reflexive rule − If alpha is a set of attributes and beta is_subset_of alpha, then alpha holds beta.
impossible.
 Augmentation rule − If a → b holds and y is attribute set, then ay → by also holds. That is adding
Normalization is used for mainly two purpose, attributes in dependencies, does not change the basic dependencies.
 Eliminating redundant (useless) data.
 Ensuring data dependencies make sense i.e., data is logically stored.  Transitivity rule − Same as transitive rule in algebra, if a → b holds and b → c holds, then a → c also
holds. a → b is called as a functionally that determines b.

Problem Without Normalization TRIVIAL FUNCTIONAL DEPENDENCY


Without Normalization, it becomes difficult to handle and update the database, without facing data loss.  Trivial − If a functional dependency (FD) X → Y holds, where Y is a subset of X, then it is called a
Insertion, Updation and Deletion Anamolies are very frequent if Database is not Normalized. To understand trivial FD. Trivial FDs always hold.
these anomalies let us take an example of Student table.  Non-trivial − If an FD X → Y holds, where Y is not a subset of X, then it is called a non-trivial FD.
S_id S_name S_Address Subject_opted  Completely non-trivial − If an FD X → Y holds, where x intersect Y = Φ, it is said to be a completely
401 Kesava Noida JAVA non-trivial FD.
402 Rama Panipat DBMS PROPERTIES OF FUNCTIONAL DEPENDENCIES:
403 Krishna Jammu DBMS 1. Reflexive: If Y⊆X then X → Y is a Reflexive Functional Dependency.
404 Kesava Noida Data Mining Ex: AB→A , A⊆AB holds. Therefore AB→A is a Reflexive Functional Dependency.

 Updation Anamoly : If data items are scattered and are not linked to each other properly, then it could
lead to strange situations. For example, when we try to update one data item having its copies scattered 2. Augmentation: If X→Y is a functional dependency then by augmentation, XZ→YZ is also a functional
over several places, a few instances get updated properly while a few others are left with old values. dependency.
Such instances leave the database in an inconsistent state.

Example: To update address of a student who occurs twice or more than twice in a table, we will have 3.Transitivity: IF X→Y and Y→Z are two functional dependencies then by transitivity, X→Z is also a
to update S_Address column in all the rows, else data will become inconsistent. functional dependency.

 Insertion Anamoly : We tried to insert data in a record that does not exist at all.
4. Union: If X→Y and X→Z are two functional dependencies then, X→YZ is also a functional dependency.
Example: Suppose for a new admission, we have a Student id(S_id), name and address of a student but
if student has not opted for any subjects yet then we have to insert NULL there, leading to Insertion 5. Decomposition: If X→YZ is a functional dependency then X→Y and X→Z are also functional
Anamoly. dependencies.
 Deletion Anamoly : We tried to delete a record, but parts of it was left undeleted because of
unawareness, the data is also saved somewhere else.

Example: If (S_id) 402 has only one subject and temporarily he drops it, when we delete that row,
entire student record will be deleted along with it.

1 2
www.Jntufastupdates.com 1 www.Jntufastupdates.com 2
Database Management Systems Prof. B. Satyanarayana Reddy Database Management Systems Prof. B. Satyanarayana Reddy

CLOSURE SET OF A FUNCTIONAL DEPENDENCY (F+) SECOND NORMAL FORM (2NF):


It is a set of all functional dependencies that can be determined using the given set of dependencies. It is It depends on the concept of Full Functional Dependences(FFD) and disallows Partial Functional
denoted by F+. Dependencies(PFD) i.e., for a relation that has concatenated primary key, each attribute in the table that is not
Attribute Closure (X+): It is a set of all the attributes that can be determined using X. It is denoted by X+, part of the primary key must depend upon the entire concatenated key for its existence. If any attribute depends
where X is any set of attributes. only on one part of the concatenated key, then the relation fails Second normal form.
Example: A relation with Partial Functional dependencies can be made as in 2NF by removing them.
R(A,B,C) F:{A→B , B→C} PFD: part of key → Non Key
+
A ={A,B,C} B ={B,C}+ +
C ={C} FFD: Key → Non key
AB+={A,B,C} AC+={A,C,B} BC+={B,C} ABC+={A,B,C} Example:
Medication Dependencies
Identifying keys in the given relation based on Functional Dependencies associated with it
Patient No. Drug No. of units Pname
X+ is a set of attributes that can be determined using the given set X of attributes.
P1 D1 10 Kiran Patient No. → Pname
 If X+ contains all the attributes of a relation, then X is called "Super key" of that relation.
 If X+ is minimal set, then X is called "Candidate Key" of that relation. P1 D2 20 Kiran
If no closure contains all the elements then in such a case we can find independent attributes of that P1 D3 15 Kiran Patient No.,Drug → No. of units
relation i.e., the attributes that which are not in the R.H.S. of any dependency. If the closure of the Independent P2 D4 15 Raj
attributes contains all the elements then it can be treated as a candidate key.
If the closure of independent attributes also doesn't contain all the elements then we try to find the key The above relation is in 1NF but not in 2NF. Key for the relation is Patient No. and Drug. After
by adding dependent attributes one by one. If we couldn't find key then we can add groups of dependent eliminating the Partial functional dependencies, the decomposed relations are:
attributes till we find a key to that relation.
Patient Parent Relation Dependencies
Patient No. Pname
FIRST NORMAL FORM (1NF)
P1 Kiran Key: Patient No. Patient No. → Pname
1NF is designed to disallow multi valued attributes, composite attributes and their combinations. Means
1NF allows only atomic values i.e., the attribute of any tuple must be either 1 or NULL value. P2 Raj
A relation having multi valued and composite attributes is known as Un Normalized Relation. Removal
of These multi valued and composite attributes will turn the UN Normalized Relation to 1NF Relation. Medication Child Relation Dependencies
Example: Patient No. Drug No. of units
Professor Key: Patient No. and Drug Patient No.,Drug → No. of
Un Normalized Relation Since P1 D1 10
ID Name Salary salary is a Multi valued units
P1 D2 20 Patient No. is Foreign key
attribute referring Patient No. in the
1 Rama {40000,10000,15000,10000}
P1 D3 15 Parent Relation (Patient)
We can eliminate this multi valued attribute by splitting the salary column to more specific columns like
P2 D4 15
Basic, TA, DA, HRA. The Above relation in 1NF is as follows.
Professor The above 2 relations satisfy 2NF. They don't have partial functional dependencies.
ID Name Basic TA DA HRA 1NF Note: If key is only one attribute then the relation is always in 2NF.
1 Rama 40000 10000 15000 10000

Every Relation in the Relation Database must be in 1NF.

3 4
www.Jntufastupdates.com 3 www.Jntufastupdates.com 4
Database Management Systems Prof. B. Satyanarayana Reddy Database Management Systems Prof. B. Satyanarayana Reddy

THIRD NORMAL FORM (3NF): BOYCE - CODD NORMAL FORM(BCNF):


Third Normal form applies that every non-prime attribute of a relation must be dependent on primary BCNF is a higher version of the Third Normal form. This form deals with certain type of anamoly that is not
key, or we can say that, there should not be the case that a non-prime attribute is determined by another non- handled by 3NF. A 3NF relation which does not have multiple overlapping candidate keys is said to be in
prime attribute. So this transitive functional dependency should be removed from the relation and also the BCNF. For a relation to be in BCNF, following conditions must be satisfied:
relation must be in Second Normal form. Simply, It is based on the concept of transitive dependencies. 3NF
disallows the transitive dependencies.  R must be in 3rd Normal Form
Dependency: Key → Non Key  and, for each functional dependency ( X -> Y ), X should be a super Key.
Transitive Dependency: Non Key → Non Key (OR)
A relation satisfying 2NF and with Transitive Functional dependencies can be made as in 3NF by removing the A relationship is said to be in BCNF if it is already in 3NF and the left hand side of every dependency is a
transitive functional dependencies. candidate key. A relation which is in 3NF is almost always in BCNF. These could be same situation when a
3NF relation may not be in BCNF the following conditions are found true.
Example:
1. The candidate keys are composite.
Contains Key
2. There are more than one candidate keys in the relation.
Patient No. Pname Ward No. Ward Name Patient No. and Pname
3. There are some common attributes in the relation.
P1 Kumar W1 ICU Dependencies
P2 Kiran W1 ICU Patient No.,Pname → Ward No. Professor Code Department Head of Dept. Percent Time
P3 Kamal W1 ICU Ward No. → Ward Name (Transitive P1 Physics Ghosh 50
Dependency)
P4 Sharath W2 General P1 Mathematics Krishnan 50
P2 Chemistry Rao 25
Ward No. → Ward Name is transitive because Ward No. is not a key. To make this relation satisfy 3NF P2 Physics Ghosh 75
transitive dependency must be removed from it. It can be done by decomposing the relation. The decomposed P3 Mathematics Krishnan 100
relations are as follows:
Ward Key Dependencies Consider, as an example, the above relation. It is assumed that:
Ward No. Ward Name 1. A professor can work in more than one department
W1 ICU Ward No. Ward No. → Ward Name 2. The percentage of the time he spends in each department is given.
W2 General 3. Each department has only one Head of Department.
Dependencies of the above relation are:
Contains Primary Key Dependencies Department,Professor Code → Head of the Depatrment
Patient No. Pname Ward No. Patient No. and Pname Department,Professor Code → Percent time
P1 Kumar W1 Patient No.,Pname → Ward No. Department → Head of the Depatrment

P2 Kiran W1 Foreign Key Head of the Department,Professor Code → Depatrment

P3 Kamal W1 Word No. refers Ward Head of the Department,Professor Code → Percent time
No. in relation Ward
P4 Sharath W2
The given relation is in 3NF. Observe, however, that the names of Dept. and Head of Dept. are
duplicated. Further, if Professor P2 resigns, rows 3 and 4 are deleted. We lose the information that Rao is the
The above two relations satisfy 3NF. They don't have transitive dependencies. Head of Department of Chemistry.
Note 1: A relation R is said to be in 3NF if whenever a non trivial functional dependency of the form X → A The normalization of the relation is done by creating a new relation for Dept. and Head of Dept. and deleting
holds then either X is a super key or A is a prime attribute. Head of Dept. form the given relation. The normalized relations are shown in the following.
Note 2: If all attributes are prime attributes then the relation is in 3NFbecause with such attributes no partial
functional dependencies and transitive dependencies exists.

5 6
www.Jntufastupdates.com 5 www.Jntufastupdates.com 6
Database Management Systems Prof. B. Satyanarayana Reddy Database Management Systems Prof. B. Satyanarayana Reddy

Professor_work Dependencies 2. The information about item I1 is stored twice for vendor V3.
Professor Code Department Percent Time Department,Professor Code → Percent time Observe that the relation given is in 3NF and also in BCNF. It still has the problem mentioned above. The
problem is reduced by expressing this relation as two relations in the Fourth Normal Form (4NF). A
P1 Physics 50
relation is in 4NF if it has no more than one independent multi valued dependency or one independent
P1 Mathematics 50 Department is foreign key referring multi valued dependency with a functional dependency.
Department in the Deartment_Details relation
P2 Chemistry 25 The table can be expressed as the two 4NF relations given as following. The fact that vendors are capable
of supplying certain items and that they are assigned to supply for some projects in independently specified
P2 Physics 75
in the 4NF relation.
P3 Mathematics 100
Vendor_Supply Vendor_Project
Depatrment_Details Dependencies Vendor Code Item Code Vendor Code Project No.
Department Head of Dept. Department → Head of the Department V1 I1 V1 P1
Physics Ghosh V1 I2 V1 P3
Mathematics Krishnan V2 I2 V2 P1
Chemistry Rao V2 I3 V3 P2
V3 I1 .
FOURTH NORMAL FORM (4NF) SUMMARY
When attributes in a relation have multi-valued dependency, further Normalization to 4NF and 5NF are
Input Transformation Output
required. A multi-valued dependency is a typical kind of dependency in which each and every attribute within Relation Relation
a relation depends upon the other, yet none of them is a unique primary key. Consider a vendor supplying many
items to many projects in an organization. The following are the assumptions: All Relations Eliminate variable length record. Remove multi-attribute lines in table. 1NF
1. A vendor is capable of supplying many items. 1NF Remove dependency of non-key attributes on part of a multi-attribute key. 2NF
2. A project uses many items. 2NF Remove dependency of non-key attributes on other non-key attributes. 3NF
3. A vendor supplies to many projects. 3NF Remove dependency of an attribute of a multi attribute key on an attribute of BCNF
another (overlapping) multi-attribute key.
4. An item may be supplied by many vendors.
A multi valued dependency exists here because all the attributes depend upon the other and yet none of them is BCNF Remove more than one independent multi-valued dependency from relation 4NF
a primary key having unique value. by splitting relation.
4NF Add one relation relating attributes with multi-valued dependency. 5NF
Vendor Code Item Code Project No.
V1 I1 P1
PROPERTIES OF DECOMPOSITION:
V1 I2 P1
Every Decomposition must satisfy 2 properties.
V1 I1 P3
1. Lossless join
V1 I2 P3 2. Dependency Preserving
V2 I2 P1 1. Lossless join:
V2 I3 P1 If we decompose a relation R into relations R1 and R2,
V3 I1 P2  Decomposition is lossy if R1 ⋈ R2 ⊃ R
V3 I1 P3  Decomposition is lossless if R1 ⋈ R2 = R
The given relation has a number of problems. For example: To check for lossless join decomposition using FD set, following conditions must hold:
1. If vendor V1 has to supply to project P2, but the item is not yet decided, then a row with a blank for 1. Union of Attributes of R1 and R2 must be equal to attribute of R. Each attribute of R must be either in R1
item code has to be introduced. or in R2. Att(R1) U Att(R2)=Att(R)

7 8
www.Jntufastupdates.com 7 www.Jntufastupdates.com 8
.
Database Management Systems Prof. B. Satyanarayana Reddy UNIT-V
TRANSACTION MANAGEMENT
2. Intersection of Attributes of R1 and R2 must not be NULL. Att(R1) ∩ Att(R2) ≠ Φ
3. Common attribute must be a key for at least one relation (R1 or R2) Att(R1) ∩ Att(R2) -> Att(R1)
or Att(R1) ∩ Att(R2) -> Att(R2) What is a Transaction?
For Example, A relation R (A, B, C, D) with FD set {A->BC} is decomposed into R1(ABC) and R2(AD) which A transaction is an event which occurs on the database. Generally a transaction reads a value from
is a lossless join decomposition as: the database or writes a value to the database. If you have any concept of Operating Systems, then
1. First condition holds true as Att(R1) U Att(R2) = (ABC) U (AD) = (ABCD) = Att(R). we can say that a transaction is analogous to processes.
2. Second condition holds true as Att(R1) ∩ Att(R2) = (ABC) ∩ (AD) ≠ Φ
3. Third condition holds true as Att(R1) ∩ Att(R2) = A is a key of R1(ABC) because A->BC is given.
Although a transaction can both read and write on the database, there are some fundamental

Dependency Preserving Decomposition differences between these two classes of operations. A read operation does not change the image of
the database in any way. But a write operation, whether performed with the intention of inserting,
If we decompose a relation R into relations R1 and R2, All dependencies of R either must be a part of
R1 or R2 or must be derivable from combination of FD’s of R1 and R2. updating or deleting data from the database, changes the image of the database. That is, we may say
For Example, A relation R (A, B, C, D) with FD set{A->BC} is decomposed into R1(ABC) and R2(AD) that these transactions bring the database from an image which existed before the transaction
which is dependency preserving because FD A->BC is a part of R1(ABC).
occurred (called the Before Image or BFIM) to an image which exists after the transaction occurred
Decomposition of a relation is done when a relation in relational model is not in appropriate normal form.
Relation R is decomposed into two or more relations if decomposition is lossless join as well as dependency (called the After Image or AFIM).
preserving.

The Four Properties of Transactions


CONCEPT OF SURROGATE KEY: Every transaction, for whatever purpose it is being used, has the following four properties. Taking
In database design, it is a good practice to have a primary key for each table. There are two ways to specify a
the initial letters of these four properties we collectively call them the ACID Properties. Here we try
primary key: The first is to use part of the data as the primary key. For example, a table that includes
information on employees may use Social Security Number as the primary key. This type of key is called a to describe them and explain them.
natural key. The second is to use a new field with artificially-generated values whose sole purpose is to be used
as a primary key. This is called a surrogate key.
A surrogate key has the following characteristics: Atomicity: This means that either all of the instructions within the transaction will be reflected in the
1) It is typically an integer. database, or none of them will be reflected.
2) It has no meaning. You will not be able to know the meaning of that row of data based on the surrogate
key value.
Say for example, we have two accounts A and B, each containing Rs 1000/-. We now start a
3) It is not visible to end users. End users should not see a surrogate key in a report.
transaction to deposit Rs 100/- from account A to Account B.
Surrogate keys can be generated in a variety of ways, and most databases offer ways to generate
surrogate keys. For ex, Oracle uses SEQUENCE , MySQL uses AUTO_INCREMENT , and SQL Server uses Read A;
IDENTITY. A = A – 100;
Surrogate keys are often used in data warehousing systems, as the high data volume in a data warehouse Write A;
means that optimizing query speed becomes important. Using a surrogate key is advantageous because it is
quicker to join on a numeric field rather than a non-numeric field. This does come at a price — when you insert Read B;
data into a table, whether via an ETL Process or via an "IINSERT INTO" statement, the system needs to take B = B + 100;
more resources to generate the surrogate key.
Write B;
There are no hard rules on when to employ a surrogate key as opposed to using the natural key. Often
the data architect would need to look at the nature of the data being modeled and stored and consider any
possible performance implications.

9
www.Jntufastupdates.com 9
DATABASE MANAGEMENT SYSTEM Page 63
www.Jntufastupdates.com 1
Fine, is not it? The transaction has 6 instructions to extract the amount from A and submit it to B. There are several ways to achieve this and the most popular one is using some kind of locking
The AFIM will show Rs 900/- in A and Rs 1100/- in B. mechanism. Again, if you have the concept of Operating Systems, then you should remember the
semaphores, how it is used by a process to make a resource busy before starting to use it, and how it
Now, suppose there is a power failure just after instruction 3 (Write A) has been complete. What is used to release the resource after the usage is over. Other processes intending to access that same
happens now? After the system recovers the AFIM will show Rs 900/- in A, but the same Rs 1000/- resource must wait during this time. Locking is almost similar. It states that a transaction must first
in B. It would be said that Rs 100/- evaporated in thin air for the power failure. Clearly such a lock the data item that it wishes to access, and release the lock when the accessing is no longer
situation is not acceptable. required. Once a transaction locks the data item, other transactions wishing to access the same data
item must wait until the lock is released.
The solution is to keep every value calculated by the instruction of the transaction not in any stable
storage (hard disc) but in a volatile storage (RAM), until the transaction completes its last instruction. Durability: It states that once a transaction has been complete the changes it has made should be
When we see that there has not been any error we do something known as a COMMIT operation. Its permanent.
job is to write every temporarily calculated value from the volatile storage on to the stable storage. In
this way, even if power fails at instruction 3, the post recovery image of the database will show As we have seen in the explanation of the Atomicity property, the transaction, if completes
accounts A and B both containing Rs 1000/-, as if the failed transaction had never occurred. successfully, is committed. Once the COMMIT is done, the changes which the transaction has made
to the database are immediately written into permanent storage. So, after the transaction has been
committed successfully, there is no question of any loss of information even if the power fails.
Consistency: If we execute a particular transaction in isolation or together with other transaction, Committing a transaction guarantees that the AFIM has been reached.
(i.e. presumably in a multi-programming environment), the transaction will yield the same expected
result. There are several ways Atomicity and Durability can be implemented. One of them is called Shadow
Copy. In this scheme a database pointer is used to point to the BFIM of the database. During the
To give better performance, every database management system supports the execution of multiple transaction, all the temporary changes are recorded into a Shadow Copy, which is an exact copy of
transactions at the same time, using CPU Time Sharing. Concurrently executing transactions may the original database plus the changes made by the transaction, which is the AFIM. Now, if the
have to deal with the problem of sharable resources, i.e. resources that multiple transactions are transaction is required to COMMIT, then the database pointer is updated to point to the AFIM copy,
trying to read/write at the same time. For example, we may have a table or a record on which two and the BFIM copy is discarded. On the other hand, if the transaction is not committed, then the
transaction are trying to read or write at the same time. Careful mechanisms are created in order to database pointer is not updated. It keeps pointing to the BFIM, and the AFIM is discarded. This is a
prevent mismanagement of these sharable resources, so that there should not be any change in the simple scheme, but takes a lot of memory space and time to implement.
way a transaction performs. A transaction which deposits Rs 100/- to account A must deposit the
same amount whether it is acting alone or in conjunction with another transaction that may be trying If you study carefully, you can understand that Atomicity and Durability is essentially the same
to deposit or withdraw some amount at the same time. thing, just as Consistency and Isolation is essentially the same thing.

Isolation: In case multiple transactions are executing concurrently and trying to access a sharable
resource at the same time, the system should create an ordering in their execution so that they should
not create any anomaly in the value stored at the sharable resource.

DATABASE MANAGEMENT SYSTEM Page 64 DATABASE MANAGEMENT SYSTEM Page 65


www.Jntufastupdates.com 2 www.Jntufastupdates.com 3
Transaction States
There are the following six states in which a transaction may exist: Concurrent Execution
Active: The initial state when the transaction has just started execution. A schedule is a collection of many transactions which is implemented as a unit. Depending upon
how these transactions are arranged in within a schedule, a schedule can be of two types:
x Serial: The transactions are executed one after another, in a non-preemptive manner.
Partially Committed: At any given point of time if the transaction is executing properly,
x Concurrent: The transactions are executed in a preemptive, time shared method.
then it is going towards it COMMIT POINT. The values generated during the execution are
all stored in volatile storage.
In Serial schedule, there is no question of sharing a single data item among many transactions,
because not more than a single transaction is executing at any point of time. However, a serial
Failed: If the transaction fails for some reason. The temporary values are no longer required,
schedule is inefficient in the sense that the transactions suffer for having a longer waiting time and
and the transaction is set to ROLLBACK. It means that any change made to the database by
response time, as well as low amount of resource utilization.
this transaction up to the point of the failure must be undone. If the failed transaction has
withdrawn Rs. 100/- from account A, then the ROLLBACK operation should add Rs 100/- to
In concurrent schedule, CPU time is shared among two or more transactions in order to run them
account A.
concurrently. However, this creates the possibility that more than one transaction may need to access
a single data item for read/write purpose and the database could contain inconsistent value if such
Aborted: When the ROLLBACK operation is over, the database reaches the BFIM. The
accesses are not handled properly. Let us explain with the help of an example.
transaction is now said to have been aborted.

Let us consider there are two transactions T1 and T2, whose instruction sets are given as following.
Committed: If no failure occurs then the transaction reaches the COMMIT POINT. All the
T1 is the same as we have seen earlier, while T2 is a new transaction.
temporary values are written to the stable storage and the transaction is said to have been
committed.
T1
Read A;
Terminated: Either committed or aborted, the transaction finally reaches this state.
A = A – 100;
Write A;
The whole process can be described using the following diagram:
Read B;
B = B + 100;
Write B;
PARTIALLY COMMITTED
COMMITTED
Entry Point T2
ACTIVE Read A;

TERMINATE Temp = A * 0.1;


D Read C;
C = C + Temp;
FAILED ABORTED
Write C;

DATABASE MANAGEMENT SYSTEM Page 66 DATABASE MANAGEMENT SYSTEM Page 67


www.Jntufastupdates.com 4 www.Jntufastupdates.com 5
T2 is a new transaction which deposits to account C 10% of the amount in account A.

If we prepare a serial schedule, then either T1 will completely finish before T2 can begin, or T2 will
completely finish before T1 can begin. However, if we want to create a concurrent schedule, then
T1 T2
some Context Switching need to be made, so that some portion of T1 will be executed, then some
portion of T2 will be executed and so on. For example say we have prepared the following
Read A;
concurrent schedule.
A = A – 100;
Read A;
T1 T2
Temp = A * 0.1;
Read C;
Read A;
C = C + Temp;
A = A – 100; Write C;
Write A; Write A;
Read A; Read B;
Temp = A * 0.1; B = B + 100;
Read C; Write B;
C = C + Temp;
Write C; This schedule is wrong, because we have made the switching at the second instruction of T1. The
Read B; result is very confusing. If we consider accounts A and B both containing Rs 1000/- each, then
B = B + 100; the result of this schedule should have left Rs 900/- in A, Rs 1100/- in B and add Rs 90 in C (as
Write B; C should be increased by 10% of the amount in A). But in this wrong schedule, the Context
Switching is being performed before the new value of Rs 900/- has been updated in A. T2 reads
No problem here. We have made some Context Switching in this Schedule, the first one after the old value of A, which is still Rs 1000/-, and deposits Rs 100/- in C. C makes an unjust gain of
executing the third instruction of T1, and after executing the last statement of T2. T1 first deducts Rs Rs 10/- out of nowhere.
100/- from A and writes the new value of Rs 900/- into A. T2 reads the value of A, calculates the
value of Temp to be Rs 90/- and adds the value to C. The remaining part of T1 is executed and Rs
100/- is added to B. Serializability
When several concurrent transactions are trying to access the same data item, the instructions
It is clear that a proper Context Switching is very important in order to maintain the Consistency and within these concurrent transactions must be ordered in some way so as there are no problem in
Isolation properties of the transactions. But let us take another example where a wrong Context accessing and releasing the shared data item. There are two aspects of serializability which are
Switching can bring about disaster. Consider the following example involving the same T1 and T2 described here:

DATABASE MANAGEMENT SYSTEMS Page 69

DATABASE MANAGEMENT SYSTEM Page 68


www.Jntufastupdates.com 6 www.Jntufastupdates.com 7
Conflict Serializability 3. If in S1, T1 performs the final write operation on that data item, then in S2 also, T1
Two instructions of two different transactions may want to access the same data item in order to should perform the final write operation on that data item.
perform a read/write operation. Conflict Serializability deals with detecting whether the
instructions are conflicting in any way, and specifying the order in which these two instructions Let us consider a schedule S in which there are two consecutive instructions, I and J , of
will be executed in case there is any conflict. A conflict arises if at least one (or both) of the transactions Ti and Tj , respectively (i _= j). If I and J refer to different data
instructions is a write operation. The following rules are important in Conflict Serializability: items, then we can swap I and J without affecting the results of any instruction
in the schedule. However, if I and J refer to the same data item Q, then the order of the two steps
1. If two instructions of the two concurrent transactions are both for read operation, then may matter. Since we are dealing with only read and write instructions, there are four cases that
they are not in conflict, and can be allowed to take place in any order. we need to consider:
2. If one of the instructions wants to perform a read operation and the other instruction
I = read(Q), J = read(Q). The order of I and J does not matter, since the same value
wants to perform a write operation, then they are in conflict, hence their ordering is of Q is read by Ti and Tj , regardless of the order.
important. If the read instruction is performed first, then it reads the old value of the data
I = read(Q), J = write(Q). If I comes before J , then Ti does not read the value of Q that is
item and after the reading is over, the new value of the data item is written. It the write written by Tj in instruction J . If J comes before I, then Ti reads
instruction is performed first, then updates the data item with the new value and the read the value of Q that is written by Tj. Thus, the order of I and J matters.
instruction reads the newly updated value.
I = write(Q), J = read(Q). The order of I and J matters for reasons similar to those of the
3. If both the transactions are for write operation, then they are in conflict but can be previous case.
allowed to take place in any order, because the transaction do not read the value updated
by each other. However, the value that persists in the data item after the schedule is over 4. I = write(Q), J = write(Q). Since both instructions are write operations, the order of these
is the one written by the instruction that performed the last write. instructions does not affect either Ti or Tj . However, the value obtained by the next read(Q)
instruction of S is affected, since the result of only the latter of the two write instructions is
preserved in the database. If there is no other write(Q) instruction after I and J in S, then the
order of I and J directly affects the final value of Q in the database state that results from
View Serializability:
schedule S.
This is another type of serializability that can be derived by creating another schedule out of an
existing schedule, involving the same set of transactions. These two schedules would be called
View Serializable if the following rules are followed while creating the second schedule out of
the first. Let us consider that the transactions T1 and T2 are being serialized to create two
different schedules

S1 and S2 which we want to be View Equivalent and both T1 and T2 wants to access the same
data item.
1. If in S1, T1 reads the initial value of the data item, then in S2 also, T1 should read the
initial value of that same data item.
2. If in S1, T1 writes a value in the data item which is read by T2, then in S2 also, T1 should Fig: Schedule 3—showing only the read and write instructions.
write the value in the data item before T2 reads it.

DATABASE MANAGEMENT SYSTEMS Page 70 DATABASE MANAGEMENT SYSTEMS Page 71

www.Jntufastupdates.com 8 www.Jntufastupdates.com 9
Sailors records with rating=1, another transaction might add a new such Sailors record, which is
We say that I and J conflict if they are operations by different transactions on the same data missed by T.
item, and at least one of these instructions is a write operation. To illustrate the concept of
conflicting instructions, we consider schedule 3in Figure above. The write(A) instruction of T1
conflicts with the read(A) instruction of T2. However, the write(A) instruction of T2 does not A REPEATABLE READ transaction uses the same locking protocol as a SERIALIZABLE
conflict with the read(B) instruction of T1, because the two instructions access different data transaction, except that it does not do index locking, that is, it locks only individual objects, not
items. sets of objects.

Transaction Characteristics READ COMMITTED ensures that T reads only the changes made by committed transactions,
and that no value written by T is changed by any other transaction until T is complete. However,
Every transaction has three characteristics: access mode, diagnostics size, and isolation level. a value read by T may well be modified by another transaction while T is still in progress, and T
The diagnostics size determines the number of error conditions that can be recorded. is, of course, exposed to the phantom problem.

If the access mode is READ ONLY, the transaction is not allowed to modify the database. A READ COMMITTED transaction obtains exclusive locks before writing objects and holds
Thus, INSERT, DELETE, UPDATE, and CREATE commands cannot be executed. If we have these locks until the end. It also obtains shared locks before reading objects, but these locks are
to execute one of these commands, the access mode should be set to READ WRITE. For released immediately; their only effect is to guarantee that the transaction that last modified the
transactions with READ ONLY access mode, only shared locks need to be obtained, thereby object is complete. (This guarantee relies on the fact that every SQL transaction obtains
increasing concurrency. exclusive locks before writing objects and holds exclusive locks until the end.)

A READ UNCOMMITTED transaction does not obtain shared locks before reading objects.
The isolation level controls the extent to which a given transaction is exposed to the actions of This mode represents the greatest exposure to uncommitted changes of other transactions; so
other transactions executing concurrently. By choosing one of four possible isolation level much so that SQL prohibits such a transaction from making any changes itself - a READ
settings, a user can obtain greater concurrency at the cost of increasing the transaction's UNCOMMITTED transaction is required to have an access mode of READ ONLY. Since such a
exposure to other transactions' uncommitted changes. transaction obtains no locks for reading objects, and it is not allowed to write objects (and
therefore never requests exclusive locks), it never makes any lock requests.
Isolation level choices are READ UNCOMMITTED, READ COMMITTED, REPEATABLE The SERIALIZABLE isolation level is generally the safest and is recommended for most
READ, and SERIALIZABLE. The effect of these levels is summarized in Figure given below. transactions. Some transactions, however, can run with a lower isolation level, and the smaller
In this context, dirty read and unrepeatable read are defined as usual. Phantom is defined to be number of locks requested can contribute to improved system performance.
the possibility that a transaction retrieves a collection of objects (in SQL terms, a collection of For example, a statistical query that finds the average sailor age can be run at the READ
tuples) twice and sees different results, even though it does not modify any of these tuples itself. COMMITTED level, or even the READ UNCOMMITTED level, because a few incorrect or
missing values will not significantly affect the result if the number of sailors is large. The
In terms of a lock-based implementation, a SERIALIZABLE transaction obtains locks before isolation level and access mode can be set using the SET TRANSACTION command. For
reading or writing objects, including locks on sets of objects that it requires to be unchanged (see example, the following command declares the current transaction to be SERIALIZABLE and
Section 19.3.1), and holds them until the end, according to Strict 2PL. READ ONLY:
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE READONLY
REPEATABLE READ ensures that T reads only the changes made by committed transactions, When a transaction is started, the default is SERIALIZABLE and READ WRITE.
and that no value read or written by T is changed by any other transaction until T is complete.
However, T could experience the phantom phenomenon; for example, while T examines all PRECEDENCE GRAPH

DATABASE MANAGEMENT SYSTEMS Page 72 DATABASE MANAGEMENT SYSTEMS Page 73

www.Jntufastupdates.com 10 www.Jntufastupdates.com 11
A precedence graph, also named conflict graph and serializability graph, is used in the context of For each transaction Ti participating in schedule S, create a node labelled T i in
concurrency control in databases. the precedence graph. So the precedence graph contains T 1, T2, T3
For each case in S where Ti executes a write_item(X) then Tj executes a
The precedence graph for a schedule S contains: read_item(X), create an edge (Ti --> Tj) in the precedence graph. This occurs
nowhere in the above example, as there is no read after write.
3. For each case in S where Ti executes a read_item(X) then Tj executes a
A node for each committed transaction in S write_item(X), create an edge (Ti --> Tj) in the precedence graph. This results in
directed edge from T1 to T2.
An arc from Ti to Tj if an action of Ti precedes and conflicts with one of Tj's actions. 4. For each case in S where Ti executes a write_item(X) then Tj executes a
write_item(X), create an edge (Ti --> Tj) in the precedence graph. This results in
directed edges from T2 to T1, T1 to T3, and T2 to T3.
Precedence graph example 5. The schedule S is conflict serializable if the precedence graph has no cycles. As
T1 and T2 constitute a cycle, then we cannot declare S as serializable or not and
serializability has to be checked using other methods.

TESTING FOR CONFLICT SERIALIZABILITY


1 A schedule is conflict serializable if and only if its precedence graph is acyclic.

2 To test for conflict serializability, we need to construct the precedence graph and to
invoke a cycle-detection algorithm.Cycle-detection algorithms exist which take order
n2 time, where n is the number of vertices in the graph.

(Better algorithms take order n + e where e is the number of edges.)

3 If precedence graph is acyclic, the serializability order can be obtained by a


A precedence graph of the schedule D, with 3 transactions. As there is a cycle (of length 2; with topological sorting of the graph. That is, a linear order consistent with the partial
two edges) through the committed transactions T1 and T2, this schedule (history) is not Conflict order of the graph.
serializable. For example, a serializability order for the schedule (a) would be one of either (b) or
The drawing sequence for the precedence graph:- (c)

DATABASE MANAGEMENT SYSTEMS Page 74 DATABASE MANAGEMENT SYSTEMS Page 75

www.Jntufastupdates.com 12 www.Jntufastupdates.com 13
4 A serializability order of the transactions can be obtained by finding a linear order
Can lead to the undoing of a significant amount of work
consistent with the partial order of the precedence graph.

RECOVERABLE SCHEDULES
CASCADELESS SCHEDULES
Recoverable schedule — if a transaction Tj reads a data item previously written by a
transaction Ti , then the commit operation of Ti must appear before the commit operation of Tj. Cascadeless schedules — for each pair of transactions Ti and Tj such that Tj reads
The following schedule is not recoverable if T9 commits immediately after the read(A) a data item previously written by Ti, the commit operation of Ti appears before the
operation. read operation of Tj.

Every cascadeless schedule is also recoverable

It is desirable to restrict the schedules to those that are cascadeless

Example of a schedule that is NOT cascadeless

If T8 should abort, T9 would have read (and possibly shown to the user) an inconsistent
database state. Hence, database must ensure that schedules are recoverable.

CASCADING ROLLBACKS
Cascading rollback – a single transaction failure leads to a series of transaction rollbacks.
Consider the following schedule where none of the transactions has yet committed (so the
schedule is recoverable)

CONCURRENCY SCHEDULE
A database must provide a mechanism that will ensure that all possible schedules are
both:

Conflict serializable.

Recoverable and preferably cascadeless

A policy in which only one transaction can execute at a time generates serial schedules,
but provides a poor degree of concurrency

If T10 fails, T11 and T12 must also be rolled back.


DATABASE MANAGEMENT SYSTEMS Page 76 DATABASE MANAGEMENT SYSTEMS Page 77

www.Jntufastupdates.com 14 www.Jntufastupdates.com 15
Concurrency-control schemes tradeoff between the amount of concurrency they allow
and the amount of overhead that they incur In SQL, a transaction begins implicitly.

Testing a schedule for serializability after it has executed is a little too late! A transaction in SQL ends by:

Tests for serializability help us understand why a concurrency control protocol is Commit work commits current transaction and begins a new one.
correct
Rollback work causes current transaction to abort.
Goal – to develop concurrency control protocols that will assure serializability.
In almost all database systems, by default, every SQL statement also commits
implicitly if it executes successfully
WEEK LEVELS OF CONSISTENCY
Some applications are willing to live with weak levels of consistency, allowing Implicit commit can be turned off by a database directive
schedules that are not serializable
E.g. in JDBC, connection.setAutoCommit(false);
E.g., a read-only transaction that wants to get an approximate total balance of all
accounts RECOVERY SYSTEM
E.g., database statistics computed for query optimization can be approximate (why?)
Failure Classification:
Such transactions need not be serializable with respect to other transactions Transaction failure :

Tradeoff accuracy for performance Logical errors: transaction cannot complete due to some internal error condition
LEVELS OF CONSISTENCY IN SQL
Serializable — default System errors: the database system must terminate an active transaction due to an error
condition (e.g., deadlock)
Repeatable read — only committed records to be read, repeated reads of same record
must return same value. However, a transaction may not be serializable – it may find some System crash: a power failure or other hardware or software failure causes the system
records inserted by a transaction but not find others. to crash.

Read committed — only committed records can be read, but successive reads of record Fail-stop assumption: non-volatile storage contents are assumed to not be corrupted as
may return different (but committed) values. result of a system crash

Read uncommitted — even uncommitted records may be read. Database systems have numerous integrity checks to prevent corruption of disk data

Lower degrees of consistency useful for gathering approximate information about the Disk failure: a head crash or similar disk failure destroys all or part of disk storage
database
Destruction is assumed to be detectable: disk drives use checksums to detect failures
Warning: some database systems do not ensure serializable schedules by default
RECOVERY ALGORITHMS
E.g., Oracle and PostgreSQL by default support a level of consistency called snapshot
isolation (not part of the SQL standard)
TRANSACTION DEFINITION IN SQL Consider transaction Ti that transfers $50 from account A to account B

Data manipulation language must include a construct for specifying the set of actions Two updates: subtract 50 from A and add 50 to B
that comprise a transaction.

DATABASE MANAGEMENT SYSTEMS Page 78 DATABASE MANAGEMENT SYSTEMS Page 79

www.Jntufastupdates.com 16 www.Jntufastupdates.com 17
Transaction Ti requires updates to A and B to be output to the database.

A failure may occur after one of these modifications have been made but before both of Stable-Storage Implementation
them are made.
Maintain multiple copies of each block on separate disks
Modifying the database without ensuring that the transaction will commit may leave
the database in an inconsistent state copies can be at remote sites to protect against disasters such as fire or flooding.
Not modifying the database may result in lost updates if failure occurs just after Failure during data transfer can still result in inconsistent copies.
transaction commits
Block transfer can result in
Recovery algorithms have two parts
Successful completion
1. Actions taken during normal transaction processing to ensure enough information
exists to recover from failures Partial failure: destination block has incorrect information
2. Actions taken after a failure to recover the database contents to a state that ensures Total failure: destination block was never updated
atomicity, consistency and durability
Protecting storage media from failure during data transfer (one solution):

STORAGE STRUCTURE Execute output operation as follows (assuming two copies of each block):

1. Write the information onto the first physical block.


Volatile storage:
2. When the first write successfully completes, write the same information onto the
does not survive system crashes second physical block.

examples: main memory, cache memory 3. The output is completed only after the second write successfully completes.

Nonvolatile storage: Copies of a block may differ due to failure during output operation. To recover from
failure:
survives system crashes
1. First find inconsistent blocks:
examples: disk, tape, flash memory,
1. Expensive solution: Compare the two copies of every disk block.
non-volatile (battery backed up) RAM
2. Better solution:
but may still fail, losing data
Record in-progress disk writes on non-volatile storage (Non-volatile RAM or special
Stable storage: area of disk).

a mythical form of storage that survives all failures Use this information during recovery to find blocks that may be inconsistent, and only
compare copies of these.
approximated by maintaining multiple copies on distinct nonvolatile media
Used in hardware RAID systems

DATABASE MANAGEMENT SYSTEMS Page 80 DATABASE MANAGEMENT SYSTEMS Page 81

www.Jntufastupdates.com 18 www.Jntufastupdates.com 19
2. If either copy of an inconsistent block is detected to have an error (bad checksum),
overwrite it by the other copy. If both have no error, but are different, overwrite the second block Lock-Based Protocols
by the first block. A lock is a mechanism to control concurrent access to a data item
Data items can be locked in two modes :
DATA ACCESS 1. exclusive (X) mode. Data item can be both read as well as
written. X-lock is requested using lock-X instruction.
2. shared (S) mode. Data item can only be read. S-lock is
Physical blocks are those blocks residing on the disk. requested using lock-S instruction.
Lock requests are made to concurrency-control manager. Transaction can proceed only after
System buffer blocks are the blocks residing temporarily in main memory. request is granted.
Lock-compatibility matrix
Block movements between disk and main memory are initiated through the following
two operations:

input(B) transfers the physical block B to main memory.

output(B) transfers the buffer block B to the disk, and replaces the appropriate physical
block there.

We assume, for simplicity, that each data item fits in, and is stored inside, a single
block.
1) A transaction may be granted a lock on an item if the requested lock is compatible with locks
already held on the item by other transactions
Each transaction Ti has its private work-area in which local copies of all data items 2) Any number of transactions can hold shared locks on an item,
accessed and updated by it are kept. but if any transaction holds an exclusive on the item no other transaction may hold any
lock on the item.
Ti's local copy of a data item X is denoted by xi. 3) If a lock cannot be granted, the requesting transaction is made to wait till all incompatible
locks held by other transactions have been released. The lock is then granted.
BX denotes block containing X
Example of a transaction performing locking:
Transferring data items between system buffer blocks and its private work-area done T2: lock-S(A);
by: read (A);
unlock(A);
read(X) assigns the value of data item X to the local variable xi. lock-S(B);
read (B);
write(X) assigns the value of local variable xi to data item {X} in the buffer block. unlock(B);
display(A+B)
Transactions Locking as above is not sufficient to guarantee serializability — if A and B get updated
in-between the read of A and B, the displayed sum would be wrong.
Must perform read(X) before accessing X for the first time (subsequent reads can be A locking protocol is a set of rules followed by all transactions while requesting and releasing
from local copy) locks. Locking protocols restrict the set of possible schedules.

The write(X) can be executed at any time before the transaction commits Consider the partial schedule

Note that output(BX) need not immediately follow write(X). System can performthe
output operation when it seems fit.

DATABASE MANAGEMENT SYSTEMS Page 82 DATABASE MANAGEMENT SYSTEMS Page 83

www.Jntufastupdates.com 20 www.Jntufastupdates.com 21
6. Cascading roll-back is possible under two-phase locking. To avoid this, follow a
modified protocol called strict two-phase locking. Here a transaction must hold all its
exclusive locks till it commits/aborts.
7. Rigorous two-phase locking is even stricter: here all locks are held till commit/abort.
In this protocol transactions can be serialized in the order in which they commit.
8. There can be conflict serializable schedules that cannot be obtained if two-phase
locking is used.
9. However, in the absence of extra information (e.g., ordering of access to data), two-
phase locking is needed for conflict serializability in the following sense:
Given a transaction Ti that does not follow two-phase locking, we can find a
transaction Tj that uses two-phase locking, and a schedule for Ti and Tj that is not conflict
serializable.

TIMESTAMP-BASED PROTOCOLS

1. Each transaction is issued a timestamp when it enters the system. If an old transaction
Neither T3 nor T4 can make progress — executing lock-S(B) causes T4 to wait for T3 to Ti has time-stamp TS(Ti), a new transaction Tj is assigned time-stamp TS(Tj) such that
release its lock on B, while executing lock-X(A) causes T3 to wait for T4 to release its TS(Ti) <TS(Tj).
lock on A. 2. The protocol manages concurrent execution such that the time-stamps determine the
Such a situation is called a deadlock. serializability order.
l To handle a deadlock one of T3 or T4 must be rolled back 3. In order to assure such behavior, the protocol maintains for each data Q two timestamp
and its locks released. values:
2. The potential for deadlock exists in most locking protocols. Deadlocks are a necessary a.W-timestamp(Q) is the largest time-stamp of any transaction that executed
evil. write(Q) successfully.
3. Starvation is also possible if concurrency control manager is badly designed. For b.R-timestamp(Q) is the largest time-stamp of any transaction that executed
example: read(Q) successfully.
a. A transaction may be waiting for an X-lock on an item, while a sequence of
other transactions request and are granted an S-lock on the same item. 4. The timestamp ordering protocol ensures that any conflicting read and write
b. The same transaction is repeatedly rolled back due to deadlocks. operations are executed in timestamp order.
4.Concurrency control manager can be designed to prevent starvation. 5.Suppose a transaction Ti issues a read(Q)
1. If TS(Ti) d W-timestamp(Q), then Ti needs to read a valueof Q that was
THE TWO-PHASE LOCKING PROTOCOL already overwritten.
n Hence, the read operation is rejected, and Ti is rolled back.
1.This is a protocol which ensures conflict-serializable schedules. 2. If TS(Ti)t W-timestamp(Q), then the read operation is executed, and R-
2.Phase 1: Growing Phase timestamp(Q) is set to max(R-timestamp(Q), TS(Ti)).
a.transaction may obtain locks
b.transaction may not release locks 6. Suppose that transaction Ti issues write(Q).
3. Phase 2: Shrinking Phase 1. If TS(Ti) < R-timestamp(Q), then the value of Q that Ti is producing was needed
a.transaction may release locks previously, and the system assumed that that value would never be produced.
b.transaction may not obtain locks n Hence, the write operation is rejected, and Ti is rolled back.
4. The protocol assures serializability. It can be proved that the transactions can be 2. If TS(Ti) < W-timestamp(Q), then Ti is attempting to write an obsolete value of Q.
serialized in the order of their lock points (i.e. the point where a transaction acquired its n Hence, this write operation is rejected, and Ti is rolled back.
final lock). 3. Otherwise, the write operation is executed, and W-timestamp(Q) is set to TS(Ti).

5. Two-phase locking does not ensure freedom from deadlocks Thomas’ Write Rule

DATABASE MANAGEMENT SYSTEMS Page 84 DATABASE MANAGEMENT SYSTEMS Page 85

www.Jntufastupdates.com 22 www.Jntufastupdates.com 23
1) Read phase. During this phase, the system executes transaction Ti. It reads the
1. We now present a modification to the timestamp-ordering protocol that allows greater values of the various data items and stores them in variables local to Ti. It performs all write
potential concurrency than does the protocol i.e., Timestamp ordering Protocol . Let us operations on temporary local variables, without updates of the actual database.
consider schedule 4 of Figure below, and apply the timestamp-ordering protocol. Since T27 2) Validation phase. The validation test is applied to transaction Ti. This determines
starts before T28, we shall assume that TS(T27) < TS(T28). The read(Q) operation of T27 whether Ti is allowed to proceed to the write phase without causing a violation ofserializability.
succeeds, as does the write(Q) operation of T28. When T27 attempts its write(Q) operation, If a transaction fails the validation test, the system aborts the transaction.
we find that TS(T27) < W-timestamp(Q), since Wtimestamp(Q) = TS(T28). Thus, the 3) Write phase. If the validation test succeeds for transaction Ti, the temporary local
write(Q) by T27 is rejected and transaction T27 must be rolled back. variables that hold the results of any write operations performed by Ti are copied to the database.
Read-only transactions omit this phase.
2. Although the rollback of T27 is required by the timestamp-ordering protocol, it is MODES IN VALIDATION-BASED PROTOCOLS
unnecessary. Since T28 has already written Q, the value that T27 is attempting to write is 1. Start(Ti)
one that will never need to be read. Any transaction Ti with TS(Ti ) < TS(T28) that attempts 2. Validation(Ti )
a read(Q)will be rolled back, since TS(Ti)<W-timestamp(Q). 3. Finish
MULTIPLE GRANULARITY.
3. Any transaction Tj with TS(Tj ) > TS(T28) must read the value of Q written by T28, rather
than the value that T27 is attempting to write. This observation leads to a modified version multiple granularity locking (MGL) is a locking method used in database management
of the timestamp-ordering protocol in which obsolete write operations can be ignored systems (DBMS) and relational databases.
under certain circumstances. The protocol rules for read operations remain unchanged.
The protocol rules for write operations, however, are slightly different from the timestamp- In MGL, locks are set on objects that contain other objects. MGL exploits the hierarchical
ordering protocol. nature of the contains relationship. For example, a database may have files, which contain pages,
which further contain records. This can be thought of as a tree of objects, where each node
contains its children. A lock on such as a shared or exclusive lock locks the targeted node as well
as all of its descendants.

Multiple granularity locking is usually used with non-strict two-phase locking to


guarantee serializability. The multiple-granularity locking protocol uses these lock modes
to ensure serializability. It requires that a transaction Ti that attempts to lock a node Q must
follow these rules:
Transaction Ti must observe the lock-compatibility function of Figure above.
Transaction Ti must lock the root of the tree first, and can lock it in anymode.
Transaction Ti can lock a node Q in S or IS mode only if Ti currently has the parent of Q
locked in either IX or IS mode.
The modification to the timestamp-ordering protocol, called Thomas’ write rule, is this: Transaction Ti can lock a node Q in X, SIX, or IX mode only if Ti currently has the
Suppose that transaction Ti issues write(Q). parent of Q locked in either IX or SIX mode.
Transaction Ti can lock a node only if Ti has not previously unlocked any node (that
1. If TS(Ti ) < R-timestamp(Q), then the value of Q that Ti is producing was previously
is, Ti is two phase).
needed, and it had been assumed that the value would never be produced. Hence, the
system rejects the write operation and rolls Ti back. Transaction Ti can unlock a node Q only if Ti currently has none of the children of
2. If TS(Ti ) < W-timestamp(Q), then Ti is attempting to write an obsolete value of Q. Hence, Q locked.
this write operation can be ignored.
3. Otherwise, the system executes the write operation and setsW-timestamp(Q) to
TS(Ti ).

VALIDATION-BASED PROTOCOLS
Phases in Validation-Based Protocols

DATABASE MANAGEMENT SYSTEMS Page 86 DATABASE MANAGEMENT SYSTEMS Page 87

www.Jntufastupdates.com 24 www.Jntufastupdates.com 25
Database Management Systems Prof. B. Satyanarayana Reddy Database Management Systems Prof. B. Satyanarayana Reddy

Record Blocking and Spanned Versus Un spanned Records:


STORAGE AND INDEXING : Database file organization, file organization on disk, heap files and sorted files, The records of a file must be allocated to disk blocks because a block is the unit of data transfer between
hashing, single and multi-level indexes, dynamic multilevel indexing using B-Tree and B+ tree, index on multiple disk and memory. When the block size is larger than the record size, each block will contain numerous records,
keys. although some files may have unusually large records that cannot fit in one block.
Suppose that the block size is B bytes. For a file of fixed-length records of size R bytes, with B > R, we
DATABASE FILE ORGANIZATION: can fit bfr = floor(B/R) records per block, where the floor(x) rounds down the number x to an integer. The
value bfr is called the blocking factor for the file. In general, R may not divide B exactly, so we have some
Records and Record Types: unused space in each block equal to B - (bfr * R) bytes
Data is usually stored in the form of records. Each record consists of a collection of related data values To utilize this unused space, we can store part of a record on one block and the rest on another. A
or items, where each value is formed of one or more bytes and corresponds to a particular field of the record. pointer at the end of the first block points to the block containing the remainder of the record in case it is not the
Records usually describe entities and their attributes. A collection of field names and their corresponding data next consecutive block on disk. This organization is called spanned, because records can span more than one
types constitutes a record type or record format definition. A data type, associated with each field, specifies the block. Whenever a record is larger than a block, we must use a spanned organization. If records are not allowed
types of values a field can take. to cross block boundaries, the organization is called un spanned. This is used with fixed-length records having
In recent database applications, the need may arise for storing data items that consist of large B > R because it makes each record start at a known location in the block, simplifying record processing. For
unstructured objects, which represent images, digitized video or audio streams, or free text. These are referred variable-length records, either a spanned or an un spanned organization can be used. If the average record is
to as BLOBs (Binary Large Objects). A BLOB data item is typically stored separately from its record in a pool large, it is advantageous to use spanning to reduce the lost space in each block.
of disk blocks, and a pointer to the BLOB is included in the record.
Files, Fixed-length Records, and Variable-length Records:
A file is a sequence of records. In many cases, all records in a file are of the same record type. If every
record in the file has exactly the same size (in bytes), the file is said to be made up of fixed-length records. If
different records in the file have different sizes, the file is said to be made up of variable-length records. A file
may have variable-length records for several reasons:
1. The file records are of the same record type, but
one or more of the fields are of varying size (variable-length fields) (or)
one or more of the fields are optional (or) For variable-length records using spanned organization, each block may store a different number of
one or more of the fields may have multiple values for individual records; such a field is called a records. In this case, the blocking factor bfr represents the average number of records per block for the file. We
repeating field and a group of values for the field is often called a repeating group. can use bfr to calculate the number of blocks b needed for a file of r records:
2. The file contains records of different record types and hence of varying size (mixed file). b = Ceil(r/bfr) blocks where the Ceil(x) rounds the value x up to the next integer.

File Headers
A file header or file descriptor contains information about a file that is needed by the system programs
that access the file records. The header includes information to determine the disk addresses of the file blocks as
well as to record format descriptions,
To search for a record on disk, one or more blocks are copied into main memory buffers. Programs then
search for the desired record or records within the buffers, using the information in the file header. If the
address of the block that contains the desired record is not known, the search programs must do a linear search
through the file blocks.

1 2
www.Jntufastupdates.com 26 www.Jntufastupdates.com 27
Database Management Systems Prof. B. Satyanarayana Reddy Database Management Systems Prof. B. Satyanarayana Reddy

File Operations: FILES OF UNORDERED RECORDS (HEAP FILES):


Operations on database files can be broadly classified into two categories − In this type of organization, records are placed in the file in the order in which they are inserted, so new
records are inserted at the end of the file. This organization is often used with additional access paths, such as
 Update Operations
the secondary indexes.
 Retrieval Operations
Inserting: Inserting a new record is very efficient: the last disk block of the file is copied into a buffer; the new
Update operations change the data values by insertion, deletion, or update. Retrieval operations, on the other record is added; and the block is then rewritten back to disk. The address of the last file block is kept in the file
hand, do not alter the data but retrieve them after optional conditional filtering. In both types of operations, header.
selection plays a significant role. Other than creation and deletion of a file, there could be several operations,
which can be done on files.
Searching: searching for a record using any search condition involves a linear search through the file block by
 Open − A file can be opened in one of the two modes, read mode or write mode. In read mode, the
block-an expensive procedure. If only one record satisfies the search condition, then, on the average, a program
operating system does not allow anyone to alter data. In other words, data is read only. Files opened in
will read into memory and search half the file blocks before it finds the record. For a file of b blocks, this
read mode can be shared among several entities. Write mode allows data modification. Files opened in
requires searching (b/2) blocks, on average. and b blocks in worst case.
write mode can be read but cannot be shared.
Deleting: To delete a record, a program must first find its block, copy the block into a buffer, then delete the
 Locate − Every file has a file pointer, which tells the current position where the data is to be read or
record from the buffer, and finally rewrite the block back to the disk. This leaves unused space in the disk
written. This pointer can be adjusted accordingly. Using find (seek) operation, it can be moved forward
block. Deleting a large number of records in this way results in wasted storage space.
or backward.
Another technique used for record deletion is to have an extra byte or bit, called a deletion marker,
 Read − By default, when files are opened in read mode, the file pointer points to the beginning of the
stored with each record. A record is deleted by setting the deletion marker to a certain value. A different value
file. There are options where the user can tell the operating system where to locate the file pointer at the
of the marker indicates a valid (not deleted) record. Search programs consider only valid records in a block
time of opening a file. The very next data to the file pointer is read.
when conducting their search. Both of these deletion techniques require periodic reorganization of the file to
 Reset - Sets the file pointer of an open file to the beginning of the file. reclaim the unused space of deleted records. During reorganization, the file blocks are accessed consecutively,
and records are packed by removing deleted records.
 FindNext - Searches for the next record in the file that satisfies the search condition. Transfers the
Another possibility is to use the space of deleted records when inserting new records, although this
block containing that record into a main memory buffer (if it is not already there). The record is located
requires extra bookkeeping to keep track of empty locations.
in the buffer and becomes the current record.
Reading: To read all records in order of the values of some field, we create a sorted copy of the file. Sorting is
 Delete - Deletes the current record and (eventually) updates the file on disk to reflect the deletion. an expensive operation for a large disk file, and special techniques for external sorting are used.

 Modify - Modifies some field values for the current record and (eventually) updates the file on disk to FILES OF ORDERED RECORDS (SORTED FILES):
reflect the modification. We can physically order the records of a file on disk based on the values of one of their fields-called the
ordering field. This leads to an ordered or sequential files If the ordering field is also a key field of the file-a
 Insert - Inserts a new record in the file by locating the block where the record is to be inserted, field guaranteed to have a unique value in each record-then the field is called the ordering key for the file.
transferring that block into a main memory buffer (if it is not already there), writing the record into the Figure shows an ordered file with NAME as the ordering key field.
buffer, and (eventually) writing the buffer to disk to reflect the insertion.
Ordered records have some advantages over unordered files. They are:
 Write − User can select to open a file in write mode, which enables them to edit its contents. It can be 1. Reading the records in order of the ordering key values becomes extremely efficient, because no
deletion, insertion, or modification. The file pointer can be located at the time of opening or can be sorting is required.
dynamically changed if the operating system allows to do so. 2. Finding the next record from the current one in order of the ordering key usually requires no additional
block accesses, because the next record is in the same block as the current one
 Close − This is the most important operation from the operating system’s point of view. When a request
3. Using a search condition based on the value of an ordering key field results in faster access when the
to close a file is generated, the operating system
binary search technique is used, which constitutes an improvement over linear searches
o removes all the locks (if in shared mode),
o saves the data (if altered) to the secondary storage media, and A binary search for disk files can be done on the blocks rather than on the records. Suppose that the file
has b blocks numbered 1, 2, ... , b; the records are ordered by ascending value of their ordering key field; and
o releases all the buffers and file handlers associated with the file. we are searching for a record whose ordering key field value is K. Assuming that disk addresses of the file
The organization of data inside a file plays a major role here. The process to locate the file pointer to a desired blocks are available in the file header. A binary search usually accesses log2(b) blocks, whether the record is
record inside a file various based on whether the records are arranged sequentially or clustered. found or not-an improvement over linear searches, where, on the average, (b/2) blocks are accessed when the
record is found and b blocks are accessed when the record is not found.

3 4
www.Jntufastupdates.com 28 www.Jntufastupdates.com 29
Database Management Systems Prof. B. Satyanarayana Reddy Database Management Systems Prof. B. Satyanarayana Reddy

among them. For record deletion, the problem is less severe if deletion markers and periodic reorganization are
used.
One option for making insertion more efficient is to keep some unused space in each block for new records.
However, once this space is used up, the original problem resurfaces.
Modifying: Modifying a field value of a record depends on two factors: (1) the search condition to locate the
record and (2) the field to be modified. If the search condition involves the ordering key field, we can locate the
record using a binary search; otherwise we must do a linear search. A non ordering field can be modified by
changing the record and rewriting it in the same physical location on disk-assuming fixed-length records.
Modifying the ordering field means that the record can change its position in the file, which requires deletion of
the old record followed by insertion of the modified record.
Reading: Reading the file records in order of the ordering field is quite efficient if we ignore the records in
overflow, since the blocks can be read consecutively using double buffering. To include the records in
overflow, we must merge them in their correct positions; in this case, we can first reorganize the file, and then
read its blocks sequentially.
Ordered files are rarely used in database applications unless an additional access path, called a primary index, is
used; this results in an indexed sequential file. This further improves the random access time on the ordering
key field. If Ordering attribute is not key then the file is Clustered file.

HASHING:
Another type of primary file organization is based on hashing, which provides very fast access to
records on certain search conditions. This organization is usually called a hash file. The search condition must
be an equality condition on a single field, called the hash field of the file. In most cases, the hash field is also a
key field of the file, in which case it is called the hash key. The idea behind hashing is to provide a function h,
called a hash function or randomizing function, that is applied to the hash field value of a record and yields the
address of the disk block in which the record is stored.
Internal Hashing:
For internal files, hashing is typically implemented as a hash table through the use of an array of records.
For array index range is from 0 to M - 1 have M slots whose addresses correspond to the array indexes. We
choose a hash function that transforms the hash field value into an integer between 0 and M - 1.One common
hash function is the h(K) = K mod M function.

Insertion and deletion: Inserting and deleting records are expensive operations for an ordered file because the
records must remain physically ordered. To insert a record, we must find its correct position in the file, based on
its ordering field value, and then make space in the file to insert the record in that position. For a large file this Non integer hash field values can be transformed into integers before the mod function is applied. For character
can be very time consuming because, on the average, half the records of the file must be moved to make space strings, the numeric (ASCII) codes associated with characters can be used in the transformation-for example, by
for the new record. This means that half the file blocks must be read and rewritten after records are moved multiplying those code values.

5 6
www.Jntufastupdates.com 30 www.Jntufastupdates.com 31
Database Management Systems Prof. B. Satyanarayana Reddy Database Management Systems Prof. B. Satyanarayana Reddy

Other hashing functions can be used. One technique, called folding, involves applying an arithmetic • Multiple hashing: The program applies a second hash function if the first results in a collision. If another
function such as addition or a logical function such as exclusive or to different portions of the hash field value to collision results, the program uses open addressing or applies a third hash function and then uses open
calculate the hash address. Another technique involves picking some digits of the hash field value-for example, addressing if necessary.
the third, fifth, and eighth digits-to form the hash address. to The problem with most hashing functions is that
they do not guarantee that distinct values will hash to distinct addresses, because the hash field space-the Each collision resolution method requires its own algorithms for insertion, retrieval, and deletion of records.
number of possible values a hash field can take-is usually much larger than the address space-the number of The goal of a good hashing function is to distribute the records uniformly over the address space so as to
available addresses for records. The hashing function maps the hash field space to the address space. minimize collisions while not leaving many unused locations.
A collision occurs when the hash field value of a record that is being inserted hashes to an address that
already contains a different record. In this situation, we must insert the new record in some other position, since External Hashing for Disk Files:
its hash address is occupied. The process of finding another position is called collision resolution. There are Hashing for disk files is called external hashing. To suit the characteristics of disk storage, the target
numerous methods for collision resolution, including the following: address space is made of buckets, each of which holds multiple records. A bucket is either one disk block or a
• Open addressing: Proceeding from the occupied position specified by the hash address, the program cluster of contiguous blocks. The hashing function maps a key into a relative bucket number, rather than assign
checks the subsequent positions in order until an unused (empty) position is found. an absolute block address to the bucket. A table maintained in the file header converts the bucket number into
• Chaining: For this method, various overflow locations are kept, usually by extending the array with a the corresponding disk block address.
number of overflow positions. In addition, a pointer field is added to each record location. A collision is The collision problem is less severe with buckets, because as many records as will fit in a bucket can
resolved by placing the new record in an unused overflow location and setting the pointer of the occupied hash to the same bucket without causing problems. However, we must make provisions for the case where a
hash address location to the address of that overflow location. A linked list of overflow records for each bucket is filled to capacity and a new record being inserted hashes to that bucket. We can use a variation of
hash address is thus maintained. chaining in which a pointer is maintained in each bucket to a linked list of overflow records for the bucket. The
pointers in the linked list should be record pointers, which include both a block address and a relative record
position within the block.

The hashing scheme described is called static hashing because a fixed number of buckets M is allocated. This
can be a serious drawback for dynamic files. Suppose that we allocate M buckets for the address space and let
m be the maximum number of records that can fit in one bucket then, at most (m * M) records will fit in the
allocated space. If number of records turns out to be substantially fewer than (m * M), we are left with a lot of
unused space. On the other hand, if the number of records increases to substantially more than (m * M),
numerous collisions will result and retrieval will be slowed down because of the long lists of overflow records.
In either case, we may have to change the number of blocks M allocated and then use a new hashing
function (based on the new value of M) to redistribute the records. These reorganizations can be quite time
consuming for large files.

When using external hashing, searching for a record given a value of some field other than the hash field
is as expensive as in the case of an unordered file. Record deletion can be implemented by removing the record
from its bucket. If the bucket has an overflow chain, we can move one of the overflow records into the bucket to
replace the deleted record. If the record to be deleted is already in overflow, we simply remove it from the
linked list.

7 8
www.Jntufastupdates.com 32 www.Jntufastupdates.com 33
Database Management Systems Prof. B. Satyanarayana Reddy Database Management Systems Prof. B. Satyanarayana Reddy

distinct bucket for each of the 2d directory locations. Several directory locations with the same first d' bits for
their hash values may contain the same bucket address if all the records that hash to these locations fit in a
single bucket. A local depth d'-stored with each bucket-specifies the number of bits on which the bucket
contents are based.
Bucket splitting: suppose that a new inserted record causes overflow in the bucket whose hash values start with
01-the third bucket, the records will be distributed between two buckets: the first contains all records whose
hash values start with 010, and the second all those whose hash values start with 011. Now the two directory
locations for 010 and 011 point to the two new distinct buckets. Before the split, they pointed to the same
bucket. The local depth d' of the two new buckets is 3, which is one more than the local depth of the old bucket.

Hashing Techniques That Allow Dynamic File Expansion:


A major drawback of the static hashing scheme just discussed is that the hash address space is fixed.
Hence, it is difficult to expand or shrink the file dynamically. The techniques used in dynamic hashing are:
1. Extendible Hashing
2. Dynamic Hashing
3. Linear Hashing
These hashing schemes take advantage of the fact that the result of applying a hashing function is a
nonnegative integer and hence can be represented as a binary number. The access structure is built on the binary
representation of the hashing function result, which is a string of bits. We call this the hash value of a record.
Records are distributed among buckets based on the values of the leading bits in their hash values.
Overflow: If a bucket that overflows and is split used to have a local depth d' equal to the global depth d of the
Extendible Hashing: In extendible hashing, a type of directory-an array of 2d bucket addresses-is maintained, directory, then the size of the directory must now be doubled so that we can use an extra bit to distinguish the
where d is called the global depth of the directory. The integer value corresponding to the first (high-order) d two new buckets. For example, if the bucket for records whose hash values start with 111 overflows, the two
bits of a hash value is used as an index to the array to determine a directory entry, and the address in that entry new buckets need a directory with global depth d = 4, because the two buckets are now labeled 1110 and 1111,
determines the bucket in which the corresponding records are stored. However, there does not have to be a and hence their local depths are both 4. The directory size is hence doubled, and each of the other original

9 10
www.Jntufastupdates.com 34 www.Jntufastupdates.com 35
Database Management Systems Prof. B. Satyanarayana Reddy Database Management Systems Prof. B. Satyanarayana Reddy

locations in the directory is also split into two locations, both of which have the same pointer value as did the Four buckets are shown ("000","001","110" and "111") with higher order 3 bit addresses (corresponding
original location. to the global depth of 3), and two buckets ("01" and "10") are shown with higher order 2 bit addresses
(corresponding to the local depth of 2). The latter two are the result of collapsing the "010" and "011" into "01"
The main advantage of extendible hashing that makes it attractive is that the performance of the file does
and collapsing "100" and "101" into "10". Note that the directory nodes are used implicitly to determine local
not degrade as the file grows, as opposed to static external hashing where collisions increase and the
and global depths of buckets in dynamic hashing.
corresponding chaining causes additional accesses. In addition, no space is allocated in extendible hashing for
future growth, but additional buckets can be allocated dynamically as needed. Another advantage is that The search for a record given the hashed address involves traversing the directory tree which leads to the
splitting causes minor reorganization in most cases, since only the records in one bucket are redistributed to the bucket holding that record.
two new buckets.
Linear Hashing:
A disadvantage is that the directory must be searched before accessing the buckets themselves, resulting in two
block accesses instead of one in static hashing. This performance penalty is considered minor. The idea behind linear hashing is to allow a hash file to expand and shrink its number of buckets
dynamically without needing a directory. Suppose that the file starts with M buckets numbered 0, 1, ... , M - 1
Dynamic Hashing: and uses the mod hash function h(K) = K mod M; this hash function is called the initial hash function hi
.Overflow because of collisions is still needed and can be handled by maintaining individual overflow chains
A precursor to extendible hashing was dynamic hashing, in which the addresses of the buckets were either the
for each bucket. However, when a collision leads to an overflow record in any file bucket, the first bucket in the
n- higher order bits or n-1 higher order bits, depending n the total number of keys belonging to the respective
file-bucket 0-is split into two buckets: the original bucket 0 and a new bucket M at the end of the file. The
bucket. The eventual storage of records in buckets for dynamic hashing is somewhat similar to extendible
records originally in bucket 0 are distributed between the two buckets based on a different hashing function
hashing. The major difference is in the organization of the directory. whereas extendible hashing uses the notion
hi+1(K) = K mod 2M. A key property of the two hash functions hi and hi+1 is that any records that hashed to
of global depth (higher order d bits) for the flat directory and then combines adjacent collapsible buckets into a
bucket 0 based on hi will hash to either bucket 0 or bucket M based on hi+1; this is necessary for linear hashing
bucket of local depth d-1, dynamic hashing maintains a tree structured directory with two types of nodes.
to work.
 Internal nodes that have two pointers - the left pointer corresponds to the 0 bit (in the hashed address) and As further collisions lead to overflow records, additional buckets are split in the linear order 1, 2, 3, ....
a right pointer corresponding to the 1 bit. If enough overflows occur, all the original file buckets 0, 1, ... ,M - 1 will have been split, so the file now has
 Leaf nodes - these hold a pointer to the actual bucket with records. 2M instead of M buckets, and all buckets use the hash function hi+1 . Hence, the records in overflow are
An example of the dynamic hashing appears as shown in the figure. eventually redistributed into regular buckets, using the function hi+1 via a delayed split of their buckets. There is
no directory; only a value n-which is initially set to 0 and is incremented by 1 whenever a split occurs-is needed
to determine which buckets have been split. To retrieve a record with hash key value K, first apply the function
hi to K; if hi(K) < n, then apply the function hi+1 on K because the bucket is already split. Initially, n = 0,
indicating that the function hi applies to all buckets; n grows linearly as buckets are split.

Splitting can be controlled by monitoring the file load factor instead of by splitting whenever an
overflow occurs. In general, the file load factor 1can be defined as l= r/(bfr * N), where r is the current number
of file records, bfr is the maximum number of records that can fit in a bucket, and N is the current number of
file buckets. Buckets that have been split can also be recombined if the load of the file falls below a certain
threshold. Blocks are combined linearly, and N is decremented appropriately. The file load can be used to
trigger both splits and combinations; in this manner the file load can be kept within a desired range.

Single Level Order Indexes:


An ordered index access structure is similar to that behind the index used in a textbook, which lists
important terms at the end of the book in alphabetical order along with a list of page numbers where the term
appears in the book. We can search an index to find a list of addresses page numbers in this case-and use these
addresses to locate a term in the textbook by searching the specified pages. Without index, searching for a
phrase in a text book by shifting through whole textbook is like linear search.
For a file with a given record structure consisting of several fields (or attributes), an index access
structure is usually defined on a single field of a file, called an indexing field (or indexing attribute). The index
typically stores each value of the index field along with a list of pointers to all disk blocks that contain records
with that field value. The values in the index are ordered so that we can do a binary search on the index. The
index file is much smaller than the data file, so searching the index using a binary search is reasonably efficient.

11 12
www.Jntufastupdates.com 36 www.Jntufastupdates.com 37
Database Management Systems Prof. B. Satyanarayana Reddy Database Management Systems Prof. B. Satyanarayana Reddy

There are several types of ordered indexes.


A primary index is specified on the ordering key field of an ordered file of records. An ordering key field is
used to physically order the file records on disk, and every record has a unique value for that field. Primary
index works for the records having a key field.
A clustering index is used if numerous records in the file can have the same value for the ordering field. A file
can have at most one physical ordering field, so it can have at most one primary index or one clustering index,
but not both.
A secondary index, can be specified on any non-ordering field of a file. A file can have several secondary
indexes in addition to its primary access method.

PRIMARY INDEX:
A primary index is an ordered file whose records are of fixed length with two fields. The first field is of
the same data type as the ordering key field-called the primary key-of the data file, and the second field is a
pointer to a disk block (a block address). There is one index entry (or index record) in the index file for each
block in the data file. Each index entry has the value of the primary key field for the first record in a block and a
pointer to that block as its two field values. We will refer to the two field values of index entry i as < K(i),P(i) >.
To create a primary index on the ordered file introduced , we use the NAME field as primary key,
because that is the ordering key field of the file (assuming that each value of NAME is unique). Each entry in
the index has a NAME value and a pointer. The first three index entries are as follows:
<K(l) = (Aaron,Ed), P(l) = address of block 1>
<K(2) = (Adams.john), P(2) = address of block 2>
<K(3) = (Alexander,Ed), P(3) = address of block 3>
Below figure illustrates this primary index.
The total number of entries in the index is the same as the number of disk blocks in the ordered data file.
The first record in each block of the data file is called the anchor record of the block, or simply the block
anchor.
Indexes can also be characterized as dense or sparse.
A dense index has an index entry for every search key value (and hence every record) in the data file.
A sparse (or non-dense) index on the other hand, has index entries for only some of the search values.
A primary index is hence a non-dense (sparse) index, since it includes an entry for each disk block of the
data file and the keys of its anchor record rather than for every search value. The index file for a primary index
needs substantially fewer blocks than does the data file, for two reasons.
1. There are fewer index entries than there are records in the data file.
2. Each index entry is typically smaller in size than a data record because it has only two fields;
consequently, more index entries than data records can fit in one block. Hence requires fewer block
accesses than searching on the data file.
The binary search for an ordered data file required log 2 b block accesses. where b is the no. of blocks in CLUSTERING INDEX:
the data file. But if the primary index file contains b, blocks, then to locate a record with a search key value
requires a binary search of that index and access to the block containing that record: a total of log 2 bi +1 If records of a file are physically ordered on a non-key field-which does not have a distinct value for
accesses. each record-that field is called the clustering field. We can create a different type of index, called a clustering
index, to speed up retrieval of records that have the same value for the clustering field. This differs from a
A major problem with a primary index-as with any ordered file-is insertion and deletion of records. primary index, which requires that the ordering field of the data file have a distinct value for each record.
With a primary index, the problem is compounded because, if we attempt to insert a record in its correct
position in the data file, we have to not only move records to make space for the new record but also change A clustering index is also an ordered file with two fields; the first field is of the same type as the
some index entries, since moving records will change the anchor records of some blocks. clustering field of the data file, and the second field is a block pointer. There is one entry in the clustering index
for each distinct value of the clustering field, containing the value and a pointer to the first block in the data file
Solution for this is using an unordered overflow file or using a linked list of overflow records for each block in that has a record with that value for its clustering field.
the data file.

13 14
www.Jntufastupdates.com 38 www.Jntufastupdates.com 39
Database Management Systems Prof. B. Satyanarayana Reddy Database Management Systems Prof. B. Satyanarayana Reddy

Record insertion and deletion still cause problems, because the data records are physically ordered. To
alleviate the problem of insertion, it is common to reserve a whole block (or a cluster of contiguous blocks) for
each value of the clustering field; all records with that value are placed in the block (or block cluster). This
makes insertion and deletion relatively straightforward.
A clustering index is another example of a non-dense index, because it has an entry for every distinct
value of the indexing field which is a non-key by definition and hence has duplicate values rather than for every
record in the file. SECONDARY INDEXES:
A secondary index provides a secondary means of accessing a file for which some primary access
already exists. The secondary index may be on a field which is a candidate key and has a unique value in every
record, or a non key with duplicate values. The index is an ordered file with two fields. The first field is of the
same data type as some non ordering field of the data file that is an indexing field. The second field is either a
block pointer or a record pointer. There can be many secondary indexes (and hence, indexing fields) for the
same file.
We first consider a secondary index access structure on a key field that has a distinct value for every
record. Such a field is sometimes called a secondary key. In this case there is one index entry for each record in

15 16
www.Jntufastupdates.com 40 www.Jntufastupdates.com 41
Database Management Systems Prof. B. Satyanarayana Reddy Database Management Systems Prof. B. Satyanarayana Reddy

the data file, which contains the value of the secondary key for the record and a pointer either to the block in 1. Option 1 is to include several index entries with the same K(i) value-one for each record. This would be
which the record is stored or to the record itself. Hence, such an index is dense. a dense index.
We again refer to the two field values of index entry i as <K(i), P(i)>. The entries are ordered by value 2. Option 2 is to have variable-length records for the index entries, with a repeating field for the pointer.
of K(i), so we can perform a binary search. Because the records of the data file are not physically ordered by <K(i),<P(i,1),P(i,2).........P(i,k)>
values of the secondary key field, we cannot use block anchors. P(i) in the index entries are block pointers, not
3. Option 3, which is more commonly used, is to keep the index entries themselves at a fixed length and
record pointers. Once the appropriate block is transferred to main memory, a search for the desired record
have a single entry for each index field value but to create an extra level of indirection to handle the
within the block can be carried out. multiple pointers. In this non dense scheme, the pointer P(i) in index entry <K(i), P(i)> points to a block
A secondary index usually needs more storage space and longer search time than does a primary index, of record pointers; each record pointer in that block points to one of the data file records with value K(i)
because of its larger number of entries. However, the improvement in search time for an arbitrary record is for the indexing field. If some value K(i) occurs in too many records, so that their record pointers cannot
much greater for a secondary index than for a primary index. fit in a single disk block, a cluster or linked list of blocks is used.

MULTI LEVEL INDEXES:


The idea behind a multilevel index is to reduce the part of the index that we continue to search by bfri,
the blocking factor for the index, which is larger than 2. Hence, the search space is reduced much faster. The
We can also create a secondary index on a non key field of a file. In this case, numerous records in the value bfri is called the fan-out of the multilevel index, and we will refer to it by the symbol f0, Searching a
data file can have the same value for the indexing field. There are several options for implementing such an multilevel index requires approximately (log fo bi) block accesses, which is a smaller number than for binary
index: search if the fan-out is larger than 2.

17 18
www.Jntufastupdates.com 42 www.Jntufastupdates.com 43
Database Management Systems Prof. B. Satyanarayana Reddy Database Management Systems Prof. B. Satyanarayana Reddy

A multilevel index considers the index file, which we will now refer to as the first (or base) level of a DYNAMIC MULTILEVEL INDEXES USING B-TREES AND B+-TREES
multilevel index, as an ordered file with a distinct value for each K(i). Hence we can create a primary index for
the first level; this index to the first level is called the second level of the multilevel index. Because the second Search Trees and B-Trees:
level is a primary index, we can use block anchors so that the second level has one entry for each block of the A search tree is a special type of tree that is used to guide the search for a record, given the value of one
first level. The blocking factor bfri for the second level-and for all subsequent levels-is the same as that for the of the record's fields. A search tree is slightly different from a multilevel index. A search tree of order p is a tree
first-level index, because all index entries are the same size; each has one field value and one block address. If such that each node contains at most p - 1 search values and p pointers in the order
the first level has r1 entries, and the blocking factor-which is also the fan-out-for the index is bfri = f0, then the
<P1, K1, P2, K2 ... , Pq-1 , Kq-1 , Pq> , where q <= p; each Pi
first level needs ceil((r1/f0)) blocks, which is therefore the number of entries r2 needed at the second level of the
index. is a pointer to a child node (or a null pointer); and each Ki, is a search value from some ordered set of values.
All search values are assumed to be unique. Two constraints must hold at all times on the search tree:
We can repeat this process for the second level. The third level, which is a primary index for the second
level, has an entry for each second-level block, so the number of third-level entries is r3 = ceil((r2/f0)). Notice 1. Within each node, K1 < K2 < K3 <.....<Kq-1 .
that we require a second level only if the first level needs more than one block of disk storage, and, similarly, 2. For all values X in the sub tree pointed at by Pi we have Ki-1 < X < K, for 1 < i < q; X < Ki, for i = 1;
we require a third level only if the second level needs more than one block. We can repeat the preceding process and Ki- 1 < X for i = q
until all the entries of some index level t fit in a single block. This block at the t th level is called the top index
level. Each level reduces the number of entries at the previous level by a factor of f0-the index fan-out-so we
can use the formula l<= (r1/((f0)t)) to calculate t. Hence, a multilevel index with r1 first-level entries will have
approximately t levels, where t = CEIL((log fo (r1))).

We can use a search tree as a mechanism to search for records stored in a disk file. The values in the tree
can be the values of one of the fields of the file, called the search field (which is the same as the index field if a
multilevel index guides the search). Each key value in the tree is associated with a pointer to the record in the
data file having that value. Alternatively, the pointer could be to the disk block containing that record. The
search tree itself can be stored on disk by assigning each tree node to a disk block. When a new record is
inserted, we must update the search tree by inserting an entry in the tree containing the search field value of the
new record and a pointer to the new record.

Algorithms are necessary for inserting and deleting search values into and from the search tree while
maintaining the preceding two constraints. In general, these algorithms do not guarantee that a search tree is
balanced, meaning that all of its leaf nodes are at the same level.
Keeping a search tree balanced is important because it guarantees that no nodes will beat very high
levels and hence require many block accesses during a tree search. Keeping the tree balanced yields a uniform
search speed regardless of the value of the search key. Another problem with search trees is that record deletion
may leave some nodes in the tree nearly empty, thus wasting storage space and increasing the number of levels.
The B-tree addresses both of these problems by specifying additional constraints on the search tree.

19 20
www.Jntufastupdates.com 44 www.Jntufastupdates.com 45
Database Management Systems Prof. B. Satyanarayana Reddy Database Management Systems Prof. B. Satyanarayana Reddy

B-Trees: The leaf nodes of the W-tree are usually linked together to provide ordered access on the search field to
The B-tree has additional constraints that ensure that the tree is always balanced and that the space the records. These leaf nodes are similar to the first (base) level of an index. Internal nodes of the B+-tree
wasted by deletion, if any, never becomes excessive. correspond to the other levels of a multilevel index. Some search field values from the leaf nodes are repeated
in the internal nodes of the B+ tree to guide the search.
More formally, a B-tree of order p, when used as an access structure on a key field to search for records
in a data file, can be defined as follows: The structure of the internal nodes of a B+ tree of order p is as follows:
1. Each internal node is of the form <P1,K1,P2,K2,......,Pq-1,Kq-1,Pq> where q <= P and each Pi is a tree
1. Each internal node in the B-tree is of the form pointer.
<P1,<K1,Pr1>,P2,<K2,Pr2>...........,<Kq-1,Prq-1>,Pq> 2. Within each internal node, K1 < K2 < ...... < Kq-1
where q<=p. Each Pi is a tree pointer - a pointer to another node in the B-tree. Each Prj is a data 3. For all search field values X in the sub tree pointed at by Pi, we have Ki-1 < X <= Ki, for 1 < i < q; X <=
pointer - a pointer to the record whose search key field value is equal to K, (or to the data file block K, for i = 1; and Ki-1 < X for i = q
containing that record). 4. Each internal node has at most p tree pointers.
2. Within each node, K1 < K2< ... < Kq-1 5. Each internal node, except the root, has at least Ceil(p/2) tree pointers. The root node has at least two
3. For all search key field values X in the sub tree pointed at by Pi (the ith sub tree, see Figure), we have: tree pointers if it is an internal node.
Ki-1 < X < Ki for 1 < i < q; X < Ki for i=1 and Ki-1 < X for i = q. 6. An internal node with q pointers, q <= p, has q - 1 search field values.
4. Each node has at most p tree pointers.
5. Each node, except the root and leaf nodes, has at least Ceil(p/2) tree pointers. The root node has at least The structure of the leaf nodes of a B+-tree of order p is as follows:
two tree pointers unless it is the only node in the tree. 1. Each leaf node is of the form <<K1,Pr1>,<K2,Pr2>,....<Kq-1,Prq-1>,Pnext>
6. A node with q tree pointers, q <= p, has q - 1 search key field values (and hence has q - 1 data pointers). where q <= p, each Pri, is a data pointer, and Pnext points to the next leaf node of the B+-tree.
7. All leaf nodes are at the same level. Leaf nodes have the same structure as internal nodes except that all 2. Within each leaf node, K1 < K2 < ... < Kq-1, q <= p.
of their tree pointers P, are null. 3. Each Pr, is a data pointer that points to the record whose search field value is K, or to a file block
containing the record (or to a block of record pointers that point to records whose search field value is K,
if the search field is not a key).
4. Each leaf node has at least Ceil(p/2)) values.
5. All leaf nodes are at the same level.

The pointers in internal nodes are tree pointers to blocks that are tree nodes, whereas the pointers in leaf
nodes are data pointers to the data file records or blocks-except for the Pnext pointer, which is a tree pointer to
the next leaf node. By starting at the leftmost leaf node, it is possible to traverse leaf nodes as a linked list, using
the Pnext pointers. This provides ordered access to the data records on the indexing field.

B+Trees:
Most implementations of a dynamic multilevel index use a variation of the B-tree data structure called a
B+-tree. In a B-tree, every value of the search field appears once at some level in the tree, along with a data
pointer. In a B+-tree, data pointers are stored only at the leaf nodes of the tree; hence, the structure of leaf nodes
differs from the structure of internal nodes. The leaf nodes have an entry for every value of the search field,
along with a data pointer to the record (or to the block that contains this record) if the search field is a key field.
For a non key search field, the pointer points to a block containing pointers to the data file records, creating an
extra level of indirection.

21 22
www.Jntufastupdates.com 46 www.Jntufastupdates.com 47
Database Management Systems Prof. B. Satyanarayana Reddy Database Management Systems Prof. B. Satyanarayana Reddy

Comparison between B-trees and B+ trees: to distribute the employees uniformly by age. The grid array shown for this file has a total of 36 cells. Each cell
points to some bucket address where the records corresponding to that cell are stored.
Because entries in the internal nodes of a B+-tree include search values and tree pointers without any
data pointers, more entries can be packed into an internal node of a B+-tree than for a similar B-tree. Thus, for
the same block (node) size, the order p will be larger for the B+-tree than for the B-tree. This can lead to fewer
B+-tree levels, improving search time. Because the structures for internal and for leaf nodes of a B+-tree are
different, the order p can be different. We will use p to denote the order for internal nodes and Pleaf to denote
the order for leaf nodes, which we define as being the maximum number of data pointers in a leaf node.

Indexes on Multiple Keys:


If a certain combination of attributes is used very frequently, it is advantageous to set up an access
structure to provide efficient access by a key value that is a combination of those attributes. For example: List
the employees whose deptno is 4 and age is 59. To find this there are alternative search strategies. They are:
1. If deptno has index but not age, then select the records having deptno 4 and then filtering based on age.
2. If age had index but not deptno, then select records having age 59 and filter the records based on deptno.
3. If both have indexes, then an intersection of these sets of records or pointers yields those records that
satisfy both conditions, those records that satisfy both conditions, or the blocks in which records Thus our request for DNO = 4 and AGE = 59 maps into the cell (1, 5) corresponding to the grid array.
satisfying both conditions are located. The records for this combination will be found in the corresponding bucket. This method is particularly useful
for range queries that would map into a set of cells corresponding to a group of values along the linear scales.
All of these alternatives eventually give the correct result. However, if the set of records that meet each Grid files perform well in terms of reduction in time for multiple key access. However, they represent a space
condition (deptno = 4 or age = 59) individually are large, yet only a few records satisfy the combined condition, overhead in terms of the grid array structure.
then none of the above is a very efficient technique for the given search request.
Moreover, with dynamic files, a frequent reorganization of the file adds to the maintenance cost.
Ordered Index on Multiple Attributes:
In general, if an index is created on attributes <A1, A2, ... , An>, the search key values are tuples with n values:
<v1, v2,...... , vn>.
A lexicographic ordering of these tuple values establishes an order on this composite search key. For our
example, all of department keys for department number 3 precede those for department 4. Thus <3, n> precedes
<4, m> for any values of m and n. The ascending key order for keys with Dna = 4 would be <4, 18>, <4, 19>,
<4,20>, and so on. Lexicographic ordering works similarly to ordering of character strings.
Partitioned Hashing:
Partitioned hashing is an extension of static external hashing that allows access on multiple keys. It is
suitable only for equality comparisons; range queries are not supported. In partitioned hashing, for a key
consisting of n components, the hash function is designed to produce a result with n separate hash addresses.
The bucket address is a concatenation of these n addresses. It is then possible to search for the required
composite search key by looking up the appropriate buckets that match the parts of the address in which we are
interested.
An advantage of partitioned hashing is that it can be easily extended to any number of attributes. The
bucket addresses can be designed so that high order bits in the addresses correspond to more frequently
accessed attributes. Additionally, no separate access structure needs to be maintained for the individual
attributes. The main drawback of partitioned hashing is that it cannot handle range queries on any of the
component attributes.

Grid Files:
Another alternative is to organize the EMPLOYEE file as a grid file. we can construct a grid array with
one linear scale (or dimension) for each of the search attributes.
The scales are made in a way as to achieve a uniform distribution of that attribute. Thus, in our example,
we show that the linear scale for DNO has DNO = 1, 2 combined as one value a on the scale, while DNO = 5
corresponds to the value 2 on that scale. Similarly, AGE is divided into its scale of 0 to 5 by grouping ages so as

23 24
www.Jntufastupdates.com 48 www.Jntufastupdates.com 49

You might also like