DBMS unit 1-5 notes (1)
DBMS unit 1-5 notes (1)
INTRODUCTION TO DBMS:--
Data:--
Database:--
Database is a collection of interrelated and organized data. In
general, it is a collection of files.
➢ Database is a collection of related data and data is a collection DATABASE vs FILE SYSTEMS:--
of facts and figures that can be processed to produce
information.
➢ A Database Management System stores data in such a way
that it becomes easier to retrieve, manipulate, and produce
information.
Databases touch every area of our life. The list of few areas where Real-world entity:-- A modern DBMS is more realistic and uses real-
they are extensively used is:-- world entities to design its architecture. It uses the behavior and
attributes too.
• Banks • Universities • Airlines • E-Commerce • Stock
Ex:- School database which is related to the students.
Exchanges • Weather Forecast • Manufacturing Assemblies
• Human Resource.
Relation-based tables:-- DBMS allows entities and relations among Concurrent Access and Crash Recovery:-- A DBMS schedules
them to form tables. A user can understand the architecture of a concurrent accesses to the data in such a manner that users can
database just by looking at the table names. think of the data as being accessed by only one user at a time.
Further, the DBMS protects users from the effects of system failures.
Data Independence:-- Application programs should not, ideally, be
exposed to details of data representation and storage, The DBMS Drawbacks of using File systems to store data & store data (cont.):--
provides an abstract view of the data that hides such details.
➢ Data Redundancy & Inconsistency
Efficient Data Access:-- A DBMS utilizes a variety of sophisticated
Multiple file formats, duplication of information in diff. files.
techniques to store and retrieve data efficiently. This feature is
especially important if the data is stored on external storage devices. ➢ Difficulty in Accessing Data
Data Integrity and Security:-- If data is always accessed through the Need to write a new program to carry out each new task
DBMS, the DBMS can enforce integrity constraints. ➢ Integrity Problems
Ex:- It checks the data before entering into database which is related Integrity constants {e.g.., Account Bal.>0}.
to employee & classes of user.
Hard to add new constraints or existing changes.
Consistency:-- Consistency is a state where every relation in a
database remains consistent. A DBMS can provide greater ➢ Atomicity of Updates
consistency as compared to earlier forms of data storing applications Failed one will leave the database in an inconsistent state with
like file-processing systems. partial updates carried out.
Query Language:-- DBMS is equipped with query language, which Ex:- Transfer funds from one to another account should either
makes it more efficient to retrieve and manipulate data. A user can complete or not happens at all.
apply as many and as different filtering options as required to
retrieve a set of data. Traditionally it was not possible where file- ➢ Concurrent Access by Multiple Users
processing system was used. It will needs performance.
Data Administration:-- Experienced professionals who understand ➢ Security Problems
the nature of the data being managed, and how different groups of
Hard to provide user access to some, but not all the data.
users use it, can be responsible for organizing the data
representation to minimize redundancy and for fine-tuning the DataBase Systems offers solutions for all the above problems
storage of the data to make retrieval efficient. & another problems also.
Reasons for not choosing DBMS:- Though there are several 1. Actors on the scene.
advantages with DBMS, some applications, with tight real-time 2. Workers behind the scene.
constraints or just a few well-defined critical operations for which ➢ Actors on the scene:-
efficient custom code must be written are not choosing DBMS.
Those who actually use and control the database content, and those
✓ A DBMS is a complex piece of software, optimized for certain who design, develop and maintain database applications.
kinds of workloads and its performance may not be adequate
Database Administrator (DBA):- This is the chief administrator, who
for certain specialized applications.
oversees and manages the database system (including the data and
✓ An application may need to manipulate the data in ways not
software).
supported by the query language. In such a situation, the
abstract view of the data presented by the DBMS does not Designers:- Designers are the group of people who actually work on
match the application's needs and actually gets in the way. the designing part of the database.
DATABASE USERS:-- End Users:- These are persons who access the database for querying,
updating, & report generation.
A typical DBMS has users with different rights and permissions who
use it for different purposes. Some users retrieve data and some ➢ WORKERS BEHIND THE SCENE:-
back it up. The users of a DBMS can be broadly categorized as Those who design and develop the DBMS software and related tools,
follows:--- and the computer systems operators.
They are used in describing data at the logical and view levels. They
are characterized by the fact that they provide fairly flexible
structuring capabilities and allow data constraints to be specified
explicitly.
In this some different models are:-- • Entity:− An entity in an ER Model is a Real-world entity having
1. The E-R model (Entity Relationship model ‘or’ Diagram). properties called “Attributes”. Every attribute is defined by its set of
values called “Domain”.
2. The object-oriented model.
• Relationship:− The logical association among entities is called
3. The semantic data model.
“Relationship”. Relationships are mapped with entities in various
4. The functional data model. ways. Mapping cardinalities define the number of association
between two entities.
✓ The E-R model:--
Mapping cardinalities:− • One to One • One to Many • Many to
The (E-R) data model is based on a perception of a real worker that
One • Many to Many.
consists of a collection of basic objects, called entities, and of
relationships among these objects. The overall logical structure of a
database can be expressed graphically by an E-R diagram.
Like the E-R model the object-oriented model is based on a • Data is stored in tables called relations.
collection of objects. An object contains values stored in instance
• Relations can be normalized.
variables within the object. An object also contains bodies of code
that operate on the object. These bodies of code are called • In normalized relations, values saved are atomic values.
“Methods”.
• Each row in a relation contains a unique value.
✓ The Semantic Data Model:-
• Each column in a relation contains values from a same domain.
The semantic data model is a method of structuring data in
2. RECORD-BASED LOGICAL MODELS:--
order to represent it in a specific logical way. It is a
conceptual data model that includes semantic information that These are also used in describing data at the logical and view
adds a basic meaning to the data and the relationships that lie levels. In contrast to object-based data models, they are used
between them. both to specify the overall logical structures of the database, and
✓ The Functional Data Model:- to provide a higher-level description of the implementation.
Functional Data Models are a form of Semantic Data Model The three most widely accepted record based data models are:
which appeared early in database history. They use the o The relational Model.
mathematical formalism of function application to represent and
o Network Model.
follow associations between data items.
o Hierarchical Model.
➢ Relational Model:-
➢ System Complextity:- Since the database is based on the hierarchical structure, the
1. In this data can be accessed one record at a time. relationship between the various layers is logically simple.
2. A user-friendly DBMS cannot be created using this. ✓ Data Security:–
➢ Lack of Structural Independence:-
Hierarchical model was the first database that offered the data
1. Making structural modifications to the database is very difficult
security that is provided and enforced by the DBMS.
in this model as accessing the navigational method.
2. In this we can achieves data independence, it still fails to ✓ Efficiency:–
achieve structural independence.
Example tree figure for network model database:--
The hierarchical database model is a very efficient one when the
database contains a large number of one-to-many relationships
and when the users require large number of transactions, using
data whose relationships are fixed.
DIS-ADVANTAGES OF DBMS:--
✓ COST OF SOFTWARE/HARDWARE & MIGRATION :-
1. 1 TIRE.
2. 2 TIRE.
3. 3 TIRE.
TIRE-1 :-
➢ Advantages :--
➢ Fast Communication.
➢ Easy to Manage.
➢ Dis-advantages :--
➢ Scalability.
➢ Security.
Examples:--
Relational Model:-
• It will represents how data is stored in relational
databases.
• A relational database stores data in the form of relations
(tables).
• Consider a relation student with the attributes like:- Roll
number; Name; Address; Phone & Age.
• Table name:-- Student_Data
The value which is not known or un-available is called as Roll-No Name Address Phone Age
“Null Value”. It is represented by blank space. 33337 Sri Paris Xxxxxxxxx 20
33338 Sai Los-angles Xxxxxxxxx 25
Concept Of Domain:-
33339 Kim Sweden Xxxxxxxxx 30
The domain of a database is the set of all allowable values or
attributes of the database.
E.g:- Gender-(Male;Female;Other). Fields (columns,attributes e.t.c.)
Relation:- Tuples
Importance Of Null Values:- In every table the name of the column or row is called as
“Domain”.
• SQL supports a special value known as NULL. Which is
used to represent the values of attributes that may be E.g:- 1.Create domain id_value; int; constraint; id_test check
unknown or not allow to a tuple. value<100.
• It is an important to understand that a null value is 2.Create table student (stu_id, id_value, primary key,
different from zero value. stu_name, varchar(30), stu_age int);
• A null value is used to represent a missing value, But
that it usually has 1 of 3 different interpretation.
Key Constraints In DBMS:-
➢ Value unknown. (Value exists but it is unknown). • Constraints are nothing but rules. That are allowed to be
➢ Value not available.(Exists with a purpose). followed while entering data into columns of database.
➢ Attribute not applicable.(Undefined for this tuple). • Constraints ensure that data entered by the user into
Constraints:- columns must be with in the criteria specified by the
condition.
• These are the rules enforced on the data columns of a • We have 6 types of key contraints in DBMS. They are:-
table. These are used to limit the type of the data that
1.Not Null 2.Unique 3.Default
can go into a database.
• This ensures the accuracy & reliability of the data in the 4.Check 5.Primary Key 6.Foreign Key
database. Constraints could be either on a column level
1.Not Null:-
or in a table level.
The not null specification prohibits the insertion of a null
Domain Constraints In DBMS:-
value for the attribute. Any database modification that would
Domain constraints specify that within each tuple, the value cause a null to be inserted in an attribute declared to be not
of each attribute ‘A’ must be an atomic value from the null generates an error diagnostic.
domain dom(A). Each attribute value must be either null or
Example:- CREATE TABLE Persons ( ID int NOT NULL UNIQUE,
drawn from the domain of that attribute.
LastName varchar(255) NOT NULL, FirstName varchar(255),
Domain Constraint= Data type check for column + Constraint. Age int );
2.Unique:- It is ensures that the data entered by the user for that
column is within the range of values or possible values
The unique specification says that attributes Aj1 , Aj2, . . . ,
specified.
Ajm form a candidate key; that is, no two tuples in the
relation can be equal on all the listed attributes. E.g:- CREATE TABLE Persons ( ID int, LastName varchar(255),
FirstName varchar(255), Age int CHECK (Age>=18) );
Example:- CREATE TABLE Persons ( ID int UNIQUE, LastName
varchar(255) NOT NULL, FirstName varchar(255), Age int ); 5.Primary Key:-
3.Default:- It is a constraint in a table which will uniquely identifies each
row record in a database table by enabling 1 or more
It is used in SQL is used to add default data to the
columns in the table.
columns. But default column value can be customised
E.g:- ‘Roll-No’ of a student is a primary key.
.i.e.; it can be overridden.
6.Foreign Key:-
E.g:- 1.
Sometimes the information stored in a relation is linked to
Roll-No Name Address the information stored in another relation. If one of the
33337 Sri Paris relations is modified, the other must be checked, and
33338 Sai Paris perhaps modified, to keep the data consistent.
33339 Kim Paris
E.g:- CREATE TABLE Enrolled ( studid CHAR(20) , cid CHAR(20),
grade CHAR(10), PRIMARY KEY (studid, cid), FOREIGN KEY
Row with Default values
(studid) REFERENCES Students );
E.g:-2.
Integrity Constarints In DBMS:-
CREATE TABLE Persons ( ID int not null, LastName
1.Entity Integrity Constraints.
varchar(255) NOT NULL, FirstName varchar(255)Default ‘sri’);
2.Referential Integrity Constraints.
4.Check:-
1.Entity Integrity Constraints:-
Integrity constraints ensure that changes made to the • Allows users to access data in the relational database
database by authorized users do not result in a loss of data management systems.
consistency. Thus, integrity constraints guard against
• Allows users to describe the data.
accidental damage to the database. Examples of integrity
constraints are: • Allows users to define the data in a database and
manipulate that data.
• An instructor name cannot be null.
• Allows to embed within other languages using SQL
• No two instructors can have the same instructor ID.
modules, libraries & pre-compilers.
• Every department name in the course relation must have a
• Allows users to create and drop databases and tables.
matching department name in the department relation.
• Allows users to create view, stored procedure, functions in
2.Referential Integrity Constraints:-
a database.
Ensuring that a value that appears in one relation for a given
• Allows users to set permissions on tables, procedures and
set of attributes also appears for a certain set of attributes in
views.
another relation. This condition is called “Referential
Integrity”. let r1 and r2 be relations whose set of attributes Rules:-
are R1 and R2, respectively, with primary keys K1 and K2. We
• SQL is not sensitive case. Generally keywords of SQL are
say that a subset α of R2 is a foreign key referencing K1 in
written in upper case.
relation r1 if it is required that, for every tuple t2 in r2, there
• Using the SQL statements, You can perform most of the
must be a tuple t1 in r1 such that t1.K1 = t2.α. Requirements
actions in database.
of this form are called ‘Referential-Integrity Constraints’ or
• Statements of SQL are depended on text lines. We can
Subset Dependencies’.
use a single SQL statement on 1 or multiple text line.
Basic SQL:- “Standard Query Language--- SQL”
SQL Process:-
SQL is widely popular because it offers the following
When you are executing an SQL command for any RDBMS,
advantages:-
the system determines the best way to carry out your
request and SQL engine figures out how to interpret the task. Simple Database Schema:-
A database schema is a structure that represents the logical
storage of the data in a database. It is the logical
representation of a database which shows how the data is
stored logically in the entire database.
It contains the schema objects like:-tables, fields, packages,
views, relations, primary and foreign keys.
It includes the follows:-
1.Consistent formatting for all data entries.
2.Database objects & unique keys for all data entries.
There are various components included in this process. These 3.Tables with multiple columns & each column contains it’s
components are:- name & data type.
• Query Dispatcher
• Optimization Engines
Characteristics Of SQL:-
• Easy to learn.
• Easy to create, insert, edit, delete e.t.c…
• User can access the data from DBMS.
• User can describe the data easily.
SQL Data Types:- Float -1.79E +308 1.79E +308 It is used to
specify a
SQL data type is used to define the values that a column can floating point
contain. Every column is required to have a name & data value. E.g:- 3.7;
3.3
type in the database table. Real -3.40E +38 3.40E +38 It specifies a
single precision
Data Types Of SQL:- floating point
number.
3.Extract Numeric Data Type:-
DATA TYPE DESCRIPTION
Int It is used to specify an integer
value.
Small Int It is used to specify an small
integer value.
1.Binary Data Types:- Bit It has the number of bits to
store.
In this there are 3 types:- Decimal It specifies a numeric value
that can have a decimal
• Binary:- It has maximum length of 8000 Bytes. It
number.
contains fixed-length binary data.
• Var-Binary:- It has maximum length of 8000 Bytes. It Numeric It is used to specify a numeric
contains variable-length of binary data. value.
• Image:- It has maximum length of 2,147,483,647 Bytes. 4.Date & Time Data Type:-
It contains variable-length of binary data.
DATA TYPE DESCRIPTION
2.Numeric Data Type:- Date It is used to store the year,
month, days values.
DATA TYPE FROM TO DESCRIPTION Time It is used to store the hour,
min’s, sec’s values.
Time Stamp It stores the year, month,
day, hour, min & sec’s values. 1. In the above table “STUDENT” is the table name,
5.String Data Type:- ‘STU_ID’, ‘STU_NAME’, ‘CLASS’, ‘E-MAIL’, ‘GROUP’ are
the column names.
DATA TYPE DESCRIPTION
Char It has max.length of 8000 • The combination of data is multiple columns forms a
characters. It contains fixed row. E.g:- 3377, ‘Sri’, ‘Class’, ‘E-mail’, ‘Group’ are the
length & Non-Unicode data of 1 row.
characters.
Varchar It has max.length of 8000 SQL Commands:-
characters contains variable DDL - Data Definition Language:-
length & Non-Unicode
characters. 1. Create Table.
Text It has max.length of 2. Alter Table.
2,147,483,647 characters. It
contains variable length & 3. Drop Table.
Non-Unicode characters. Create Table:-
SQL Table:-
Creates a new table, a view of a table, or other object in
SQL table is a collection of data which is organised in the the database.
terms of rows & columns.
➢ SYNTAX:-
• In DBMS the table is known as Relation & Row as a CREATE TABLE table_name ( column_1 datatype,
‘Tuple’. column_2 datatype, column_3 datatype );
• E.g:- Let’s see the STUDENT table.
Alter Table:-
STU_ID Name Class E-mail Group Modifies an existing database object, such as a table.
3377 Sri AID-B [email protected] CSE ➢ SYNTAX:-
1234 Jhon ECE-A [email protected] ECE
3737 Sunny EEE-2 [email protected] EEE ALTER TABLE table_name ADD column_name datatype;
Drop Table:-
Deletes an entire table, a view of a table or other objects in DELETE:-
the database.
Deletes records.
➢ SYNTAX:-
➢ SYNTAX:-
DROP TABLE table_name;
DELETE FROM table_name WHERE some_column =
DML - Data Manipulation Language:- some_value;
These commands are used to manipulate data in DCL - Data Control Language:-
database.
These commands are used to control the data in
1.Insert. database.
2.Update. 1.Grant.
3.Delete. 2.Revoke.
INSERT:- GRANT:-
Creates a record. Gives a privilege or premission to user.
➢ SYNTAX:- ➢ SYNTAX:-
In this there are 3 types clauses. They are:-- It is used to sort the order of the data in database in
Ascending or Descending Orders.
1.Group By Clause.
✓ SYNTAX:-
2.Having Clause.
SELECT column_name FROM table_name ORDER BY
3.Order By Clause. column_name ASC | DESC;
REPRESENTATION:
***THE END***
1. ENTITIES:
Entities are represented by using rectangular boxes. These are named with the entity name that they
represent.
2. ATTRIBUTES:
Attributes are the properties of entities. Attributes are represented by means of ellipses. Every ellipse
represents one attribute and is directly connected to its entity.
1
www.Jntufastupdates.com 1
Database Management Systems Prof. B. Satyanarayana Reddy Database Management Systems Prof. B. Satyanarayana Reddy
Types of attributes:
Simple attribute − Simple attributes are atomic values, which cannot be divided further. For example, a
student's phone number is an atomic value of 10 digits.
Composite attribute − Composite attributes are made of more than one simple attribute. For example, a
student's complete name may have first_name and last_name.
Derived attribute − Derived attributes are the attributes that do not exist in the physical database, but
their values are derived from other attributes present in the database. For example, average_salary in a
department should not be saved directly in the database, instead it can be derived. For another example,
age can be derived from data_of_birth.
Single-value attribute − Single-value attributes contain single value. For example − Derived attributes are depicted by dashed ellipse.
Social_Security_Number.
Multi-value attribute − Multi-value attributes may contain more than one values. For example, a person
can have more than one phone number, email_address, etc.
Attributes are the properties of entities. Attributes are represented by means of ellipses. Every ellipse represents
one attribute and is directly connected to its entity (rectangle).
3. RELATIONSHIP:
Relationships are represented by diamond-shaped box. Name of the relationship is written inside
the diamond-box. All the entities (rectangles) participating in a relationship, are connected to it by a line.
Types of relationships:
If the attributes are composite, they are further divided in a tree like structure. Every node is then connected to
its attribute. That is, composite attributes are represented by ellipses that are connected with an ellipse. Degree of Relationship is the number of participating entities in a relationship defines the degree of the
relationship. Based on degree the relationships are categorized as
Unary = degree 1
Binary = degree 2
Ternary = degree 3
n-ary = degree
Unary Relationship: A relationship with one entity set. It is like a relationship among 2 entities of same entity
set. Example: A professor ( in-charge) reports to another professor (Head Of the Dept).
Multi valued attributes are depicted by double ellipse.
2 3
www.Jntufastupdates.com 2 www.Jntufastupdates.com 3
Database Management Systems Prof. B. Satyanarayana Reddy Database Management Systems Prof. B. Satyanarayana Reddy
Each professor teaches one course and each course is taught by one professor.
Binary Relationship: A relationship among 2 entity sets. Example: A professor teaches a course and a course
is taught by a professor. 2. One-to-many relationship: When more than one instance of an entity is associated with a relationship,
then the relationship is one-to-many relationship. Each entity in A is associated with zero or more entities
in B and each entity in B is associated with at most one entity in A.
Ternary Relationship: A relationship among 3 entity sets. Example: A professor teaches a course in so and so
semester.
Each professor teaches 0 (or) more courses and each course is taught by at most one professor.
3. Many-to-one relationship: When more than one instance of entity is associated with the relationship, then
n-array Relationship: A relationship among n entity sets. the relationship is many-to-one relationship. Each entity in A is associated with at most one entity in B and
each entity in B is associated with 0 (or) more entities in A.
E1
E2
R E3
Each professor teaches at most one course and each course is taught by 0 (or) more professors.
Cardinality defines the number of entities in one entity set, which can be associated with the number of entities
of other set via relationship set. Cardinality ratios are categorized into 4. They are.
1. One-to-One relationship: When only one instance of an entities are associated with the relationship, then 4. Many-to-Many relationship: If more than one instance of an entity on the left and more than one instance
the relationship is one-to-one relationship. Each entity in A is associated with at most one entity in B and of an entity on the right can be associated with the relationship, then it depicts many-to-many relationship.
each entity in B is associated with at most one entity in A. Each entity in A is associated with 0 (or) more entities in B and each entity in B is associated with 0 (or)
more entities in A.
4 5
www.Jntufastupdates.com 4 www.Jntufastupdates.com 5
Database Management Systems Prof. B. Satyanarayana Reddy Database Management Systems Prof. B. Satyanarayana Reddy
Partial participation − If, Not all entities of the entity set are involved in the relationship then such a
participation is said to be partial. Partial participation is represented by single lines.
Example: Participation Constraints can be explained easily with some examples. They are as follows.
Weak Entity set: If each entity in the entity set is not distinguishable or it doesn't has a key then such an entity
1.Each Professor teaches at least one course. set is known as strong entity set.
min=1 (Total Participation)
max=many (No key)
6 7
www.Jntufastupdates.com 6 www.Jntufastupdates.com 7
Database Management Systems Prof. B. Satyanarayana Reddy Database Management Systems Prof. B. Satyanarayana Reddy
Specialization: is opposite to Generalization. It is a top-down approach in which one higher level entity can be
broken down into two lower level entity. In specialization, some higher level entities may not have lower-level
entity sets at all. In specialization, a group of entities is divided into sub-groups based on their characteristics.
Take a group ‘Person’ for example. A person has name, date of birth, gender, etc. These properties are common
in all persons, human beings. But in a company, persons can be identified as employee, employer, customer, or
vendor, based on what role they play in the company.
The cardinality of the owner entity set is with weak relationship is 1 : m. Weak entity set is uniquely
identifiable by partial key and key of the owner entity set.
Dependent entity set is key to the relation because the all the tuples of weak entity set are associated
with the owner entity set tuples.
The process of sub grouping with in a entity set is known as specialization or generalization.
Specialization follows top down approach and generalization follows bottom-up approach. Both the speculation
and generalization are depicted using a triangle component labeled as IS A.
Generalization: is a bottom-up approach in which two lower level entities combine to form a higher level
entity. In generalization, the higher level entity can also combine with other lower level entity to make further
higher level entity. In generalization, a number of entities are brought together into one generalized entity based
on their similar characteristics. For example, pigeon, house sparrow, crow and dove can all be generalized as
Birds.
8 9
www.Jntufastupdates.com 8 www.Jntufastupdates.com 9
Database Management Systems Prof. B. Satyanarayana Reddy Database Management Systems Prof. B. Satyanarayana Reddy
Attribute inheritance is a crucial property where a subclass entity set inherits all the attributes of its AGGREGATION:
An aggregation is not a ternary relationship but is an attempt to establish the relationship with another
super class entity set. Attributes can be additionally specified which is used to give a clear representation
relationship set. It is also termed as relationship with in a relationship. Aggregation can be used over a binary,
though that same attribute is found nowhere in the hierarchy. ternary or a quaternary relationship set. Aggregation is denoted using a dashed rectangle.
Employee and customer can inherit the attributes of Person entity and they have their own attributes like
salary for employee and credit_rating for customer. similarly, the entities officer, teller and secretary inherit alll Aggregation over ternary relationship:
the attributes of employee and they can have their own attributes like office_member for officer,
station_number & hours_worked for teller and hours_worked for secretary.
If an entity set has one single higher level entity set then it is termed as single inheritance. If it has
multiple higher level entity sets then we can term it as multiple inheritance.
A Condition Defined Constraint is imposed, while classifying the entities of a higher level entity set to
be part of (or) a member of lower level entity sets based on a specified defined constraints.
Example: Every higher level entity in the entity set "Account" is checked using the attribute ''acc_type" to be
assigned either to the "SavingsAccount" or to the "CurrentAccount". SavingsAccount and CurrentAccount are
lower level entity sets.
If no condition is specified during the process of designing the lower level entity sets, then it is called Aggregation over binary Relationships:
user defined constraint.
Disjoint Constraint: This constraint checks whether an entity belongs to only one lower level entity set
or not.
Overlapping Constraint: This constraint ensures by testing out that an entity in the higher level entity set
belong to more than one lower level entity sets.
Completeness Constraint: This is also called total constraint which specifies whether or not and entity in
the higher level entity set must belong if at least one lower level entity set in generalization or
specialization.
When we consider the completeness constraint, we come across total and partial constraints. i.e., Total
Participation constraint and Partial Participation Constraint.
Total Participation forces that a higher level entity set 's entity(Every entity) must belong to at least
one lower level entity set mandatorily.
Ex: An account entity set's entity set must be belong to either savings account entity set or
current account entity set. In the examples shown above, we treated the already existed relationship sets "WorksFor" and "Sponsors" as an
entity set for defining the new relationship sets "Manages" and "Monitors". A relationship set is participating in
another relationship. So it can be termed as aggregation.
Partial Participation is rarely found with an entity set because sometimes an entity set in the higher
level entity set beside being a member of that higher level entity set, doesn't belong to any of the
lower level entity sets immediately until the stipulated period. TERNARY RELATIONSHIP DECOMPOSED INTO BINARY:
Ex: A new employer listed in the higher level entity set but not designated to any one of the Consider the following ER diagram, representing insurance policies owned by employees at a company.
available teams that belong to the lower level entity set. It depicts 3 entity sets Employee, policy and Dependents. The 3 entity sets are associated with a ternary
relationship set called Covers. Each employee can own several polices, each policy can be owned by several
employees, and each dependent can be covered by several policies.
1 1
www.Jntufastupdates.com
0 10 www.Jntufastupdates.com
1 11
Database Management Systems Prof. B. Satyanarayana Reddy Database Management Systems Prof. B. Satyanarayana Reddy
Supply in the ternary relationship set from the first figure, which has a set of relationship instances (s,j,p) which
means 's' is a supplier who is supplying part 'p' to a project 'j'.
A ternary relationship represent different information than 3 binary relationship sets do. Here the
relationship sets canSupply, uses and supplies substitute the ternary relationship set "supply".
No combination of binary relationships is an adequate substitute. because there is question "where to add
quantity attribute?". Is it to the can-supply or to the uses or to the supplies??
The solution for this is to maintain the same ternary relationship with a weak entity set Supply which has
attribute Qty.
TERNARY VS BINARY: Generally the degree of a relationship set is assessed by counting the no. of nodes
or edges that emanate from that relationship set.
1 1
www.Jntufastupdates.com
2 12 www.Jntufastupdates.com
3 13
Database Management Systems Prof. B. Satyanarayana Reddy Database Management Systems Prof. B. Satyanarayana Reddy
SQL> CREATE TABLE Employees ( ssn CHAR(11), name CHAR(30) , lot INTEGER,
PRIMARY KEY (ssn) );
Relationship sets without constraints: To represent a relationship, we must be able to identify each
participating entity and give values to the descriptive attributes of the relationship. Thus, the attributes of the
relation include:
• The primary key attributes of each participating entity set, as foreign key fields.
• The descriptive attributes of the relationship set. The set of non descriptive attributes is a super key for the The table corresponding to Manages has the attributes ssn, did, since. However, because each
relation. If there are no key constraints, this set of attributes is a candidate key. department has at most one manager, no two tuples can have the same did value but differ on the ssn value. A
consequence of this observation is that did is itself a key for Manages; indeed, the set did, ssn is not a key
(because it is not minimal). The Manages relation can be defined using the following SQL statement:
SQL> CREATE TABLE Manages (ssn CHAR (11) , did INTEGER, since DATE,
PRIMARY KEY (did),
FOREIGN KEY (ssn) REFERENCES Employees,
FOREIGN KEY (did) REFERENCES Departments)
A second approach to translating a relationship set with key constraints is often superior because it
avoids creating a distinct table for the relationship set. The idea is to include the information about the
relationship set in the table corresponding to the entity set with the key, taking advantage of the key constraint.
In the Manages example, because a department has at most one manager, we can add the key fields of the
SQL> CREATE TABLE Works_In ( ssn CHAR(11),did INTEGER, Employees tuple denoting the manager and the since attribute to the Departments tuple.
address CHAR(20) ,since DATE, PRIMARY KEY (ssn, did, address),
FOREIGN KEY (ssn) REFERENCES Employees, This approach eliminates the need for a separate Manages relation, and queries asking for a department's
FOREIGN KEY (address) REFERENCES Locations, manager can be answered without combining information from two relations. The only drawback to this
FOREIGN KEY (did) REFERENCES Departments); approach is that space could be wasted if several departments have no managers. In this case the added fields
would have to be filled with null values. The first translation (using a separate table for Manages) avoids this
Another Example: inefficiency, but some important queries require us to combine information from two relations, which can be a
slow operation.
The following SQL statement, defining a DepLMgr relation that captures the information in both
Departments and Manages, illustrates the second approach to translating relationship sets with key constraints:
1 1
www.Jntufastupdates.com
4 14 www.Jntufastupdates.com
5 15
Database Management Systems Prof. B. Satyanarayana Reddy Database Management Systems Prof. B. Satyanarayana Reddy
Relationship Sets with Participation Constraints We can capture the desired semantics with the following definition of the Dep_Policy relation:
Every department is required to have a manager, due to the participation constraint, and at most one SQL> CREATE TABLE Dep_Policy (pname CHAR(20) , age INTEGER,
manager, due to the key constraint.
cost REAL, eno CHAR (11) ,
PRIMARY KEY (pname, eno),
FOREIGN KEY (eno) REFERENCES Employees ON DELETE CASCADE );
Observe that the primary key is (pname, eno) , since Dependents is a weak entity. We have to ensure
that every Dependents entity is associated with an Employees entity (the owner), as per the total participation
constraint on Dependents. That is, eno cannot be null. This is ensured because eno , is part of the primary key.
The CASCADE option ensures that information about an employee's policy and dependents is deleted if the
corresponding Employees tuple is deleted.
Consider the Dependents weak entity set shown in Figure , with partial key pname. A Dependents entity Example:
can be identified uniquely only if we take the key of the owning Employees entity and the pname of the SQL> CREATE Table emp3(eno number(5) unique,ename varchar2(10));
Dependents entity, and the Dependents entity must be deleted if the owning Employees entity is deleted.
Table created.
SQL> desc emp3;
Name Null? Type
-------------------------- ------------------ --------------------
ENO NUMBER(5)
ENAME VARCHAR2(10)
SQL> insert into emp3 values(&eno,'&ename');
Enter value for eno: 1
Enter value for ename: sss
old 1: insert into emp3 values(&eno,'&ename')
new 1: insert into emp3 values(1,'sss')
1 row created.
1 1
www.Jntufastupdates.com
6 16 www.Jntufastupdates.com
7 17
Database Management Systems Prof. B. Satyanarayana Reddy Database Management Systems Prof. B. Satyanarayana Reddy
Syntax: CREATE TABLE Table_Name(column_name data_type(size) PRIMARY KEY, ….); ID NAME AGE ADDRESS SALARY
Example: 1 Ramesh 35 Ahmadabad 2000.00
2 Khilan 25 Delhi 1500.00
SQL> CREATE TABLE faculty (fcode NUMBER(3) PRIMARY KEY, 3 Kaushik 23 Kota 2000.00
fname CHAR(10)); 4 Chaitali 25 Mumbai 6500.00
5 Hardik 27 Bhopal 8500.00
6 Komal 22 MP 4500.00
5. FOREIGN KEY: It is a table level constraint. We cannot add this at column level. To reference any
7 Muffy 24 Indore 10000.00
primary key column from other table this constraint can be used. The table in which the foreign key is
defined is called a detail table. The table that defines the primary key and is referenced by the foreign key
is called the master table. Now, let us check the following subquery with a SELECT statement.
Syntax: CREATE TABLE Table_Name ( col_name type(size) SQL> SELECT * FROM CUSTOMERS
FOREIGN KEY(col_name) REFERENCES table_name WHERE ID IN (SELECT ID FROM CUSTOMERS WHERE SALARY > 4500) ;
);
Example: This would produce the following result.
SQL> CREATE TABLE subject (
ID NAME AGE ADDRESS SALARY
scode NUMBER (3) PRIMARY KEY,
4 Chaitali 25 Mumbai 6500.00
subname CHAR(10),fcode NUMBER(3), 5 Hardik 27 Bhopal 8500.00
FOREIGN KEY(fcode) REFERENCE faculty ); 7 Muffy 24 Indore 10000.00
1 1
www.Jntufastupdates.com
8 18 www.Jntufastupdates.com
9 19
Database Management Systems Prof. B. Satyanarayana Reddy Database Management Systems Prof. B. Satyanarayana Reddy
Subqueries with the INSERT Statement: Subqueries also can be used with INSERT statements. The INSERT SQL> DELETE FROM CUSTOMERS WHERE AGE IN
statement uses the data returned from the subquery to insert into another table. The selected data in the subquery (SELECT AGE FROM CUSTOMERS_BKP WHERE AGE >= 27 );
can be modified with any of the character, date or number functions.
Syntax: This would impact two rows and finally the CUSTOMERS table would have the following records.
INSERT INTO table_name [ (column1 [, column2 ]) ]
SELECT [ *|column1 [, column2 ] FROM table1 [, table2 ] [ WHERE VALUE OPERATOR] ID NAME AGE ADDRESS SALARY
2 Khilan 25 Delhi 1500.00
Example: Consider a table CUSTOMERS_BKP with similar structure as CUSTOMERS table. Now to copy
3 Kaushik 23 Kota 2000.00
the complete CUSTOMERS table into the CUSTOMERS_BKP table, you can use the following syntax.
4 Chaitali 25 Mumbai 6500.00
SQL> INSERT INTO CUSTOMERS_BKP 6 Komal 22 MP 4500.00
SELECT * FROM CUSTOMERS WHERE ID IN 7 Muffy 24 Indore 10000.00
(SELECT ID FROM CUSTOMERS) ;
GROUPING
Subqueries with the UPDATE Statement: The subquery can be used in conjunction with the UPDATE The SQL GROUP BY clause is used in collaboration with the SELECT statement to arrange identical
statement. Either single or multiple columns in a table can be updated when using a subquery with the data into groups. This GROUP BY clause follows the WHERE clause in a SELECT statement and precedes the
UPDATE statement. ORDER BY clause.
Syntax: The basic syntax of a GROUP BY clause is shown in the following code block. The GROUP BY
Syntax: clause must follow the conditions in the WHERE clause and must precede the ORDER BY clause if one is
UPDATE table SET column_name = new_value [ WHERE OPERATOR [ VALUE ] used.
(SELECT COLUMN_NAME FROM TABLE_NAME [WHERE)] )
SELECT column1, column2 FROM table_name WHERE [ conditions ]
Example: Assuming, we have CUSTOMERS_BKP table available which is backup of CUSTOMERS table. GROUP BY column1, column2 ORDER BY column1, column2
The following example updates SALARY by 0.25 times in the CUSTOMERS table for all the customers whose
AGE is greater than or equal to 27. Guidelines to use Group By Clause
If the group function is included in the select clause., we should not use individual result columns.
SQL> UPDATE CUSTOMERS SET SALARY = SALARY * 0.25 The extra non-group functional columns should be declared in the Group By clause.
WHERE AGE IN (SELECT AGE FROM CUSTOMERS_BKP WHERE AGE >= 27 );
Using WHERE clause, rows can be pre executed before dividing them into groups.
Column aliases can't be used in the Group by clause.
This would impact two rows and finally CUSTOMERS table would have the following records.
By Default, rows are sorted by ascending order of columns included in the Group By list.
ID NAME AGE ADDRESS SALARY Examples:
1 Ramesh 35 Ahmadabad 125.00 Display the average salary of the departments from Emp table.
2 Khilan 25 Delhi 1500.00 SQL> select deptno,AVG(sal) from emp group by deptno;
3 Kaushik 23 Kota 2000.00 Display the minimum and maximum salaries of employees working as clerks in each department.
4 Chaitali 25 Mumbai 6500.00 SQL> select deptno,min(sal),max(sal) from emp where job='CLERK' Group by deptno;
5 Hardik 27 Bhopal 2125.00
6 Komal 22 MP 4500.00 Excluding Groups of Results: While using Group By clause, there is a provision to exclude some group results
7 Muffy 24 Indore 10000.00 using HAVING clause. HAVING clause is used to specify which groups can be specified. It is used to filter the
data which is associated with the group functions.
Subqueries with the DELETE Statement: The subquery can be used in conjunction with the DELETE
Syntax:
statement like with any other statements mentioned above.
SELECT column1, column2 FROM table1, table2 WHERE [ conditions ]
Syntax: GROUP BY column1, column2 HAVING [ conditions ]
DELETE FROM TABLE_NAME [ WHERE OPERATOR [ VALUE ]
(SELECT COLUMN_NAME FROM TABLE_NAME [ WHERE) ] ) Sequence of steps:
Example: Assuming, we have a CUSTOMERS_BKP table available which is a backup of the CUSTOMERS First rows are grouped.
table. The following example deletes the records from the CUSTOMERS table for all the customers whose Group functions are applied to that identifies groups.
AGE is greater than or equal to 27. Groups that match with the criteria in having clause are displayed.
2 2
www.Jntufastupdates.com
0 20 www.Jntufastupdates.com
1 21
Database Management Systems Prof. B. Satyanarayana Reddy Database Management Systems Prof. B. Satyanarayana Reddy
The HAVING clause can predict Group By clause but it is more logical to declare it after Group By MIN Function:
clause. Group By clause can be used without a group function in the SELECT list. If rows need to be restricted
based on the result of a Group function, we must have a group by clause as well as Having clause. Existence of SQL> select min(Salary) from Employees;
Group by clause does not guarantee the existence of HAVING clause but the existence of HAVING clause OUTPUT:
demands the existence of Group By clause. MIN(SALARY)
29860
Example:
Display the Departments having the min salary of clerks is > 1000 MAX Function:
SQL> select deptno, min(sal) from emp where job='CLERK'
group by deptno HAVING min(sal)> 1000; SQL> select max(Salary) from Employees;
Display the sum of the salaries of the departments. OUTPUT:
SQL> select deptno, sum(sal) from emp group by deptno; MAX(SALARY)
65800
SUM Function:
AGGREGATION: Aggregation Functions or Group Functions
These function return a single row based on group of rows. These can appear in SELECT list and SQL> select sum(Salary) from Employees;
HAVING clauses only. These operate on sets of rows to give one result per group. The set may be whole table OUTPUT:
or table split into group. SUM(SALARY)
212574
Guidelines to use Aggregate Functions:
Distinct makes the functions to consider only non duplicate value. AVG Function:
All makes the function to consider every value including duplicates.
Syntax: SQL> select avg(Salary) from Employees;
GroupFunctionName (Distinct/ All columns) OUTPUT:
AVG(SALARY)
The data types for arguments may be char,varchar, number or Date. All group functions except count(*) 42514.8
ignore NULL values. To substitute a value for NULL value, use the NVL() function. When a group function is
declared in a select List, no single row columns should be declared. other columns can be declared but they COUNT Function:
should be declared in the group by clause.
The list of Aggregate Functions are: SQL> select count(IdNum) from Employees;
OUTPUT:
MIN returns the smallest value in a given column
MAX returns the largest value in a given column COUNT(IDNUM)
SUM returns the sum of the numeric values in a given column 5
AVG returns the average value of a given column
COUNT returns the total number of values in a given column COUNT(*) Function:
COUNT(*) returns the number of rows in a table
SQL> select count(*) from Employees;
Consider the following table: OUTPUT:
COUNT(*)
5
2 2
www.Jntufastupdates.com
2 22 www.Jntufastupdates.com
3 23
Database Management Systems Prof. B. Satyanarayana Reddy Database Management Systems Prof. B. Satyanarayana Reddy
2 2
www.Jntufastupdates.com
4 24 www.Jntufastupdates.com
5 25
Database Management Systems Prof. B. Satyanarayana Reddy Database Management Systems Prof. B. Satyanarayana Reddy
Table 2 − ORDERS Table is as follows. This would produce the following result −
OID DATE CUSTOMER_ID AMOUNT
ID NAME AMOUNT DATE
102 2009-10-08 00:00:00 3 3000
3 Kaushik 3000 2009-10-08-00:00:00
100 2009-10-08 00:00:00 3 1500
3 Kaushik 1500 2009-10-08 00:00:00
101 2009-11-20 00:00:00 2 1560
2 Khilan 1560 2009-11-20 00:00:00
103 2008-05-20 00:00:00 4 2060
4 Chaitali 2060 2008-05-20 00:00:00
Inner Join:
Full Join:
SQL> SELECT id, name, amount, date FROM customers
INNER JOIN orders SQL> SELECT id, name, amount, date FROM customers
ON customers.id = orders.customer_id; FULL JOIN orders
ON customers.id = orders.customer_id;
This would produce the following result −
This would produce the following result −
ID NAME AMOUNT DATE
ID NAME AMOUNT DATE
3 Kaushik 3000 2009-10-08-00:00:00
1 Ramesh NULL NULL
3 Kaushik 1500 2009-10-08 00:00:00
2 Khilan 1560 2009-11-20 00:00:00
2 Khilan 1560 2009-11-20 00:00:00
3 Kaushik 3000 2009-10-08-00:00:00
4 Chaitali 2060 2008-05-20 00:00:00
3 Kaushik 1500 2009-10-08 00:00:00
Left Join: 4 Chaitali 2060 2008-05-20 00:00:00
SQL> SELECT id, name, amount, date FROM customers 5 Hardik NULL NULL
LEFT JOIN orders 6 Komal NULL NULL
ON customers.id = orders.customer_id; 7 Muffy NULL NULL
This would produce the following result − 3 Kaushik 3000 2009-10-08-00:00:00
3 Kaushik 1500 2009-10-08 00:00:00
ID NAME AMOUNT DATE
2 Khilan 1560 2009-11-20 00:00:00
1 Ramesh NULL NULL
4 Chaitali 2060 2008-05-20 00:00:00
2 Khilan 1560 2009-11-20 00:00:00
3 Kaushik 3000 2009-10-08-00:00:00
Self Join:
3 Kaushik 1500 2009-10-08 00:00:00
SQL> SELECT a.id, b.name, a.salary FROM customers a, customers b
4 Chaitali 2060 2008-05-20 00:00:00 WHERE a.salary<b.salary
5 Hardik NULL NULL This would produce the following result −
6 Komal NULL NULL
7 Muffy NULL NULL ID NAME SALARY
2 Ramesh 1500.00
Right Join: 2 Kaushik 1500.00
SQL> SELECT id, name, amount, date FROM customers 1 Chaitail 2000.00
RIGHT JOIN orders 2 Chaitail 1500.00
ON customers.id = orders.customer_id;
3 Chaitail 2000.00
6 Chaitail 4500.00
2 2
www.Jntufastupdates.com
6 26 www.Jntufastupdates.com
7 27
Database Management Systems Prof. B. Satyanarayana Reddy Database Management Systems Prof. B. Satyanarayana Reddy
CREATE VIEW view_2 AS SELECT * FROM Table1 WHERE x = 1 UNION ALL SELECT * FROM Table1
VIEWS: WHERE x = 2;
A view is nothing more than a SQL statement that is stored in the database with an associated name. A -- not updatable!
view is actually a composition of a table in the form of a predefined SQL query. A view can contain all rows of
a table or select rows from a table. A view can be created from one or many tables which depends on the More about Views:
written SQL query to create a view.
A view takes up no storage space other than for the definition of the view in the data dictionary.
Views, which are a type of virtual tables allow users to do the following −
A view contains no data. All the data it shows comes from the base tables.
Structure data in a way that users or classes of users find natural or intuitive.
A view can provide an additional level of table security by restricting access to a set of rows or columns of a
Restrict access to the data in such a way that a user can see and (sometimes) modify exactly what they table.
need and no more.
A view hides implementation complexity. The user can select from the view with a simple SQL, unaware that
Summarize data from various tables which can be used to generate reports. the view is based internally on a join between multiple tables.
A view lets you change the data you can access, applying operators, aggregation functions, filters etc. on the
2 Types of Views Updatable and Read-only -views base table.
Unlike base tables, VIEWs are either updatable or read-only, but not both. INSERT, UPDATE, and
DELETE operations are allowed on updatable VIEWs and base tables, subject to any other constraints. A view isolates applications from changes in definitions of base tables. Suppose a view uses two columns of
INSERT, UPDATE, and DELETE are not allowed on read-only VIEWs, but you can change their base tables, a base table, it makes no difference to the view if other columns are added, modified or removed from the base
as you would expect. An updatable VIEW is one that can have each of its rows associated with exactly one row table.
in an underlying base table.
To know about the views in your own schema, look up user_views.
When the VIEW is changed, the changes pass unambiguously through the VIEW to that underlying base
table. Updatable VIEWs in Standard SQL are defined only for queries that meet these criteria: The underlying SQL definition of the view can be read via select text from user_views for the view.
1.Built on only one table
2.No GROUP BY clause Oracle does not enforce constraints on views. Instead, views are subject to the constraints of their base tables.
3.No HAVING clause
4.No aggregate functions
5.No calculated columns
6.No UNION, INTERSECT, or EXCEPT
2 2
www.Jntufastupdates.com
8 28 www.Jntufastupdates.com
9 29
Database Management Systems Prof. B. Satyanarayana Reddy Database Management Systems Prof. B. Satyanarayana Reddy
Example:
Creating Views
SQL> CREATE VIEW CUSTOMERS_VIEW AS
Database views are created using the CREATE VIEW statement. Views can be created from a single
SELECT name, age FROM CUSTOMERS
table, multiple tables or another view. To create a view, a user must have the appropriate system privilege
WHERE age IS NOT NULL WITH CHECK OPTION;
according to the specific implementation.
The WITH CHECK OPTION in this case should deny the entry of any NULL values in the view's AGE
Syntax: column, because the view is defined by data that does not have a NULL value in the AGE column.
CREATE VIEW view_name AS
SELECT column1, column2..... FROM table_name WHERE [condition]; Updating a View
we can include multiple tables in your SELECT statement in a similar way as we use them in a normal SQL A view can be updated under certain conditions which are given below −
SELECT query.
The SELECT clause may not contain the keyword DISTINCT.
Example: The SELECT clause may not contain summary functions.
Consider the CUSTOMERS table having the following records −
The SELECT clause may not contain set functions.
ID NAME AGE ADDRESS SALARY The SELECT clause may not contain set operators.
1 Ramesh 32 Ahmadabad 2000.00 The SELECT clause may not contain an ORDER BY clause.
2 Khilan 25 Delhi 1500.00
3 Kaushik 23 Kota 2000.00 The FROM clause may not contain multiple tables.
4 Chaitali 25 Mumbai 6500.00 The WHERE clause may not contain subqueries.
5 Hardik 27 Bhopal 8500.00
The query may not contain GROUP BY or HAVING.
6 Komal 22 MP 4500.00
7 Muffy 24 Indore 10000.00 Calculated columns may not be updated.
Following is an example to create a view from the CUSTOMERS table. This view would be used to have All NOT NULL columns from the base table must be included in the view in order for the INSERT
customer name and age from the CUSTOMERS table. query to function.
SQL > CREATE VIEW CUSTOMERS_VIEW AS SELECT name, age FROM CUSTOMERS; So, if a view satisfies all the above-mentioned rules then you can update that view. The following code block
has an example to update the age of Ramesh.
Now, you can query CUSTOMERS_VIEW in a similar way as you query an actual table. Following is an
example for the same. SQL > UPDATE CUSTOMERS_VIEW SET AGE = 35 WHERE name = 'Ramesh';
SQL > SELECT * FROM CUSTOMERS_VIEW; This would ultimately update the base table CUSTOMERS and the same would reflect in the view itself. Now,
try to query the base table and the SELECT statement would produce the following result.
This would produce the following result.
ID NAME AGE ADDRESS SALARY
NAME AGE 1 Ramesh 35 Ahmadabad 2000.00
Ramesh 32 2 Khilan 25 Delhi 1500.00
Khilan 25 3 Kaushik 23 Kota 2000.00
Kaushik 23 4 Chaitali 25 Mumbai 6500.00
Chaitali 25 5 Hardik 27 Bhopal 8500.00
Hardik 27 6 Komal 22 MP 4500.00
Komal 22 7 Muffy 24 Indore 10000.00
Muffy 24
The With Check Option: Inserting Rows into a View
The WITH CHECK OPTION is a CREATE VIEW statement option. The purpose of the WITH Rows of data can be inserted into a view. The same rules that apply to the UPDATE command also
CHECK OPTION is to ensure that all UPDATE and INSERTs satisfy the condition(s) in the view definition. If apply to the INSERT command. Here, we cannot insert rows in the CUSTOMERS_VIEW because we have not
they do not satisfy the condition(s), the UPDATE or INSERT returns an error. included all the NOT NULL columns in this view, otherwise you can insert rows in a view in a similar way as
you insert them in a table.
3 3
www.Jntufastupdates.com
0 30 www.Jntufastupdates.com
1 31
Database Management Systems Prof. B. Satyanarayana Reddy Database Management Systems Prof. B. Satyanarayana Reddy
Deleting Rows from a View UNION: Combines the result of 2 select statements into one result set, and then estimates any duplicate rows
Rows of data can be deleted from a view. The same rules that apply to the UPDATE and INSERT commands from that result set.
apply to the DELETE command.
Following is an example to delete a record having AGE = 22. UNION ALL: Combines the result of 2 SELECT statements into one result set including the duplicates.
SQL > DELETE FROM CUSTOMERS_VIEW WHERE age = 22; INTERSECT: Returns only the rows that are returned by each of two SELECT statements.
This would ultimately delete a row from the base table CUSTOMERS and the same would reflect in the view MINUS: Takes the result set of each SELECT statement, and removes those rows that are also recommended
itself. Now, try to query the base table and the SELECT statement would produce the following result. by a second SELECT statement.
ID NAME AGE ADDRESS SALARY
Point Of Concentration:
1 Ramesh 35 Ahmadabad 2000.00
2 Khilan 25 Delhi 1500.00 The queries are all executed independently but their output is merged.
3 Kaushik 23 Kota 2000.00 Only final queries ends with a semicolon(;).
4 Chaitali 25 Mumbai 6500.00
Rules And Restrictions:
5 Hardik 27 Bhopal 8500.00
The result set of both the queries must have same number of columns.
7 Muffy 24 Indore 10000.00
The datatype of each column in the second result set must match the datatype of the corresponding column
in the first result set.
Dropping Views
Obviously, where you have a view, you need a way to drop the view if it is no longer needed. The 2 SELECT statements may not contain an ORDER BY clause. The final result of the entire set
operations can be ordered.
Syntax: The columns used for ordering must be defined through the column number.
DROP VIEW view_name;
Examples:
Example: Display the employees who work in departments 10 and 30with out duplicates.
SQL> DROP VIEW CUSTOMERS_VIEW;
SQL> SELECT empno, ename from emp where deptno=10
UNION
SELECT empno, ename from emp where deptno=30;
SET OPERATIONS
These operators are used to combine information of similar datatype from one or more than one table.
Datatype of the corresponding columns in all the select statement should be same. Display the employees who work in departments 10 and 30.
Different types of set commands are SQL> SELECT empo,ename from emp where deptno=10
UNION UNION ALL
UNION ALL SELECT empno, ename from emp where deptno=30 ;
INTERSECT
Display the employees who work in both the departments with deptno 10 and 30.
MINUS
Set operators are combine 2 or more queries into one result . SQL> SELECT empno, ename from emp where deptno=10
The result of each SELECT statement can be treated as a set and SQL set operators can be applied on INTERSECT
those sets to arrive at a final result. SELECT empno, ename from emp where deptno=30 ;
SQL statements containing set operators are referred to as compound queries, and each SELECT
statements in a command query in referred to as a compound query.
Display the employees whose row number is less than 7 but not less than 6.
Set operations are often called vertical joins, as a result combines data from 2 or more SELECT based on
columns instead of rows. SQL> SELECT rownum , ename from emp where rownum<7
MINUS
Syntax: SELECT rownum , ename from emp where rownum<6;
<compound query>
{ UNION | UNION ALL | MINUS | INTERSECT }
<compound query>
3 3
www.Jntufastupdates.com
2 32 www.Jntufastupdates.com
3 33
Database Management Systems Prof. B. Satyanarayana Reddy Database Management Systems Prof. B. Satyanarayana Reddy
RELATIONAL OPERATIONS UNION: Builds a relation consisting of all rows appearing in either or both of the two relations. For example,
consider two relations, A and B, consisting of rows:
Given this simple and restricted data structure, it is possible to define some very powerful relational A: a B: a => A union B: a
operators which, from the users' point of view, act in parallel' on all entries in a table simultaneously, although b e b
their implementation may require conventional processing. c c
Codd originally defined eight relational operators. e
1. SELECT originally called RESTRICT INTERSECT: Builds a relation consisting of all rows appearing in both of the two relations.
2. PROJECT For example, consider two relations, A and B, consisting of rows:
3. JOIN A: a B: a => A intersect B: a
4. PRODUCT b e
5. UNION c
6. INTERSECT
7. DIFFERENCE DIFFERENCE: Builds a relation consisting of all rows appearing in the first and not in the second of the two
8. DIVIDE relations. For example, consider two relations, A and B, consisting of rows:
The most important of these are (1), (2), (3) and (8), which, together with some other aggregate A: a B: a => A - B: b and B - A: e
functions, are powerful enough to answer a wide range of queries. The eight operators will be described as b e c
general procedures - i.e. not in the syntax of SQL or any other relational language. The important point is that c
they define the result required rather than the detailed process of obtaining it - what but not how.
DIVIDE: Takes two relations, one binary and one unary, and builds a relation consisting of all values of one
column of the binary relation that match, in the other column, all values in the unary relation.
SELECT: RESTRICTS the rows chosen from a table to those entries with specified attribute values. A: a x B: x => A divide B: a
SELECT item FROM stock_level WHERE quantity > 100 a y y
a z
constructs a new, logical table - an unnamed relation - with one column per row (i.e. item) containing all rows
b x
from stock_level that satisfy the WHERE clause.
c y
Of the relational operators 3.2.4. to 3.2.8.defined by Codd, the most important is DIVISION. For
PROJECT: Selects rows made up of a sub-set of columns from a table.
example, suppose table A contains a list of suppliers and commodities, table B a list of all commodities
PROJECT stock_item OVER item AND description bought by a company. Dividing A by B produces a table listing suppliers who sell all commodities
produces a new logical table where each row contains only two columns - item and description. The
new table will only contain distinct rows from stock_item; i.e. any duplicate rows so formed will be
eliminated.
JOIN: Associates entries from two tables on the basis of matching column values.
JOIN stock_item WITH stock_level OVER item
It is not necessary for there to be a one-to-one relationship between entries in two tables to be joined -
entries which do not match anything will be eliminated from the result, and entries from one table which
match several entries in the other will be duplicated the required number of times.
PRODUCT: Builds a relation from two specified relations consisting of all possible combinations of rows,
one from each of the two relations. For example, consider two relations, A and B, consisting of rows:
A: a B: d => A product B: a d
b e a e
c b d
b e
c d
c e
3 3
www.Jntufastupdates.com
4 34 www.Jntufastupdates.com
5 35
Database Management Systems Prof. B. Satyanarayana Reddy Database Management Systems Prof. B. Satyanarayana Reddy
Updation Anamoly : If data items are scattered and are not linked to each other properly, then it could
lead to strange situations. For example, when we try to update one data item having its copies scattered 2. Augmentation: If X→Y is a functional dependency then by augmentation, XZ→YZ is also a functional
over several places, a few instances get updated properly while a few others are left with old values. dependency.
Such instances leave the database in an inconsistent state.
Example: To update address of a student who occurs twice or more than twice in a table, we will have 3.Transitivity: IF X→Y and Y→Z are two functional dependencies then by transitivity, X→Z is also a
to update S_Address column in all the rows, else data will become inconsistent. functional dependency.
Insertion Anamoly : We tried to insert data in a record that does not exist at all.
4. Union: If X→Y and X→Z are two functional dependencies then, X→YZ is also a functional dependency.
Example: Suppose for a new admission, we have a Student id(S_id), name and address of a student but
if student has not opted for any subjects yet then we have to insert NULL there, leading to Insertion 5. Decomposition: If X→YZ is a functional dependency then X→Y and X→Z are also functional
Anamoly. dependencies.
Deletion Anamoly : We tried to delete a record, but parts of it was left undeleted because of
unawareness, the data is also saved somewhere else.
Example: If (S_id) 402 has only one subject and temporarily he drops it, when we delete that row,
entire student record will be deleted along with it.
1 2
www.Jntufastupdates.com 1 www.Jntufastupdates.com 2
Database Management Systems Prof. B. Satyanarayana Reddy Database Management Systems Prof. B. Satyanarayana Reddy
3 4
www.Jntufastupdates.com 3 www.Jntufastupdates.com 4
Database Management Systems Prof. B. Satyanarayana Reddy Database Management Systems Prof. B. Satyanarayana Reddy
P3 Kamal W1 Word No. refers Ward Head of the Department,Professor Code → Percent time
No. in relation Ward
P4 Sharath W2
The given relation is in 3NF. Observe, however, that the names of Dept. and Head of Dept. are
duplicated. Further, if Professor P2 resigns, rows 3 and 4 are deleted. We lose the information that Rao is the
The above two relations satisfy 3NF. They don't have transitive dependencies. Head of Department of Chemistry.
Note 1: A relation R is said to be in 3NF if whenever a non trivial functional dependency of the form X → A The normalization of the relation is done by creating a new relation for Dept. and Head of Dept. and deleting
holds then either X is a super key or A is a prime attribute. Head of Dept. form the given relation. The normalized relations are shown in the following.
Note 2: If all attributes are prime attributes then the relation is in 3NFbecause with such attributes no partial
functional dependencies and transitive dependencies exists.
5 6
www.Jntufastupdates.com 5 www.Jntufastupdates.com 6
Database Management Systems Prof. B. Satyanarayana Reddy Database Management Systems Prof. B. Satyanarayana Reddy
Professor_work Dependencies 2. The information about item I1 is stored twice for vendor V3.
Professor Code Department Percent Time Department,Professor Code → Percent time Observe that the relation given is in 3NF and also in BCNF. It still has the problem mentioned above. The
problem is reduced by expressing this relation as two relations in the Fourth Normal Form (4NF). A
P1 Physics 50
relation is in 4NF if it has no more than one independent multi valued dependency or one independent
P1 Mathematics 50 Department is foreign key referring multi valued dependency with a functional dependency.
Department in the Deartment_Details relation
P2 Chemistry 25 The table can be expressed as the two 4NF relations given as following. The fact that vendors are capable
of supplying certain items and that they are assigned to supply for some projects in independently specified
P2 Physics 75
in the 4NF relation.
P3 Mathematics 100
Vendor_Supply Vendor_Project
Depatrment_Details Dependencies Vendor Code Item Code Vendor Code Project No.
Department Head of Dept. Department → Head of the Department V1 I1 V1 P1
Physics Ghosh V1 I2 V1 P3
Mathematics Krishnan V2 I2 V2 P1
Chemistry Rao V2 I3 V3 P2
V3 I1 .
FOURTH NORMAL FORM (4NF) SUMMARY
When attributes in a relation have multi-valued dependency, further Normalization to 4NF and 5NF are
Input Transformation Output
required. A multi-valued dependency is a typical kind of dependency in which each and every attribute within Relation Relation
a relation depends upon the other, yet none of them is a unique primary key. Consider a vendor supplying many
items to many projects in an organization. The following are the assumptions: All Relations Eliminate variable length record. Remove multi-attribute lines in table. 1NF
1. A vendor is capable of supplying many items. 1NF Remove dependency of non-key attributes on part of a multi-attribute key. 2NF
2. A project uses many items. 2NF Remove dependency of non-key attributes on other non-key attributes. 3NF
3. A vendor supplies to many projects. 3NF Remove dependency of an attribute of a multi attribute key on an attribute of BCNF
another (overlapping) multi-attribute key.
4. An item may be supplied by many vendors.
A multi valued dependency exists here because all the attributes depend upon the other and yet none of them is BCNF Remove more than one independent multi-valued dependency from relation 4NF
a primary key having unique value. by splitting relation.
4NF Add one relation relating attributes with multi-valued dependency. 5NF
Vendor Code Item Code Project No.
V1 I1 P1
PROPERTIES OF DECOMPOSITION:
V1 I2 P1
Every Decomposition must satisfy 2 properties.
V1 I1 P3
1. Lossless join
V1 I2 P3 2. Dependency Preserving
V2 I2 P1 1. Lossless join:
V2 I3 P1 If we decompose a relation R into relations R1 and R2,
V3 I1 P2 Decomposition is lossy if R1 ⋈ R2 ⊃ R
V3 I1 P3 Decomposition is lossless if R1 ⋈ R2 = R
The given relation has a number of problems. For example: To check for lossless join decomposition using FD set, following conditions must hold:
1. If vendor V1 has to supply to project P2, but the item is not yet decided, then a row with a blank for 1. Union of Attributes of R1 and R2 must be equal to attribute of R. Each attribute of R must be either in R1
item code has to be introduced. or in R2. Att(R1) U Att(R2)=Att(R)
7 8
www.Jntufastupdates.com 7 www.Jntufastupdates.com 8
.
Database Management Systems Prof. B. Satyanarayana Reddy UNIT-V
TRANSACTION MANAGEMENT
2. Intersection of Attributes of R1 and R2 must not be NULL. Att(R1) ∩ Att(R2) ≠ Φ
3. Common attribute must be a key for at least one relation (R1 or R2) Att(R1) ∩ Att(R2) -> Att(R1)
or Att(R1) ∩ Att(R2) -> Att(R2) What is a Transaction?
For Example, A relation R (A, B, C, D) with FD set {A->BC} is decomposed into R1(ABC) and R2(AD) which A transaction is an event which occurs on the database. Generally a transaction reads a value from
is a lossless join decomposition as: the database or writes a value to the database. If you have any concept of Operating Systems, then
1. First condition holds true as Att(R1) U Att(R2) = (ABC) U (AD) = (ABCD) = Att(R). we can say that a transaction is analogous to processes.
2. Second condition holds true as Att(R1) ∩ Att(R2) = (ABC) ∩ (AD) ≠ Φ
3. Third condition holds true as Att(R1) ∩ Att(R2) = A is a key of R1(ABC) because A->BC is given.
Although a transaction can both read and write on the database, there are some fundamental
Dependency Preserving Decomposition differences between these two classes of operations. A read operation does not change the image of
the database in any way. But a write operation, whether performed with the intention of inserting,
If we decompose a relation R into relations R1 and R2, All dependencies of R either must be a part of
R1 or R2 or must be derivable from combination of FD’s of R1 and R2. updating or deleting data from the database, changes the image of the database. That is, we may say
For Example, A relation R (A, B, C, D) with FD set{A->BC} is decomposed into R1(ABC) and R2(AD) that these transactions bring the database from an image which existed before the transaction
which is dependency preserving because FD A->BC is a part of R1(ABC).
occurred (called the Before Image or BFIM) to an image which exists after the transaction occurred
Decomposition of a relation is done when a relation in relational model is not in appropriate normal form.
Relation R is decomposed into two or more relations if decomposition is lossless join as well as dependency (called the After Image or AFIM).
preserving.
9
www.Jntufastupdates.com 9
DATABASE MANAGEMENT SYSTEM Page 63
www.Jntufastupdates.com 1
Fine, is not it? The transaction has 6 instructions to extract the amount from A and submit it to B. There are several ways to achieve this and the most popular one is using some kind of locking
The AFIM will show Rs 900/- in A and Rs 1100/- in B. mechanism. Again, if you have the concept of Operating Systems, then you should remember the
semaphores, how it is used by a process to make a resource busy before starting to use it, and how it
Now, suppose there is a power failure just after instruction 3 (Write A) has been complete. What is used to release the resource after the usage is over. Other processes intending to access that same
happens now? After the system recovers the AFIM will show Rs 900/- in A, but the same Rs 1000/- resource must wait during this time. Locking is almost similar. It states that a transaction must first
in B. It would be said that Rs 100/- evaporated in thin air for the power failure. Clearly such a lock the data item that it wishes to access, and release the lock when the accessing is no longer
situation is not acceptable. required. Once a transaction locks the data item, other transactions wishing to access the same data
item must wait until the lock is released.
The solution is to keep every value calculated by the instruction of the transaction not in any stable
storage (hard disc) but in a volatile storage (RAM), until the transaction completes its last instruction. Durability: It states that once a transaction has been complete the changes it has made should be
When we see that there has not been any error we do something known as a COMMIT operation. Its permanent.
job is to write every temporarily calculated value from the volatile storage on to the stable storage. In
this way, even if power fails at instruction 3, the post recovery image of the database will show As we have seen in the explanation of the Atomicity property, the transaction, if completes
accounts A and B both containing Rs 1000/-, as if the failed transaction had never occurred. successfully, is committed. Once the COMMIT is done, the changes which the transaction has made
to the database are immediately written into permanent storage. So, after the transaction has been
committed successfully, there is no question of any loss of information even if the power fails.
Consistency: If we execute a particular transaction in isolation or together with other transaction, Committing a transaction guarantees that the AFIM has been reached.
(i.e. presumably in a multi-programming environment), the transaction will yield the same expected
result. There are several ways Atomicity and Durability can be implemented. One of them is called Shadow
Copy. In this scheme a database pointer is used to point to the BFIM of the database. During the
To give better performance, every database management system supports the execution of multiple transaction, all the temporary changes are recorded into a Shadow Copy, which is an exact copy of
transactions at the same time, using CPU Time Sharing. Concurrently executing transactions may the original database plus the changes made by the transaction, which is the AFIM. Now, if the
have to deal with the problem of sharable resources, i.e. resources that multiple transactions are transaction is required to COMMIT, then the database pointer is updated to point to the AFIM copy,
trying to read/write at the same time. For example, we may have a table or a record on which two and the BFIM copy is discarded. On the other hand, if the transaction is not committed, then the
transaction are trying to read or write at the same time. Careful mechanisms are created in order to database pointer is not updated. It keeps pointing to the BFIM, and the AFIM is discarded. This is a
prevent mismanagement of these sharable resources, so that there should not be any change in the simple scheme, but takes a lot of memory space and time to implement.
way a transaction performs. A transaction which deposits Rs 100/- to account A must deposit the
same amount whether it is acting alone or in conjunction with another transaction that may be trying If you study carefully, you can understand that Atomicity and Durability is essentially the same
to deposit or withdraw some amount at the same time. thing, just as Consistency and Isolation is essentially the same thing.
Isolation: In case multiple transactions are executing concurrently and trying to access a sharable
resource at the same time, the system should create an ordering in their execution so that they should
not create any anomaly in the value stored at the sharable resource.
Let us consider there are two transactions T1 and T2, whose instruction sets are given as following.
Committed: If no failure occurs then the transaction reaches the COMMIT POINT. All the
T1 is the same as we have seen earlier, while T2 is a new transaction.
temporary values are written to the stable storage and the transaction is said to have been
committed.
T1
Read A;
Terminated: Either committed or aborted, the transaction finally reaches this state.
A = A – 100;
Write A;
The whole process can be described using the following diagram:
Read B;
B = B + 100;
Write B;
PARTIALLY COMMITTED
COMMITTED
Entry Point T2
ACTIVE Read A;
If we prepare a serial schedule, then either T1 will completely finish before T2 can begin, or T2 will
completely finish before T1 can begin. However, if we want to create a concurrent schedule, then
T1 T2
some Context Switching need to be made, so that some portion of T1 will be executed, then some
portion of T2 will be executed and so on. For example say we have prepared the following
Read A;
concurrent schedule.
A = A – 100;
Read A;
T1 T2
Temp = A * 0.1;
Read C;
Read A;
C = C + Temp;
A = A – 100; Write C;
Write A; Write A;
Read A; Read B;
Temp = A * 0.1; B = B + 100;
Read C; Write B;
C = C + Temp;
Write C; This schedule is wrong, because we have made the switching at the second instruction of T1. The
Read B; result is very confusing. If we consider accounts A and B both containing Rs 1000/- each, then
B = B + 100; the result of this schedule should have left Rs 900/- in A, Rs 1100/- in B and add Rs 90 in C (as
Write B; C should be increased by 10% of the amount in A). But in this wrong schedule, the Context
Switching is being performed before the new value of Rs 900/- has been updated in A. T2 reads
No problem here. We have made some Context Switching in this Schedule, the first one after the old value of A, which is still Rs 1000/-, and deposits Rs 100/- in C. C makes an unjust gain of
executing the third instruction of T1, and after executing the last statement of T2. T1 first deducts Rs Rs 10/- out of nowhere.
100/- from A and writes the new value of Rs 900/- into A. T2 reads the value of A, calculates the
value of Temp to be Rs 90/- and adds the value to C. The remaining part of T1 is executed and Rs
100/- is added to B. Serializability
When several concurrent transactions are trying to access the same data item, the instructions
It is clear that a proper Context Switching is very important in order to maintain the Consistency and within these concurrent transactions must be ordered in some way so as there are no problem in
Isolation properties of the transactions. But let us take another example where a wrong Context accessing and releasing the shared data item. There are two aspects of serializability which are
Switching can bring about disaster. Consider the following example involving the same T1 and T2 described here:
S1 and S2 which we want to be View Equivalent and both T1 and T2 wants to access the same
data item.
1. If in S1, T1 reads the initial value of the data item, then in S2 also, T1 should read the
initial value of that same data item.
2. If in S1, T1 writes a value in the data item which is read by T2, then in S2 also, T1 should Fig: Schedule 3—showing only the read and write instructions.
write the value in the data item before T2 reads it.
www.Jntufastupdates.com 8 www.Jntufastupdates.com 9
Sailors records with rating=1, another transaction might add a new such Sailors record, which is
We say that I and J conflict if they are operations by different transactions on the same data missed by T.
item, and at least one of these instructions is a write operation. To illustrate the concept of
conflicting instructions, we consider schedule 3in Figure above. The write(A) instruction of T1
conflicts with the read(A) instruction of T2. However, the write(A) instruction of T2 does not A REPEATABLE READ transaction uses the same locking protocol as a SERIALIZABLE
conflict with the read(B) instruction of T1, because the two instructions access different data transaction, except that it does not do index locking, that is, it locks only individual objects, not
items. sets of objects.
Transaction Characteristics READ COMMITTED ensures that T reads only the changes made by committed transactions,
and that no value written by T is changed by any other transaction until T is complete. However,
Every transaction has three characteristics: access mode, diagnostics size, and isolation level. a value read by T may well be modified by another transaction while T is still in progress, and T
The diagnostics size determines the number of error conditions that can be recorded. is, of course, exposed to the phantom problem.
If the access mode is READ ONLY, the transaction is not allowed to modify the database. A READ COMMITTED transaction obtains exclusive locks before writing objects and holds
Thus, INSERT, DELETE, UPDATE, and CREATE commands cannot be executed. If we have these locks until the end. It also obtains shared locks before reading objects, but these locks are
to execute one of these commands, the access mode should be set to READ WRITE. For released immediately; their only effect is to guarantee that the transaction that last modified the
transactions with READ ONLY access mode, only shared locks need to be obtained, thereby object is complete. (This guarantee relies on the fact that every SQL transaction obtains
increasing concurrency. exclusive locks before writing objects and holds exclusive locks until the end.)
A READ UNCOMMITTED transaction does not obtain shared locks before reading objects.
The isolation level controls the extent to which a given transaction is exposed to the actions of This mode represents the greatest exposure to uncommitted changes of other transactions; so
other transactions executing concurrently. By choosing one of four possible isolation level much so that SQL prohibits such a transaction from making any changes itself - a READ
settings, a user can obtain greater concurrency at the cost of increasing the transaction's UNCOMMITTED transaction is required to have an access mode of READ ONLY. Since such a
exposure to other transactions' uncommitted changes. transaction obtains no locks for reading objects, and it is not allowed to write objects (and
therefore never requests exclusive locks), it never makes any lock requests.
Isolation level choices are READ UNCOMMITTED, READ COMMITTED, REPEATABLE The SERIALIZABLE isolation level is generally the safest and is recommended for most
READ, and SERIALIZABLE. The effect of these levels is summarized in Figure given below. transactions. Some transactions, however, can run with a lower isolation level, and the smaller
In this context, dirty read and unrepeatable read are defined as usual. Phantom is defined to be number of locks requested can contribute to improved system performance.
the possibility that a transaction retrieves a collection of objects (in SQL terms, a collection of For example, a statistical query that finds the average sailor age can be run at the READ
tuples) twice and sees different results, even though it does not modify any of these tuples itself. COMMITTED level, or even the READ UNCOMMITTED level, because a few incorrect or
missing values will not significantly affect the result if the number of sailors is large. The
In terms of a lock-based implementation, a SERIALIZABLE transaction obtains locks before isolation level and access mode can be set using the SET TRANSACTION command. For
reading or writing objects, including locks on sets of objects that it requires to be unchanged (see example, the following command declares the current transaction to be SERIALIZABLE and
Section 19.3.1), and holds them until the end, according to Strict 2PL. READ ONLY:
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE READONLY
REPEATABLE READ ensures that T reads only the changes made by committed transactions, When a transaction is started, the default is SERIALIZABLE and READ WRITE.
and that no value read or written by T is changed by any other transaction until T is complete.
However, T could experience the phantom phenomenon; for example, while T examines all PRECEDENCE GRAPH
www.Jntufastupdates.com 10 www.Jntufastupdates.com 11
A precedence graph, also named conflict graph and serializability graph, is used in the context of For each transaction Ti participating in schedule S, create a node labelled T i in
concurrency control in databases. the precedence graph. So the precedence graph contains T 1, T2, T3
For each case in S where Ti executes a write_item(X) then Tj executes a
The precedence graph for a schedule S contains: read_item(X), create an edge (Ti --> Tj) in the precedence graph. This occurs
nowhere in the above example, as there is no read after write.
3. For each case in S where Ti executes a read_item(X) then Tj executes a
A node for each committed transaction in S write_item(X), create an edge (Ti --> Tj) in the precedence graph. This results in
directed edge from T1 to T2.
An arc from Ti to Tj if an action of Ti precedes and conflicts with one of Tj's actions. 4. For each case in S where Ti executes a write_item(X) then Tj executes a
write_item(X), create an edge (Ti --> Tj) in the precedence graph. This results in
directed edges from T2 to T1, T1 to T3, and T2 to T3.
Precedence graph example 5. The schedule S is conflict serializable if the precedence graph has no cycles. As
T1 and T2 constitute a cycle, then we cannot declare S as serializable or not and
serializability has to be checked using other methods.
2 To test for conflict serializability, we need to construct the precedence graph and to
invoke a cycle-detection algorithm.Cycle-detection algorithms exist which take order
n2 time, where n is the number of vertices in the graph.
www.Jntufastupdates.com 12 www.Jntufastupdates.com 13
4 A serializability order of the transactions can be obtained by finding a linear order
Can lead to the undoing of a significant amount of work
consistent with the partial order of the precedence graph.
RECOVERABLE SCHEDULES
CASCADELESS SCHEDULES
Recoverable schedule — if a transaction Tj reads a data item previously written by a
transaction Ti , then the commit operation of Ti must appear before the commit operation of Tj. Cascadeless schedules — for each pair of transactions Ti and Tj such that Tj reads
The following schedule is not recoverable if T9 commits immediately after the read(A) a data item previously written by Ti, the commit operation of Ti appears before the
operation. read operation of Tj.
If T8 should abort, T9 would have read (and possibly shown to the user) an inconsistent
database state. Hence, database must ensure that schedules are recoverable.
CASCADING ROLLBACKS
Cascading rollback – a single transaction failure leads to a series of transaction rollbacks.
Consider the following schedule where none of the transactions has yet committed (so the
schedule is recoverable)
CONCURRENCY SCHEDULE
A database must provide a mechanism that will ensure that all possible schedules are
both:
Conflict serializable.
A policy in which only one transaction can execute at a time generates serial schedules,
but provides a poor degree of concurrency
www.Jntufastupdates.com 14 www.Jntufastupdates.com 15
Concurrency-control schemes tradeoff between the amount of concurrency they allow
and the amount of overhead that they incur In SQL, a transaction begins implicitly.
Testing a schedule for serializability after it has executed is a little too late! A transaction in SQL ends by:
Tests for serializability help us understand why a concurrency control protocol is Commit work commits current transaction and begins a new one.
correct
Rollback work causes current transaction to abort.
Goal – to develop concurrency control protocols that will assure serializability.
In almost all database systems, by default, every SQL statement also commits
implicitly if it executes successfully
WEEK LEVELS OF CONSISTENCY
Some applications are willing to live with weak levels of consistency, allowing Implicit commit can be turned off by a database directive
schedules that are not serializable
E.g. in JDBC, connection.setAutoCommit(false);
E.g., a read-only transaction that wants to get an approximate total balance of all
accounts RECOVERY SYSTEM
E.g., database statistics computed for query optimization can be approximate (why?)
Failure Classification:
Such transactions need not be serializable with respect to other transactions Transaction failure :
Tradeoff accuracy for performance Logical errors: transaction cannot complete due to some internal error condition
LEVELS OF CONSISTENCY IN SQL
Serializable — default System errors: the database system must terminate an active transaction due to an error
condition (e.g., deadlock)
Repeatable read — only committed records to be read, repeated reads of same record
must return same value. However, a transaction may not be serializable – it may find some System crash: a power failure or other hardware or software failure causes the system
records inserted by a transaction but not find others. to crash.
Read committed — only committed records can be read, but successive reads of record Fail-stop assumption: non-volatile storage contents are assumed to not be corrupted as
may return different (but committed) values. result of a system crash
Read uncommitted — even uncommitted records may be read. Database systems have numerous integrity checks to prevent corruption of disk data
Lower degrees of consistency useful for gathering approximate information about the Disk failure: a head crash or similar disk failure destroys all or part of disk storage
database
Destruction is assumed to be detectable: disk drives use checksums to detect failures
Warning: some database systems do not ensure serializable schedules by default
RECOVERY ALGORITHMS
E.g., Oracle and PostgreSQL by default support a level of consistency called snapshot
isolation (not part of the SQL standard)
TRANSACTION DEFINITION IN SQL Consider transaction Ti that transfers $50 from account A to account B
Data manipulation language must include a construct for specifying the set of actions Two updates: subtract 50 from A and add 50 to B
that comprise a transaction.
www.Jntufastupdates.com 16 www.Jntufastupdates.com 17
Transaction Ti requires updates to A and B to be output to the database.
A failure may occur after one of these modifications have been made but before both of Stable-Storage Implementation
them are made.
Maintain multiple copies of each block on separate disks
Modifying the database without ensuring that the transaction will commit may leave
the database in an inconsistent state copies can be at remote sites to protect against disasters such as fire or flooding.
Not modifying the database may result in lost updates if failure occurs just after Failure during data transfer can still result in inconsistent copies.
transaction commits
Block transfer can result in
Recovery algorithms have two parts
Successful completion
1. Actions taken during normal transaction processing to ensure enough information
exists to recover from failures Partial failure: destination block has incorrect information
2. Actions taken after a failure to recover the database contents to a state that ensures Total failure: destination block was never updated
atomicity, consistency and durability
Protecting storage media from failure during data transfer (one solution):
STORAGE STRUCTURE Execute output operation as follows (assuming two copies of each block):
examples: main memory, cache memory 3. The output is completed only after the second write successfully completes.
Nonvolatile storage: Copies of a block may differ due to failure during output operation. To recover from
failure:
survives system crashes
1. First find inconsistent blocks:
examples: disk, tape, flash memory,
1. Expensive solution: Compare the two copies of every disk block.
non-volatile (battery backed up) RAM
2. Better solution:
but may still fail, losing data
Record in-progress disk writes on non-volatile storage (Non-volatile RAM or special
Stable storage: area of disk).
a mythical form of storage that survives all failures Use this information during recovery to find blocks that may be inconsistent, and only
compare copies of these.
approximated by maintaining multiple copies on distinct nonvolatile media
Used in hardware RAID systems
www.Jntufastupdates.com 18 www.Jntufastupdates.com 19
2. If either copy of an inconsistent block is detected to have an error (bad checksum),
overwrite it by the other copy. If both have no error, but are different, overwrite the second block Lock-Based Protocols
by the first block. A lock is a mechanism to control concurrent access to a data item
Data items can be locked in two modes :
DATA ACCESS 1. exclusive (X) mode. Data item can be both read as well as
written. X-lock is requested using lock-X instruction.
2. shared (S) mode. Data item can only be read. S-lock is
Physical blocks are those blocks residing on the disk. requested using lock-S instruction.
Lock requests are made to concurrency-control manager. Transaction can proceed only after
System buffer blocks are the blocks residing temporarily in main memory. request is granted.
Lock-compatibility matrix
Block movements between disk and main memory are initiated through the following
two operations:
output(B) transfers the buffer block B to the disk, and replaces the appropriate physical
block there.
We assume, for simplicity, that each data item fits in, and is stored inside, a single
block.
1) A transaction may be granted a lock on an item if the requested lock is compatible with locks
already held on the item by other transactions
Each transaction Ti has its private work-area in which local copies of all data items 2) Any number of transactions can hold shared locks on an item,
accessed and updated by it are kept. but if any transaction holds an exclusive on the item no other transaction may hold any
lock on the item.
Ti's local copy of a data item X is denoted by xi. 3) If a lock cannot be granted, the requesting transaction is made to wait till all incompatible
locks held by other transactions have been released. The lock is then granted.
BX denotes block containing X
Example of a transaction performing locking:
Transferring data items between system buffer blocks and its private work-area done T2: lock-S(A);
by: read (A);
unlock(A);
read(X) assigns the value of data item X to the local variable xi. lock-S(B);
read (B);
write(X) assigns the value of local variable xi to data item {X} in the buffer block. unlock(B);
display(A+B)
Transactions Locking as above is not sufficient to guarantee serializability — if A and B get updated
in-between the read of A and B, the displayed sum would be wrong.
Must perform read(X) before accessing X for the first time (subsequent reads can be A locking protocol is a set of rules followed by all transactions while requesting and releasing
from local copy) locks. Locking protocols restrict the set of possible schedules.
The write(X) can be executed at any time before the transaction commits Consider the partial schedule
Note that output(BX) need not immediately follow write(X). System can performthe
output operation when it seems fit.
www.Jntufastupdates.com 20 www.Jntufastupdates.com 21
6. Cascading roll-back is possible under two-phase locking. To avoid this, follow a
modified protocol called strict two-phase locking. Here a transaction must hold all its
exclusive locks till it commits/aborts.
7. Rigorous two-phase locking is even stricter: here all locks are held till commit/abort.
In this protocol transactions can be serialized in the order in which they commit.
8. There can be conflict serializable schedules that cannot be obtained if two-phase
locking is used.
9. However, in the absence of extra information (e.g., ordering of access to data), two-
phase locking is needed for conflict serializability in the following sense:
Given a transaction Ti that does not follow two-phase locking, we can find a
transaction Tj that uses two-phase locking, and a schedule for Ti and Tj that is not conflict
serializable.
TIMESTAMP-BASED PROTOCOLS
1. Each transaction is issued a timestamp when it enters the system. If an old transaction
Neither T3 nor T4 can make progress — executing lock-S(B) causes T4 to wait for T3 to Ti has time-stamp TS(Ti), a new transaction Tj is assigned time-stamp TS(Tj) such that
release its lock on B, while executing lock-X(A) causes T3 to wait for T4 to release its TS(Ti) <TS(Tj).
lock on A. 2. The protocol manages concurrent execution such that the time-stamps determine the
Such a situation is called a deadlock. serializability order.
l To handle a deadlock one of T3 or T4 must be rolled back 3. In order to assure such behavior, the protocol maintains for each data Q two timestamp
and its locks released. values:
2. The potential for deadlock exists in most locking protocols. Deadlocks are a necessary a.W-timestamp(Q) is the largest time-stamp of any transaction that executed
evil. write(Q) successfully.
3. Starvation is also possible if concurrency control manager is badly designed. For b.R-timestamp(Q) is the largest time-stamp of any transaction that executed
example: read(Q) successfully.
a. A transaction may be waiting for an X-lock on an item, while a sequence of
other transactions request and are granted an S-lock on the same item. 4. The timestamp ordering protocol ensures that any conflicting read and write
b. The same transaction is repeatedly rolled back due to deadlocks. operations are executed in timestamp order.
4.Concurrency control manager can be designed to prevent starvation. 5.Suppose a transaction Ti issues a read(Q)
1. If TS(Ti) d W-timestamp(Q), then Ti needs to read a valueof Q that was
THE TWO-PHASE LOCKING PROTOCOL already overwritten.
n Hence, the read operation is rejected, and Ti is rolled back.
1.This is a protocol which ensures conflict-serializable schedules. 2. If TS(Ti)t W-timestamp(Q), then the read operation is executed, and R-
2.Phase 1: Growing Phase timestamp(Q) is set to max(R-timestamp(Q), TS(Ti)).
a.transaction may obtain locks
b.transaction may not release locks 6. Suppose that transaction Ti issues write(Q).
3. Phase 2: Shrinking Phase 1. If TS(Ti) < R-timestamp(Q), then the value of Q that Ti is producing was needed
a.transaction may release locks previously, and the system assumed that that value would never be produced.
b.transaction may not obtain locks n Hence, the write operation is rejected, and Ti is rolled back.
4. The protocol assures serializability. It can be proved that the transactions can be 2. If TS(Ti) < W-timestamp(Q), then Ti is attempting to write an obsolete value of Q.
serialized in the order of their lock points (i.e. the point where a transaction acquired its n Hence, this write operation is rejected, and Ti is rolled back.
final lock). 3. Otherwise, the write operation is executed, and W-timestamp(Q) is set to TS(Ti).
5. Two-phase locking does not ensure freedom from deadlocks Thomas’ Write Rule
www.Jntufastupdates.com 22 www.Jntufastupdates.com 23
1) Read phase. During this phase, the system executes transaction Ti. It reads the
1. We now present a modification to the timestamp-ordering protocol that allows greater values of the various data items and stores them in variables local to Ti. It performs all write
potential concurrency than does the protocol i.e., Timestamp ordering Protocol . Let us operations on temporary local variables, without updates of the actual database.
consider schedule 4 of Figure below, and apply the timestamp-ordering protocol. Since T27 2) Validation phase. The validation test is applied to transaction Ti. This determines
starts before T28, we shall assume that TS(T27) < TS(T28). The read(Q) operation of T27 whether Ti is allowed to proceed to the write phase without causing a violation ofserializability.
succeeds, as does the write(Q) operation of T28. When T27 attempts its write(Q) operation, If a transaction fails the validation test, the system aborts the transaction.
we find that TS(T27) < W-timestamp(Q), since Wtimestamp(Q) = TS(T28). Thus, the 3) Write phase. If the validation test succeeds for transaction Ti, the temporary local
write(Q) by T27 is rejected and transaction T27 must be rolled back. variables that hold the results of any write operations performed by Ti are copied to the database.
Read-only transactions omit this phase.
2. Although the rollback of T27 is required by the timestamp-ordering protocol, it is MODES IN VALIDATION-BASED PROTOCOLS
unnecessary. Since T28 has already written Q, the value that T27 is attempting to write is 1. Start(Ti)
one that will never need to be read. Any transaction Ti with TS(Ti ) < TS(T28) that attempts 2. Validation(Ti )
a read(Q)will be rolled back, since TS(Ti)<W-timestamp(Q). 3. Finish
MULTIPLE GRANULARITY.
3. Any transaction Tj with TS(Tj ) > TS(T28) must read the value of Q written by T28, rather
than the value that T27 is attempting to write. This observation leads to a modified version multiple granularity locking (MGL) is a locking method used in database management
of the timestamp-ordering protocol in which obsolete write operations can be ignored systems (DBMS) and relational databases.
under certain circumstances. The protocol rules for read operations remain unchanged.
The protocol rules for write operations, however, are slightly different from the timestamp- In MGL, locks are set on objects that contain other objects. MGL exploits the hierarchical
ordering protocol. nature of the contains relationship. For example, a database may have files, which contain pages,
which further contain records. This can be thought of as a tree of objects, where each node
contains its children. A lock on such as a shared or exclusive lock locks the targeted node as well
as all of its descendants.
VALIDATION-BASED PROTOCOLS
Phases in Validation-Based Protocols
www.Jntufastupdates.com 24 www.Jntufastupdates.com 25
Database Management Systems Prof. B. Satyanarayana Reddy Database Management Systems Prof. B. Satyanarayana Reddy
File Headers
A file header or file descriptor contains information about a file that is needed by the system programs
that access the file records. The header includes information to determine the disk addresses of the file blocks as
well as to record format descriptions,
To search for a record on disk, one or more blocks are copied into main memory buffers. Programs then
search for the desired record or records within the buffers, using the information in the file header. If the
address of the block that contains the desired record is not known, the search programs must do a linear search
through the file blocks.
1 2
www.Jntufastupdates.com 26 www.Jntufastupdates.com 27
Database Management Systems Prof. B. Satyanarayana Reddy Database Management Systems Prof. B. Satyanarayana Reddy
Modify - Modifies some field values for the current record and (eventually) updates the file on disk to FILES OF ORDERED RECORDS (SORTED FILES):
reflect the modification. We can physically order the records of a file on disk based on the values of one of their fields-called the
ordering field. This leads to an ordered or sequential files If the ordering field is also a key field of the file-a
Insert - Inserts a new record in the file by locating the block where the record is to be inserted, field guaranteed to have a unique value in each record-then the field is called the ordering key for the file.
transferring that block into a main memory buffer (if it is not already there), writing the record into the Figure shows an ordered file with NAME as the ordering key field.
buffer, and (eventually) writing the buffer to disk to reflect the insertion.
Ordered records have some advantages over unordered files. They are:
Write − User can select to open a file in write mode, which enables them to edit its contents. It can be 1. Reading the records in order of the ordering key values becomes extremely efficient, because no
deletion, insertion, or modification. The file pointer can be located at the time of opening or can be sorting is required.
dynamically changed if the operating system allows to do so. 2. Finding the next record from the current one in order of the ordering key usually requires no additional
block accesses, because the next record is in the same block as the current one
Close − This is the most important operation from the operating system’s point of view. When a request
3. Using a search condition based on the value of an ordering key field results in faster access when the
to close a file is generated, the operating system
binary search technique is used, which constitutes an improvement over linear searches
o removes all the locks (if in shared mode),
o saves the data (if altered) to the secondary storage media, and A binary search for disk files can be done on the blocks rather than on the records. Suppose that the file
has b blocks numbered 1, 2, ... , b; the records are ordered by ascending value of their ordering key field; and
o releases all the buffers and file handlers associated with the file. we are searching for a record whose ordering key field value is K. Assuming that disk addresses of the file
The organization of data inside a file plays a major role here. The process to locate the file pointer to a desired blocks are available in the file header. A binary search usually accesses log2(b) blocks, whether the record is
record inside a file various based on whether the records are arranged sequentially or clustered. found or not-an improvement over linear searches, where, on the average, (b/2) blocks are accessed when the
record is found and b blocks are accessed when the record is not found.
3 4
www.Jntufastupdates.com 28 www.Jntufastupdates.com 29
Database Management Systems Prof. B. Satyanarayana Reddy Database Management Systems Prof. B. Satyanarayana Reddy
among them. For record deletion, the problem is less severe if deletion markers and periodic reorganization are
used.
One option for making insertion more efficient is to keep some unused space in each block for new records.
However, once this space is used up, the original problem resurfaces.
Modifying: Modifying a field value of a record depends on two factors: (1) the search condition to locate the
record and (2) the field to be modified. If the search condition involves the ordering key field, we can locate the
record using a binary search; otherwise we must do a linear search. A non ordering field can be modified by
changing the record and rewriting it in the same physical location on disk-assuming fixed-length records.
Modifying the ordering field means that the record can change its position in the file, which requires deletion of
the old record followed by insertion of the modified record.
Reading: Reading the file records in order of the ordering field is quite efficient if we ignore the records in
overflow, since the blocks can be read consecutively using double buffering. To include the records in
overflow, we must merge them in their correct positions; in this case, we can first reorganize the file, and then
read its blocks sequentially.
Ordered files are rarely used in database applications unless an additional access path, called a primary index, is
used; this results in an indexed sequential file. This further improves the random access time on the ordering
key field. If Ordering attribute is not key then the file is Clustered file.
HASHING:
Another type of primary file organization is based on hashing, which provides very fast access to
records on certain search conditions. This organization is usually called a hash file. The search condition must
be an equality condition on a single field, called the hash field of the file. In most cases, the hash field is also a
key field of the file, in which case it is called the hash key. The idea behind hashing is to provide a function h,
called a hash function or randomizing function, that is applied to the hash field value of a record and yields the
address of the disk block in which the record is stored.
Internal Hashing:
For internal files, hashing is typically implemented as a hash table through the use of an array of records.
For array index range is from 0 to M - 1 have M slots whose addresses correspond to the array indexes. We
choose a hash function that transforms the hash field value into an integer between 0 and M - 1.One common
hash function is the h(K) = K mod M function.
Insertion and deletion: Inserting and deleting records are expensive operations for an ordered file because the
records must remain physically ordered. To insert a record, we must find its correct position in the file, based on
its ordering field value, and then make space in the file to insert the record in that position. For a large file this Non integer hash field values can be transformed into integers before the mod function is applied. For character
can be very time consuming because, on the average, half the records of the file must be moved to make space strings, the numeric (ASCII) codes associated with characters can be used in the transformation-for example, by
for the new record. This means that half the file blocks must be read and rewritten after records are moved multiplying those code values.
5 6
www.Jntufastupdates.com 30 www.Jntufastupdates.com 31
Database Management Systems Prof. B. Satyanarayana Reddy Database Management Systems Prof. B. Satyanarayana Reddy
Other hashing functions can be used. One technique, called folding, involves applying an arithmetic • Multiple hashing: The program applies a second hash function if the first results in a collision. If another
function such as addition or a logical function such as exclusive or to different portions of the hash field value to collision results, the program uses open addressing or applies a third hash function and then uses open
calculate the hash address. Another technique involves picking some digits of the hash field value-for example, addressing if necessary.
the third, fifth, and eighth digits-to form the hash address. to The problem with most hashing functions is that
they do not guarantee that distinct values will hash to distinct addresses, because the hash field space-the Each collision resolution method requires its own algorithms for insertion, retrieval, and deletion of records.
number of possible values a hash field can take-is usually much larger than the address space-the number of The goal of a good hashing function is to distribute the records uniformly over the address space so as to
available addresses for records. The hashing function maps the hash field space to the address space. minimize collisions while not leaving many unused locations.
A collision occurs when the hash field value of a record that is being inserted hashes to an address that
already contains a different record. In this situation, we must insert the new record in some other position, since External Hashing for Disk Files:
its hash address is occupied. The process of finding another position is called collision resolution. There are Hashing for disk files is called external hashing. To suit the characteristics of disk storage, the target
numerous methods for collision resolution, including the following: address space is made of buckets, each of which holds multiple records. A bucket is either one disk block or a
• Open addressing: Proceeding from the occupied position specified by the hash address, the program cluster of contiguous blocks. The hashing function maps a key into a relative bucket number, rather than assign
checks the subsequent positions in order until an unused (empty) position is found. an absolute block address to the bucket. A table maintained in the file header converts the bucket number into
• Chaining: For this method, various overflow locations are kept, usually by extending the array with a the corresponding disk block address.
number of overflow positions. In addition, a pointer field is added to each record location. A collision is The collision problem is less severe with buckets, because as many records as will fit in a bucket can
resolved by placing the new record in an unused overflow location and setting the pointer of the occupied hash to the same bucket without causing problems. However, we must make provisions for the case where a
hash address location to the address of that overflow location. A linked list of overflow records for each bucket is filled to capacity and a new record being inserted hashes to that bucket. We can use a variation of
hash address is thus maintained. chaining in which a pointer is maintained in each bucket to a linked list of overflow records for the bucket. The
pointers in the linked list should be record pointers, which include both a block address and a relative record
position within the block.
The hashing scheme described is called static hashing because a fixed number of buckets M is allocated. This
can be a serious drawback for dynamic files. Suppose that we allocate M buckets for the address space and let
m be the maximum number of records that can fit in one bucket then, at most (m * M) records will fit in the
allocated space. If number of records turns out to be substantially fewer than (m * M), we are left with a lot of
unused space. On the other hand, if the number of records increases to substantially more than (m * M),
numerous collisions will result and retrieval will be slowed down because of the long lists of overflow records.
In either case, we may have to change the number of blocks M allocated and then use a new hashing
function (based on the new value of M) to redistribute the records. These reorganizations can be quite time
consuming for large files.
When using external hashing, searching for a record given a value of some field other than the hash field
is as expensive as in the case of an unordered file. Record deletion can be implemented by removing the record
from its bucket. If the bucket has an overflow chain, we can move one of the overflow records into the bucket to
replace the deleted record. If the record to be deleted is already in overflow, we simply remove it from the
linked list.
7 8
www.Jntufastupdates.com 32 www.Jntufastupdates.com 33
Database Management Systems Prof. B. Satyanarayana Reddy Database Management Systems Prof. B. Satyanarayana Reddy
distinct bucket for each of the 2d directory locations. Several directory locations with the same first d' bits for
their hash values may contain the same bucket address if all the records that hash to these locations fit in a
single bucket. A local depth d'-stored with each bucket-specifies the number of bits on which the bucket
contents are based.
Bucket splitting: suppose that a new inserted record causes overflow in the bucket whose hash values start with
01-the third bucket, the records will be distributed between two buckets: the first contains all records whose
hash values start with 010, and the second all those whose hash values start with 011. Now the two directory
locations for 010 and 011 point to the two new distinct buckets. Before the split, they pointed to the same
bucket. The local depth d' of the two new buckets is 3, which is one more than the local depth of the old bucket.
9 10
www.Jntufastupdates.com 34 www.Jntufastupdates.com 35
Database Management Systems Prof. B. Satyanarayana Reddy Database Management Systems Prof. B. Satyanarayana Reddy
locations in the directory is also split into two locations, both of which have the same pointer value as did the Four buckets are shown ("000","001","110" and "111") with higher order 3 bit addresses (corresponding
original location. to the global depth of 3), and two buckets ("01" and "10") are shown with higher order 2 bit addresses
(corresponding to the local depth of 2). The latter two are the result of collapsing the "010" and "011" into "01"
The main advantage of extendible hashing that makes it attractive is that the performance of the file does
and collapsing "100" and "101" into "10". Note that the directory nodes are used implicitly to determine local
not degrade as the file grows, as opposed to static external hashing where collisions increase and the
and global depths of buckets in dynamic hashing.
corresponding chaining causes additional accesses. In addition, no space is allocated in extendible hashing for
future growth, but additional buckets can be allocated dynamically as needed. Another advantage is that The search for a record given the hashed address involves traversing the directory tree which leads to the
splitting causes minor reorganization in most cases, since only the records in one bucket are redistributed to the bucket holding that record.
two new buckets.
Linear Hashing:
A disadvantage is that the directory must be searched before accessing the buckets themselves, resulting in two
block accesses instead of one in static hashing. This performance penalty is considered minor. The idea behind linear hashing is to allow a hash file to expand and shrink its number of buckets
dynamically without needing a directory. Suppose that the file starts with M buckets numbered 0, 1, ... , M - 1
Dynamic Hashing: and uses the mod hash function h(K) = K mod M; this hash function is called the initial hash function hi
.Overflow because of collisions is still needed and can be handled by maintaining individual overflow chains
A precursor to extendible hashing was dynamic hashing, in which the addresses of the buckets were either the
for each bucket. However, when a collision leads to an overflow record in any file bucket, the first bucket in the
n- higher order bits or n-1 higher order bits, depending n the total number of keys belonging to the respective
file-bucket 0-is split into two buckets: the original bucket 0 and a new bucket M at the end of the file. The
bucket. The eventual storage of records in buckets for dynamic hashing is somewhat similar to extendible
records originally in bucket 0 are distributed between the two buckets based on a different hashing function
hashing. The major difference is in the organization of the directory. whereas extendible hashing uses the notion
hi+1(K) = K mod 2M. A key property of the two hash functions hi and hi+1 is that any records that hashed to
of global depth (higher order d bits) for the flat directory and then combines adjacent collapsible buckets into a
bucket 0 based on hi will hash to either bucket 0 or bucket M based on hi+1; this is necessary for linear hashing
bucket of local depth d-1, dynamic hashing maintains a tree structured directory with two types of nodes.
to work.
Internal nodes that have two pointers - the left pointer corresponds to the 0 bit (in the hashed address) and As further collisions lead to overflow records, additional buckets are split in the linear order 1, 2, 3, ....
a right pointer corresponding to the 1 bit. If enough overflows occur, all the original file buckets 0, 1, ... ,M - 1 will have been split, so the file now has
Leaf nodes - these hold a pointer to the actual bucket with records. 2M instead of M buckets, and all buckets use the hash function hi+1 . Hence, the records in overflow are
An example of the dynamic hashing appears as shown in the figure. eventually redistributed into regular buckets, using the function hi+1 via a delayed split of their buckets. There is
no directory; only a value n-which is initially set to 0 and is incremented by 1 whenever a split occurs-is needed
to determine which buckets have been split. To retrieve a record with hash key value K, first apply the function
hi to K; if hi(K) < n, then apply the function hi+1 on K because the bucket is already split. Initially, n = 0,
indicating that the function hi applies to all buckets; n grows linearly as buckets are split.
Splitting can be controlled by monitoring the file load factor instead of by splitting whenever an
overflow occurs. In general, the file load factor 1can be defined as l= r/(bfr * N), where r is the current number
of file records, bfr is the maximum number of records that can fit in a bucket, and N is the current number of
file buckets. Buckets that have been split can also be recombined if the load of the file falls below a certain
threshold. Blocks are combined linearly, and N is decremented appropriately. The file load can be used to
trigger both splits and combinations; in this manner the file load can be kept within a desired range.
11 12
www.Jntufastupdates.com 36 www.Jntufastupdates.com 37
Database Management Systems Prof. B. Satyanarayana Reddy Database Management Systems Prof. B. Satyanarayana Reddy
PRIMARY INDEX:
A primary index is an ordered file whose records are of fixed length with two fields. The first field is of
the same data type as the ordering key field-called the primary key-of the data file, and the second field is a
pointer to a disk block (a block address). There is one index entry (or index record) in the index file for each
block in the data file. Each index entry has the value of the primary key field for the first record in a block and a
pointer to that block as its two field values. We will refer to the two field values of index entry i as < K(i),P(i) >.
To create a primary index on the ordered file introduced , we use the NAME field as primary key,
because that is the ordering key field of the file (assuming that each value of NAME is unique). Each entry in
the index has a NAME value and a pointer. The first three index entries are as follows:
<K(l) = (Aaron,Ed), P(l) = address of block 1>
<K(2) = (Adams.john), P(2) = address of block 2>
<K(3) = (Alexander,Ed), P(3) = address of block 3>
Below figure illustrates this primary index.
The total number of entries in the index is the same as the number of disk blocks in the ordered data file.
The first record in each block of the data file is called the anchor record of the block, or simply the block
anchor.
Indexes can also be characterized as dense or sparse.
A dense index has an index entry for every search key value (and hence every record) in the data file.
A sparse (or non-dense) index on the other hand, has index entries for only some of the search values.
A primary index is hence a non-dense (sparse) index, since it includes an entry for each disk block of the
data file and the keys of its anchor record rather than for every search value. The index file for a primary index
needs substantially fewer blocks than does the data file, for two reasons.
1. There are fewer index entries than there are records in the data file.
2. Each index entry is typically smaller in size than a data record because it has only two fields;
consequently, more index entries than data records can fit in one block. Hence requires fewer block
accesses than searching on the data file.
The binary search for an ordered data file required log 2 b block accesses. where b is the no. of blocks in CLUSTERING INDEX:
the data file. But if the primary index file contains b, blocks, then to locate a record with a search key value
requires a binary search of that index and access to the block containing that record: a total of log 2 bi +1 If records of a file are physically ordered on a non-key field-which does not have a distinct value for
accesses. each record-that field is called the clustering field. We can create a different type of index, called a clustering
index, to speed up retrieval of records that have the same value for the clustering field. This differs from a
A major problem with a primary index-as with any ordered file-is insertion and deletion of records. primary index, which requires that the ordering field of the data file have a distinct value for each record.
With a primary index, the problem is compounded because, if we attempt to insert a record in its correct
position in the data file, we have to not only move records to make space for the new record but also change A clustering index is also an ordered file with two fields; the first field is of the same type as the
some index entries, since moving records will change the anchor records of some blocks. clustering field of the data file, and the second field is a block pointer. There is one entry in the clustering index
for each distinct value of the clustering field, containing the value and a pointer to the first block in the data file
Solution for this is using an unordered overflow file or using a linked list of overflow records for each block in that has a record with that value for its clustering field.
the data file.
13 14
www.Jntufastupdates.com 38 www.Jntufastupdates.com 39
Database Management Systems Prof. B. Satyanarayana Reddy Database Management Systems Prof. B. Satyanarayana Reddy
Record insertion and deletion still cause problems, because the data records are physically ordered. To
alleviate the problem of insertion, it is common to reserve a whole block (or a cluster of contiguous blocks) for
each value of the clustering field; all records with that value are placed in the block (or block cluster). This
makes insertion and deletion relatively straightforward.
A clustering index is another example of a non-dense index, because it has an entry for every distinct
value of the indexing field which is a non-key by definition and hence has duplicate values rather than for every
record in the file. SECONDARY INDEXES:
A secondary index provides a secondary means of accessing a file for which some primary access
already exists. The secondary index may be on a field which is a candidate key and has a unique value in every
record, or a non key with duplicate values. The index is an ordered file with two fields. The first field is of the
same data type as some non ordering field of the data file that is an indexing field. The second field is either a
block pointer or a record pointer. There can be many secondary indexes (and hence, indexing fields) for the
same file.
We first consider a secondary index access structure on a key field that has a distinct value for every
record. Such a field is sometimes called a secondary key. In this case there is one index entry for each record in
15 16
www.Jntufastupdates.com 40 www.Jntufastupdates.com 41
Database Management Systems Prof. B. Satyanarayana Reddy Database Management Systems Prof. B. Satyanarayana Reddy
the data file, which contains the value of the secondary key for the record and a pointer either to the block in 1. Option 1 is to include several index entries with the same K(i) value-one for each record. This would be
which the record is stored or to the record itself. Hence, such an index is dense. a dense index.
We again refer to the two field values of index entry i as <K(i), P(i)>. The entries are ordered by value 2. Option 2 is to have variable-length records for the index entries, with a repeating field for the pointer.
of K(i), so we can perform a binary search. Because the records of the data file are not physically ordered by <K(i),<P(i,1),P(i,2).........P(i,k)>
values of the secondary key field, we cannot use block anchors. P(i) in the index entries are block pointers, not
3. Option 3, which is more commonly used, is to keep the index entries themselves at a fixed length and
record pointers. Once the appropriate block is transferred to main memory, a search for the desired record
have a single entry for each index field value but to create an extra level of indirection to handle the
within the block can be carried out. multiple pointers. In this non dense scheme, the pointer P(i) in index entry <K(i), P(i)> points to a block
A secondary index usually needs more storage space and longer search time than does a primary index, of record pointers; each record pointer in that block points to one of the data file records with value K(i)
because of its larger number of entries. However, the improvement in search time for an arbitrary record is for the indexing field. If some value K(i) occurs in too many records, so that their record pointers cannot
much greater for a secondary index than for a primary index. fit in a single disk block, a cluster or linked list of blocks is used.
17 18
www.Jntufastupdates.com 42 www.Jntufastupdates.com 43
Database Management Systems Prof. B. Satyanarayana Reddy Database Management Systems Prof. B. Satyanarayana Reddy
A multilevel index considers the index file, which we will now refer to as the first (or base) level of a DYNAMIC MULTILEVEL INDEXES USING B-TREES AND B+-TREES
multilevel index, as an ordered file with a distinct value for each K(i). Hence we can create a primary index for
the first level; this index to the first level is called the second level of the multilevel index. Because the second Search Trees and B-Trees:
level is a primary index, we can use block anchors so that the second level has one entry for each block of the A search tree is a special type of tree that is used to guide the search for a record, given the value of one
first level. The blocking factor bfri for the second level-and for all subsequent levels-is the same as that for the of the record's fields. A search tree is slightly different from a multilevel index. A search tree of order p is a tree
first-level index, because all index entries are the same size; each has one field value and one block address. If such that each node contains at most p - 1 search values and p pointers in the order
the first level has r1 entries, and the blocking factor-which is also the fan-out-for the index is bfri = f0, then the
<P1, K1, P2, K2 ... , Pq-1 , Kq-1 , Pq> , where q <= p; each Pi
first level needs ceil((r1/f0)) blocks, which is therefore the number of entries r2 needed at the second level of the
index. is a pointer to a child node (or a null pointer); and each Ki, is a search value from some ordered set of values.
All search values are assumed to be unique. Two constraints must hold at all times on the search tree:
We can repeat this process for the second level. The third level, which is a primary index for the second
level, has an entry for each second-level block, so the number of third-level entries is r3 = ceil((r2/f0)). Notice 1. Within each node, K1 < K2 < K3 <.....<Kq-1 .
that we require a second level only if the first level needs more than one block of disk storage, and, similarly, 2. For all values X in the sub tree pointed at by Pi we have Ki-1 < X < K, for 1 < i < q; X < Ki, for i = 1;
we require a third level only if the second level needs more than one block. We can repeat the preceding process and Ki- 1 < X for i = q
until all the entries of some index level t fit in a single block. This block at the t th level is called the top index
level. Each level reduces the number of entries at the previous level by a factor of f0-the index fan-out-so we
can use the formula l<= (r1/((f0)t)) to calculate t. Hence, a multilevel index with r1 first-level entries will have
approximately t levels, where t = CEIL((log fo (r1))).
We can use a search tree as a mechanism to search for records stored in a disk file. The values in the tree
can be the values of one of the fields of the file, called the search field (which is the same as the index field if a
multilevel index guides the search). Each key value in the tree is associated with a pointer to the record in the
data file having that value. Alternatively, the pointer could be to the disk block containing that record. The
search tree itself can be stored on disk by assigning each tree node to a disk block. When a new record is
inserted, we must update the search tree by inserting an entry in the tree containing the search field value of the
new record and a pointer to the new record.
Algorithms are necessary for inserting and deleting search values into and from the search tree while
maintaining the preceding two constraints. In general, these algorithms do not guarantee that a search tree is
balanced, meaning that all of its leaf nodes are at the same level.
Keeping a search tree balanced is important because it guarantees that no nodes will beat very high
levels and hence require many block accesses during a tree search. Keeping the tree balanced yields a uniform
search speed regardless of the value of the search key. Another problem with search trees is that record deletion
may leave some nodes in the tree nearly empty, thus wasting storage space and increasing the number of levels.
The B-tree addresses both of these problems by specifying additional constraints on the search tree.
19 20
www.Jntufastupdates.com 44 www.Jntufastupdates.com 45
Database Management Systems Prof. B. Satyanarayana Reddy Database Management Systems Prof. B. Satyanarayana Reddy
B-Trees: The leaf nodes of the W-tree are usually linked together to provide ordered access on the search field to
The B-tree has additional constraints that ensure that the tree is always balanced and that the space the records. These leaf nodes are similar to the first (base) level of an index. Internal nodes of the B+-tree
wasted by deletion, if any, never becomes excessive. correspond to the other levels of a multilevel index. Some search field values from the leaf nodes are repeated
in the internal nodes of the B+ tree to guide the search.
More formally, a B-tree of order p, when used as an access structure on a key field to search for records
in a data file, can be defined as follows: The structure of the internal nodes of a B+ tree of order p is as follows:
1. Each internal node is of the form <P1,K1,P2,K2,......,Pq-1,Kq-1,Pq> where q <= P and each Pi is a tree
1. Each internal node in the B-tree is of the form pointer.
<P1,<K1,Pr1>,P2,<K2,Pr2>...........,<Kq-1,Prq-1>,Pq> 2. Within each internal node, K1 < K2 < ...... < Kq-1
where q<=p. Each Pi is a tree pointer - a pointer to another node in the B-tree. Each Prj is a data 3. For all search field values X in the sub tree pointed at by Pi, we have Ki-1 < X <= Ki, for 1 < i < q; X <=
pointer - a pointer to the record whose search key field value is equal to K, (or to the data file block K, for i = 1; and Ki-1 < X for i = q
containing that record). 4. Each internal node has at most p tree pointers.
2. Within each node, K1 < K2< ... < Kq-1 5. Each internal node, except the root, has at least Ceil(p/2) tree pointers. The root node has at least two
3. For all search key field values X in the sub tree pointed at by Pi (the ith sub tree, see Figure), we have: tree pointers if it is an internal node.
Ki-1 < X < Ki for 1 < i < q; X < Ki for i=1 and Ki-1 < X for i = q. 6. An internal node with q pointers, q <= p, has q - 1 search field values.
4. Each node has at most p tree pointers.
5. Each node, except the root and leaf nodes, has at least Ceil(p/2) tree pointers. The root node has at least The structure of the leaf nodes of a B+-tree of order p is as follows:
two tree pointers unless it is the only node in the tree. 1. Each leaf node is of the form <<K1,Pr1>,<K2,Pr2>,....<Kq-1,Prq-1>,Pnext>
6. A node with q tree pointers, q <= p, has q - 1 search key field values (and hence has q - 1 data pointers). where q <= p, each Pri, is a data pointer, and Pnext points to the next leaf node of the B+-tree.
7. All leaf nodes are at the same level. Leaf nodes have the same structure as internal nodes except that all 2. Within each leaf node, K1 < K2 < ... < Kq-1, q <= p.
of their tree pointers P, are null. 3. Each Pr, is a data pointer that points to the record whose search field value is K, or to a file block
containing the record (or to a block of record pointers that point to records whose search field value is K,
if the search field is not a key).
4. Each leaf node has at least Ceil(p/2)) values.
5. All leaf nodes are at the same level.
The pointers in internal nodes are tree pointers to blocks that are tree nodes, whereas the pointers in leaf
nodes are data pointers to the data file records or blocks-except for the Pnext pointer, which is a tree pointer to
the next leaf node. By starting at the leftmost leaf node, it is possible to traverse leaf nodes as a linked list, using
the Pnext pointers. This provides ordered access to the data records on the indexing field.
B+Trees:
Most implementations of a dynamic multilevel index use a variation of the B-tree data structure called a
B+-tree. In a B-tree, every value of the search field appears once at some level in the tree, along with a data
pointer. In a B+-tree, data pointers are stored only at the leaf nodes of the tree; hence, the structure of leaf nodes
differs from the structure of internal nodes. The leaf nodes have an entry for every value of the search field,
along with a data pointer to the record (or to the block that contains this record) if the search field is a key field.
For a non key search field, the pointer points to a block containing pointers to the data file records, creating an
extra level of indirection.
21 22
www.Jntufastupdates.com 46 www.Jntufastupdates.com 47
Database Management Systems Prof. B. Satyanarayana Reddy Database Management Systems Prof. B. Satyanarayana Reddy
Comparison between B-trees and B+ trees: to distribute the employees uniformly by age. The grid array shown for this file has a total of 36 cells. Each cell
points to some bucket address where the records corresponding to that cell are stored.
Because entries in the internal nodes of a B+-tree include search values and tree pointers without any
data pointers, more entries can be packed into an internal node of a B+-tree than for a similar B-tree. Thus, for
the same block (node) size, the order p will be larger for the B+-tree than for the B-tree. This can lead to fewer
B+-tree levels, improving search time. Because the structures for internal and for leaf nodes of a B+-tree are
different, the order p can be different. We will use p to denote the order for internal nodes and Pleaf to denote
the order for leaf nodes, which we define as being the maximum number of data pointers in a leaf node.
Grid Files:
Another alternative is to organize the EMPLOYEE file as a grid file. we can construct a grid array with
one linear scale (or dimension) for each of the search attributes.
The scales are made in a way as to achieve a uniform distribution of that attribute. Thus, in our example,
we show that the linear scale for DNO has DNO = 1, 2 combined as one value a on the scale, while DNO = 5
corresponds to the value 2 on that scale. Similarly, AGE is divided into its scale of 0 to 5 by grouping ages so as
23 24
www.Jntufastupdates.com 48 www.Jntufastupdates.com 49