20it007-Database Management Systems.pptx
20it007-Database Management Systems.pptx
SYLLABUS
Introduction to DBMS
Overview of DBMS- Data Models- Database Languages- Database
Administrator- Database Users- Three Schema architecture of DBMS: Basic
concepts- Mapping Constraints- Keys. Relational Algebra – Relational
Calculus: Domain relational Calculus –Tuple Relational Calculus.
Database Design and SQL
Entity-Relationship Diagram-Design Issues- Weak Entity Sets- and
Extended E-R features - Structure of relational Databases- Views-
Modifications of the Database- Concept of DDL- DML- TCL - DCL: Basic
Structure- Set Operations- Aggregate Functions- Null Values- Domain
Constraints- Referential Integrity Constraints- Assertions- Views- Nested Sub
Queries- Stored Procedures. Functional Dependency- Different Anomalies in
designing a Database.- Normalization using Functional Dependencies-
Decomposition- Boyce-Codd Normal Form- 3NF- Normalization using
Multi-Valued Dependencies- 4NF- 5NF.
• Query Processing and Transactions
Database Query Processing - Transactions- Concurrency Control – Recovery System-
State Serializability- Lock Based Protocols- Two Phase Locking.
• Storage Management and Indexing
Physical Storage Systems: Storage Interfaces – Magnetic Disks – Flash Memory
-RAID – Disk block access. Data Storage Structures: Database Storage Architecture - File
Organization- Organization of Records in Files – Data Dictionary Storage - Indexing.
• Advances in Database
Database System Architectures – Parallel and Distributed Transaction Processing –
Complex Data types: Semi structured Data – Spatial Data – Textual Data Big Data – Data
Analytics – Blockchain Databases.
References
• Abraham Silberschatz- Henry thF. Korth, S. Sudharshan, “Database System
Concepts”, Tata McGraw Hill, 7 Edition, 2019.
• Ramez Elmasri, Shamkant
th
B. Navathe, “Fundamentals of Database Systems”,
Pearson Education, 7 Edition, 2015.
• C.J. Date, A.Kannan,thS.Swamynathan, “An Introduction to Database Systems”,
Pearson Education, 8 Edition, 2006.
• Raghu Ramakrishnan,
th
“Database Management Systems”, McGraw-Hill College
Publications, 4 Edition, 2015.
• G.K.Gupta, "Database Management Systems”, Tata McGraw Hill, 1st Edition,
2018.
• Atul Kahate,st
“Introduction to Database Management Systems”, Pearson
Education, 1 Edition, 2004.
• Ivan Bayross, “SQL, PL/SQL the Programming Language of Oracle”, BPB
Publications, 2010.
20IT009-DATABASE MANAGEMENT SYSTEMS LABORATORY
• Creation of a database and write SQL queries to retrieve information from the database.
• Perform Insertion, Deletion, Modifying, Altering, Updating and Viewing records based on
conditions.
• Creation of a database using views, synonyms, sequences and indexes
• Creation of a database using Commit, Rollback and Save point.
• Creation of a database to set various constraints.
• Creating relationship between the databases.
• Write PL/SQL block to by accepting input from the user and handling exceptions.
• Creation of Procedures.
• Creation of functions.
• Mini project (Application Development using Oracle/ MySQL)
a) Inventory Control System.
b) Material Requirement Processing.
c) Hospital Management System.
d) Railway Reservation System.
e) Personal Information System.
f) Web Based User Identification System.
g) Timetable Management System.
h) Hotel Management System.
• Oracle is a company that produces database management systems (DBMS) that help
organizations manage and organize large amounts of data.
• Oracle is the most used relational database management system (RDBMS) today.
• An RDBMS is used by businesses to store and retrieve information.
• A relational database stores information in tabular form, with rows and columns representing
different data attributes and the various relationships between the data values.
• SQL is a standard database language is used to access and manipulate data in databases. SQL
stands for Structured Query Language. It was developed by IBM Computer Scientists in
the 1970s. By executing queries SQL can create, update, delete, and retrieve data in
databases like MySQL, Oracle, PostgreSQL, etc. Overall, SQL is a query language that
communicates with databases.
• Data is the unorganized information, so to organize that data, we make a database. A database
is an organized collection of structured data, usually controlled by a database management
system (DBMS). Databases help us easily store, access, and manipulate data held on a
computer.
Creation of a database and write SQL queries to retrieve information from the
database.
ALTER Command :
• The structure of an existing table can be changed by users using the SQL ALTER
TABLE command.
• Renaming a table.
• All attributes have values. For example, a student entity may have name, class, and age as
attributes.
• There exists a domain or range of values that can be assigned to attributes. For example, a
student's name cannot be a numeric value. It has to be alphabetic. A student's age cannot be
negative, etc.
Types of Attributes
• Simple attribute − Simple attributes are atomic values, which cannot be divided further.
For example, a student's phone number is an atomic value of 10 digits.
• Multi-value attribute − Multi-value attributes may contain more than one values.
For example, a person can have more than one phone number, email_address, hobby etc.
• Composite attribute − Composite attributes are made of more than one simple attribute.
For example, a student's complete name may have first_name ,middle_name and
last_name.
• Derived attribute − Derived attributes are the attributes that do not exist in the physical
database, but their values are derived from other attributes present in the database.For
example, age can be derived from data_of_birth.
• Binary Relationship –
When there are TWO entities set participating in a relation, the relationship is called as
binary relationship.For example, Student is enrolled in Course.
n-ary Relationship
• When there are n entities set participating in a relation, the relationship
is called as n-ary relationship
Degree of Relationship
• The number of participating entities in a relationship defines the degree
of the relationship.
• Binary = degree 2
• Ternary = degree 3
• n-ary = degree n
Ternary and Quaternary relationship
Mapping Constraints/Cardinalities
• An E-R enterprise schema may define certain constraints to which the contents of
database system must conform.
• Two types of constraints are
1.Mapping cardinalities
❖ One-one
❖ One-many
❖ Many-one
❖ Many-many
2.Participation constraints
❖ Total participation
❖ Partial participation
Mapping Cardinalities
• Cardinality defines the number of entities in one entity set, which can be associated with the
number of entities of other set via relationship set.
1. One to one – When each entity in each entity set can take part only once in the
relationship, the cardinality is one to one.
Eg: A male can marry to one female and a female can marry to one male. So the relationship
will be one to one.
2. One-to-many − One entity from entity set A can be associated with more than
one entities of entity set B however an entity from entity set B, can be associated
with at most one entity.
3. Many to one – When entities in one entity set can take part only once in the relationship set
and entities in other entity set can take part more than once in the relationship set, cardinality is
many to one.
Eg: A student can take only one course but one course can be taken by many students. So the
cardinality will be n to 1. It means that for one course there can be n students but for one student,
there will be only one course.
4. Many to many – When entities in all entity sets can take part more than once in the relationship cardinality is
many to many.
Eg: A student can take more than one course and one course can be taken by many students. So the relationship
will be many to many.
Participation Constraint
Participation Constraint is applied on the entity participating in the relationship set.
1. Total Participation – Each entity in the entity set must participate in the relationship.
• If each student must enroll in a course, the participation of student will be total. Total participation is shown
by double line in ER diagram.
2. Partial Participation – The entity in the entity set may or may NOT participate in the relationship.
• If some courses are not enrolled by any of the student, the participation of course will be partial.
The diagram depicts the ‘Enrolled in’ relationship set with Student Entity set having total participation and
Course Entity set having partial participation.
keys
• A DBMS key is an attribute or set of an attribute which helps you to identify a row(tuple)
in a relation(table).
• It allows you to find the relation between two tables.
• Keys help you uniquely identify a row in a table by a combination of one or more
columns in that table.
• Eg:employee id is a primary key because it uniquely identifies an employee record.
Various Keys in Database Management System
• Super Key
• Primary Key
• Candidate Key
• Foreign Key
What is the Super key?
• A superkey is a group of single or multiple keys which identifies rows in a table.
• A Super key may have additional attributes that are not needed for unique
identification.
What is a Primary Key?
• A column or group of columns in a table which helps us to uniquely identifies every row in that table is
called a primary key. The same value can't appear more than once in the table.
Rules for defining Primary key:
• Two rows can't have the same primary key value
• It must for every row to have a primary key value.
• The primary key field cannot be null.
• The value in a primary key column can never be modified or updated if any foreign key refers to that
primary key.
What is a Candidate Key?
• A super key with no repeated attribute is called candidate key.
• The Primary key should be selected from the candidate keys. Every table must have at least a single
candidate key.
Properties of Candidate key:
• It must contain unique values
• Candidate key may have multiple attributes
• Must not contain null values
• It should contain minimum fields to ensure uniqueness
• Uniquely identify each record in a table
• Example: In the given table Stud ID, Roll No, and email are candidate keys which help us to uniquely
identify the student record in the table.
Foreign key /Reference key
• A foreign key is a column which is added to create a relationship with another table.
• Foreign keys help us to maintain data integrity and also allows navigation between two different instances
of an entity.
Difference Between Primary key & Foreign key
Primary Key
• Helps you to uniquely identify a record in the table.
• Primary Key never accept null values.
• You can have the single Primary key in a table.
Foreign Key
• It is a field in the table that is the primary key of another table.
• A foreign key may accept multiple null values.
• You can have multiple foreign keys in a table.
Strong and Weak Entity sets
Strong Entity:
• Strong entity is not dependent of any other entity in schema.
• Strong entity always has primary key.
• Strong entity is represented by single rectangle.
• Two strong entity’s relationship is represented by single diamond.
• Various strong entities together makes the strong entity set.
Weak Entity:
• weak entity is an entity that cannot be uniquely identified by its attributes alone.
• Weak entity depends on strong entity to ensure the existence of weak entity.
• weak entity does not have any primary key, It has partial discriminator key.
• Weak entity is represented by double rectangle.
• The relation between one strong and one weak entity is represented by double diamond.
Cardinality constraints-one to one relationship
One to many relationship
Many to one relationship
Many to many relationship
Cardinality limits on relationship sets
E-R diagram for banking system
Extended E-R model(EER model)
1. Generalization
❖ Generalization is a bottom-up approach in which two lower level entities combine to form a
higher level entity.
❖ In generalization, the higher level entity can also combine with other lower level entities to
make further higher level entity.
❖ For example, Saving and Current account types entities can be generalised and an entity
with name Account can be created, which covers both.
2. Specialization
❖ Specialization is opposite to Generalization.
❖ It is a top-down approach in which one higher level entity can be broken down into two
lower level entity.
❖ In specialization, a higher level entity may not have any lower-level entity sets, it's
possible.
3. Aggregration
• Aggregration is a process when relation between two entities is treated as a single entity.
• In the diagram above, the relationship between Center and Course together, is acting as an
Entity, which is in relationship with another entity Visitor.
Relational Model
• A Relational Database management System(RDBMS) is a database management
system based on the relational model introduced by E.F Codd.
• In relational model, data is stored in relations(tables) and is represented in form of
tuples(rows).
• RDBMS is used to manage Relational database.
• Relational database is a collection of organized set of tables related to each other, and
from which data can be accessed easily.
RDBMS Concepts
What is Table ?
• In Relational database model, a table is a collection of data elements organised in terms of rows
and columns.
• A table is also considered as a convenient representation of relations.
• A table can have duplicate row of data .
• Table is the most simplest form of data storage.
Example: Employee table
What is a Tuple?
• A single entry in a table is called a Tuple or Record or Row.
• A tuple in a table represents a set of related data.
• For example, the above Employee table has 4 tuples/records/rows.
What is an Attribute?
• A table consists of several records(row), each record can be broken down into several
smaller parts of data known as Attributes or columns or fields.
• The above Employee table consist of four attributes,
✔ ID
✔ Name
✔ Age
✔ Salary.
Attribute Domain
• When an attribute is defined in a relation(table), it is defined to hold only a certain type
of values, which is known as Attribute Domain.
• The attribute Name will hold the name of employee for every tuple.
• If we save employee's address there, it will be violation of the Relational database
model.
What is a Relation Schema?
A relation schema describes the structure of the relation, with the name of the relation(name of table), its
attributes and their names and type.
What is a Relation instance − A finite set of tuples in the relational database system represents relation instance.
Relation instances do not have duplicate tuples.
For example,
∏rollno,name (Student)
It will show only the rollno and name columns for all the rows in the Student table.
3. Union Operation (∪)
• This operation is used to fetch data from two relations(tables).
• The relations(tables) specified should have same number of attributes(columns) and same
attribute domain. Also the duplicate tuples are automatically eliminated from the result.
Syntax: r ∪ s
• where r and s are relations.
4. Set Difference (-)
• This operation is used to find data present in one(first) relation and not present in the
second relation. This operation is also applicable on two relations, just like Union
operation.
Syntax: r - s
• where r and s are relations.
5.Cartesian Product (X)
• This is used to combine data from two different relations(tables) into one and fetch data
from the combined relation.
Syntax: A X B
6.Rename Operation (ρ)-(rho)
• This operation is used to rename either the relation or the attributes.
Syntax: ρs(R)
ρ(RelationNew, RelationOld)
Additional Operations in Relational Algebra
• Set Intersection
• Natural join
• Division operation
• Assignment operation
Set Intersection
• The intersection operator gives the common data values between the two
data sets/tables/relations that are intersected.
Natural join operation
• Natural join is a binary operation that is used to combine certain selections and a Cartesian
product into one operation.
• It is denoted by the join symbol ⋈ .
Division operation
• The division is a binary operation that is written as R ÷ S.
• Suited to queries that include the phrase ‘for all’.
Assignment Operation
Extended Relational Algebra Operations
• In Relational Algebra, Extended Operators are those operators that are
derived from the basic operators.
■ Generalized Projection
■ Outer Join
■ Aggregate Functions
■ Aggregation function takes a collection of values and returns a
single value as a result.
avg: average value
min: minimum value
max:
maximum value
sum: sum of
values
count: number of values
■ Aggregate operation in relational algebra
G1, G2, …, Gn
g F1( A1 , F2( A2 ,…, Fn( An (E)
) ) )
α α 7
α β 7
β β 3
β β
10
sum-C
g sum(c) (r)
27
■ Relation account grouped by branch-name:
g (account)
branch-name sum(balance)
branch-name balance
Perryridge 1300
Brighton 1500
Redwood 700
■ An extension of the join operation that avoids loss of information.
■ Computes the join and then adds tuples form one relation that does
not match tuples in the other relation to the result of the join.
■ Uses null values:
☟ null signifies that the value is unknown or does not exist
☟ All comparisons involving null are false by definition.
■ The content of the database may be modified using the following
operations:
☟ Deletion
☟ Insertion
☟ Updating
■ All these operations are expressed using the assignment
operator.
Deletion
■ A delete request is expressed similarly to a query, except instead of
displaying tuples to the user, the selected tuples are removed from
the database.
■ Can delete only whole tuples; cannot delete values on only
particular attributes
■ A deletion is expressed in relational algebra by:
r←r–E
where r is a relation and E is a relational algebra query.
■ Delete all account records in the Perryridge branch.
❑ Provide as a gift for all loan customers in the Perryridge branch, a $200 savings
account.
■ Let the loan number serve as the account number for the new savings account.
r ← (σ (borrower loan))
1 branch-name = “Perryridge”
account ← account ∪ ∏ (r )
branch-name, account-number,200 1
depositor ← depositor ∪ ∏ (r )
customer-name, loan-number 1
Relational Calculus
• Relational Algebra - procedural query language to fetch data and which also explains
how it is done.
• Relational Calculus - non-procedural query language and has no description about how
the query will work or the data will be fetched. It only focusses on what to do, and not
on how to do it.
For example,
{< name, age > | ∈ Student ∧ age > 17}
• The above query will return the names and ages of the students in the table
Student who are older than 17.
Integrity Constraints
• Integrity constraints are a set of rules.
• It is used to maintain the quality of information.
• Integrity constraints ensure that the data insertion, updating, and other
processes have to be performed in such a way that data integrity is not
affected.
Types of Integrity Constraint
1. Domain constraints
• Domain constraints can be defined as the definition of a valid set of
values for an attribute.
• The data type of domain includes string, character, integer, time, date,
currency, etc. The value of the attribute must be available in the
corresponding domain.
• Example:
2. Entity integrity constraints
• The entity integrity constraint states that primary key value can't be null.
• This is because the primary key value is used to identify individual rows in
relation and if the primary key has a null value, then we can't identify those
rows.
• A table can contain a null value other than the primary key field.
Example:
3. Referential Integrity Constraints
• A referential integrity constraint is specified between two tables.
• In the Referential integrity constraints, if a foreign key in Table 1 refers to the
Primary Key of Table 2, then every value of the Foreign Key in Table 1 must be
available in Table 2.
Example:
4. Key constraints
• Keys are the entity set that is used to identify an entity within its entity set
uniquely.
• An entity set can have multiple keys, but out of which one key will be the primary
key. A primary key can contain a unique value in the relational table.
• Example:
ATTRIBUTE CLOSURE
• USING ATTRIBUTE CLOSURE,WE CAN FIND THE GIVEN KEY IS CANDIDATE
KEY OR NOT.
• X=set of attributes
• X(superscript +) = contains set of attributes determined by X.
• (here ‘+’ symbol indicates attribute closure of X)
Eg.1: R(A,B,C,D,E) and FD { A->B,B->C,C->D,D->E}
A+={A,B,C,D,E} ----------------🡪SUPER KEY
AB+={A,B,C,D,E} ----------------🡪SUPER KEY
AC+={A,C,B,D,E} ----------------🡪SUPER KEY
AD+={A,D,B,C,E} ----------------🡪SUPER KEY
AE+={A,E,B,C,D} --------------🡪SUPER KEY
ABC+={A,B,C,D,E} ----------------🡪SUPER KEY
.etc….,
So all the subsets that combines with ‘A’ gives the SUPER KEY.
ATTRIBUTE CLOSURE
CHECK FOR ‘B’:
• BC+={B,C,D,E}
• BD+={B,D,C,E}
• BE+={B,E,C,D}
• BDC+={B,D,C,E}
……..etc.,
But we cannot determine ‘A’ here. None of the subsets gave SUPER KEY.
CHECK FOR ‘C’:
CD+={C,D,E}
CE+={C,E,D}
But we cannot determine ‘A’ and ‘B’ here. None of the subsets gave SUPER KEY.
Similarly check for ‘D’ and ‘E’.
ATTRIBUTE CLOSURE
• In this relation, SUPER KEYS are A+,AB+,AC+,AD+,AE+,ABC+…etc….
• Now check the PROPER SUBSET of the SUPER KEYS determines super key or not..
1) A+ :
PROPER SUBSET={ empty}------------🡪CANDIDATE KEY
2) AB+ :
PROPER SUBSET={A},{B}
A+={A,B,C,D,E} -------🡪SUPER KEY
B+={B,C,D,E}
This subset determines SUPER KEY.so this is not a CANDIDATE KEY.
3)AC+ :
PROPER SUBSET={A},{C}
A+={A,B,C,D,E} -------🡪SUPER KEY
C+={C,D,E}
This subset determines SUPER KEY.so this is not a CANDIDATE KEY.
4)AD+ :
PROPER SUBSET={A},{D}
A+={A,B,C,D,E} -------🡪SUPER KEY
D+={D,E}
This subset determines SUPER KEY.so this is not a CANDIDATE KEY.
4)AE+ :
PROPER SUBSET={A},{E}
A+={A,B,C,D,E} -------🡪SUPER KEY
E+={E}
This subset determines SUPER KEY.so this is not a CANDIDATE KEY.
5)ABC+ :
PROPER SUBSET={A},{B},{C},{AC},{AB},{BC}
A+={A,B,C,D,E} -------🡪SUPER KEY
B+={B,C,D,E}
C+={C,D,E}
AB+={A,B,C,D,E} -------🡪SUPER KEY
AC+={A,B,C,D,E} -------🡪SUPER KEY
BC+={B,C,D,E}
This subset determines SUPER KEY.so this is not a CANDIDATE KEY.
AC+:
• PROPER SUBSET={A},{C}
Check ‘A’ and ‘C’ is a Super Key.
Here ‘A’ is a Super Key then ‘AC’ is not a
Candidate Key.
In this relation,
ABC+:
5 Super keys : A,AB,AC,ABC,BC
PROPER SUBSET=
{A},{B},{C},{AB},{AC},{BC} 2 Candidate Keys: A ,BC
Here A ,AB,AC are Super Keys. 1 Primary Key: A
So ABC is not a Candidate Key. 1 Alternate Key: BC.
BC:
PROPER SUBSET= {B}{C}
Here B and C are not a SK.
SO BC is a CK.
A B C D
1 1 5 1
2 1 7 1
3 1 7 1
4 2 7 1
5 2 5 1
6 2 5 2
SUPER KEYS: • ABD={A},{B},{D},{AB},{AD},{BD}
A,AB,AC,AD,ABC,ABD,ACD,ABCD Already A,AB,AD are super keys.
Check the proper subset: So ABD is not a CK.
• A+={empty}, so A is a CK. • ACD={A},{C},{D},{AC},{AD},{CD}
• AB+={A},{B} Already A,AC,AD are super keys.
Already A is a SK.so AB is not a CK. • ABCD={A},{B},{C},{D},{AB},{AC},{AD
},{BC},{BD},{CD},{ABC},{ABD},
• AC={A},{C}
{BCD},{ACD}
Already A is a SK.so AC is not a CK.
Already A,AB,AC,AD,ABC,ABD,ACD->SK.
• AD={A},{D}
Already A is a SK.so AD is not a CK.
So in this relation,
• ABC={A},{B},{C},{AB},{AC},{BC}
8 Super keys are,
Already A,AB,AC are super keys.
A,AB,AC,AD,ABC,ABD,ACD,ABCD
so ABC is not a CK.
1 Candidate key= A
FUNCTIONAL DEPENDENCY
• Functional Dependency (FD) is a constraint that determines the relation of one attribute to
another attribute in a Database Management System. (OR) (one attribute is dependent on
another attribute)
• Functional Dependency helps to maintain the quality of data in the database.
• A functional dependency is denoted by an arrow “→”. The functional dependency of X on Y
is represented by X → Y. (X-Determinant and Y-Dependent)
• Eg:If we know the value of Employee number, we can obtain Employee Name, city, salary,
etc.Here city, Employee Name, and salary are functionally depended on Employee
number.
TYPES OF FUNCTIONAL DEPENDENCY
• TRIVIAL (This function is always valid)
• NON-TRIVIAL
• MULTI-VALUED
• TRANSITIVE
TRIVIAL: (X) 🡪 (Y)
1) FD,X🡪Y, If Y is a subset of X (eg. R.NO,NAME🡪NAME)
2) X🡪X (eg.R.NO🡪R.NO) (always valid)
NON TRIVIAL: (this function may or may not be valid depends on the data in the table)
X🡪Y , (X intersection Y = empty) nothing is common in X and Y
(eg: R.NO🡪NAME)
SEMI TRIVIAL:
eg. R.NO, NAME🡪 NAME,MARKS
ARMSTRONGS AXIOMS/INFERENCE RULE
• Using these rule, we can find out all the functional dependencies exist on a given
relation/table. R.NO. NAME MARK DEPT COUR
• 7 rules( 3 primary rules and 4 secondary rules) S SE
1 a 78 CS C1
1)REFLEXIVITY:
2 b 60 EE C1
X🡪X 3 a 78 CS C2
X🡪Y, Y is the subset of X. 4 b 60 EE C3
2)TRANSITIVITY: 5 c 80 IT C3
6 d 80 EC C2
If (X🡪Y & Y🡪Z) then X🡪Z
Eg.NAME🡪MARKS & MARKS🡪DEPT) then NAME🡪DEPT.
3)AUGUMENTATION:
If X🡪Y then XA🡪YA(add any attribute in left and right side)
eg. R.NO🡪NAME
(R.NO,MARKS)🡪(NAME,MARKS)
ARMSTRONGS AXIOMS/INFERENCE RULE
• (4 secondary rules)
4)UNION:
If X🡪Y & X🡪Z then x🡪YZ
Eg.RNO🡪NAME & RNO🡪MARKS then RNO🡪(NAME,MARKS)
5)DECOMPOSITION/SPLITTING:
cannot split in the left side(determinant)
Split only in the right side)
If X🡪YZ then X🡪Y & X🡪Z
Eg.(NAME,MARKS)🡪(DEPT,COURSE) then
(NAME,MARKS)🡪DEPT & (NAME,MARKS)🡪COURSE
ARMSTRONGS AXIOMS/INFERENCE RULE
• (4 secondary rules)
6)PSEUDO TRANSITIVITY:
If (X🡪Y & YZ🡪A) then XZ🡪A
Eg.(R.NO🡪NAME) & (NAME,MARKS)🡪DEPT then (R.NO,MARKS)🡪DEPT
7)COMPOSITION:
If X🡪Y & A🡪B then XA🡪YB
ATTRIBUTE CLOSURE/CLOSURE SET
• USING ATTRIBUTE CLOSURE,WE CAN FIND THE GIVEN KEY IS CANDIDATE
KEY OR NOT.
• X=set of attributes
• X(superscript +) = contains set of attributes determined by X.
• (here ‘+’ symbol indicates attribute closure of X)
Eg.R(A,B,C,D,E)
• FD {A->B,B->C,C->D,D->E}
• A->B, B->C then A->C (transitivity)
• A->A (write REFLEXIVITY)
• A->C ,C->D then A->D (transitivity)
• A->D,D->E then A->E (transitivity)
• A->ABCDE (union)
ATTRIBUTE CLOSURE/CLOSURE SET
Eg.R(A,B,C,D,E)
• FD { A->B,B->C,C->D,D->E}
We can also write,
• B->C,C->D then B->D (transitivity)
• B->D,D->E then B->E (transitivity)
• B->B (reflexivity)
• B->BCDE(union),but we cannot determine A here.
We can also write
C->D,D->E then C->E (transitivity)
C->C (reflexivity)
C->CDE (union), but we cannot determine A and B here.
We can also write
E->E (reflexivity)
ATTRIBUTE CLOSURE/CLOSURE SET
Eg.R(A,B,C,D,E)
• FD { A->B,B->C,C->D,D->E}
• A->B, we can write it as AD->BD(augumentation)
• AD->BD then AD->B and AD->D (splitting/Decomposition)
FIND the closure of A(superscript +)={A,B,C,D, E}-🡪super key
FIND the closure of AD(+)={A,D,B,C,E}🡪super key
FIND the closure of B(+)={B,C,D,E}
FIND the closure of CD(+)={C,D,E}
FIND the closure of AB(+)={A,B,C,D,E}->SUPER KEY
Super key: it is a set of attributes whose closure contains all attributes of a given relation.
Number of super key present in the relation = 16 (since R(A,B,C,D,E) =A with (B,C,D,E)
Possibilities= 2 power 4=16 superkeys
Normalization
• Normalization is the process of organizing the data in the database.
• Normalization is used to minimize/reduce the redundancy from the database table.
• It is also used to eliminate the unwanted characteristics like Insertion, Update and
Deletion Anomalies.
• Normalization divides the larger table into the smaller table and links them using
relationship.
• Delete anomaly: Suppose, if at a point of time the company closes the department
D890 then deleting the rows that are having emp_dept as D890 would also delete
the information of employee Maggie since she is assigned only to this department.
• Update anomaly: Two rows for employee Rick as he belongs to two departments
of the company. If we want to update the address of Rick then we have to update
the same in two rows or the data will become inconsistent.
OUTPUT TABLE:
Second Normal Form (2NF)
• To be in second normal form, a relation must be in first normal form(1NF) and
relation must not contain any partial dependency.
• A relation is in 2NF if it has No Partial Dependency, i.e., no non-prime
attribute (An attribute that is not part of any candidate key is known as
non-prime attribute) is dependent on any proper subset of any candidate key of
the table.
• Partial Dependency – If the proper subset of candidate key determines
non-prime attribute, it is called partial dependency.
Eg:1
• There are many courses having the same course fee.
• COURSE_FEE cannot alone decide the value of COURSE_NO or STUD_NO
• COURSE_FEE together with STUD_NO cannot decide the value of COURSE_NO
• COURSE_FEE together with COURSE_NO cannot decide the value of STUD_NO
• Hence,COURSE_FEE would be a non-prime attribute, as it does not belong to the one only candidate key
{STUD_NO, COURSE_NO} ;
• But, COURSE_NO -> COURSE_FEE, i.e., COURSE_FEE is dependent on COURSE_NO, which is a proper
subset of the candidate key. Non-prime attribute COURSE_FEE is dependent on a proper subset of the
candidate key, which is a partial dependency and so this relation is not in 2NF.
• To convert the above relation to 2NF, we need to split the table into two tables such as :
Table 1: STUD_NO, COURSE_NO
Table 2: COURSE_NO, COURSE_FEE
• all non-key attributes are fully functional dependent on the primary key.
Third Normal Form (3NF)
• A relation will be in 3NF if it is in 2NF and not contain any transitive partial
dependency.
• 3NF is used to reduce the data duplication.
Indexing
Basic Concepts
⚫ Indexing mechanisms used to speed up access to
desired data.
⚫ E.g., author catalog in library
⚫ Search Key - attribute to set of attributes used to look
up records in a file.
⚫ An index file consists of records (called index entries) of
the form
search-key pointer
⚫ Index files are typically much smaller than the original file
⚫ Two basic kinds of indices:
⚫ Ordered indices: search keys are stored in sorted order
⚫ Hash indices: search keys are distributed uniformly across
“buckets” using a “hash function”.
Index Evaluation Metrics
⚫ Access types supported efficiently. E.g.,
⚫ records with a specified value in the attribute
⚫ or records with an attribute value falling in a specified
range of values.
⚫ Access time
⚫ Insertion time
⚫ Deletion time
⚫ Space overhead
Ordered Indices
⚫ In an ordered index, index entries are stored sorted on the search key
value. E.g., author catalog in library.
⚫ Primary index: in a sequentially ordered file, the index whose search key
specifies the sequential order of the file.
⚫ Also called clustering index
⚫ The search key of a primary index is not necessarily the primary key.
⚫ If even outer index is too large to fit in main memory, yet another level of
index can be created, and so on.
⚫ Indices at all levels must be updated on insertion or deletion from the file.
Multilevel Index (Cont.)
Data Dictionary Storage
Data dictionary (also called system catalog) stores metadata; that is, data
about data, such as
•User_metadata = (user_name,
encrypted_password, group)
Index_metadata = (index_name, relation_name, index_type, index_attributes)
Schedule 5
Schedule 6
Conflict Equivalent
View Serializability
• A schedule will view serializable if it is view equivalent to a serial schedule.
• If a schedule is conflict serializable, then it will be view serializable.
• The view serializable which does not conflict serializable contains blind writes. Blind write is simply when
a transaction writes without reading.
• A transaction have WRITE(Q), but no READ(Q) before it. So, the transaction is writing to the database
"blindly" without reading previous value.
View Equivalent
• Two schedules S1 and S2 are said to be view equivalent if they satisfy the following conditions:
1. Initial Read
• An initial read of both schedules must be the same. Suppose two schedule S1 and S2. In schedule S1, if a
transaction T1 is reading the data item A, then in S2, transaction T1 should also read A.
View Serializability
View Serializability
Recoverability of Schedule
What is recoverability?
• Sometimes a transaction may not execute completely due to a software
issue, system crash or hardware failure. In that case, the failed
transaction has to be rollback. But some other transaction may also
have used value produced by the failed transaction.
What is non recoverable schedule?
• A non recoverable schedule means: When there is a system failures,
we may not be able to recover to a consistent database state.
A cascading rollback occurs in database systems when a transaction (T1) causes a failure
and a rollback must be performed. Other transactions dependent on T1's actions must also be
rollbacked due to T1's failure, thus causing a cascading effect. That is, one transaction's
failure causes many to fail.
Concurrency Control
• In the concurrency control, the multiple transactions can be executed
simultaneously.
• It may affect the transaction result. It is highly important to maintain the order
of execution of those transactions.
Problems of concurrency control
Several problems can occur when concurrent transactions are executed in an
uncontrolled manner. Following are the three problems in concurrency control.
• Lost updates
• Dirty read
• Unrepeatable /Nonrepeatable read
lost update problem
• In the lost update problem, update done to a data item by a transaction is lost as it is
overwritten by the update done by another transaction.
• This is incorrect, the correct result is 12-3-2 = 7.figure.1
Dirty read
• A Dirty read is the situation when a transaction reads a data that has not yet
been committed.
• For example, Let’s say transaction 1 updates a row and leaves it
uncommitted, meanwhile, Transaction 2 reads the updated row. If transaction
1 rolls back the change, transaction 2 will have read data that is considered
never to have existed.
Unrepeatable /Non repeatable read
• Non Repeatable read occurs when a transaction reads same row twice, and get a different value
each time.
• For example, suppose transaction T1 reads data. Due to concurrency, another transaction T2
updates the same data and commit, Now if transaction T1 rereads the same data, it will retrieve a
different value.
Concurrency Control Protocol
• Concurrency control protocols ensure atomicity, isolation, and serializability of
concurrent transactions.
• The concurrency control protocol can be divided into three categories:
• Lock based protocol
• Time-stamp protocol
• Validation based protocol
Lock-Based Protocol
• In this type of protocol, any transaction cannot read or write data until it acquires an appropriate
lock on it.
There are two types of lock:
1. Shared lock:
• It is also known as a Read-only lock. In a shared lock, the data item can only read by the
transaction.
• It can be shared between the transactions because when the transaction holds a lock, then it
can't update the data on the data item.
2. Exclusive lock:
• In the exclusive lock, the data item can be both reads as well as written by the transaction.
• This lock is exclusive, and in this lock, multiple transactions do not modify the same data
simultaneously.
Lock-Based Protocol
Types of lock protocols available:
1. Simplistic lock protocol
• It is the simplest way of locking the data while transaction.
• Simplistic lock-based protocols allow all the transactions to get the lock
on the data before insert or delete or update on it.
• It will unlock the data item after completing the transaction.
Lock-Based Protocol
2. Pre-claiming Lock Protocol
• Pre-claiming Lock Protocols evaluate the transaction to list all the data items on which they need
locks.
• Before initiating an execution of the transaction, it requests DBMS for all the lock on all those
data items.
• If all the locks are granted then this protocol allows the transaction to begin. When the
transaction is completed then it releases all the lock.
• If all the locks are not granted then this protocol allows the transaction to rolls back and waits
until all the locks are granted.
Lock-Based Protocol
3. Two-phase locking (2PL)
• The two-phase locking protocol divides the execution phase of the transaction into three parts.
• In the first part, when the execution of the transaction starts, it seeks permission for the lock it
requires.
• In the second part, the transaction acquires all the locks.
• The third phase is started as soon as the transaction releases its first lock.
• In the third phase, the transaction cannot demand any new locks. It only releases the acquired
locks.
Lock-Based Protocol
There are two phases of 2PL:
• Growing phase: In the growing phase, a new lock on the data item may be acquired by the transaction, but
none can be released.
• Shrinking phase: In the shrinking phase, existing lock held by the transaction may be released, but no new
locks can be acquired.
• LOCK POINT: The Point at which the growing phase ends, i.e., when a transaction takes the final lock it
needs to carry on its work.
• 2-PL ensures serializability
Drawbacks of 2-PL:
• Cascading Rollback is possible under 2-PL.
• Deadlocks and Starvation are possible.
Strict 2-PL:
• all Exclusive(X) locks held by the transaction be released until after the Transaction
Commits.
Following Strict 2-PL ensures that our schedule is:
• Recoverable
• Cascadeless
Hence, it gives us freedom from Cascading Abort but still, Deadlocks are possible!
Rigorous 2-PL
• all Exclusive(X) and Shared(S) locks held by the transaction be released
until after the Transaction Commits.
Following Rigorous 2-PL ensures that our schedule is:
• Recoverable
• Cascadeless
• Hence, it gives us freedom from Cascading Abort but still, Deadlocks are possible!
Basic Steps in Query Processing
1. Parsing and translation
2. Optimization
3. Evaluation