0% found this document useful (0 votes)
8 views

20it007-Database Management Systems.pptx

The document outlines the syllabus for a Database Management Systems course, covering topics such as DBMS introduction, database design, SQL, query processing, storage management, and advanced database concepts. It includes practical lab exercises for creating databases and writing SQL queries, along with a mini project component. References to key textbooks and resources are also provided.

Uploaded by

thenmozhimb1006
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

20it007-Database Management Systems.pptx

The document outlines the syllabus for a Database Management Systems course, covering topics such as DBMS introduction, database design, SQL, query processing, storage management, and advanced database concepts. It includes practical lab exercises for creating databases and writing SQL queries, along with a mini project component. References to key textbooks and resources are also provided.

Uploaded by

thenmozhimb1006
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 231

20IT007-DATABASE MANAGEMENT SYSTEMS

SYLLABUS
Introduction to DBMS
Overview of DBMS- Data Models- Database Languages- Database
Administrator- Database Users- Three Schema architecture of DBMS: Basic
concepts- Mapping Constraints- Keys. Relational Algebra – Relational
Calculus: Domain relational Calculus –Tuple Relational Calculus.
Database Design and SQL
Entity-Relationship Diagram-Design Issues- Weak Entity Sets- and
Extended E-R features - Structure of relational Databases- Views-
Modifications of the Database- Concept of DDL- DML- TCL - DCL: Basic
Structure- Set Operations- Aggregate Functions- Null Values- Domain
Constraints- Referential Integrity Constraints- Assertions- Views- Nested Sub
Queries- Stored Procedures. Functional Dependency- Different Anomalies in
designing a Database.- Normalization using Functional Dependencies-
Decomposition- Boyce-Codd Normal Form- 3NF- Normalization using
Multi-Valued Dependencies- 4NF- 5NF.
• Query Processing and Transactions
Database Query Processing - Transactions- Concurrency Control – Recovery System-
State Serializability- Lock Based Protocols- Two Phase Locking.
• Storage Management and Indexing
Physical Storage Systems: Storage Interfaces – Magnetic Disks – Flash Memory
-RAID – Disk block access. Data Storage Structures: Database Storage Architecture - File
Organization- Organization of Records in Files – Data Dictionary Storage - Indexing.
• Advances in Database
Database System Architectures – Parallel and Distributed Transaction Processing –
Complex Data types: Semi structured Data – Spatial Data – Textual Data Big Data – Data
Analytics – Blockchain Databases.
References
• Abraham Silberschatz- Henry thF. Korth, S. Sudharshan, “Database System
Concepts”, Tata McGraw Hill, 7 Edition, 2019.
• Ramez Elmasri, Shamkant
th
B. Navathe, “Fundamentals of Database Systems”,
Pearson Education, 7 Edition, 2015.
• C.J. Date, A.Kannan,thS.Swamynathan, “An Introduction to Database Systems”,
Pearson Education, 8 Edition, 2006.
• Raghu Ramakrishnan,
th
“Database Management Systems”, McGraw-Hill College
Publications, 4 Edition, 2015.
• G.K.Gupta, "Database Management Systems”, Tata McGraw Hill, 1st Edition,
2018.
• Atul Kahate,st
“Introduction to Database Management Systems”, Pearson
Education, 1 Edition, 2004.
• Ivan Bayross, “SQL, PL/SQL the Programming Language of Oracle”, BPB
Publications, 2010.
20IT009-DATABASE MANAGEMENT SYSTEMS LABORATORY
• Creation of a database and write SQL queries to retrieve information from the database.
• Perform Insertion, Deletion, Modifying, Altering, Updating and Viewing records based on
conditions.
• Creation of a database using views, synonyms, sequences and indexes
• Creation of a database using Commit, Rollback and Save point.
• Creation of a database to set various constraints.
• Creating relationship between the databases.
• Write PL/SQL block to by accepting input from the user and handling exceptions.
• Creation of Procedures.
• Creation of functions.
• Mini project (Application Development using Oracle/ MySQL)
a) Inventory Control System.
b) Material Requirement Processing.
c) Hospital Management System.
d) Railway Reservation System.
e) Personal Information System.
f) Web Based User Identification System.
g) Timetable Management System.
h) Hotel Management System.
• Oracle is a company that produces database management systems (DBMS) that help
organizations manage and organize large amounts of data.
• Oracle is the most used relational database management system (RDBMS) today.
• An RDBMS is used by businesses to store and retrieve information.
• A relational database stores information in tabular form, with rows and columns representing
different data attributes and the various relationships between the data values.
• SQL is a standard database language is used to access and manipulate data in databases. SQL
stands for Structured Query Language. It was developed by IBM Computer Scientists in
the 1970s. By executing queries SQL can create, update, delete, and retrieve data in
databases like MySQL, Oracle, PostgreSQL, etc. Overall, SQL is a query language that
communicates with databases.
• Data is the unorganized information, so to organize that data, we make a database. A database
is an organized collection of structured data, usually controlled by a database management
system (DBMS). Databases help us easily store, access, and manipulate data held on a
computer.
Creation of a database and write SQL queries to retrieve information from the
database.
ALTER Command :
• The structure of an existing table can be changed by users using the SQL ALTER
TABLE command.
• Renaming a table.

• Changing a column name.

• Adding or deleting columns.

• Modifying the data type of a column.


Syntax
1. Renaming a Table
ALTER TABLE table_name RENAME TO new_table_name;
2. Renaming a Column
ALTER TABLE table_name RENAME COLUMN old_column_name TO
new_column_name;
3. Adding a New Column
ALTER TABLE table_name ADD column_name datatype;
3. Dropping a Column
ALTER TABLE table_name DROP COLUMN column_name;
4. Modifying a Column Data Type
ALTER TABLE table_name MODIFY COLUMN column_name new_datatype;
Introduction to DBMS
• Data: Data is the raw material that can be processed for any computing
machine.
• For example − Employee name, Product name, Name of the student, Marks
of the student, Mobile number, Image etc.
• Information: Information is the data that has been converted into more
useful or intelligent form.
• For example: Report card sheet.
• Example of data, information and knowledge
• A student secures 450 marks. Here 450 is data, marks of the student is
the information.
Difference: Data Vs Information
Data Information
Data is the raw fact. It is a processed form of data.
It is not significant to a business. It is significant to a business.
Data is an atomic level piece of
It is a collection of data.
information.
Example: Product name, Name of
Example: Report card of student.
student.
It is a phenomenal fact. It is organized data.
This is the primary level of
It is a secondary level of intelligence.
intelligence.
May or may not be meaningful. Always meaningful.
Understanding is difficult. Understanding is easy.
What is a Database?
• Database is a systematic collection of data.
• Databases support storage and manipulation of data.
• Databases make data management easy.
Examples
• An online telephone directory would use database to store data
pertaining to people, phone numbers, other contact details, etc.
• Your electricity service provider is using a database to manage billing ,
client related issues, to handle fault data, etc.
• Facebook- It needs to store, manipulate and present data related to
members, their friends, member activities, messages, advertisements
etc.
What is DBMS?
• DBMS contains information about one particular enterprise
• Collection of interrelated data
• Set of programs to access the data
• An environment that is both convenient and efficient to use
• Database Management Systems (DBMS) are software systems used to
store, retrieve, and run queries on data.
• A DBMS serves as an interface between an end-user and a database,
allowing users to create, read, update, and delete data in the database.
Database Applications
• Banking: transactions
• Airlines: reservations, schedules
• Universities: registration, grades
• Sales: customers, products, purchases
• Online retailers: order tracking, customized recommendations
• Manufacturing: production, inventory, orders, supply chain
• Human resources: employee records, salaries, tax deductions
• Telecommunication: monthly bills, keeping records of calls made,
maintaining balances on prepaid calling cards and storing information
about the communication networks.
• Databases can be very large.
• Databases touch all aspects of our lives.
Drawbacks of using file systems to store data
• Before dbms were introduced, organizations usually stored information
in file processing systems.
• File Management /processing System/File System is the traditional and
popular way to keep your data files organized on your drives.
• Keeping organizational information in a file processing system has a
number of major disadvantages.
Drawbacks of using file systems / Purpose of database systems
• Data redundancy and inconsistency
• Data redundancy occurs when the same piece of data exists in multiple places.
• This will lead to higher storage and access cost.
• Data inconsistency is when the same data exists in multiple tables.
• Eg: a changed customer address may be reflected in personal information file ,but not in
saving account records file.
• Difficulty in accessing data
• Need to write a new application program to carry out each new task.
• Data isolation
• Data isolation is the process of keeping data transactions separate from each other.
(without interfere other transactions)
• Data Integrity problems
• The data present in the database should be consistent and correct. To achieve this, the data
should must satisfy certain constraints.
• Eg:the balance of a bank account may never fall below a prescribed amount(say $100).
Drawbacks of using file systems to store data (Cont.)
• Atomicity of updates
• Failures may leave database in an inconsistent state with partial updates carried out
• Example: Transfer of funds from one account to another account should either complete
or not happen at all.
• Concurrent access by multiple users
• Allow multiple users to update the data simultaneously.
• Example: Two people reading a balance (say $500) and updating it by withdrawing
money (say $50 and $100 each) at the same time, the result of the concurrent
executions may leave the account A at the same time.(a/c may contain either $450 or
$400,rather than $ 350).
• Security problems
• Not every user of the database system should be able to access all the data.
• Database systems offer solutions to all the above problems
Three Schema architecture of DBMS/
Views of Data
• A database system is a collection of interrelated data and a set of programs
that allow users to access and modify these data.
• A major purpose of a database system is to provide users with an abstract view
of the data.
• That is, the system hides certain details of how the data are stored and
maintained.
• Data Abstraction:
• This process of hiding irrelevant details from user is called data abstraction.
Three Schema architecture of DBMS/Views of Data
Three Schema architecture of DBMS
1. Internal level/ physical level
• This level describes how the data is actually stored in the storage devices.
• This level is also responsible for allocating space to the data.

2. Conceptual level/ logical level


• The whole design of the database such as relationship among data, schema of data etc.
• Database constraints and security are also implemented. This level is maintained by DBA.

3. External level/ view level


• This level is called “view”- because several users can view their desired data from this level
which is internally fetched from database with the help of conceptual and internal level.
DATA MODELS
• The structure of a database is the data model: a collection of conceptual tools
for describing data, data relationships, data semantics, and consistency
constraints.
• A Database model defines the logical design and structure of a database and
defines how data will be stored, accessed and updated in a database
management system.
• The data models can be classified into six different categories:
❖ Relational Model
❖ Entity-relationship Model
❖ Object based data model
❖ Semi structured data model
❖ Hierarchical data model
❖ Network data model
TYPES OF DATA MODELS
1. Relational Model
• This model was introduced by E.F Codd in 1970, and since then it has been the most
widely used database model.
• The basic structure of data in the relational model is tables.
• The relational model uses a collection of tables to represent both data and the
relationships among those data.
• Each table has multiple columns, and each column has a unique name.
• Each table contains records of a particular type.
TYPES OF DATA MODELS
2. Entity-Relationship Model.
• The entity-relationship ( E-R ) data model uses a collection of basic objects,
called entities, and relationships among these objects.
• An entity is a “thing” or “object” in the real world that is distinguishable
from other objects.
• The entity-relationship model is commonly used in database design.
Entity Relationship Diagram (ER Diagram)
• An ER diagram shows the relationship among entity sets.
• An entity set is a group of similar entities and these entities can have
attributes.
• An entity is a table or attribute of a table in database, so by showing
relationship among tables and their attributes,
• ER diagram shows the complete logical structure of a database.
ER Diagram
• we have two entities Student and College and their relationship.
• The relationship between Student and College is many to one as a college can have many students
however a student cannot study in multiple colleges at the same time.
• Student entity has attributes such as Stu_Id, Stu_Name & Stu_Addr.
• College entity has attributes such as Col_ID & Col_Name.

Rectangle: Represents Entity sets.


Ellipses: Attributes
Diamonds: Relationship Set
Lines: link attributes to Entity Sets and Entity sets to Relationship Set
Double Ellipses: Multivalued Attributes
Dashed Ellipses: Derived Attributes
Double Rectangles: Weak Entity Sets
Double Lines: Total participation of an entity in a relationship set
TYPES OF DATA MODELS
3. Object-Based Data Model.
• Information is represented in the form of objects as used in object-oriented programming.
• In the object oriented data model (OODM), both data and their relationships are contained in a single
structure known as an object.
• Object-oriented programming (especially in Java, C++) has become the main software-development
methodology.
• The object-relational data model combines features of the object-oriented data model and relational data
model.
TYPES OF DATA MODELS
4. Semistructured Data Model.
• The semistructured data model permits the specification of data where individual data
items of the same type may have different sets of attributes.
• The Extensible Markup Language( XML ) is widely used to represent semistructured
data.
TYPES OF DATA MODELS
5. Hierarchical Model
• This database model organizes data into a tree-like-structure, with a single root, to which all the other data is
linked.
• The hierarchy starts from the Root data, and expands like a tree, adding child nodes to the parent nodes.
• In this model, a child node will only have a single parent node.
• In hierarchical model, data is organized into tree-like structure with one-to-many relationship between two
different types of data, for example, one department can have many courses, many professors and of-course
many students.
TYPES OF DATA MODELS
6. Network Model
• This is an extension of the Hierarchical model.
• In this model data is organized more like a graph, and are allowed to have more than one
parent node.
• In this database model, data is more related hence accessing the data is also easier and fast.
This database model was used to map many-to-many data relationships.
• This was the most widely used database model, before Relational Model was introduced.
Database Language
• A DBMS has appropriate languages and interfaces to express database queries and updates.
• Database languages can be used to read, store and update the data in the database.
Types of Database Language
1. Data Definition Language
• DDL stands for Data Definition Language.
• It is used to define database structure.
• It is used to create schema, tables, indexes, constraints, etc. in the database.
• Using the DDL statements, you can create the skeleton of the database.
• Data definition language is used to store the information of metadata like the number of
tables and schemas, their names, indexes, columns in each table, constraints, etc.
Here are some tasks that come under DDL:
❖ Create: It is used to create objects in the database.
❖ Alter: It is used to alter the structure of the database.
❖ Drop: It is used to delete objects from the database.
❖ Truncate: It is used to remove all records from a table.
❖ Rename: It is used to rename an object.
2. Data Manipulation Language
• DML stands for Data Manipulation Language.
• It is used for accessing and manipulating data in a database. It handles user
requests.
Here are some tasks that come under DML:
❖ Select: It is used to retrieve data from a database.
❖ Insert: It is used to insert data into a table.
❖ Update: It is used to update existing data within a table.
❖ Delete: It is used to delete all records from a table.
Two types of DMLs
1. Procedural Language/DMLs:
• The program code is written as a sequence of instructions.
• User has to specify “what to do” and also “how to do” (step by step procedure).
• These instructions are executed in the sequential order.
• These instructions are written to solve specific problems.
Examples of Procedural languages: FORTRAN, BASIC, C and JAVA.
2. Non-Procedural /Declarative Language/DMLs :
• The user has to specify only “what to do” and not “how to do”.
• The programs are small in size.
• It is a declarative language ,users concentrate on defining the input and output rather than the
program steps required in a procedural language such as c++/java…..
Examples of Non-Procedural languages: SQL, PROLOG, LISP.
Query: A Query is a statement requesting the retrieval of information.
Query Language: The portion of a DML that involves information
retrieval is called a query language.
3. Data Control Language
• DCL stands for Data Control Language.
• It is used to retrieve the stored data.
Here are some tasks that come under DCL:
❖ Grant: It is used to give/grant user access privileges to a database/ other users or roles.
❖ Revoke: It is used to take back privileges from the user.
Privileges are of two types :
❖ System Privileges
❖ Object privileges
System Privileges are normally granted by a DBA to users. Examples of system privileges
are CREATE SESSION, CREATE TABLE, CREATE USER etc.
Object privileges means privileges on objects such as tables, views, synonyms, procedure.
These are granted by owner of the object.
CREATE,GRANT AND REVOKE
Syntax: Create role rolename identified by password
CREATE ROLES mdm;
Syntax: GRANT privileges ON object TO user;
GRANT SELECT, INSERT, UPDATE, DELETE ON customers TO mdm;
Syntax: REVOKE privileges ON object FROM user;
Revoke SELECT, INSERT, UPDATE, DELETE ON customers FROM mdm;
REVOKE DELETE ON customers FROM mdm;
REVOKE ALL ON suppliers FROM mdm;
REVOKE ALL ON suppliers FROM public;
4. Transaction Control Language
• TCL is used to run the changes made by the DML statement.
• TCL can be grouped into a logical transaction.
Here are some tasks that come under TCL:
❖ Commit: It is used to permanently save any transaction into the database.
❖ Rollback: It is used to restore the database to original since the last
Commit.
❖ Savepoint: It is used to temporarily save a transaction so that you can
rollback to that point whenever required.
Database Architecture/System Structure/ Structure of a DBMS
Database Architecture/System Structure/ Structure of a DBMS
• A database system is partitioned into modules that deal with each of the
responsibilities of the overall system.
• The functional components of a database system can be broadly divided into the
storage manager and the query processor components.
• The storage manager - databases typically require a large amount of storage space.
• The query processor - helps the database system simplify and facilitate access to data.
• It is the job of the database system to translate updates and queries written in a
nonprocedural language, at the logical level.
Query Processor
The query processor components include
• DDL interpreter,- interprets DDL statements and records the definitions in the
data dictionary.
• DML compiler- translates DML statements in a query language into an evaluation
plan consisting of low-level instructions that the query evaluation engine
understands.
• A query can usually be translated into any of a number of alternative evaluation
plans that all give the same result.
• The DML compiler also performs query optimization-it picks the lowest cost
evaluation plan from among the alternatives.
• Query evaluation engine- executes low-level instructions generated by the DML
compiler.
Data Dictionary
Storage Manager
• provides the interface between the low level data stored in the database and the application
programs and queries submitted to the system.
• The storage manager is responsible for the interaction with the file manager.
• The storage manager translates the various DML statements into low-level file-system
commands.
• storage manager is responsible for storing, retrieving, and updating data in the database.
The storage manager components include:
• Authorization and integrity manager- tests for the integrity constraints and
checks the authority of users to access data.
• Transaction manager - ensures that the database remains in a consistent (correct)
state despite system failures, and that concurrent transaction executions proceed
without conflicting.
• File manager - manages the allocation of space on disk storage.
• Buffer manager - responsible for fetching data from disk storage into main
memory, and deciding what data to cache in main memory.
• The buffer manager is a critical part of the database system, since it enables the
database to handle data sizes that are much larger than the size of main memory.
Database users and Administrators
Entity Relationship Model(ER-Model)
• An entity relationship model, also called an entity-relationship (ER) diagram, is a
graphical representation of entities and their relationships to each other.
• Entity-relationship model is a model used for design and representation of relationships
between data.
To understand about the ER Model:
• Entity and Entity Set
• What are Attributes?
• Types of Attributes.
• Keys
• Relationships
Entity and Entity sets
• An entity is a real-world object that are represented in database.
• It can be any object,place,person. Data are stored about such entities.
For example:
• In a school database, students, teachers, classes, and courses offered can be
considered as entities.
• All these entities have some attributes or properties that give them their identity.
• An entity set is a collection of similar types of entities.
• An entity set may contain entities with attribute sharing similar values.
For example:
• A Students set may contain all the students of a school;
• A Teachers set may contain all the teachers of a school from all faculties.
e.g.;
• E1 is an entity having Entity Type Student.
• Set of all students is called Entity Set.
• In ER diagram, Entity Type is represented as:
Attributes and its Types
• Entities are represented by means of their properties, called attributes.

• All attributes have values. For example, a student entity may have name, class, and age as
attributes.
• There exists a domain or range of values that can be assigned to attributes. For example, a
student's name cannot be a numeric value. It has to be alphabetic. A student's age cannot be
negative, etc.
Types of Attributes
• Simple attribute − Simple attributes are atomic values, which cannot be divided further.
For example, a student's phone number is an atomic value of 10 digits.
• Multi-value attribute − Multi-value attributes may contain more than one values.
For example, a person can have more than one phone number, email_address, hobby etc.
• Composite attribute − Composite attributes are made of more than one simple attribute.
For example, a student's complete name may have first_name ,middle_name and
last_name.

• Derived attribute − Derived attributes are the attributes that do not exist in the physical
database, but their values are derived from other attributes present in the database.For
example, age can be derived from data_of_birth.

• Single-value attribute/atomic attributes − Single-value attributes contain single value.


For example − Social_Security_Number,roomno,customerid
Types of Relationships
• Degree of a relationship set:
The number of different entity sets participating in a relationship set is called as degree of
a relationship set.
• Unary Relationship –
When there is only ONE entity set participating in a relation, the relationship is called as
unary relationship. For example, one person is married to only one person.

• Binary Relationship –
When there are TWO entities set participating in a relation, the relationship is called as
binary relationship.For example, Student is enrolled in Course.
n-ary Relationship
• When there are n entities set participating in a relation, the relationship
is called as n-ary relationship
Degree of Relationship
• The number of participating entities in a relationship defines the degree
of the relationship.
• Binary = degree 2
• Ternary = degree 3
• n-ary = degree n
Ternary and Quaternary relationship
Mapping Constraints/Cardinalities
• An E-R enterprise schema may define certain constraints to which the contents of
database system must conform.
• Two types of constraints are
1.Mapping cardinalities
❖ One-one
❖ One-many
❖ Many-one
❖ Many-many
2.Participation constraints
❖ Total participation
❖ Partial participation
Mapping Cardinalities
• Cardinality defines the number of entities in one entity set, which can be associated with the
number of entities of other set via relationship set.
1. One to one – When each entity in each entity set can take part only once in the
relationship, the cardinality is one to one.
Eg: A male can marry to one female and a female can marry to one male. So the relationship
will be one to one.
2. One-to-many − One entity from entity set A can be associated with more than
one entities of entity set B however an entity from entity set B, can be associated
with at most one entity.
3. Many to one – When entities in one entity set can take part only once in the relationship set
and entities in other entity set can take part more than once in the relationship set, cardinality is
many to one.
Eg: A student can take only one course but one course can be taken by many students. So the
cardinality will be n to 1. It means that for one course there can be n students but for one student,
there will be only one course.
4. Many to many – When entities in all entity sets can take part more than once in the relationship cardinality is
many to many.
Eg: A student can take more than one course and one course can be taken by many students. So the relationship
will be many to many.
Participation Constraint
Participation Constraint is applied on the entity participating in the relationship set.
1. Total Participation – Each entity in the entity set must participate in the relationship.
• If each student must enroll in a course, the participation of student will be total. Total participation is shown
by double line in ER diagram.
2. Partial Participation – The entity in the entity set may or may NOT participate in the relationship.
• If some courses are not enrolled by any of the student, the participation of course will be partial.
The diagram depicts the ‘Enrolled in’ relationship set with Student Entity set having total participation and
Course Entity set having partial participation.
keys
• A DBMS key is an attribute or set of an attribute which helps you to identify a row(tuple)
in a relation(table).
• It allows you to find the relation between two tables.
• Keys help you uniquely identify a row in a table by a combination of one or more
columns in that table.
• Eg:employee id is a primary key because it uniquely identifies an employee record.
Various Keys in Database Management System
• Super Key
• Primary Key
• Candidate Key
• Foreign Key
What is the Super key?
• A superkey is a group of single or multiple keys which identifies rows in a table.
• A Super key may have additional attributes that are not needed for unique
identification.
What is a Primary Key?
• A column or group of columns in a table which helps us to uniquely identifies every row in that table is
called a primary key. The same value can't appear more than once in the table.
Rules for defining Primary key:
• Two rows can't have the same primary key value
• It must for every row to have a primary key value.
• The primary key field cannot be null.
• The value in a primary key column can never be modified or updated if any foreign key refers to that
primary key.
What is a Candidate Key?
• A super key with no repeated attribute is called candidate key.
• The Primary key should be selected from the candidate keys. Every table must have at least a single
candidate key.
Properties of Candidate key:
• It must contain unique values
• Candidate key may have multiple attributes
• Must not contain null values
• It should contain minimum fields to ensure uniqueness
• Uniquely identify each record in a table
• Example: In the given table Stud ID, Roll No, and email are candidate keys which help us to uniquely
identify the student record in the table.
Foreign key /Reference key
• A foreign key is a column which is added to create a relationship with another table.
• Foreign keys help us to maintain data integrity and also allows navigation between two different instances
of an entity.
Difference Between Primary key & Foreign key
Primary Key
• Helps you to uniquely identify a record in the table.
• Primary Key never accept null values.
• You can have the single Primary key in a table.
Foreign Key
• It is a field in the table that is the primary key of another table.
• A foreign key may accept multiple null values.
• You can have multiple foreign keys in a table.
Strong and Weak Entity sets
Strong Entity:
• Strong entity is not dependent of any other entity in schema.
• Strong entity always has primary key.
• Strong entity is represented by single rectangle.
• Two strong entity’s relationship is represented by single diamond.
• Various strong entities together makes the strong entity set.
Weak Entity:
• weak entity is an entity that cannot be uniquely identified by its attributes alone.
• Weak entity depends on strong entity to ensure the existence of weak entity.
• weak entity does not have any primary key, It has partial discriminator key.
• Weak entity is represented by double rectangle.
• The relation between one strong and one weak entity is represented by double diamond.
Cardinality constraints-one to one relationship
One to many relationship
Many to one relationship
Many to many relationship
Cardinality limits on relationship sets
E-R diagram for banking system
Extended E-R model(EER model)
1. Generalization
❖ Generalization is a bottom-up approach in which two lower level entities combine to form a
higher level entity.
❖ In generalization, the higher level entity can also combine with other lower level entities to
make further higher level entity.
❖ For example, Saving and Current account types entities can be generalised and an entity
with name Account can be created, which covers both.
2. Specialization
❖ Specialization is opposite to Generalization.
❖ It is a top-down approach in which one higher level entity can be broken down into two
lower level entity.
❖ In specialization, a higher level entity may not have any lower-level entity sets, it's
possible.
3. Aggregration
• Aggregration is a process when relation between two entities is treated as a single entity.
• In the diagram above, the relationship between Center and Course together, is acting as an
Entity, which is in relationship with another entity Visitor.
Relational Model
• A Relational Database management System(RDBMS) is a database management
system based on the relational model introduced by E.F Codd.
• In relational model, data is stored in relations(tables) and is represented in form of
tuples(rows).
• RDBMS is used to manage Relational database.
• Relational database is a collection of organized set of tables related to each other, and
from which data can be accessed easily.
RDBMS Concepts
What is Table ?
• In Relational database model, a table is a collection of data elements organised in terms of rows
and columns.
• A table is also considered as a convenient representation of relations.
• A table can have duplicate row of data .
• Table is the most simplest form of data storage.
Example: Employee table
What is a Tuple?
• A single entry in a table is called a Tuple or Record or Row.
• A tuple in a table represents a set of related data.
• For example, the above Employee table has 4 tuples/records/rows.
What is an Attribute?
• A table consists of several records(row), each record can be broken down into several
smaller parts of data known as Attributes or columns or fields.
• The above Employee table consist of four attributes,

✔ ID
✔ Name
✔ Age
✔ Salary.
Attribute Domain
• When an attribute is defined in a relation(table), it is defined to hold only a certain type
of values, which is known as Attribute Domain.
• The attribute Name will hold the name of employee for every tuple.
• If we save employee's address there, it will be violation of the Relational database
model.
What is a Relation Schema?
A relation schema describes the structure of the relation, with the name of the relation(name of table), its
attributes and their names and type.

What is a Relation instance − A finite set of tuples in the relational database system represents relation instance.
Relation instances do not have duplicate tuples.

What is a Relation Key?


A relation key is an attribute which can uniquely identify a particular tuple(row) in a relation(table).

Relational Integrity Constraints


Every relation in a relational database model should follow a few constraints to be a valid relation, these
constraints are called as Relational Integrity Constraints.

The three main Integrity Constraints are:


❖ Key Constraints
❖ Domain Constraints
❖ Referential integrity Constraints
Key Constraints
• There must be at least one minimal subset of attributes in the relation, which can identify a tuple
uniquely. This minimal subset of attributes is called key for that relation.
Key constraints force that −
• in a relation with a key attribute, no two tuples can have identical values for key attributes.
• a key attribute cannot have NULL values.
• Key constraints are also referred to as Entity Constraints.
Domain Constraints
• Attributes have specific values in real-world scenario. For example, age can only be a positive
integer and telephone numbers cannot contain a digit outside 0-9.
Referential integrity Constraints
• Referential integrity constraints work on the concept of Foreign Keys.
• A foreign key is a key attribute of a relation that can be referred in other relation.
• Referential integrity constraint states that if a relation refers to a key attribute of a different or
same relation, then that key element must exist.
What is Relational Algebra?
• Every database management system must define a query language to allow users to access the data
stored in the database.
• Relational Algebra is a procedural query language used to query the database tables to access data
in different ways.
In relational algebra,
• input is a relation (table from which data has to be accessed) and
• output is also a relation (a temporary table holding the data asked for by the user).
The fundamental operations of relational algebra are as follows −
• Select
• Project
• Union
• Set Different
• Cartesian product
• Rename
1. Select Operation (σ)
• This is used to fetch rows(tuples) from table(relation) which satisfies a given condition.
Syntax: σp(r)
✔ Where, σ represents the Select Predicate,
✔ r is the name of relation
✔ p is the prepositional logic where we specify the conditions that must be satisfied by the
data.
• In prepositional logic, one can use unary and binary operators like =, <, > etc., to specify
the conditions.
Eg: σsal>500(employee)
• You can also use, and, or etc. operators, to specify two conditions,
Eg: σsal>500 and desig=‘TUTOR’(employee)
2. Project Operation (∏)
• Project operation is used to project only a certain set of attributes of a relation.
✔ It will project the columns or attributes
✔ It will also remove duplicate data from the columns.
Syntax: ∏A1, A2...(r)
where A1, A2 etc are attribute names(column names).

For example,
∏rollno,name (Student)
It will show only the rollno and name columns for all the rows in the Student table.
3. Union Operation (∪)
• This operation is used to fetch data from two relations(tables).
• The relations(tables) specified should have same number of attributes(columns) and same
attribute domain. Also the duplicate tuples are automatically eliminated from the result.
Syntax: r ∪ s
• where r and s are relations.
4. Set Difference (-)
• This operation is used to find data present in one(first) relation and not present in the
second relation. This operation is also applicable on two relations, just like Union
operation.
Syntax: r - s
• where r and s are relations.
5.Cartesian Product (X)
• This is used to combine data from two different relations(tables) into one and fetch data
from the combined relation.
Syntax: A X B
6.Rename Operation (ρ)-(rho)
• This operation is used to rename either the relation or the attributes.
Syntax: ρs(R)
ρ(RelationNew, RelationOld)
Additional Operations in Relational Algebra
• Set Intersection
• Natural join
• Division operation
• Assignment operation
Set Intersection
• The intersection operator gives the common data values between the two
data sets/tables/relations that are intersected.
Natural join operation
• Natural join is a binary operation that is used to combine certain selections and a Cartesian
product into one operation.
• It is denoted by the join symbol ⋈ .
Division operation
• The division is a binary operation that is written as R ÷ S.
• Suited to queries that include the phrase ‘for all’.
Assignment Operation
Extended Relational Algebra Operations
• In Relational Algebra, Extended Operators are those operators that are
derived from the basic operators.
■ Generalized Projection
■ Outer Join
■ Aggregate Functions
■ Aggregation function takes a collection of values and returns a
single value as a result.
avg: average value
min: minimum value
max:
maximum value
sum: sum of
values
count: number of values
■ Aggregate operation in relational algebra

G1, G2, …, Gn
g F1( A1 , F2( A2 ,…, Fn( An (E)
) ) )

☟ E is any relational-algebra expression


☟ G1, G2 …, Gn is a list of attributes on which to group (can be
empty)
☟ Each Fi is an aggregate function
■ Relation r:
A B C

α α 7
α β 7
β β 3
β β
10

sum-C
g sum(c) (r)
27
■ Relation account grouped by branch-name:

branch-name account-number balance


Perryridge A-102 400
Perryridge A-201 900
Brighton A-217 750
Brighton A-215 750
Redwood A-222 700

g (account)
branch-name sum(balance)
branch-name balance
Perryridge 1300
Brighton 1500
Redwood 700
■ An extension of the join operation that avoids loss of information.
■ Computes the join and then adds tuples form one relation that does
not match tuples in the other relation to the result of the join.
■ Uses null values:
☟ null signifies that the value is unknown or does not exist
☟ All comparisons involving null are false by definition.
■ The content of the database may be modified using the following
operations:
☟ Deletion
☟ Insertion
☟ Updating
■ All these operations are expressed using the assignment
operator.

Deletion
■ A delete request is expressed similarly to a query, except instead of
displaying tuples to the user, the selected tuples are removed from
the database.
■ Can delete only whole tuples; cannot delete values on only
particular attributes
■ A deletion is expressed in relational algebra by:
r←r–E
where r is a relation and E is a relational algebra query.
■ Delete all account records in the Perryridge branch.

account ← account – σ (account)


branch-name = “Perryridge”

■ Delete all loan records with amount in the range of 0 to 50

loan ← loan – σ (loan)


amount ≥ 0 and amount ≤ 50
■ To insert data into a relation, we either:
☟ specify a tuple to be inserted
☟ write a query whose result is a set of tuples to be inserted
■ in relational algebra, an insertion is expressed by:
r← r ∪ E
where r is a relation and E is a relational algebra expression.
■ The insertion of a single tuple is expressed by letting E be a constant relation containing
one tuple.
■ Insert information in the database specifying that Smith has $1200 in account A-973 at the
Perryridge branch.
account ← account∪ {(“Perryridge”, A-973, 1200)}

depositor ← depositor ∪ {(“Smith”, A-973)}

❑ Provide as a gift for all loan customers in the Perryridge branch, a $200 savings
account.
■ Let the loan number serve as the account number for the new savings account.

r ← (σ (borrower loan))
1 branch-name = “Perryridge”
account ← account ∪ ∏ (r )
branch-name, account-number,200 1

depositor ← depositor ∪ ∏ (r )
customer-name, loan-number 1
Relational Calculus
• Relational Algebra - procedural query language to fetch data and which also explains
how it is done.
• Relational Calculus - non-procedural query language and has no description about how
the query will work or the data will be fetched. It only focusses on what to do, and not
on how to do it.

Relational Calculus exists in two forms:


1. Tuple Relational Calculus (TRC)
2. Domain Relational Calculus (DRC)
Tuple Relational Calculus (TRC)
• Fetching tuples based on the given condition.
Syntax: { T | Condition }
• Here, we define a tuple variable, specify the table(relation) name in which the tuple is
to be searched for, along with a condition.
• We can also specify column name using a . dot operator, with the tuple variable to
only get a certain attribute(column) in result.
• A tuple variable is nothing but a name, can be anything, generally we use a single
alphabet for this, T is a tuple variable.

For example:table🡪Student, we would put it as Student(T)


• In table Student, if we want to get data for students with age greater than 17, then, we
can write it as,
T.age > 17, where T is our tuple variable.
• If we want to use Tuple Relational Calculus to fetch names of students, from table
Student, with age greater than 17, then, for T being our tuple variable,
• T | Student(T) AND T.age > 17
Domain Relational Calculus (DRC)
• In domain relational calculus, filtering is done based on the domain of the
attributes and not based on the tuple values.
Syntax: { c1, c2, c3, ..., cn | F(c1, c2, c3, ... ,cn)}
• where, c1, c2... etc represents domain of attributes(columns) and
• F defines the formula including the condition for fetching the data.

For example,
{< name, age > | ∈ Student ∧ age > 17}
• The above query will return the names and ages of the students in the table
Student who are older than 17.
Integrity Constraints
• Integrity constraints are a set of rules.
• It is used to maintain the quality of information.
• Integrity constraints ensure that the data insertion, updating, and other
processes have to be performed in such a way that data integrity is not
affected.
Types of Integrity Constraint
1. Domain constraints
• Domain constraints can be defined as the definition of a valid set of
values for an attribute.
• The data type of domain includes string, character, integer, time, date,
currency, etc. The value of the attribute must be available in the
corresponding domain.
• Example:
2. Entity integrity constraints
• The entity integrity constraint states that primary key value can't be null.
• This is because the primary key value is used to identify individual rows in
relation and if the primary key has a null value, then we can't identify those
rows.
• A table can contain a null value other than the primary key field.
Example:
3. Referential Integrity Constraints
• A referential integrity constraint is specified between two tables.
• In the Referential integrity constraints, if a foreign key in Table 1 refers to the
Primary Key of Table 2, then every value of the Foreign Key in Table 1 must be
available in Table 2.
Example:
4. Key constraints
• Keys are the entity set that is used to identify an entity within its entity set
uniquely.
• An entity set can have multiple keys, but out of which one key will be the primary
key. A primary key can contain a unique value in the relational table.
• Example:
ATTRIBUTE CLOSURE
• USING ATTRIBUTE CLOSURE,WE CAN FIND THE GIVEN KEY IS CANDIDATE
KEY OR NOT.
• X=set of attributes
• X(superscript +) = contains set of attributes determined by X.
• (here ‘+’ symbol indicates attribute closure of X)
Eg.1: R(A,B,C,D,E) and FD { A->B,B->C,C->D,D->E}
A+={A,B,C,D,E} ----------------🡪SUPER KEY
AB+={A,B,C,D,E} ----------------🡪SUPER KEY
AC+={A,C,B,D,E} ----------------🡪SUPER KEY
AD+={A,D,B,C,E} ----------------🡪SUPER KEY
AE+={A,E,B,C,D} --------------🡪SUPER KEY
ABC+={A,B,C,D,E} ----------------🡪SUPER KEY
.etc….,
So all the subsets that combines with ‘A’ gives the SUPER KEY.
ATTRIBUTE CLOSURE
CHECK FOR ‘B’:
• BC+={B,C,D,E}
• BD+={B,D,C,E}
• BE+={B,E,C,D}
• BDC+={B,D,C,E}
……..etc.,
But we cannot determine ‘A’ here. None of the subsets gave SUPER KEY.
CHECK FOR ‘C’:
CD+={C,D,E}
CE+={C,E,D}
But we cannot determine ‘A’ and ‘B’ here. None of the subsets gave SUPER KEY.
Similarly check for ‘D’ and ‘E’.
ATTRIBUTE CLOSURE
• In this relation, SUPER KEYS are A+,AB+,AC+,AD+,AE+,ABC+…etc….
• Now check the PROPER SUBSET of the SUPER KEYS determines super key or not..
1) A+ :
PROPER SUBSET={ empty}------------🡪CANDIDATE KEY
2) AB+ :
PROPER SUBSET={A},{B}
A+={A,B,C,D,E} -------🡪SUPER KEY
B+={B,C,D,E}
This subset determines SUPER KEY.so this is not a CANDIDATE KEY.
3)AC+ :
PROPER SUBSET={A},{C}
A+={A,B,C,D,E} -------🡪SUPER KEY
C+={C,D,E}
This subset determines SUPER KEY.so this is not a CANDIDATE KEY.
4)AD+ :
PROPER SUBSET={A},{D}
A+={A,B,C,D,E} -------🡪SUPER KEY
D+={D,E}
This subset determines SUPER KEY.so this is not a CANDIDATE KEY.
4)AE+ :
PROPER SUBSET={A},{E}
A+={A,B,C,D,E} -------🡪SUPER KEY
E+={E}
This subset determines SUPER KEY.so this is not a CANDIDATE KEY.
5)ABC+ :
PROPER SUBSET={A},{B},{C},{AC},{AB},{BC}
A+={A,B,C,D,E} -------🡪SUPER KEY
B+={B,C,D,E}
C+={C,D,E}
AB+={A,B,C,D,E} -------🡪SUPER KEY
AC+={A,B,C,D,E} -------🡪SUPER KEY
BC+={B,C,D,E}
This subset determines SUPER KEY.so this is not a CANDIDATE KEY.

So in this relation SUBSET ‘A’ alone is a CANDIDATE KEY.


All other super keys proper subset gives super key.
so those subset is not a CANDIDATE KEY.
2) R(A,B,C,D,E) and FD={A->B ,D->E} • ACD+={A,C,D,B,E} ----🡪SUPER KEY
• A+={A,B} • ADE+={A,D,E,B}
• B+={B} • ABCD+={A,B,C,D,E}----🡪SUPER KEY
• C+={C} • ABCE+=A,B,C,E}
• D+={D,E} • ABDE+={A,B,D,E}
• E+={E} • ACDE+={A,C,D,E,B} ----🡪SUPER KEY
• AB+={A,B} NOW check B:
• AC+={A,C,B} BC+={B,C}
• AD+={A,D,B,E} BD+={B,D}….etc..
• AE+={A,E,B} Similarly we can determine all the subsets of
• ABC+={A,B,C} B,C,D and E.
• ABD+={A,B,D,E}
• ABE+={A,B,E}
• So find the CANDIDATE KEY FROM CHECK ABCD + :
SUPER KEYS..
• A+={A,B}
• SUPER KEYS are ACD+,ABCD+,ACDE+
• B+={B}
• Now check the proper subset of the super
keys. • C+={C}
CHECK ACD+ : • D+={D,E}
PROPER SUBSET: • AB+={A,B}
A+={A,B} • AC+={A,C,B}
C+={C} • AD+={A,D,B,E}
D+={D,E} • ABC+={A,B,C}
AC+={A,C,B} • ABD+={A,B,D,E}
AD+={A,D,B,E} • ACD+={A,C,D,B,E}-🡪SUPER KEY
CD+={C,D,E} • So it is not a CANDIDATE KEY.
So NO super keys determines here.so
ACD is a CANDIDATE KEY
CHECK ACDE is a CK:
So in this relation,
• A+={A,B}
• 3 SUPER KEYS ARE
• C+={C}
ACD,ABCD,ACDE
• D+={D,E}
• 1 CANDIDATE KEY:
• E+={E}
ACD
• AC+={A,C,B}
• AD+={A,D,B,E}
• AE+={A.E.B}
• ACD+={A,C,D,B,E}🡪SUPER KEY
• ACE+={A,C,E,B}
• ADE+={A,D,E,B}
• So ACDE is not a CANDIDATE KEY.
FIND the SUPER KEYS from this Relation:
Super keys are A,AB,AC,ABC,BC

Now Check the PROPER SUBSET of the


super key:
A+:
• PROPER SUBSET={ empty}
So ‘A’ is a Candidate Key. RELATION R(A,B,C)

AC+:
• PROPER SUBSET={A},{C}
Check ‘A’ and ‘C’ is a Super Key.
Here ‘A’ is a Super Key then ‘AC’ is not a
Candidate Key.
In this relation,
ABC+:
5 Super keys : A,AB,AC,ABC,BC
PROPER SUBSET=
{A},{B},{C},{AB},{AC},{BC} 2 Candidate Keys: A ,BC
Here A ,AB,AC are Super Keys. 1 Primary Key: A
So ABC is not a Candidate Key. 1 Alternate Key: BC.

BC:
PROPER SUBSET= {B}{C}
Here B and C are not a SK.
SO BC is a CK.
A B C D
1 1 5 1
2 1 7 1
3 1 7 1
4 2 7 1
5 2 5 1
6 2 5 2
SUPER KEYS: • ABD={A},{B},{D},{AB},{AD},{BD}
A,AB,AC,AD,ABC,ABD,ACD,ABCD Already A,AB,AD are super keys.
Check the proper subset: So ABD is not a CK.
• A+={empty}, so A is a CK. • ACD={A},{C},{D},{AC},{AD},{CD}
• AB+={A},{B} Already A,AC,AD are super keys.
Already A is a SK.so AB is not a CK. • ABCD={A},{B},{C},{D},{AB},{AC},{AD
},{BC},{BD},{CD},{ABC},{ABD},
• AC={A},{C}
{BCD},{ACD}
Already A is a SK.so AC is not a CK.
Already A,AB,AC,AD,ABC,ABD,ACD->SK.
• AD={A},{D}
Already A is a SK.so AD is not a CK.
So in this relation,
• ABC={A},{B},{C},{AB},{AC},{BC}
8 Super keys are,
Already A,AB,AC are super keys.
A,AB,AC,AD,ABC,ABD,ACD,ABCD
so ABC is not a CK.
1 Candidate key= A
FUNCTIONAL DEPENDENCY
• Functional Dependency (FD) is a constraint that determines the relation of one attribute to
another attribute in a Database Management System. (OR) (one attribute is dependent on
another attribute)
• Functional Dependency helps to maintain the quality of data in the database.
• A functional dependency is denoted by an arrow “→”. The functional dependency of X on Y
is represented by X → Y. (X-Determinant and Y-Dependent)
• Eg:If we know the value of Employee number, we can obtain Employee Name, city, salary,
etc.Here city, Employee Name, and salary are functionally depended on Employee
number.
TYPES OF FUNCTIONAL DEPENDENCY
• TRIVIAL (This function is always valid)
• NON-TRIVIAL
• MULTI-VALUED
• TRANSITIVE
TRIVIAL: (X) 🡪 (Y)
1) FD,X🡪Y, If Y is a subset of X (eg. R.NO,NAME🡪NAME)
2) X🡪X (eg.R.NO🡪R.NO) (always valid)
NON TRIVIAL: (this function may or may not be valid depends on the data in the table)
X🡪Y , (X intersection Y = empty) nothing is common in X and Y
(eg: R.NO🡪NAME)
SEMI TRIVIAL:
eg. R.NO, NAME🡪 NAME,MARKS
ARMSTRONGS AXIOMS/INFERENCE RULE
• Using these rule, we can find out all the functional dependencies exist on a given
relation/table. R.NO. NAME MARK DEPT COUR
• 7 rules( 3 primary rules and 4 secondary rules) S SE
1 a 78 CS C1
1)REFLEXIVITY:
2 b 60 EE C1
X🡪X 3 a 78 CS C2
X🡪Y, Y is the subset of X. 4 b 60 EE C3
2)TRANSITIVITY: 5 c 80 IT C3
6 d 80 EC C2
If (X🡪Y & Y🡪Z) then X🡪Z
Eg.NAME🡪MARKS & MARKS🡪DEPT) then NAME🡪DEPT.
3)AUGUMENTATION:
If X🡪Y then XA🡪YA(add any attribute in left and right side)
eg. R.NO🡪NAME
(R.NO,MARKS)🡪(NAME,MARKS)
ARMSTRONGS AXIOMS/INFERENCE RULE
• (4 secondary rules)
4)UNION:
If X🡪Y & X🡪Z then x🡪YZ
Eg.RNO🡪NAME & RNO🡪MARKS then RNO🡪(NAME,MARKS)

5)DECOMPOSITION/SPLITTING:
cannot split in the left side(determinant)
Split only in the right side)
If X🡪YZ then X🡪Y & X🡪Z
Eg.(NAME,MARKS)🡪(DEPT,COURSE) then
(NAME,MARKS)🡪DEPT & (NAME,MARKS)🡪COURSE
ARMSTRONGS AXIOMS/INFERENCE RULE
• (4 secondary rules)
6)PSEUDO TRANSITIVITY:
If (X🡪Y & YZ🡪A) then XZ🡪A
Eg.(R.NO🡪NAME) & (NAME,MARKS)🡪DEPT then (R.NO,MARKS)🡪DEPT

7)COMPOSITION:
If X🡪Y & A🡪B then XA🡪YB
ATTRIBUTE CLOSURE/CLOSURE SET
• USING ATTRIBUTE CLOSURE,WE CAN FIND THE GIVEN KEY IS CANDIDATE
KEY OR NOT.
• X=set of attributes
• X(superscript +) = contains set of attributes determined by X.
• (here ‘+’ symbol indicates attribute closure of X)
Eg.R(A,B,C,D,E)
• FD {A->B,B->C,C->D,D->E}
• A->B, B->C then A->C (transitivity)
• A->A (write REFLEXIVITY)
• A->C ,C->D then A->D (transitivity)
• A->D,D->E then A->E (transitivity)
• A->ABCDE (union)
ATTRIBUTE CLOSURE/CLOSURE SET
Eg.R(A,B,C,D,E)
• FD { A->B,B->C,C->D,D->E}
We can also write,
• B->C,C->D then B->D (transitivity)
• B->D,D->E then B->E (transitivity)
• B->B (reflexivity)
• B->BCDE(union),but we cannot determine A here.
We can also write
C->D,D->E then C->E (transitivity)
C->C (reflexivity)
C->CDE (union), but we cannot determine A and B here.
We can also write
E->E (reflexivity)
ATTRIBUTE CLOSURE/CLOSURE SET
Eg.R(A,B,C,D,E)
• FD { A->B,B->C,C->D,D->E}
• A->B, we can write it as AD->BD(augumentation)
• AD->BD then AD->B and AD->D (splitting/Decomposition)
FIND the closure of A(superscript +)={A,B,C,D, E}-🡪super key
FIND the closure of AD(+)={A,D,B,C,E}🡪super key
FIND the closure of B(+)={B,C,D,E}
FIND the closure of CD(+)={C,D,E}
FIND the closure of AB(+)={A,B,C,D,E}->SUPER KEY
Super key: it is a set of attributes whose closure contains all attributes of a given relation.
Number of super key present in the relation = 16 (since R(A,B,C,D,E) =A with (B,C,D,E)
Possibilities= 2 power 4=16 superkeys
Normalization
• Normalization is the process of organizing the data in the database.
• Normalization is used to minimize/reduce the redundancy from the database table.
• It is also used to eliminate the unwanted characteristics like Insertion, Update and
Deletion Anomalies.
• Normalization divides the larger table into the smaller table and links them using
relationship.

Facts About Database Normalization:


• The words normalization and normal form refer to the structure of a database.
• Normalization was developed by IBM researcher E.F. Codd In the 1970s.
• Normalization increases clarity in organizing data in Databases.
Anomalies in DBMS
• Anomalies are problems that can occur in un-normalised databases where
all the data is stored in one table.
• There are three types of anomalies that occur when the database is not
normalized.
• Insertion anomaly
• update anomaly
• deletion anomaly
Example:
• Suppose a manufacturing company stores the employee details in a table
named employee that has four attributes: emp_id ,emp_name, emp_address
, emp_dept in which the employee works.
emp_id emp_name emp_address emp_dept
101 Rick Delhi D001
101 Rick Delhi D002
123 Maggie Agra D890
166 Glenn Chennai D900
166 Glenn Chennai D004
• Insert anomaly: Suppose a new employee joins the company, who is under
training and currently not assigned to any department then we would not be able to
insert the data into the table if emp_dept field doesn’t allow nulls.

• Delete anomaly: Suppose, if at a point of time the company closes the department
D890 then deleting the rows that are having emp_dept as D890 would also delete
the information of employee Maggie since she is assigned only to this department.

• Update anomaly: Two rows for employee Rick as he belongs to two departments
of the company. If we want to update the address of Rick then we have to update
the same in two rows or the data will become inconsistent.

To overcome these anomalies we need to normalize the data.


Database Normalization Rules
The database normalization process is divided into following the normal form:
• First Normal Form (1NF)

• Second Normal Form (2NF)

• Third Normal Form (3NF)

• Boyce-Codd Normal Form (BCNF)

• Fourth Normal Form (4NF)

• Fifth Normal Form (5NF)


• INF-The relation is said to be in first normal for form if it is already in unnormalized
form and it has no repeating groups.
• 2NF-The relation is said to be in second normal for form if it is already in 1NF and it
has no partial dependency.
• 3NF-The relation is said to be in three normal for form if it is already in 2NF and it
has no transitive dependency.
• BOYCE CODD NORMAL FORM(BCNF)-The relation is said to be in boyce codd
normal form if it is already in 3NF and every determinant is a candidate key.
• 4NF-The relation is said to be in fourth normal form if it is already in BCNF and it
has no multivalued dependency.
• 5NF-The relation is said to be in fifth normal form if it is already in 3NF and it has
no join dependency.
1NF (First Normal Form)
• A relation will be 1NF if it contains an atomic value.
• It states that an attribute of a table cannot hold multiple values. It must hold only
single-valued attribute.(Each table cell should contain a single /atomic value. Each
record needs to be unique).
• First normal form disallows the multi-valued attribute, composite attribute, and their
combinations.
OUTPUT TABLE:
1NF (First Normal Form)
Example 2: Relation EMPLOYEE is not in 1NF because of multi-valued attribute
EMP_PHONE.

OUTPUT TABLE:
Second Normal Form (2NF)
• To be in second normal form, a relation must be in first normal form(1NF) and
relation must not contain any partial dependency.
• A relation is in 2NF if it has No Partial Dependency, i.e., no non-prime
attribute (An attribute that is not part of any candidate key is known as
non-prime attribute) is dependent on any proper subset of any candidate key of
the table.
• Partial Dependency – If the proper subset of candidate key determines
non-prime attribute, it is called partial dependency.
Eg:1
• There are many courses having the same course fee.
• COURSE_FEE cannot alone decide the value of COURSE_NO or STUD_NO
• COURSE_FEE together with STUD_NO cannot decide the value of COURSE_NO
• COURSE_FEE together with COURSE_NO cannot decide the value of STUD_NO
• Hence,COURSE_FEE would be a non-prime attribute, as it does not belong to the one only candidate key
{STUD_NO, COURSE_NO} ;
• But, COURSE_NO -> COURSE_FEE, i.e., COURSE_FEE is dependent on COURSE_NO, which is a proper
subset of the candidate key. Non-prime attribute COURSE_FEE is dependent on a proper subset of the
candidate key, which is a partial dependency and so this relation is not in 2NF.
• To convert the above relation to 2NF, we need to split the table into two tables such as :
Table 1: STUD_NO, COURSE_NO
Table 2: COURSE_NO, COURSE_FEE
• all non-key attributes are fully functional dependent on the primary key.
Third Normal Form (3NF)
• A relation will be in 3NF if it is in 2NF and not contain any transitive partial
dependency.
• 3NF is used to reduce the data duplication.
Indexing
Basic Concepts
⚫ Indexing mechanisms used to speed up access to
desired data.
⚫ E.g., author catalog in library
⚫ Search Key - attribute to set of attributes used to look
up records in a file.
⚫ An index file consists of records (called index entries) of
the form
search-key pointer

⚫ Index files are typically much smaller than the original file
⚫ Two basic kinds of indices:
⚫ Ordered indices: search keys are stored in sorted order
⚫ Hash indices: search keys are distributed uniformly across
“buckets” using a “hash function”.
Index Evaluation Metrics
⚫ Access types supported efficiently. E.g.,
⚫ records with a specified value in the attribute
⚫ or records with an attribute value falling in a specified
range of values.
⚫ Access time
⚫ Insertion time
⚫ Deletion time
⚫ Space overhead
Ordered Indices
⚫ In an ordered index, index entries are stored sorted on the search key
value. E.g., author catalog in library.

⚫ Primary index: in a sequentially ordered file, the index whose search key
specifies the sequential order of the file.
⚫ Also called clustering index

⚫ The search key of a primary index is not necessarily the primary key.

⚫ Secondary index: an index whose search key specifies an order different


from the sequential order of the file. Also called non-clustering index.

⚫ Index-sequential file: ordered sequential file with a primary index.


Primary index- Dense Index Files
⚫ Dense index — Index record appears for every search-key value in the
file.
⚫ E.g. index on ID attribute of instructor relation
Dense Index Files (Cont.)
⚫ Dense index on dept_name, with instructor file sorted on dept_name
Sparse Index Files
⚫ Sparse Index: contains index records for only
some search-key values.
⚫ Applicable when records are sequentially ordered on search-key
⚫ To locate a record with search-key value K we:
⚫ Find index record with largest search-key value < K
⚫ Search file sequentially starting at the record to which the index
record points
Sparse Index Files (Cont.)
⚫ Compared to dense indices:
⚫ Less space and less maintenance overhead for
insertions and deletions.
⚫ Generally slower than dense index for locating records.
⚫ Good tradeoff: sparse index with an index entry for
every block in file, corresponding to least
search-key value in the block.
Secondary Indices Example

Secondary index on salary field of instructor

⚫ Index record points to a bucket that contains pointers to all the


actual records with that particular search-key value.
⚫ Secondary indices have to be dense
Primary and Secondary Indices
⚫ Indices offer substantial benefits when searching for records.
⚫ BUT: Updating indices imposes overhead on database modification
--when a file is modified, every index on the file must be updated,
⚫ Sequential scan using primary index is efficient, but a sequential scan
using a secondary index is expensive
⚫ Each record access may fetch a new block from disk
⚫ Block fetch requires about 5 to 10 milliseconds, versus about
100 nanoseconds for memory access
Multilevel Index

⚫ If primary index does not fit in memory, access becomes expensive.


⚫ Solution: treat primary index kept on disk as a sequential file and
construct a sparse index on it.
⚫ outer index – a sparse index of primary index
⚫ inner index – the primary index file

⚫ If even outer index is too large to fit in main memory, yet another level of
index can be created, and so on.
⚫ Indices at all levels must be updated on insertion or deletion from the file.
Multilevel Index (Cont.)
Data Dictionary Storage
Data dictionary (also called system catalog) stores metadata; that is, data
about data, such as

■ Information about relations


● names of relations
● names and types of attributes of each relation
● names and definitions of views
● integrity constraints
■ User and accounting information, including passwords
■ Statistical and descriptive data
● number of tuples in each relation
■ Physical file organization information
● How relation is stored (sequential/hash/…)
● Physical location of relation
■ Information about indices
Data Dictionary Storage (Cont.)
■ Catalog structure
● Relational representation on disk
● specialized data structures designed for efficient
access, in memory

■ A possible catalog representation:

Relation_metadata =(relation_name, number_of_attributes, storage_organization, location)

•Attribute_metadata = (attribute_name, relation_name, domain_type, position,


length)

•User_metadata = (user_name,
encrypted_password, group)
Index_metadata = (index_name, relation_name, index_type, index_attributes)

View_metadata = (view_name, definition)


TRANSACTION
Transaction
• A transaction can be defined as a group of tasks. A single task is the minimum
processing unit which cannot be divided further.
• One example is a transfer from one bank account to another:
the complete transaction requires subtracting the amount to be transferred from one
account and adding that same amount to the other.
E.g. transaction to transfer $50 from account A to account B:
1. read(A)
2. A := A – 50
3. write(A)
4. read(B)
5. B := B + 50
6. write(B)
Transaction
ACID PROPERTIES
• To preserve integrity of data, the database system must ensure.
Atomicity
• Either all operations of the transaction are properly reflected in the database or
none. Atomicity is also known as the ‘All or nothing rule’.
It involves the following two operations.
• Abort: If a transaction aborts, changes made to database are not visible.
• Commit: If a transaction commits, changes made are visible.
Eg: Transaction to transfer $50 from account A to account B:
1.read(A)
2.A:= A –50
3.write(A)
4.read(B)
5.B:= B + 50
6.write(B)
Consistency
• Execution of a transaction in isolation preserves the consistency of the database.
• This means that integrity constraints must be maintained so that the database is
consistent before and after the transaction. It refers to the correctness of a
database.
Eg: Transaction to transfer $50 from account A (A=500) to account B (B=400):
1.read(A)
2.A:= A –50
3.write(A) -----------------🡪 A=450
4.read(B)
5.B:= B + 50
6.write(B) -----------------🡪B=450
• Consistency requirement –the sum of A and B is unchanged by the execution of
the transaction.
Isolation
• Every transaction is individual, and one transaction can’t access the result of other
transactions until the transaction completed.
• Although multiple transactions may execute concurrently, each transaction must be
unaware of other concurrently executing transactions.
• If several transactions are executed concurrently,their operations may interleave in
some undesirable way,resulting in an inconsistent state.
• To avoid the problem of concurrent execution, transactions should be executed in
isolation(serially).
Isolation
• Example: If two operations are concurrently running on two different accounts,
then the value of both accounts should not get affected. The value should remain
persistent. As you can see in the below diagram, account A is making T1 and T2
transactions to account B and C, but both are executing independently without
affecting each other. It is known as Isolation.
Durability
• Once the transaction completed, then the changes it has made to the database
will be permanent.
• Even if there is a system failure, or any abnormal changes , this will safeguard
the committed data.
Transaction states
In a database, the transaction can be in one of the following states -
Transaction states
1.Active state
The active state is the first state of every transaction. In this state, the transaction is
being executed.
For example: Insertion or deletion or updating a record is done here. But all the
records are still not saved to the database.
2. Partially committed
In the partially committed state, a transaction executes its final operation, but the
data is still not saved to the database.
3. Committed
A transaction is said to be in a committed state if it executes all its operations
successfully. In this state, all the effects are now permanently saved on the database
system.
Transaction states
4. Failed state
If any of the checks made by the database recovery system fails, then the
transaction is said to be in the failed state.
In the example of total mark calculation, if the database is not able to fire a
query to fetch the marks, then the transaction will fail to execute.
5.Aborted
If the transaction fails in the middle of the transaction then before executing
the transaction, all the executed transactions are rolled back to its consistent
state.
After aborting the transaction, the database recovery module will select one of
the two operations:
• Re-start the transaction- only if no internal logical error
• Kill the transaction
Concurrent Executions
• Multiple transactions are allowed to run concurrently in the system.
Advantages are:
• increased processor and disk utilization, leading to better transaction throughput
• E.g. one transaction can be using the CPU while another is reading from or
writing to the disk.
• reduced average response time for transactions: short transactions need not
wait behind long ones.

⚫ Concurrency control schemes – mechanisms to achieve isolation.


⚫ that is, to control the interaction among the concurrent transactions in order to prevent
them from destroying the consistency of the database
Schedule
• A sequence of instructions that specify the chronological order in which
instructions of concurrent transactions are executed.
• A schedule for a set of transactions must consist of all instructions of those
transactions.
• Must preserve the order in which the instructions appear in each individual
transaction.
Schedule 1
⚫ Let T1 transfer $50 from A to B, and T2 transfer 10% of the balance from A to B.
⚫ A serial schedule in which T1 is followed by T2 :
Schedule 2
• A serial schedule where T2 is followed by T1
Schedule 3
• Let T1 and T2 be the transactions defined previously. The following schedule is not a serial
schedule, but it is equivalent to Schedule 1.
Schedule 3 Schedule 1

In Schedules 1, 2 and 3, the sum A + B is preserved. A serial schedule-1 in which T1 is followed by T2


Schedule 4
• The following concurrent schedule does not preserve the value of (A + B ).
Serializability
• Basic Assumption – Each transaction preserves database consistency.
• Thus serial execution of a set of transactions preserves database
consistency.
• A schedule is serializable if it is equivalent to a serial schedule.
Different forms of schedule equivalence give rise to the notions of:
1. conflict serializability
2. view serializability
Conflict serializability
• Conflict Serializable: A schedule is called conflict serializable if it can be
transformed into a serial schedule by swapping non-conflicting operations.
• Conflict Equivalent: Two schedules are said to be conflict equivalent when one
can be transformed to another by swapping non-conflicting operations.
• Conflicting operations: Two operations are said to be conflicting if all
conditions satisfy:
• They belong to different transactions
• They operate on the same data item
• At Least one of them is a write operation

1. li = read(Q), lj = read(Q). li and lj do not conflict.


2. li = read(Q), lj = write(Q). They conflict.
3. li = write(Q), lj = read(Q). They conflict
4. li = write(Q), lj = write(Q). They conflict
Conflict serializability
• Schedule 5 can be transformed into Schedule 6, a serial schedule
where T2 follows T1, by series of swaps of non-conflicting
instructions. Therefore Schedule 5 is conflict serializable. Then S5
and S6 are conflict equivalent.

Schedule 5
Schedule 6
Conflict Equivalent
View Serializability
• A schedule will view serializable if it is view equivalent to a serial schedule.
• If a schedule is conflict serializable, then it will be view serializable.
• The view serializable which does not conflict serializable contains blind writes. Blind write is simply when
a transaction writes without reading.
• A transaction have WRITE(Q), but no READ(Q) before it. So, the transaction is writing to the database
"blindly" without reading previous value.
View Equivalent
• Two schedules S1 and S2 are said to be view equivalent if they satisfy the following conditions:
1. Initial Read
• An initial read of both schedules must be the same. Suppose two schedule S1 and S2. In schedule S1, if a
transaction T1 is reading the data item A, then in S2, transaction T1 should also read A.
View Serializability
View Serializability
Recoverability of Schedule
What is recoverability?
• Sometimes a transaction may not execute completely due to a software
issue, system crash or hardware failure. In that case, the failed
transaction has to be rollback. But some other transaction may also
have used value produced by the failed transaction.
What is non recoverable schedule?
• A non recoverable schedule means: When there is a system failures,
we may not be able to recover to a consistent database state.
A cascading rollback occurs in database systems when a transaction (T1) causes a failure
and a rollback must be performed. Other transactions dependent on T1's actions must also be
rollbacked due to T1's failure, thus causing a cascading effect. That is, one transaction's
failure causes many to fail.
Concurrency Control
• In the concurrency control, the multiple transactions can be executed
simultaneously.
• It may affect the transaction result. It is highly important to maintain the order
of execution of those transactions.
Problems of concurrency control
Several problems can occur when concurrent transactions are executed in an
uncontrolled manner. Following are the three problems in concurrency control.
• Lost updates
• Dirty read
• Unrepeatable /Nonrepeatable read
lost update problem
• In the lost update problem, update done to a data item by a transaction is lost as it is
overwritten by the update done by another transaction.
• This is incorrect, the correct result is 12-3-2 = 7.figure.1
Dirty read
• A Dirty read is the situation when a transaction reads a data that has not yet
been committed.
• For example, Let’s say transaction 1 updates a row and leaves it
uncommitted, meanwhile, Transaction 2 reads the updated row. If transaction
1 rolls back the change, transaction 2 will have read data that is considered
never to have existed.
Unrepeatable /Non repeatable read
• Non Repeatable read occurs when a transaction reads same row twice, and get a different value
each time.
• For example, suppose transaction T1 reads data. Due to concurrency, another transaction T2
updates the same data and commit, Now if transaction T1 rereads the same data, it will retrieve a
different value.
Concurrency Control Protocol
• Concurrency control protocols ensure atomicity, isolation, and serializability of
concurrent transactions.
• The concurrency control protocol can be divided into three categories:
• Lock based protocol
• Time-stamp protocol
• Validation based protocol
Lock-Based Protocol
• In this type of protocol, any transaction cannot read or write data until it acquires an appropriate
lock on it.
There are two types of lock:
1. Shared lock:
• It is also known as a Read-only lock. In a shared lock, the data item can only read by the
transaction.
• It can be shared between the transactions because when the transaction holds a lock, then it
can't update the data on the data item.
2. Exclusive lock:
• In the exclusive lock, the data item can be both reads as well as written by the transaction.
• This lock is exclusive, and in this lock, multiple transactions do not modify the same data
simultaneously.
Lock-Based Protocol
Types of lock protocols available:
1. Simplistic lock protocol
• It is the simplest way of locking the data while transaction.
• Simplistic lock-based protocols allow all the transactions to get the lock
on the data before insert or delete or update on it.
• It will unlock the data item after completing the transaction.
Lock-Based Protocol
2. Pre-claiming Lock Protocol
• Pre-claiming Lock Protocols evaluate the transaction to list all the data items on which they need
locks.
• Before initiating an execution of the transaction, it requests DBMS for all the lock on all those
data items.
• If all the locks are granted then this protocol allows the transaction to begin. When the
transaction is completed then it releases all the lock.
• If all the locks are not granted then this protocol allows the transaction to rolls back and waits
until all the locks are granted.
Lock-Based Protocol
3. Two-phase locking (2PL)
• The two-phase locking protocol divides the execution phase of the transaction into three parts.
• In the first part, when the execution of the transaction starts, it seeks permission for the lock it
requires.
• In the second part, the transaction acquires all the locks.
• The third phase is started as soon as the transaction releases its first lock.
• In the third phase, the transaction cannot demand any new locks. It only releases the acquired
locks.
Lock-Based Protocol
There are two phases of 2PL:
• Growing phase: In the growing phase, a new lock on the data item may be acquired by the transaction, but
none can be released.
• Shrinking phase: In the shrinking phase, existing lock held by the transaction may be released, but no new
locks can be acquired.
• LOCK POINT: The Point at which the growing phase ends, i.e., when a transaction takes the final lock it
needs to carry on its work.
• 2-PL ensures serializability
Drawbacks of 2-PL:
• Cascading Rollback is possible under 2-PL.
• Deadlocks and Starvation are possible.
Strict 2-PL:
• all Exclusive(X) locks held by the transaction be released until after the Transaction
Commits.
Following Strict 2-PL ensures that our schedule is:
• Recoverable
• Cascadeless
Hence, it gives us freedom from Cascading Abort but still, Deadlocks are possible!
Rigorous 2-PL
• all Exclusive(X) and Shared(S) locks held by the transaction be released
until after the Transaction Commits.
Following Rigorous 2-PL ensures that our schedule is:
• Recoverable
• Cascadeless
• Hence, it gives us freedom from Cascading Abort but still, Deadlocks are possible!
Basic Steps in Query Processing
1. Parsing and translation
2. Optimization
3. Evaluation

You might also like