IT_CS-4002
IT_CS-4002
Page | 1
Banking
We make thousands of transactions through banks daily and we can do this without going to the
bank. So how banking has become so easy that by sitting at home we can send or get money
through banks. That is all possible just because of DBMS that manages all the bank transactions.
Universities and colleges
Examinations are done online today and universities and colleges maintain all these records
through DBMS. Student’s registrations details, results, courses, and grades all the information are
stored in the database.
Credit card transactions
For purchase of credit cards and all the other transactions are made possible only by DBMS. A
credit cardholder knows the importance of their information that all are secured through DBMS.
Social Media Sites
We all are on social media websites to share our views and connect with our friends. Daily millions
of users signed up for these social media accounts like Facebook, Twitter, Pinterest, and Whatsapp.
But how all the information of users is stored and how we become able to connect to other people,
yes this is all because of DBMS.
Telecommunications
Any telecommunication company cannot even think about its business without DBMS. DBMS is
a must for these companies to store the call details and monthly post-paid bills.
Finance
Those days have gone far when information related to money was stored in registers and files.
Today the time has totally changed because there are lots of things to do with finance like storing
sales, holding information and financial statement management, etc.
Military
The military keeps records of millions of soldiers and it has millions of files that should be kept
secured and safe. As DBMS provides a big security assurance to the military information so it is
widely used in militaries. One can easily search for all the information about anyone within
seconds with the help of DBMS.
Page | 2
Online Shopping
Online shopping has become a big trend these days. No one wants to go to the shops and waste his
time. Everyone wants to shop from home. So all these products are added and sold only with the
help of DBMS. Purchase information, invoice bills, and payment, all of these are done with the
help of DBMS.
Human Resource Management
Big firms have many workers working under them. The human resource management department
keeps records of each employee’s salary, tax, and work through DBMS.
Manufacturing
Manufacturing companies make products and sell them on the daily basis. To keep records of all
the details about the products like quantity, bills, purchase, supply chain management, DBMS is
used.
Airline Reservation system
Same as the railway reservation system, the airline also needs DBMS to keep records of flights
arrival, departure, and delay status.
Agriculture
DBMS can be used to maintain the record of crops, machinery, fertilizers, and livestock. One can
keep each record of everything used in agriculture using DBMS.
Characteristics of DBMS
It uses a digital repository established on a server to store and manage the information.
It can provide a clear and logical view of the process that manipulates data.
DBMS contains automatic backup and recovery procedures.
It contains ACID properties which maintain data in a healthy state in case of failure.
It can reduce the complex relationship between data.
It is used to support manipulation and processing of data.
It is used to provide security of data.
It can view the database from different viewpoints according to the requirements of the user.
Advantages of DBMS
Controls database redundancy: It can control data redundancy because it stores all the data in
one single database file and that recorded data is placed in the database.
Page | 3
Data sharing: In DBMS, the authorized users of an organization can share the data among
multiple users.
Easily Maintenance: It can be easily maintainable due to the centralized nature of the database
system.
Reduce time: It reduces development time and maintenance need.
Backup: It provides backup and recovery subsystems which create automatic backup of data from
hardware and software failures and restores the data if required.
Multiple user interface: It provides different types of user interfaces like graphical user
interfaces, application program interfaces
Disadvantages of DBMS
Cost of Hardware and Software: It requires a high speed of data processor and large memory
size to run DBMS software.
Size: It occupies a large space of disks and large memory to run them efficiently.
Complexity: Database system creates additional complexity and requirements.
Higher impact of failure: Failure is highly impacted the database because in most of the
organization, all the data stored in a single database and if the database is damaged due to electric
failure or database corruption then the data may be lost forever.
Database System v/s File System
Redundancy of data: Data is said to be redundant if same data is copied at many places. If a
student wants to change Phone number, he has to get it updated at various sections. Similarly, old
records must be deleted from all sections representing that student.
Inconsistency of Data: Data is said to be inconsistent if multiple copies of same data does not
match with each other. If Phone number is different in Accounts Section and Academics Section,
it will be inconsistent. Inconsistency may be because of typing errors or not updating all copies of
same data.
Difficult Data Access: A user should know the exact location of file to access data, so the process
is very cumbersome and tedious. If user wants to search student hostel allotment number of a
student from 10000 unsorted students’ records, how difficult it can be.
Unauthorized Access: File System may lead to unauthorized access to data. If a student gets
access to file having his marks, he can change it in unauthorized way.
Page | 4
No Concurrent Access: The access of same data by multiple users at same time is known as
concurrency. File system does not allow concurrency as data can be accessed by only one user at
a time.
There are the following differences between DBMS and File systems:
Components of DBMS
Hardware, Software, Data, Database Access Language, Procedures and Users all together form the
components of a DBMS.
Let us discuss the components one by one clearly.
Hardware
The hardware is the actual computer system used for keeping and accessing the database. The
conventional DBMS hardware consists of secondary storage devices such as hard disks. Databases
run on the range of machines from micro computers to mainframes.
Software
Software is the actual DBMS between the physical database and the users of the system. All the
requests from the user for accessing the database are handled by DBMS.
Data
It is an important component of the database management system. The main task of DBMS is to
process the data. Databases are used to store the data, retrieved, and updated to and from the
databases.
Users
There are a number of users who can access or retrieve the data on demand using the application
and the interfaces provided by the DBMS.
The users of the database can be classified into different groups −
Native Users
Online Users
Sophisticated Users
Specialized Users
Application Users
DBA- Database Administrator
The components of DBMS are given below in pictorial form −
Page | 5
Basis DBMS Approach File System Approach
Meaning DBMS is a collection of data. In DBMS, the user The file system is a collection of data. In this system,
is not required to write the procedures. the user has to write the procedures for managing the
database.
Sharing of data Due to the centralized approach, data sharing is Data is distributed in many files, and it may be of
easy. different formats, so it isn't easy to share data.
Data Abstraction DBMS gives an abstract view of data that hides The file system provides the detail of the data
the details. representation and storage of data.
Security and Protection DBMS provides a good protection mechanism. It isn't easy to protect a file under the file system.
Recovery Mechanism DBMS provides a crash recovery mechanism, The file system doesn't have a crash mechanism, i.e.,
i.e., DBMS protects the user from system if the system crashes while entering some data, then
failure. the content of the file will be lost.
Manipulation DBMS contains a wide variety of sophisticated The file system can't efficiently store and retrieve
Techniques techniques to store and retrieve the data. the data.
Concurrency Problems DBMS takes care of Concurrent access of data In the File system, concurrent access has many
using some form of locking. problems like redirecting the file while deleting
some information or updating some information.
Where to use Database approach used in large systems which File system approach used in large systems which
interrelate many files. interrelate many files.
Cost The database system is expensive to design. The file system approach is cheaper to design.
Data Redundancy and Due to the centralization of the database, the In this, the files and application programs are created
Inconsistency problems of data redundancy and inconsistency by different programmers so that there exists a lot of
are controlled. duplication of data which may lead to inconsistency.
Structure The database structure is complex to design. The file system approach has a simple structure.
Data Independence In this system, Data Independence exists, and it In the File system approach, there exists no Data
can be of two types. Independence.
Integrity Constraints Integrity Constraints are easy to apply. Integrity Constraints are difficult to implement in
file system.
Data Models In the database approach, 3 types of data models In the file system approach, there is no concept of
exist: data models exists.
Page | 6
Flexibility Changes are often a necessity to the content of The flexibility of the system is less as compared to
the data stored in any system, and these changes the DBMS approach.
are more easily with a database approach.
Page | 7
Data Abstraction
Data Abstraction is a process of hiding unwanted or irrelevant details from the end user. It provides
a different view and helps in achieving data independence which is used to enhance the security
of data.
The database systems consist of complicated data structures and relations. For users to access the
data easily, these complications are kept hidden, and only the relevant part of the database is made
accessible to the users through data abstraction.
Levels of abstraction for DBMS
Database systems include complex data-structures. In terms of retrieval of data, reduce complexity
in terms of usability of users and in order to make the system efficient, developers use levels of
abstraction that hide irrelevant details from the users. Levels of abstraction simplify database
design.
Mainly there are three levels of abstraction for DBMS, which are as follows −
Physical or Internal Level
Logical or Conceptual Level
View or External Level
These levels are shown in the diagram below –
Page | 8
Physical or Internal Level
It is the lowest level of abstraction for DBMS which defines how the data is actually stored, it
defines data-structures to store data and access methods used by the database. Actually, it is
decided by developers or database application programmers how to store the data in the database.
So, overall, the entire database is described in this level that is physical or internal level. It is a
very complex level to understand. For example, customer's information is stored in tables and data
is stored in the form of blocks of storage such as bytes, gigabytes etc.
Logical or Conceptual Level
Logical level is the intermediate level or next higher level. It describes what data is stored in the
database and what relationship exists among those data. It tries to describe the entire or whole data
because it describes what tables to be created and what are the links among those tables that are
created.
It is less complex than the physical level. Logical level is used by developers or database
administrators (DBA). So, overall, the logical level contains tables (fields and attributes) and
relationships among table attributes.
View or External Level
It is the highest level. In view level, there are different levels of views and every view only defines
a part of the entire data. It also simplifies interaction with the user and it provides many views or
multiple views of the same database.
View level can be used by all users (all levels' users). This level is the least complex and easy to
understand.
For example, a user can interact with a system using GUI that is view level and can enter details
at GUI or screen and the user does not know how data is stored and what data is stored, this detail
is hidden from the user.
Data Independence
• Logical Data Independence
The capacity to change the conceptual schema without having to change the external schemas and
their associated application programs.
• Physical Data Independence
➢ The capacity to change the internal schema without having to change the conceptual schema.
Page | 9
➢ For example, the internal schema may be changed when certain file structures are reorganized
or new indexes are created to improve database performance.
➢ When a schema at a lower level is changed, only the mappings between this schema and higher
level schemas need to be changed in a DBMS that fully supports data independence.
➢ The higher-level schemas themselves are unchanged.
▪ Hence, the application programs need not be changed since they
refer to the external schemas.
DBMS Architecture
The DBMS design depends upon its architecture. The basic client/server architecture is
used to deal with a large number of PCs, web servers, database servers and other
components that are connected with networks.
The client/server architecture consists of many PCs and a workstation which are connected
via the network.
DBMS architecture depends upon how users are connected to the database to get their
request done.
Types of DBMS Architecture
Database architecture can be seen as a single tier or multi-tier. But logically, database architecture
is of two types like: 2-tier architecture and 3-tier architecture.
1-Tier Architecture
In this architecture, the database is directly available to the user. It means the user can
directly sit on the DBMS and uses it.
Page | 10
Any changes done here will directly be done on the database itself. It doesn't provide a
handy tool for end users.
The 1-Tier architecture is used for development of the local application, where
programmers can directly communicate with the database for the quick response.
2-Tier Architecture
The 2-Tier architecture is same as basic client-server. In the two-tier architecture,
applications on the client end can directly communicate with the database at the server side.
For this interaction, API's like: ODBC, JDBC are used.
The user interfaces and application programs are run on the client-side.
The server side is responsible to provide the functionalities like: query processing and
transaction management.
To communicate with the DBMS, client-side application establishes a connection with the
server side.
Page | 12
Entity − An entity in an ER Model is a real-world entity having properties called attributes.
Every attribute is defined by its set of values called domain. For example, in a school
database, a student is considered as an entity. Student has various attributes like name, age,
class, etc.
Relationship − The logical association among entities is called relationship. Relationships
are mapped with entities in various ways. Mapping cardinalities define the number of
association between two entities.
Mapping cardinalities −
o one to one
o one to many
o many to one
o many to many
Relational Model
The most popular data model in DBMS is the Relational Model. It is more scientific a model than
others. This model is based on first-order predicate logic and defines a table as an n-ary relation.
Page | 13
In normalized relations, values saved are atomic values.
Each row in a relation contains a unique value.
Each column in a relation contains values from a same domain.
Database language
Database Languages are the set of statements that are used to define and manipulate a database. A
Database language has Data Definition Language (DDL), which is used to construct a database &
it has Data Manipulation Language (DML), which is used to access a database.
Database languages provide the tools to implement and manipulate a database. A database
language is comprised of two languages:
1. Data Definition Language (DDL)
2. Data Manipulation Language (DML)
DDL and DML are not two distinct languages but they together form a database language.
Page | 14
2. Data Manipulation Language (DML)
Data Manipulation Language has a set of statements that allows users to access and manipulate
the data in the database. Using DML statements user can retrieve, insert, delete or modify the
information in the database.
The Data Manipulation Languages are further of two types, procedural and non-procedural
languages:
(i) Procedural DMLs:
Procedural DMLs are considered to be low-level languages, and they define what data is needed
and how to obtain that data. The procedural DMLs are also called one-at-a-time DMLs as it
retrieves and processes each record separately.
(ii) Non-Procedural DMLs:
Non-Procedural DMLs are high-level languages, and they precisely define what data is required
without specifying the way to access it. The non-procedural DMLs are also called set-a-time
DMLs; this is because a non-procedural DMLs can retrieve several records using a single DML
command.
Overall Dabatase Structure
Database Users:
Users are differentiated by the way they expect to interact with the system:
Application programmers:
o Application programmers are computer professionals who write application
programs. Application programmers can choose from many tools to develop user
interfaces.
o Rapid application development (RAD) tools are tools that enable an application
programmer to construct forms and reports without writing a program.
Sophisticated users:
o Sophisticated users interact with the system without writing programs. Instead,
they form their requests in a database query language.
o They submit each such query to a query processor, whose function is to break
down DML statements into instructions that the storage manager understands.
Specialized users :
Page | 15
o Specialized users are sophisticated users who write specialized database
applications that do not fit into the traditional data-processing framework.
o Among these applications are computer-aided design systems, knowledge base
and expert systems, systems that store data with complex data types (for example,
graphics data and audio data), and environment-modeling systems.
Naïve users :
o Naive users are unsophisticated users who interact with the system by invoking
one of the application programs that have been written previously.
o For example, a bank teller who needs to transfer $50 from account A to account B
invokes a program called transfer. This program asks the teller for the amount of
money to be transferred, the account from which the money is to be transferred,
and the account to which the money is to be transferred.
Database Administrator:
Coordinates all the activities of the database system. The database administrator has a
good understanding of the enterprise’s information resources and needs.
Database administrator's duties include:
o Schema definition: The DBA creates the original database schema by executing a
set of data definition statements in the DDL.
o Storage structure and access method definition.
o Schema and physical organization modification: The DBA carries out changes to
the schema and physical organization to reflect the changing needs of the
organization, or to alter the physical organization to improve performance.
o Granting user authority to access the database: By granting different types of
authorization, the database administrator can regulate which parts of the database
various users can access.
o Specifying integrity constraints.
o Monitoring performance and responding to changes in requirements.
Query Processor:
The query processor will accept query from user and solves it by accessing the database.
Parts of Query processor:
DDL interpreter
Page | 16
This will interprets DDL statements and fetch the definitions in the data dictionary.
DML compiler
a. This will translates DML statements in a query language into low level instructions that the
query evaluation engine understands.
b. A query can usually be translated into any of a number of alternative evaluation plans for same
query result DML compiler will select best plan for query optimization.
Query evaluation engine
This engine will execute low-level instructions generated by the DML compiler on DBMS.
Storage Manager/Storage Management:
A storage manager is a program module which acts like interface between the data stored
in a database and the application programs and queries submitted to the system.
Thus, the storage manager is responsible for storing, retrieving and updating data in the
database.
The storage manager components include:
o Authorization and integrity manager: Checks for integrity constraints and
authority of users to access data.
o Transaction manager: Ensures that the database remains in a consistent state
although there are system failures.
o File manager: Manages the allocation of space on disk storage and the data
structures used to represent information stored on disk.
o Buffer manager: It is responsible for retrieving data from disk storage into main
memory. It enables the database to handle data sizes that are much larger than the
size of main memory.
o Data structures implemented by storage manager.
o Data files: Stored in the database itself.
o Data dictionary: Stores metadata about the structure of the database.
o Indices: Provide fast access to data items.
Page | 17
Page | 18
Unit -II
Data Modeling using the Entity-Relationship Model
Entity-relationship model is a model used for design and representation of relationships between
data.An Entity–relationship model (ER model) describes the structure of a database with the help
of a diagram, which is known as Entity Relationship Diagram (ER Diagram). An ER model is a
design or blueprint of a database that can later be implemented as a database. The main components
of E-R model are :
1. Entity
2. Attribute
3. Relationships
Entity
An entity may be any object, class, person or place. In the ER diagram, an entity can be represented
as rectangles.
Example : we have two entities Student and College and these two entities have many to one
relationship as many students study in a single college.
Attributes
Simple attribute : Simple attributes are those attributes which can not be divided further. For
example, student's age.
Composite attribute : A composite attribute is made up of more than one simple attribute. For
example, student's address will contain, house no., street name, pincode etc.
Page | 19
Here, the attributes “Name” and “Address” are composite attributes as they are composed of many
other simple attributes.
Derived attribute : These are the attributes which are not present in the whole database
management system, but are derived using other attributes. For example, average age of students
in a class.
Single-valued attribute : As the name suggests, they have a single value. in the below example ,
all the attributes are single valued attributes as they can take only one specific value for each entity.
Multi-valued attribute : they can have multiple values. in the given example below the attributes
“Mob_no” and “Email_id” are multi valued attributes as they can take more than one values for a
given entity.
Page | 20
Strong entity set V/S Weak entity set
Relationships :
When an Entity is related to another Entity, they are said to have a relationship. For example, A
Class Entity is related to Student entity, because students study in classes, hence this is a
relationship. Depending upon the number of entities involved, a degree is assigned to relationships.
For example, if 2 entities are involved, it is said to be Binary relationship, if 3 entities are involved,
it is said to be Ternary relationship, and so on...
Mapping Constraints
One to One: An entity of entity-set A can be associated with at most one entity of entity-set B and
an entity in entity-set B can be associated with at most one entity of entity-set A.
Page | 21
One to Many: An entity of entity-set A can be associated with any number of entities of entity-
set B and an entity in entity-set B can be associated with at most one entity of entity-set A.
Many to One: An entity of entity-set A can be associated with at most one entity of entity-set B
and an entity in entity-set B can be associated with any number of entities of entity-set A.
Many to Many: An entity of entity-set A can be associated with any number of entities of entity-
set B and an entity in entity-set B can be associated with any number of entities of entity-set A.
Page | 22
Mapping Cardinality
Mapping Cardinality describes the maximum number of entities that a given entity can be
associated with via a relationship. In this section, we consider only the cardinality constraint for
the binary relationship. The possible cardinality for binary relationship types are : One - to- One
(1-1) , One – to – Many (1 – N) and Many – to – Many (M-N).
Participation
The participation constraints specifiy whether the existence of an entity depends on its being
related to another entity via the relationship type.
Participation in relationship set R by entity set A may be
Total : It means every entity a in A participates in at least 1 relationship in R
Partial: It means only some a in A participate in relationships in R
Example Every project have at least 1 employee joined in it.
Not every employee in the company join in a project
Page | 23
Participation in relationship notation
Degree of Relationship Type
The degree of a relationship type is the number of participating entyties types.
Unary (Recursive) relationship type is the relationship that involve only one entity type. However,
the same entity type participates in a relationship type in different roles. For example, figure below
shows the Supervise relationship type which relates an Employee and a Supervisor who is also an
employee. So in this relationship, one employee has the role of supervisor, another has the role of
supervisee.
Binary relationship type: This relationship type has two entity types link together. This is the
most common relationship.
For example the “Joins in” relationship between EMPLOYEE and PROJECT
Ternary relationship type: If there are three entity types link together, the relationship is called
ternary relationship.
For example: The Supply relationship associates a SUPPLIER, a PART and a PROJECT.
Page | 24
E-R diagram notations
This is an entity that can’t solely be identified with its attributes (due to
Weak Entity the absence of a primary key). It inherits the identifier of its parent entity
and often integrated it with a partial key.
Page | 25
Key This is a special attribute that is used to uniquely identify an entity. It
Optional
Attribute This means thatby
is represented theanentities don’titshave
oval with namea mandatory partition in the set and
underlined.
Participation are represented by a dotted line.
Multi- These are the attributes that can have multiple values (like the Name
valued attribute can have First and Last name) and are represented by a
Attribute double oval.
Partial This depicts that not all the entities in the set are a part of the
Participation relationship and is depicted by a single line.
Total This means that all the entities in the set are in a relationship and are
Participation depicted by a double line.
Keys
A key in DBMS is an attribute or a set of attributes that help to uniquely identify a tuple (or row)
in a relation (or table). Keys are also used to establish relationships between the different tables
and columns of a relational database. Individual values in a key are called key values.
Keys play an important role in the relational database.
It is used to uniquely identify any record or row of data from the table. It is also used to
establish and identify relationships between tables.
For example, ID is used as a key in the Student table because it is unique for each student. In the
PERSON table, passport_number, license_number, SSN are keys since they are unique for each
person.
Page | 26
Types of keys:
1. Primary key
It is the first key used to identify one and only one instance of an entity uniquely. An entity
can contain multiple keys, as we saw in the PERSON table. The key which is most suitable
from those lists becomes a primary key.
In the EMPLOYEE table, ID can be the primary key since it is unique for each employee.
In the EMPLOYEE table, we can even select License_Number and Passport_Number as
primary keys since they are also unique.
For each entity, the primary key selection is based on requirements and developers.
Page | 27
2. Candidate key
A candidate key is an attribute or set of attributes that can uniquely identify a tuple.
Except for the primary key, the remaining attributes are considered a candidate key. The
candidate keys are as strong as the primary key.
For example: In the EMPLOYEE table, id is best suited for the primary key. The rest of
the attributes, like SSN, Passport_Number, License_Number, etc., are considered a
candidate key.
3. Super Key
Super key is an attribute set that can uniquely identify a tuple. A super key is a superset of a
candidate key.
5. Alternate key
There may be one or more attributes or a combination of attributes that uniquely identify each
tuple in a relation. These attributes or combinations of the attributes are called the candidate keys.
One key is chosen as the primary key from these candidate keys, and the remaining candidate key,
if it exists, is termed the alternate key. In other words, the total number of the alternate keys is the
total number of candidate keys minus the primary key. The alternate key may or may not exist. If
there is only one candidate key in a relation, it does not have an alternate key.
For example, employee relation has two attributes, Employee_Id and PAN_No, that act as
candidate keys. In this relation, Employee_Id is chosen as the primary key, so the other candidate
key, PAN_No, acts as the Alternate key.
6. Composite key
Whenever a primary key consists of more than one attribute, it is known as a composite key. This
key is also known as Concatenated Key.
Page | 29
For example, in employee relations, we assume that an employee may be assigned multiple roles,
and an employee may work on multiple projects simultaneously. So the primary key will be
composed of all three attributes, namely Emp_ID, Emp_role, and Proj_ID in combination. So
these attributes act as a composite key since the primary key comprises more than one attribute.
7. Artificial key
The key created using arbitrarily assigned data are known as artificial keys. These keys are created
when a primary key is large and complex and has no relationship with many other relations. The
data values of the artificial keys are usually numbered in a serial order.
For example, the primary key, which is composed of Emp_ID, Emp_role, and Proj_ID, is large in
employee relations. So it would be better to add a new virtual attribute to identify each tuple in the
relation uniquely.
Page | 30
These concepts are used when the comes in EER schema and the resulting schema
diagrams called as EER Diagrams.
Features of EER Model
EER creates a design more accurate to database schemas.
It reflects the data properties and constraints more precisely.
It includes all modeling concepts of the ER model.
Diagrammatic technique helps for displaying the EER schema.
It includes the concept of specialization and generalization.
It is used to represent a collection of objects that is union of objects of different of different
entity types.
A. Sub Class and Super Class
Sub class and Super class relationship leads the concept of Inheritance.
The relationship between sub class and super class is denoted with symbol.
1. Super Class
Super class is an entity type that has a relationship with one or more subtypes.
An entity cannot exist in database merely by being member of any super class.
For example: Shape super class is having sub groups as Square, Circle, Triangle.
2. Sub Class
Sub class is a group of entities with unique attributes.
Sub class inherits properties and attributes from its super class.
For example: Square, Circle, Triangle are the sub class of Shape super class.
Page | 31
B. Specialization and Generalization
1. Generalization
Generalization is the process of generalizing the entities which contain the properties of all
the generalized entities.
It is a bottom approach, in which two lower level entities combine to form a higher level
entity.
Generalization is the reverse process of Specialization.
It defines a general entity type from a set of specialized entity type.
It minimizes the difference between the entities by identifying the common features.
For example:
Generalization In the above example, Tiger, Lion, Elephant can all be generalized as Animals.
2. Specialization
Specialization is a process that defines a group entities which is divided into sub groups
based on their characteristic.
It is a top down approach, in which one higher entity can be broken down into two lower
level entity.
It maximizes the difference between the members of an entity by identifying the unique
characteristic or attributes of each member.
Page | 32
It defines one or more sub class for the super class and also forms the superclass/subclass
relationship.
C. Category or Union
Category represents a single super class or sub class relationship with more than one super class.
It can be a total or partial participation.
For example Car booking, Car owner can be a person, a bank (holds a possession on a Car) or a
company. Category (sub class) → Owner is a subset of the union of the three super classes →
Company, Bank, and Person. A Category member must exist in at least one of its super classes.
categories union type
Page | 33
D. Aggregation
Aggregation is a process that represent a relationship between a whole object and its
component parts.
It abstracts a relationship between objects and viewing the relationship as an object.
It is a process when two entity is treated as a single entity.
Aggregation
In the above example, the relation between College and Course is acting as an Entity in Relation
with Student.
Page | 34
Unit-III
Relational model
The relational model for database management is an approach to logically
represent and manage the data stored in a database. In this model, the data is organized into a
collection of two-dimensional inter-related tables, also known as relations. Each relation is a
collection of columns and rows, where the column represents the attributes of an entity and the
rows (or tuples) represents the records.
Relational Model was proposed by E.F. Codd to model data in the form of
relations or tables. After designing the conceptual model of Database using ER diagram, we need
to convert the conceptual model in the relational model which can be implemented using any
RDBMS languages like Oracle SQL, MySQL etc. So we will see what Relational Model is.
Relational Model Concepts
The set of allowed values for each attribute is called the domain of the attribute (denoted as D)
Attribute values are (normally) required to be atomic; that is, indivisible
The special value null is a member of every domain
The null value causes complications in the definition of many operations
We shall represent a relation as a table with columns and rows. Each column of the table has a
name, or attribute. Each row is called a tuple.
A relational database is based on the relational model. This database consists of various
components based on the relational model. These include:
Relation : Two-dimensional table used to store a collection of data elements.
Tuple : Row of the relation, depicting a real-world entity.
Attribute/Field : Column of the relation, depicting properties that define the relation.
Attribute Domain : Set of pre-defined atomic values that an attribute can take i.e., it describes
the legal values that an attribute can take.
Degree : It is the total number of attributes present in the relation.
Cardinality : It specifies the number of entities involved in the relation i.e., it is the total number
of rows present in the relation.
Relation key : Every row has one, two or multiple attributes, which is called relation key.
Relational Schema : It is the logical blueprint of the relation i.e., it describes the design and the
structure of the relation. It contains the table name, its attributes, and their types:
Page | 35
Relational Instance : It is the collection of records present in the relation at a given time.
Page | 36
• Each attribute has a distinct name
• Values of an attribute are all from the same domain
• Order of attributes has no significance
• Each tuple is distinct; there are no duplicate tuples
• Order of tuples has no significance, theoretically.
Constraints in Relational Model
Relational models make use of some rules to ensure the accuracy and accessibility of the data.
These rules or constraints are known as Relational Integrity Constraints. These constraints are
checked before performing any operation like insertion, deletion, or updation on the data present
in a relational database. These constraints include:
Domain Constraint : It specifies that every attribute is bound to have a value that lies
inside a specific range of values. It is implemented with the help of the Attribute Domain
concept.
Key Constraint : It states that every relation must contain an attribute or a set of attributes
(Primary Key) that can uniquely identify a tuple in that relation. This key can never be
NULL or contain the same value for two different tuples.
Referential Integrity Constraint : It is defined between two inter-related tables. It states
that if a given relation refers to a key attribute of a different or same table, then that key
must exist in the given relation.
Highlights:
To ensure data accuracy and accessibility, Relational Integrity Constraints are
implemented.
It includes domain, key, and referential integrity constraints.
Anomalies in Relational Model
When we notice any unexpected behavior while working with the relational databases, there may
be a presence of too much redundancy in the data stored in the database. This can cause anomalies
in the DBMS and it can be of various types such as:
Insertion Anomalies: It is the inability to insert data in the database due to the absence of
other data. For example: Suppose we are dividing the whole class into groups for a project
and the GroupNumber attribute is defined so that null values are not allowed. If a new
Page | 37
student is admitted to the class but not immediately assigned to a group then this student
can't be inserted into the database.
Deletion Anomalies - It is the accidental loss of data in the database upon deletion of any
other data element. For example: Suppose, we have an employee relation that contains the
details of the employee along with the department they are working in. Now, if a
department has one employee working in it and we remove the information of this
employee from the table, there will be the loss of data related to the department also. This
can lead to data inconsistency.
Modification/Update Anomalies - It is the data inconsistency that arises from data
redundancy and partial updation of data in the database. For example: Suppose, while
updating the data into the database duplicate entries were entered. Now, if the user does
not realize that the data is stored redundantly after updation, there will be data
inconsistency in the database.
Advantages of using the relational model
The advantages and reasons due to which the relational model in DBMS is widely accepted as a
standard are:
Simple and Easy To Use - Storing data in tables is much easier to understand and
implement as compared to other storage techniques.
Manageability - Because of the independent nature of each relation in a relational
database, it is easy to manipulate and manage. This improves the performance of the
database.
Query capability - With the introduction of relational algebra, relational databases provide
easy access to data via high-level query language like SQL.
Data integrity - With the introduction and implementation of relational constraints, the
relational model can maintain data integrity in the database.
Disadvantages of using the relational model
The main disadvantages of relational model in DBMS occur while dealing with a huge amount of
data as:
The performance of the relational model depends upon the number of relations present in
the database.
Hence, as the number of tables increases, the requirement of physical memory increases.
Page | 38
The structure becomes complex and there is a decrease in the response time for the queries.
Because of all these factors, the cost of implementing a relational database increase.
Codd Rules in DBMS
Edgar F. Codd, the creator of the relational model proposed 13 rules known as Codd Rules that
states:
For a database to be considered as a perfect relational database, it must follow the following rules:
1. Foundation Rule - The database must be able to manage data in relational form.
2. Information Rule - All data stored in the database must exist as a value of some table cell.
3. Guaranteed Access Rule - Every unique data element should be accessible by only a
combination of the table name, primary key value, and the column name.
4. Systematic Treatment of NULL values - Database must support NULL values.
5. Active Online Catalog - The organization of the database must exist in an online catalog
that can be queried by authorized users.
6. Comprehensive Data Sub-Language Rule - Database must support at least one language
that supports: data definition, view definition, data manipulation, integrity constraints,
authorization, and transaction boundaries.
7. View Updating Rule - All views should be theoretically and practically updatable by the
system.
8. Relational Level Operation Rule - The database must support high-level insertion,
updation, and deletion operations.
9. Physical Data Independence Rule - Data stored in the database must be independent of
the applications that can access it i.e., the data stored in the database must not depend on
any other data or an application.
10. Logical Data Independence Rule - Any change in the logical representation of the data
(structure of the tables) must not affect the user's view.
11. Integrity independence - Changing the integrity constraints at the database level should
not reflect any change at the application level.
12. Distribution independence - The database must work properly even if the data is stored
in multiple locations or is being used by multiple end-users.
Page | 39
13. Non-subversion Rule - Accessing the data by low-level relational language should not be
able to bypass the integrity rules and constraints expressed in the high-level relational
language.
ER/EER to Relational Model map- ping
Step 1: For each regular entity type E
• Create a relation R that includes all the simple attributes of E.
• Include all the simple component attributes of composite attributes.
• Choose one of the key attributes of E as primary key for R.
• If the chosen key of E is composite, the set of simple attributes that form it will together
form the primary key of R.
Step 2: For each weak entity type
W with owner entity type E
• Create a relation R, and include all simple attributes and simple components of composite
attributes of W as attributes of R.
• In addition, include as foreign key attributes of R the primary key attribute(s) of the
• relation(s) that correspond to the owner entity type(s)
Step 3: For each binary 1:1 relationship type R
• •Identify the relations S and T that correspond to the entity types participating in R. Choose
one of the relations, say S, and include as foreign key in S the primary key of T.
• It is better to choose an entity type with total participation in R in the role of S.
• Include the simple attributes of the 1:1 relationship type R as attributes of S.
• If both participations are total, we may merge the two entity types and the relationship into
a single relation
Step 4: For each regular binary 1:N
Relationship type R
• Identify the relation S that represents the participating entity type at the N-side of the
relationship type.
• Include as foreign key in S the primary key of the relations T that represents the other entity
type participating in R.
• Include any simple attributes of the 1:N relationship type as attributes of S.
Page | 40
Step 5: For each binary M:N relationship type R
• Create a new relation S to represent R.
• Include as foreign key attributes in S the primary keys of the relations that represent the
participating entity types; their combination will form the primary key of S.
• Also, include any simple attributes of the M:N relationship type as attributes of S.
Step 6: For each multi-valued attribute A
• Create a new relation R that includes an attribute corresponding to A plus the primary key
attribute K (as a foreign key in R) of the relation that represents the entity type or
relationship type that has A as an attribute.
• The primary key of R is the combination of A and K. If a multi-valued attribute is
composite, we include its components.
Step 7: For each n-ary relationship type R, n>2
• Create a new relation S to represent R.
• Include as foreign key attributes in the S the primary keys of the relations that represent
the participating entity types.
• Also include any simple attributes of the nary relationship types as attributes of S.
• The primary key for S is usually a combination of all the foreign keys that reference the
relations representing the participating entity types.
• However, if the participation constraint (min,max) of one of the entity types E participating
in the R has max =1, then the primary key of S can be the single foreign key attribute that
references the relation E’ corresponding to E
• This is because, in this case, each entity e in E will participate in at most one relationship
instance of R and hence can uniquely identify that relationship instance.
Step 8: To convert each super-class /subclass relationship into a relational schema you must use
one of the four options available. Let C be the super-class, K its primary key and A1, A2, …, An
its remaining attributes and let S1, S2, …, Sm be the sub-classes.
Option 8A (multiple relation option):
• Create a relation L for C with attributes Attrs(L) = {K, A1, A2, …, An} and PK(L) = K.
• Create a relation Li for each subclass Si, 1 < i < m, with the attributes ATTRS(Li) = {K}
U {attributes of Si} and PK(Li) = K.
• This option works for any constraints: disjoint or overlapping; total or partial.
Page | 41
Option 8B (multiple relation option):
• Create a relation Li for each subclass Si, 1 < i < m, with ATTRS(Li) = {attributes of Si} U
{K, A1, A2, …, An} PK(Li) = K
• This option works well only for disjoint and total constraints.
• If not disjoint, redundant values for inherited attributes.
• If not total, entity not belonging to any sub-class is lost.
Option 8c (Single Relation Option)
• Create a single relation L with attributes Attrs(L) = {K, A1, …, An} U {attributes of S1}
U… U
• {attributes of S m} U {T} and PK(L)=K
• This option is for specialization whose subclasses are DISJOINT, and T is a type attribute
that indicates the subclass to which each tuple belongs, if any. This option may generate a
large number of null values.
• Not recommended if many specific attributes are defined in subclasses (will result in many
null values!)
Option 8d (Single Relation Option)
• Create a single relation schema L with attributes Attrs(L) = {K, A1, …, An} U attributes
of S1}U… U{attributes of Sm} U {T1, …, Tn}and PK(L)=K
• This option is for specialization whose subclasses are overlapping, and each Ti, 1 < i < m,
is a Boolean attribute indicating whether a tuple belongs to subclass Si.
• This option could be used for disjoint subclasses too.
Relational Algebra
Relational algebra is a procedural query language, which takes instances of relations as input and
yields instances of relations as output. It uses operators to perform queries. An operator can be
either unary or binary. They accept relations as their input and yield relations as their output.
Relational algebra is performed recursively on a relation and intermediate results are also
considered relations.
Relational algebra in DBMS is a procedural query language. Queries in relational algebra are
performed using operators. Relational Algebra is the fundamental block for modern language SQL
and modern Database Management Systems such as Oracle Database, Mircosoft SQL Server, IBM
Db2, etc.
Page | 42
SELECT OPERATOR
Unary operator (one relation as operand)
Returns subset of the tuples from a relation that satisfies a selection condition:
𝜎<𝑠𝑒𝑙𝑒𝑐𝑡𝑖𝑜𝑛 𝑐𝑜𝑛𝑑𝑖𝑡𝑖𝑜𝑛> 𝑅
where <selection condition>
Notation: s p(r)
• p is called the selection predicate
• Defined as:
sp(r) = {t | t r and p(t)}
Where p is a formula in propositional calculus consisting of terms connected by : (and),
(or), (not)
Each term is one of:
<attribute>op <attribute> or <constant>
where op is one of: =, , >, . <.
• Example of selection:
dept_name=“Physics”(instructor)
Relation r
Page | 43
Page | 44
Page | 45
Page | 46
Page | 47
Page | 48
Page | 49
Additional Operations
We define additional operations that do not add any power to the relational algebra,
but that simplify common queries.
• Set intersection
• Natural join
• Assignment
• Outer join
Page | 50
Page | 51
Page | 52
Page | 53
Division operation
As the name of this operation implies, it involves dividing one relation by another. - The division
operator is used for queries which involve the ‘all’qualifier such as “Find the names of sailors who
have reserved all boats”.
Division operator A÷B can be applied if and only if:
• Attributes of B is proper subset of Attributes of A.
• The relation returned by division operator will have attributes = (All attributes of A – All
Attributes of B)
• The relation returned by division operator will return those tuples from relation A which
are associated to every B’s tuple.
Page | 54
Page | 55
Page | 56
Page | 57
Page | 58
Page | 59
Relational Calculus
There is an alternate way of formulating queries known as Relational Calculus. Relational calculus
is a non-procedural query language. In the non-procedural query language, the user is concerned
with the details of how to obtain the end results. The relational calculus tells what to do but never
explains how to do. Most commercial relational languages are based on aspects of relational
calculus including SQL-QBE and QUEL.
Why it is called Relational Calculus?
It is based on Predicate calculus, a name derived from branch of symbolic language. A predicate
is a truth-valued function with arguments. On substituting values for the arguments, the function
Page | 60
result in an expression called a proposition. It can be either true or false. It is a tailored version of
a subset of the Predicate Calculus to communicate with the relational database.
Many of the calculus expressions involves the use of Quantifiers. There are two types of
quantifiers:
• Universal Quantifiers: The universal quantifier denoted by ∀ is read as for all which
means that in a given set of tuples exactly all tuples satisfy a given condition.
• Existential Quantifiers: The existential quantifier denoted by ∃ is read as for all which
means that in a given set of tuples there is at least one occurrences whose value satisfy a
given condition.
Before using the concept of quantifiers in formulas, we need to know the concept of Free and
Bound Variables.
A tuple variable t is bound if it is quantified which means that if it appears in any occurrences a
variable that is not bound is said to be free.
Free and bound variables may be compared with global and local variable of programming
languages.
Types of Relational calculus:
DBMS Relational Calculus
Page | 61
without giving a specific procedure for obtaining that information. The tuple relational calculus is
specified to select the tuples in a relation. In TRC, filtering variable uses the tuples of a relation.
The result of the relation can have one or more tuples.
Notation:
A Query in the tuple relational calculus is expressed as following notation
{T | P (T)} or {T | Condition (T)}
Where
T is the resulting tuples
P(T) is the condition used to fetch T.
For example:
{T.name | Author(T) AND T.article = 'database' }
Output: This query selects the tuples from the AUTHOR relation. It returns a tuple with 'name'
from Author who has written an article on 'database'.
TRC (tuple relation calculus) can be quantified. In TRC, we can use Existential (∃) and Universal
Quantifiers (∀).
For example:
{ R| ∃T ∈ Authors(T.article='database' AND R.name=T.name)}
Output: This query will yield the same result as the previous one.
2. Domain Relational Calculus (DRC)
The second form of relation is known as Domain relational calculus. In domain relational calculus,
filtering variable uses the domain of attributes. Domain relational calculus uses the same operators
as tuple calculus. It uses logical connectives ∧ (and), ∨ (or) and ┓ (not). It uses Existential (∃)
and Universal Quantifiers (∀) to bind the variable. The QBE or Query by example is a query
language related to domain relational calculus.
Notation:
{a1, a2, a3, ..., an | P (a1, a2, a3, ... ,an)}
Where
a1, a2 are attributes
P stands for formula built by inner attributes
For example:
{< article, page, subject > | ∈ javatpoint ∧ subject = 'database'}
Page | 62
Output: This query will yield the article, page, and subject from the relational javatpoint, where
the subject is a database.
Page | 63
Unit-IV
Page | 64
Integrity and consistency
While security is mostly based on authentication and authorization procedures, data integrity plays
a certain role in protecting data from unintentional or malicious manipulation. For example, even
if a user gains access to the database (by stealing a password, for example), s/he still has to follow
relational rules for data manipulation, which, among others, do not allow orphaned records; s/he
wont be able to delete records from a parent table without understanding database relationships
(though some vendors had implemented the CASCADE feature that instructs RDBMS to remove
child records upon deletion of the parent one), won't be able to insert a duplicate a record into a
column protected by the UNIQUE constraint, or won't be able to insert invalid data that would
violate CHECK constraints.
Auditing
Auditing provides means to monitor database activity, both legitimate and unauthorized. It
preserves the trail of database access attempts — either successful or failed, data deletions and
inserts (in case one has to find out what had happened), and so on. It is a necessary component in
order to be considered for security certification, discussed later in the chapter.
We discuss some of the methods that have been developed for accessing databases from programs.
Most database access in practical applications is accomplished through software programs that
implement database applications. This software is usually developed in a general-purpose
programming language such as Java, C/C++/C#, COBOL (historically), or some other pro-
gramming language.
Overview of Database Programming Techniques Issues
Most database systems have an interactive interface where these SQL commands can be typed
directly into a monitor for execution by the database system. For example, in a computer system
where the Oracle RDBMS is installed, the command SQLPLUS starts the interactive interface.
The user can type SQL commands or queries directly over several lines, ended by a semicolon and
the Enter key (that is, ";<cr>"). Alternatively, a file of commands can be created and executed
through the interactive interface by typing @<filename>. The system will execute the commands
written in the file and display the results, if any.
The interactive interface is quite convenient for schema and constraint creation or for occasional
ad hoc queries. However, in practice, the majority of database interactions are executed through
Page | 65
programs that have been carefully designed and tested. These programs are generally known
as application programs or database applications, and are used as canned transactions by the end
users,
Approaches to Database Programming
Several techniques exist for including database interactions in application pro-grams. The main
approaches for database programming are the following:
1. Embedding database commands in a general-purpose programming language.
In this approach, database statements are embedded into the host programming language, but they
are identified by a special prefix. For example, the prefix for embedded SQL is the string EXEC
SQL, which pre-cedes all SQL commands in a host language program. A precompile or
preprocessor scans the source program code to identify database statements and extract them for
processing by the DBMS. They are replaced in the program by function calls to the DBMS-
generated code. This technique is generally referred to as embedded SQL.
2. Using a library of database functions or classes.
A library of functions is made available to the host programming language for database calls. For
example, there could be functions to connect to a database, prepare a query, execute a query,
execute an update, loop over the query result on record at a time, and so on. The actual database
query and update commands and any other necessary information are included as parameters in
the function calls. This approach provides what is known as an application programming interface
(API) for accessing a database from application programs. For object-oriented programming
languages (OOPLs), a class library is used. For example, Java has the JDBC class library, which
can generate various types of objects such as: connection objects to a particular database, query
objects, and query result objects. Each type of object has a set of operations associated with the
class corresponding to the object.
3. Designing a brand-new language.
A database programming language is designed from scratch to be compatible with the database
model and query language. Additional programming structures such as loops and conditional
statements are added to the database language to convert it into a full-fledged programming
language. An example of this approach is Oracle’s PL/SQL. The SQL standard has the SQL/PSM
language for specifying stored procedures.
Page | 66
Steps in Database Programming
1. Client program opens a connection to the database server
2. Client program submits queries and/or updates to the database server
3. When database access is no longer needed, client program closes/terminates the connection to
the database
Embedded SQL
• Most SQL statements can be embedded in a general-purpose host programming language
such as COBOL, C, Java
• An embedded SQL statement is distinguished from the host language statements by
prefixing with EXEC SQL and terminated with a matching END-EXEC (or semicolon)
• shared variables (variables used in both language statements) are usually prefixed with a
colon (:) in SQL statements
Example: Variable Declaration in Language C
• Variables inside DECLARE are shared and can appear (while prefixed by a colon) in SQL
statements
• SQLSTATE and/or SQLCODE is used to communicate errors/exceptions between the
database and the program
int loop;
EXEC SQL BEGIN DECLARE SECTION;
varchar dname[16], fname[16], lname[16], address[31];
char ssn[10], bdate[11],sex [2], minit[2];
float salary, raise;
int dno, dnumber;
int SQLCODE; char SQLSTATE[6];
EXEC SQL END DECLARE SECTION;
Connection to a Database
• SQL Commands to connect to a database server
• CONNECT TO server-name AS connection-name
• AUTHORIZATION user-account-info ;
• Multiple connections in one program are possible but only one will be active active
• Changing from an active connection to another
Page | 67
• SET CONNECTION connection-name;
• Disconnecting
• DISCONNECT connection-name;
Embedded SQL in a C Program Example:
loop = 1;
while (loop) {
prompt (“Enter a Social Security Number: “, ssn);
EXEC SQL
select FNAME, LNAME, ADDRESS, SALARY
into :fname, :lname, :address, :salary
from EMPLOYEE where SSN == :ssn;
if (SQLCODE == 0) printf(fname, …);
else printf(“SSN does not exist: “, ssn);
prompt(“More SSN? (1 for yes, 0 for no): “, loop);
END-EXEC
}
Embedded SQL in C Programming Examples
• A cursor (iterator) is needed to process multiple tuples
• FETCH commands move the cursor to the next tuple
• CLOSE CURSOR indicates that the processing of query results has been completed
Dynamic SQL
• Objective: executing new (not previously compiled) SQL statements at run-time
• A program accepts SQL statements from the keyboard at run-time
• A point-and-click operation translates to certain SQL query
• Dynamic update is relatively simple; dynamic query can be complex
• Because the type and number of retrieved attributes are unknown at compile time
Dynamic SQL: An Example
Program segment E3, a C program segment that uses dynamic SQL for updating a table.
EXEC SQL BEGIN DECLARE SECTION;
varchar sqlupdatestring[256];
EXEC SQL END DECLARE SECTION;
Page | 68
…
prompt (“Enter update command:“, sqlupdatestring);
EXEC SQL PREPARE sqlcommand FROM :sqlupdatestring;
EXEC SQL EXECUTE sqlcommand;
Page | 69
Multiple Tuples in SQLJ
• SQLJ supports two types of iterators:
• named iterator : associated with a query result
• positional iterator : lists only attribute types in a query result
• •A FETCH operation retrieves the next tuple in a query result:
• fetch iterator-variable into program-variable
Database Programming with Function Calls
• Embedded SQL provides static database programming
• API: dynamic database programming with a library of functions
Advantage:
• No preprocessor needed (thus more flexible)
Disadvantage
• SQL syntax checks to be done at run-time
• Sometimes requires more complex programming to access query results because
the number of attributes and their types in a query result may not be known in
advance
SQL/Call Level Interface
• A part of the SQL standard
• Provides easy access to several databases within the same program
• Certain libraries (e.g., sqlcli.h for C) have to be installed and available
• SQL statements are dynamically created and passed as string parameters in the calls
Components of SQL/CLI
Four kinds of records to keep track of needed information
1. Environment record
• To keep track of database connections
2. Connection record
• To keep track of info needed for a particular connection
3. Statement record
• To keep track of info needed for one SQL statement
4. Description record
Page | 70
• To keep track of tuples
• Each record is accessible to a C program through a pointer variable --- called a
handle to the record
• Persistent procedures/functions (modules) are stored locally and executed on the database
server (as opposed to execution on clients)
• Useful if
• The procedure is needed by many applications, it can be invoked by any of them
(thus reduce duplications)
• The execution on the server reduces communication costs
• It enhance the modeling power of views
• A stored procedure
Page | 71
local-declarations
procedure-body;
OR
• A stored function
function-body;
• SQL/PSM: part of the SQL standard for writing persistent stored modules
• SQL + stored procedures/functions + additional programming constructs, e.g.,
branching and looping statements
• To enhance the programming power of SQL
RETURNS VARCHAR[7]
Page | 72
FROM EMPLOYEE WHERE DNO = deptno;
ENDIF;
Constraints as Assertions
• General constraints: constraints that do not fit in the basic SQL categories
• Mechanism: CREAT ASSERTION
• components include: a constraint name, followed by CHECK, followed by a
condition
Assertions: An Example
• “The salary of an employee must not be greater than the salary of the manager of the
department that the employee works for’’
(SELECT *
DEPARTMENT D
E.DNO=D.NUMBER AND
D.MGRSSN=M.SSN));
Page | 73
Using General Assertions
• Specify a query that violates the condition; include inside a NOT EXISTS clause
• if the query result is not empty, the assertion has been violated
SQL Triggers
• Triggers are expressed in a syntax similar to assertions and include the following:
• Condition
Page | 74
• A convenience for expressing certain operations
Specification of Views
• SQL command: CREATE VIEW
• A view (a virtual table ) name
• A possible list of attribute names (for example, when arithmetic operations are
specified or when we want the names to be different from the attributes in the
base relations)
• A query to specify the table contents
GROUP BY PNAME;
• Query modification: present the view query in terms of a query on the underlying base tables
Page | 75
• Disadvantage: inefficient for views defined via complex queries (especially if
additional queries are to be applied to the view within a short time period)
View Update
Un-updatable Views
• Views defined using groups and aggregate functions are not updateable
• Views defined on multiple tables using joins are generally not updateable
• WITH CHECK OPTION: must be added to the definition of a view if the view is
to be updated
• To allow check for updatability and to plan for an execution strategy
Page | 76
Unit-V
Functional Dependency
Functional Dependency (FD) is a constraint that determines the relation of one attribute to another
attribute in a Database Management System (DBMS). Functional Dependency helps to maintain
the quality of data in the database. It plays a vital role to find the difference between good and bad
database design.
A functional dependency is denoted by an arrow “→”. The functional dependency of X on Y is
represented by X → Y.
The functional dependency is a relationship that exists between two attributes. It typically exists
between the primary key and non-key attribute within a table.
X → Y
The left side of FD is known as a determinant, the right side of the production is known as a
dependent.
For example:
Assume we have an employee table with attributes: Emp_Id, Emp_Name, Emp_Address. Here
Emp_Id attribute can uniquely identify the Emp_Name attribute of employee table because if we
know the Emp_Id, we can tell that employee name associated with it.
Functional dependency can be written as:
Emp_Id → Emp_Name
We can say that Emp_Name is functionally dependent on Emp_Id.
Key terms
Here, are some key terms for Functional Dependency in Database:
• Axiom :Axioms is a set of inference rules used to infer all the functional dependencies on
a relational database.
• Decomposition: It is a rule that suggests if you have a table that appears to contain two
entities which are determined by the same primary key then you should consider breaking
them up into two different tables.
• Dependent: It is displayed on the right side of the functional dependency diagram.
• Determinant: It is displayed on the left side of the functional dependency Diagram.
• Union: It suggests that if two tables are separate, and the PK is the same, you should
consider putting them. together
Page | 77
Types of Functional dependency
Page | 78
the set of all functional dependencies logically implied by F. Armstrong’s Axioms are a set of
rules, that when applied repeatedly, generates a closure of functional dependencies.
Primary Rules
Relational Decomposition
When a relation in the relational model is not in appropriate normal form then the decomposition of a
relation is required.
In a database, it breaks the table into multiple tables.
If the relation has no proper decomposition, then it may lead to problems like loss of information.
Decomposition is used to eliminate some of the problems of bad design like anomalies,
inconsistencies, and redundancy.
Types of Decomposition
Page | 79
Lossless Decomposition
• If the information is not lost from the relation that is decomposed, then the decomposition will
be lossless.
• The lossless decomposition guarantees that the join of relations will result in the same relation
as it was decomposed.
• The relation is said to be lossless decomposition if natural joins of all the decomposition give
the original relation.
Dependency Preserving
• It is an important constraint of the database.
• In the dependency preservation, at least one decomposed table must satisfy every
dependency.
• If a relation R is decomposed into relation R1 and R2, then the dependencies of R either
must be a part of R1 or R2 or must be derivable from the combination of functional
dependencies of R1 and R2.
• For example, suppose there is a relation R (A, B, C, D) with functional dependency set (A-
>BC). The relational R is decomposed into R1(ABC) and R2(AD) which is dependency
preserving because FD A->BC is a part of relation R1(ABC).
Advantages of Functional Dependency
• Functional Dependency avoids data redundancy. Therefore same data do not repeat at
multiple locations in that database
• It helps you to maintain the quality of data in the database
• It helps you to defined meanings and constraints of databases
• It helps you to identify bad designs
• It helps you to find the facts regarding the database design
Normalization
Page | 80
The main reason for normalizing the relations is removing these anomalies. Failure to eliminate
anomalies leads to data redundancy and can cause data integrity and other problems as the
database grows. Normalization consists of a series of guidelines that helps to guide you in
creating a good database structure.
• Insertion Anomaly: Insertion Anomaly refers to when one cannot insert a new tuple
into a relationship due to lack of data.
• Deletion Anomaly: The delete anomaly refers to the situation where the deletion of
data results in the unintended loss of some other important data.
• Updatation Anomaly: The update anomaly is when an update of a single data value
requires multiple rows of data to be updated.
Advantages of Normalization
Page | 81
Second Normal Form (2NF)
The normalization of 1NF relations to 2NF involves the elimination of partial dependencies. A
partial dependency exists when any non-prime attributes, i.e., an attribute not a part of the
candidate key, is not fully functionally dependent on one of the candidate keys.
For a relational table to be in second normal form, it must satisfy the following rules:
The table must be in first normal form.
It must not contain any partial dependency, i.e., all non-prime attributes are fully functionally
dependent on the primary key.
If a partial dependency exists, we can divide the table to remove the partially dependent attributes
and move them to some other table where they fit in well.
Third Normal form (3NF)
A table design is said to be in 3NF if both the following conditions hold:
• Table must be in 2NF
• Transitive functional dependency of non-prime attribute on any super key should be
removed.
An attribute that is not part of any candidate key is known as non-prime attribute.
In other words 3NF can be explained like this: A table is in 3NF if it is in 2NF and for each
functional dependency X-> Y at least one of the following conditions hold:
• X is a super key of table
• Y is a prime attribute of table
An attribute that is a part of one of the candidate keys is known as prime attribute.
Boyce–Codd Normal Form (BCNF)
Boyce-Codd Normal Form is an advanced version of 3NF as it contains additional constraints
compared to 3NF.
For a relational table to be in Boyce-Codd normal form, it must satisfy the following rules:
• The table must be in the third normal form.
• For every non-trivial functional dependency X -> Y, X is the superkey of the table. That
means X cannot be a non-prime attribute if Y is a prime attribute.
A superkey is a set of one or more attributes that can uniquely identify a row in a database table.
Page | 82
Fourth normal form (4NF)
A relation will be in 4NF if it is in Boyce Codd normal form and has no multi-valued dependency.
For a dependency A → B, if for a single value of A, multiple values of B exists, then the relation
will be a multi-valued dependency.
Multi-valued Dependencies (MVD) and Fourth Normal Form (4NF)
To deal with the problem of BCNF, R. Fagin introduced the idea of multi-valued dependency
(MVD) and the fourth normal form (4NF). A multi-valued dependency (MVD) is a functional
dependency where the dependency may be to a set and not just a single value. It is defined as X
→→ Y in relation R (X, Y, Z), if each X value is associated with a set of Y values in a way that
does not depend on the Z values. Here X and Y are both subsets of R. The notation X →→ Y is
used to indicate that a set of attributes of Y shows a multi-valued dependency (MVD) on a set of
attributes of X.
Thus, informally, MVDs occur when two or more independent multi-valued facts about the same
attribute occur ...
Fifth normal form (5NF)
• A relation is in 5NF if it is in 4NF and not contains any join dependency and joining should
be lossless.
• 5NF is satisfied when all the tables are broken into as many tables as possible in order to
avoid redundancy.
• 5NF is also known as Project-join normal form (PJ/NF).
Join Dependencies and Fifth Normal Form (5NF)
The anomalies of MVDs and are eliminated by join dependency (JD) and 5NF.
Join Dependencies (JD)
A join dependency (JD) can be said to exist if the join of R1 and R2 over C is equal to relation R.
Where, R1 and R2 are the decompositions R1(A, B, C), and R2 (C,D) of a given relations R (A,
B, C, D). Alternatively, R1 and R2 is a lossless decomposition of R. In other words, *(A, B, C, D),
(C, D) will be a join dependency of R if the join of the join’s attributes is equal to relation R. Here,
*(R1, R2, R3, ....) indicates that relations R1, R2, R3 and so on are a join dependency (JD) of R.
Therefore, a necessary condition for a relation R to satisfy a JD *(R1, R2,...., Rn) is that R
Page | 83
Page | 84