0% found this document useful (0 votes)
3 views

CB3401 Database Management and Security Unit-2

Functional dependency in DBMS describes the relationship between attributes in a database table, where certain attributes (dependent) are uniquely determined by others (determinant). It plays a crucial role in reducing data redundancy, ensuring consistency, and maintaining integrity. The document also discusses types of functional dependencies, normalization, and the advantages and disadvantages of these concepts in database management.

Uploaded by

firoz52797
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

CB3401 Database Management and Security Unit-2

Functional dependency in DBMS describes the relationship between attributes in a database table, where certain attributes (dependent) are uniquely determined by others (determinant). It plays a crucial role in reducing data redundancy, ensuring consistency, and maintaining integrity. The document also discusses types of functional dependencies, normalization, and the advantages and disadvantages of these concepts in database management.

Uploaded by

firoz52797
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

CB3401 Database Management System and Security

Unit-2 Database Design

FUNCTIONAL DEPENDENCY

What is Functional Dependency in DBMS?


Functional dependency in DBMS (Database management systems), refers to a relationship
between attributes or sets of attributes within a database table.
It indicates that the values of certain attributes (known as dependent attributes) are determined
uniquely by the values of other attributes (known as determinant attributes).
In other words, knowing the values of the determinant attributes allows us to infer the values of
the dependent attributes.
For example, let’s consider a database table "Employee" with the following attributes:
Employee (Employee_ID, Employee_Name, Address, Salary)
In this case, the Employee_ID attribute uniquely identifies each employee. Therefore, we can say
that Employee_Name, Address, and Salary are functionally dependent on Employee_ID. We can denote
this functional dependency as follows:
Employee_ID → Employee_Name, Address, Salary
Take another example, suppose we have a database table "Order" with attributes like Order_ID,
Product_Name, Quantity, and Price. In this case, the product name is dependent on the order ID, which
means that if we know the Order_ID, we can determine the corresponding Product_Name. We can denote
this functional dependency as:
Order_ID → Product_Name
This ensures that the data in the database is organized and easily accessible. By identifying such functional
dependencies, we can reduce data redundancy, ensure data consistency, and maintain data integrity in the
database.

How to Denote a Functional Dependency in DBMS?


A functional dependency can be denoted using the following notation:
A→B
Here, A is the determinant attribute, and B is the dependent attribute. It means that the value of
attribute B is uniquely determined by the value of attribute A.
Let’s consider an example to understand this notation better. Suppose we have a database table
"Student" with the following attributes:
Student (Roll_No, Name, Age, Address)
If we know the value of Roll_No, we can determine the corresponding values of Name, Age, and
Address. Therefore, we can say that Name, Age, and Address are functionally dependent on Roll_No. We
can denote this functional dependency as:
Roll_No → Name, Age, Address
Here, Roll_No is the determinant attribute, and Name, Age, and Address are dependent attributes.
Functional dependency can also be depicted diagrammatically, as shown below.

The dependent attributes are determined by pointing side of arrow and the determinant attribute is
determined by the origin of the arrow.
Types of Functional Dependency in DBMS:
There are 4 types of functional dependency in DBMS are as follows:
1. Trivial Functional Dependency in DBMS
2. Non-Trivial Functional Dependency in DBMS
3. Multivalued Functional Dependency in DBMS
4. Transitive Functional Dependency in DBMS

1. Trivial Functional Dependency in DBMS


Trivial functional dependency is a special case of a functional dependency in DBMS, where the
dependent attribute is a subset of the determinant attribute. In other words, a functional dependency is said
to be trivial if the attributes on its right side are a subset of the attributes on its left side.
Consider the following example to better understand this:
Suppose we have a database table "Employee" with the following attributes:
Employee (Employee_ID, Employee_Name, Age, Department)
Here, {Employee_Id, Employee_Name} → {Employee_Name} is a Trivial functional dependency because
the dependent Employee_Name is the subset of determinant {Employee_Id, Employee_Name}.
{ Employee_Id } → { Employee_Id }, { Name } → { Name } and { Age } → { Age } are also
Trivial functional dependency.

2. Non-Trivial Functional Dependency in DBMS


A non-trivial functional dependency is a functional dependency in DBMS where the dependent
attribute is not a subset of the determinant attribute. In other words, X → Y is called a Non-trivial
functional dependency if Y is not a subset of X. So, a functional dependency X → Y where X is a set of
attributes and Y is also a set of the attribute but not a subset of X, then it is called Non-trivial functional
dependency.
Consider the following example to better understand this:
Suppose we have a database table "Customer" with the following attributes:
Customer (Customer_ID, Customer_Name, Address, Phone_Number)
Here, {Customer_ID} → {Customer_Name} is a non-trivial functional dependency because
Customer_Name(dependent) is not a subset of Customer_ID(determinant). Similarly,
Customer_ID,Customer_Name} → {Phone_Number} is also a non-trivial functional dependency.

3. Multivalued Functional Dependency in DBMS


Multivalued functional dependency (MVD) is a type of functional dependency in DBMS, where a
single determinant attribute can determine multiple sets of independent attributes. Suppose we are given a
relation X → { Y, Z }, if there exists no functional dependency between Y and Z, then it is called
Multivalued functional dependency.
Consider the following example to better understand this:
Suppose we have a database table "Course" with the following attributes:
Course (Course_ID, Course_Name, Instructor_Name, Textbook_Name)
Here, Course_ID is the primary key. We can say that there is a multivalued functional dependency between
Course_Nameand Instructor_Name and Textbook_Name.
Here {Course_ID} → {Instructor_Name, Textbook_Name} is a Multivalued functional dependency, since
the dependent attributes Instructor_Name, Textbook_Name are not functionally dependent(i.e.
Instructor_Name → Textbook_Name or Textbook_Name → Instructor_Name).
We can denote the multivalued functional dependency as follow:
Course_Name ↠ Instructor_Name, Textbook_Name

4. Transitive Functional Dependency in DBMS


Transitive functional dependency is a type of functional dependency in DBMS where one non-key
attribute is functionally dependent on another non-key attribute through a chain of functional dependencies.
Consider the following example to better understand this:
Suppose we have a database table "Student" with the following attributes:
Student (Student_ID, Student_Name, Course_Name, Instructor_Name)
Here, Student_ID is the primary key. We can say that there is a transitive functional dependency between
Student_Name and Instructor_Name, as both of them are non-key attributes and the dependency between
them is through Course_Name, which is also a non-key attribute.
Here, {Student_Name → Course_Name} and {Course_Name → Instructor_Name} holds true. Hence,
according to the axiom of transitivity, { Student_Name → Instructor_Name} is a valid functional
dependency.

Properties of Functional Dependency in DBMS


Functional dependency in DBMS have several important properties that help to ensure data
consistency and maintain data integrity in the database. The key properties of functional dependency in
DBMS are:
1. Reflexivity: If A is a set of attributes and B is a subset of A, then the functional dependency A → B
holds true.
2. Augmentation: If a functional dependency A → B holds true, then we can add additional attributes
to the both sides without changing the existing functional dependency. For example, if A → B, then
we can add attribute C to both sides to get AC → BC.
3. Transitivity: If A → B and B → C, then we can infer that A → C also holds true by the rule of
transitivity. This property allows us to detect transitive functional dependencies.

Advantages of Functional Dependency in DBMS


Functional dependency in DBMS has several advantages, including:
1. Data consistency: Functional dependency ensures that data is consistent in DBMS. By identifying
and removing redundant data, we can prevent data inconsistencies that can result in incorrect
query results.
2. Data integrity: Functional dependency helps to maintain data integrity by ensuring that data is
stored correctly in the database. By enforcing rules that govern how data is stored and updated, we
can prevent data corruption and ensure that the data is accurate.
3. Database efficiency: By identifying and removing redundant data, functional dependency can
improve the efficiency of the database. With fewer data to process and search through, query times
can be reduced, and the database can perform more quickly.
4. Easier maintenance: By simplifying the database design, functional dependency makes it easier
to maintain the database over time. With a simpler design, it is easier to make changes, and less
time is spent on maintenance and troubleshooting.

Lossless Join And Dependency Preserving Decomposition:


Decomposition of a relation is done when a relation in a relational model is not in appropriate normal
form. Relation R is decomposed into two or more relations if decomposition is lossless join as well
as dependency preserving.

Lossless Join Decomposition


If we decompose a relation R into relations R1 and R2,
Decomposition is lossy if R1 ⋈ R2 ⊃ R
Decomposition is lossless if R1 ⋈ R2 = R

To check for lossless join decomposition using the FD set, the following conditions must hold:
1. The Union of Attributes of R1 and R2 must be equal to the attribute of R. Each attribute of R must
be either in R1 or in R2.
Att(R1) U Att(R2) = Att(R)
2. The intersection of Attributes of R1 and R2 must not be NULL.
Att(R1) ∩ Att(R2) ≠ Φ
3. The common attribute must be a key for at least one relation (R1 or R2)
Att(R1) ∩ Att(R2) -> Att(R1) or Att(R1) ∩ Att(R2) -> Att(R2)
For Example, A relation R (A, B, C, D) with FD set{A->BC} is decomposed into R1(ABC) and R2(AD)
which is a lossless join decomposition as:
1. First condition holds true as Att(R1) U Att(R2) = (ABC) U (AD) = (ABCD) = Att(R).
2. Second condition holds true as Att(R1) ∩ Att(R2) = (ABC) ∩ (AD) ≠ Φ
3. The third condition holds as Att(R1) ∩ Att(R2) = A is a key of R1(ABC) because A->BC is given.

Dependency Preserving Decomposition:


If we decompose a relation R into relations R1 and R2, All dependencies of R either must be a part of
R1 or R2 or must be derivable from a combination of functional dependency of R1 and R2. For Example,
A relation R (A, B, C, D) with FD set{A->BC} is decomposed into R1(ABC) and R2(AD) which is
dependency preserving because FD A->BC is a part of R1(ABC).

Advantages of Lossless Join and Dependency Preserving Decomposition


 Improved Data Integrity: Lossless join and dependency preserving decomposition help to maintain
the data integrity of the original relation by ensuring that all dependencies are preserved.
 Reduced Data Redundancy: These techniques help to reduce data redundancy by breaking down a
relation into smaller, more manageable relations.
 Improved Query Performance: By breaking down a relation into smaller, more focused relations,
query performance can be improved.
 Easier Maintenance and Updates: The smaller, more focused relations are easier to maintain and
update than the original relation, making it easier to modify the database schema and update the data.
 Better Flexibility: Lossless join and dependency preserving decomposition can improve the flexibility
of the database system by allowing for easier modification of the schema.

Disadvantages of Lossless Join and Dependency Preserving Decomposition


 Increased Complexity: Lossless join and dependency-preserving decomposition can increase the
complexity of the database system, making it harder to understand and manage.
 Costly: Decomposing relations can be costly, especially if the database is large and complex. This can
require additional resources, such as hardware and personnel.
 Reduced Performance: Although query performance can be improved in some cases, in others,
lossless join and dependency-preserving decomposition can result in reduced query performance due to
the need for additional join operations.
 Limited Scalability: These techniques may not scale well in larger databases, as the number of
smaller, focused relations can become unwieldy.

Question
1. Consider a schema R(A, B, C, D) and functional dependencies A->B and C->D. Then the
decomposition of R into R1(AB) and R2(CD)
(A) dependency preserving and lossless join
(B) lossless join but not dependency preserving
(C) dependency preserving but not lossless join
(D) not dependency preserving and not lossless join
Answer:
For lossless join decomposition, these three conditions must hold:
Att(R1) U Att(R2) = ABCD = Att(R)
Att(R1) ∩ Att(R2) = Φ, which violates the
condition of lossless join decomposition. Hence the decomposition is not lossless.
For dependency preserving decomposition, A->B can be ensured in R1(AB) and C->D can be ensured in
R2(CD). Hence it is dependency preserving decomposition. So, the correct option is C.
NORMALIZATION
A large database defined as a single relation may result in data duplication. This repetition of data may
result in:
o Making relations very large.
o It isn't easy to maintain and update data as it would involve searching many records in relation.
o Wastage and poor utilization of disk space and resources.
o The likelihood of errors and inconsistencies increases.
So to handle these problems, we should analyze and decompose the relations with redundant data into
smaller, simpler, and well-structured relations that are satisfy desirable properties. Normalization is a
process of decomposing the relations into relations with fewer attributes.

What is Normalization?
o Normalization is the process of organizing the data in the database.
o Normalization is used to minimize the redundancy from a relation or set of relations. It is also used to
eliminate undesirable characteristics like Insertion, Update, and Deletion Anomalies.
o Normalization divides the larger table into smaller and links them using relationships.
o The normal form is used to reduce redundancy from the database table.

Why do we need Normalization?


The main reason for normalizing the relations is removing these anomalies. Failure to eliminate
anomalies leads to data redundancy and can cause data integrity and other problems as the database grows.
Normalization consists of a series of guidelines that helps to guide you in creating a good database
structure.

Data modification anomalies can be categorized into three types:


o Insertion Anomaly: Insertion Anomaly refers to when one cannot insert a new tuple into a
relationship due to lack of data.
o Deletion Anomaly: The delete anomaly refers to the situation where the deletion of data results in the
unintended loss of some other important data.
o Updatation Anomaly: The update anomaly is when an update of a single data value requires multiple
rows of data to be updated.

Types of Normal Forms:


Normalization works through a series of stages called Normal forms. The normal forms apply to
individual relations. The relation is said to be in particular normal form if it satisfies constraints.

Following are the various types of Normal forms:


Normal Form Description

1NF A relation is in 1NF if it contains an atomic value.

2NF A relation will be in 2NF if it is in 1NF and all non-key attributes


are fully functional dependent on the primary key.

3NF A relation will be in 3NF if it is in 2NF and no transition


dependency exists.

BCNF A stronger definition of 3NF is known as Boyce Codd's normal


form.

4NF A relation will be in 4NF if it is in Boyce Codd's normal form and


has no multi-valued dependency.

5NF A relation is in 5NF. If it is in 4NF and does not contain any join
dependency, joining should be lossless.

Advantages of Normalization:
o Normalization helps to minimize data redundancy.
o Greater overall database organization.
o Data consistency within the database.
o Much more flexible database design.
o Enforces the concept of relational integrity.

Disadvantages of Normalization:
o You cannot start building the database before knowing what the user needs.
o The performance degrades when normalizing the relations to higher normal forms, i.e., 4NF, 5NF.
o It is very time-consuming and difficult to normalize relations of a higher degree.
o Careless decomposition may lead to a bad database design, leading to serious problems.

FIRST NORMAL FORM:


First Normal Form with Example
If a relation contains a composite or multi-valued attribute, it violates the first normal form, or the
relation is in the first normal form if it does not contain any composite or multi-valued attribute. A
relation is in first normal form if every attribute in that relation is single-valued attribute.

A table is in 1NF if:


 There are only Single Valued Attributes.
 Attribute Domain does not change.
 There is a unique name for every Attribute/Column.
 The order in which data is stored does not matter.
Consider the examples given below.
Example 1:
Relation STUDENT in table 1 is not in 1NF because of multi-valued attribute STUD_PHONE. Its
decomposition into 1NF has been shown in table 2.
Example

Example 2:
ID Name Courses
1 A C1,C2
2 E C3
3 M C2,C3
In the above table, Course is a multi-valued attribute so it is not in 1NF.
Below Table is in 1NF as there is no multi-valued attribute:
ID Name Course
1 A C1
1 A C2
2 E C3
3 M C2
4 M C3
Note: A database design is considered as bad if it is not even in the First Normal Form (1NF).

FIRST NORMALIZATION FORM


What is FIRST NORMALIZATION FORM?
A database is said to be in 1NF if it falls under these conditions:
 It only consists of atomic values
 No repeating groups are present
Here, an atomic value refers to that value that can’t be divided in a relational database.

Rules Followed in 1st Normal Form in DBMS


There are a few rules that the first normal form must follow in DBMS. These are:
#1. The Attributes must be Single Valued
Every column in your table must be single-valued. It means that no columns should have multiple
values in a single cell. In case we don’t have single values in a cell, we won’t be able to call it 1NF.
For instance, if we take a look at a table that consists of data regarding a single novel and its writers, and it
has the following columns: [Book ID], [Writer 1], [Writer 2], and [Writer 3]. In this case, [Writer 1],
[Writer 2], and [Writer 3] repeat the same attribute. They do not refer to different Book 1Ds. Thus, this
table would not be in 1NF.
#2. The Domain of attributes must not change
Every value stored in every table column must be of the same type/ kind. Random values should not
make up the table.
For instance, if a table consists of a column named DOB that saves the date of birth of various people, we
cannot use this column to save the names of these people. We need a separate column for that. Every
column must hold separate sets of attributes in a DBMS table.
#3. Every Column/ Attribute must have a Unique Name
A 1NF table expects that every column present in a table consists of a unique name of its own. This
way, it becomes feasible to avoid any confusion while the system is retrieving, editing, or adding data, or
performing any other operations on the table. In case multiple columns have a similar name, then the
system will be confused in the end.
#4. The order of Data does not matter
The order in which we store the data in a table does not matter in 1NF. It is a simple way of storing
info- no shenanigans involved.

Example #1
Let us take a look at an example in the form of a table. Here, we can divide the values available in the first
row of the [Hues] column into pink and black. Thus, the [TABLE_ITEMS] is not present in 1NF.

TABLE_ITEMS:
Item No. Hues Cost
1 pink, black 15.99
2 red 23.99
3 black 17.50
4 red, grey 9.99
5 brown 29.99

The table here isn’t in the first normal form, since the column [Hues] can consist of multiple values in it.
For instance, the first row consists of pink and black, and the fourth row consists of red and grey.
How do we bring this table, which is in an unnormalized form, into the normalized form? We will split this
table into two separate ones. As a result, we will have to generate the following tables:

TABLE_ITEMS_HUES:
Item No. Hues
1 pink
1 black
2 red
3 black
4 red
4 grey
5 brown

TABLE_ITEMS_COSTS:
Item No. Cost
1 15.99
2 23.99
3 17.50
4 9.99
5 29.99
Here, the first normal form is finally satisfied with both of these tables. It is because all the columns
of each of these hold just single values, and that’s what we want from 1NF.
Remember that a repeating group refers to a table that consists of two or more than two columns that are
related to each other closely.

Example #2
Look at this sample data in a table:

Serial_No. Titles Courses


11 Xkon CN, OS
12 Ykon Java
13 Zkon C++, C

Here, you can see there are multiple values in similar columns. We can resolve it as follows:

Serial_No. Titles Courses


11 Xkon CN
11 Xkon OS
12 Ykon Java
13 Zkon C++
13 Zkon C

This way, although a few values are getting repeated, we can still see that there is just one value in every
column.

Practice Problems On First Normal Form in DBMS


1. Decompose the following table into 1NF:

STUDENT_TABLE

Student_ID Student_Name Student_Hobbies Student_State


IX1 Reema Dancing Maharashtra
Painting
IX2 Rekha Cooking Rajasthan
Calligraphy
IX3 Jaya Skating Kerala
IX4 Sushma Painting Chhattisgarh
Skating
IX5 Tithi Poetry Haryana
Fencing
Answer:
STUDENT_TABLE

Student_ID Student_Name Student_Hobbies Student_State


IX1 Reema Dancing Maharashtra
IX1 Reema Painting Maharashtra
IX2 Rekha Cooking Rajasthan
IX2 Rekha Calligraphy Rajasthan
IX3 Jaya Skating Kerala
IX4 Sushma Painting Chhattisgarh
IX4 Sushma Skating Chhattisgarh
IX5 Tithi Poetry Haryana
IX5 Tithi Fencing Haryana

A relation is said to be in the 2nd Normal Form in DBMS (or 2NF) when it is in the First Normal
Form but has no non-prime attribute functionally dependent on any candidate key’s proper subset in a
relation. A relation’s non-prime attribute refers to that attribute that isn’t a part of a relation’s candidate
key.

SECOND NORMAL FORM

What is the Second Normal Form in DBMS?


In simpler words, a relation is said to be in 2NF when it exists in 1NF, while the relation’s every non-
prime attribute depends on every candidate key as a whole.
However, you must note that it puts no restriction on the dependency of non-primes on their non-prime
attributes. The Third Normal Form addresses it instead.

Uses of Second Normal Form in DBMS:


The concept of 2nd Normal Form in DBMS depends on full functional dependency. We apply 2NF
on the relations that have composite keys or the relations that have a primary key consisting of two
attributes or more. Thus, the relations having a primary key of a single attribute automatically get to their
2NF. Any relation that doesn’t exist in the 2NF may eventually suffer from further update anomalies.

Rules Followed in 2nd Normal Form in DBMS:


For a relation to be in the 2NF,
 It must be in 1NF;
 It should not consist of partial dependency.
In simpler words,
If a relation is in 1NF and all the attributes of the non-primary keys are fully dependent on primary
keys, then this relation is known to be in the 2NF or the Second Normal Form.

Note: When a candidate key’s subset determines the non-prime attributes, then we can call it a partial
dependency.

How to Normalise 1NF to 2NF?


We remove the partial dependencies to normalize the given 1NF relations to the 2NF relations. In
case there is a partial dependency, we will remove that attribute from the relation that is partially
dependent. We basically do so by placing it in a new relation with a copy of its determinant. Let us take a
look at a few examples to understand how.
Example #1
Look at the table given below:
CAND_ID SUBJECT_NO SUBJECT_FEE
111 S1 1000
222 S2 1500
111 S4 2000
444 S3 1000
444 S1 1000
222 S5 2000
In this table, you can note that many subjects come with the same subject fee. Three things are
happening here:
1. The SUBJECT_FEE won’t be able to determine the values of CAND_NO or SUBJECT_NO alone;
2. The SUBJECT_FEE along with CAND_NO won’t be able to determine the values of
SUBJECT_NO;
3. The SUBJECT_FEE along with SUBJECT_NO won’t be able to determine the values of
CAND_NO;
Thus,
We can conclude that the attribute SUBJECT_FEE is a non-prime one since it doesn’t belong to the
candidate key here {SUBJECT_NO, CAND_ID} ;
But, on the other hand, the SUBJECT_NO – > SUBJECT_FEE, meaning the SUBJECT_FEE
depends directly on the SUBJECT_NO, and it forms the candidate key’s proper subset.
Here, the SUBJECT_FEE is a non-prime attribute, and it depends directly on the candidate key’s
proper subset. Thus, it forms a partial dependency.
Conclusion: The relation mentioned here does not exist in 2NF.
Let us now convert it into 2NF. To do this, we will split this very table into two, where:
Table 1: CAND_NO, SUBJECT_NO and Table 2: SUBJECT_NO, SUBJECT_FEE
Table 1
CAND_NO SUBJECT_NO
111 S1
222 S2
111 S4
444 S3
444 S1
222 S5
Table 2
SUBJECT_NO SUBJECT_FEE
S1 1000
S2 1500
S3 1000
S4 2000
S5 2000
Now, the tables are in their Second Normal Form.
Note: The Second Normal Form tries to reduce any redundant data from getting stored in the system’s
memory. For instance, if we take an example of about 100 candidates taking the S1 subject, then we don’t
have to store their fees as 1000 as a record for all the 100 candidates. Rather, we can store them all at once
in the second table as the subject fee for S1 is 1000.
Let us take a look at another example here.
Example #2
Take a look at these functional dependencies in the relation R (M, N, O, P)
Here,
MN -> O [M and N determine O together]
NO -> P [N and O determine P together]
In the relation mentioned above, MN serves as the only candidate key. Also, no partial dependency exists
here. It means that the proper subsets of MN do not determine non-prime attributes.

Practice Problems on Second Normal Form in DBMS:


1. Decompose the following table into 2NF:

TUTOR table
TUTOR_ID COURSE TUTOR_AGE
2115 Java 30
2115 C 30
4997 Python 35
8663 C++ 38
8663 Go 38

Answer:
TUTOR_DETAIL table:
TUTOR_ID TUTOR_AGE
2115 30
4997 35
8663 38

TUTOR_COURSE table:
TUTOR_ID COURSE
2115 Java
2115 C
4997 Python
8663 C++
8663 Go

2. Decompose the following table into 2NF:


<Candidate_Courses>
Candidate_ID Course_ID Candidate_Name Course_Name
C829 A09 Beverly CSS
C736 A07 Sheldon PHP
C546 A03 Leonard HTML
C952 A05 Zach Ruby
Answer:
<Candidate_Info>
Candidate_ID Course_ID Candidate_Name
C829 A09 Beverly
C736 A07 Sheldon
C546 A03 Leonard
C952 A05 Zach

<Course_Info>
Course_ID Course_Name
A09 CSS
A07 PHP
A03 HTML
A05 Ruby

THIRD NORMAL FORM

What is the Third Normal Form in DBMS?


A given relation is said to be in its third normal form when it’s in 2NF but has no transitive partial
dependency. Meaning, when no transitive dependency exists for the attributes that are non-prime, then the
relation can be said to be in 3NF.
In simpler words,
In a relation that is in 1NF or 2NF, when none of the non-primary key attributes transitively depend on
their primary keys, then we can say that the relation is in the third normal form of 3NF.

Rules Followed in 3rd Normal Form in DBMS:


We can say that a relation is in the third normal form when it holds any of these given conditions in
case of a functional dependency P -> Q that is non-trivial:
1. P acts as a super key.
2. Q acts as a non-prime attribute. Meaning, every element of Q forms a part of a candidate key.

Uses of Third Normal Form in DBMS


We use the 3NF to reduce any duplication of data and achieve data integrity in a database. The third
normal form is fit for the designing of normal relational databases. It is because a majority of the 3NF
tables are free from the anomalies of deletion, updates, and insertion. Added to this, a 3NF would always
ensure losslessness and preservation of the functional dependencies.

How to Normalize 1NF and 2NF to 3NF?


To normalize a 2NF to 3NF, we have to determine if we have a transitive dependency in the table. In
case a transitive dependency exists, then we remove those attributes that are transitively dependent from
the relations. We do this by placing these attributes in a separate, new relation. We also place the
determinant’s copy along with it.
Note: If P -> Q and Q -> R are two functional dependencies, then P -> R is known as a transitive
dependency. When normalizing a 2NF relation to 3NF, we remove these transitive dependencies.
Example #1
Look at the table given below for the relation CANDIDATE:
CAND_NO CAND_NAME CAND_STATE CAND_COUNTRY CAND_AGE
1 TINA MAHARASHTRA INDIA 18
2 ANJALI RAJASTHAN INDIA 17
3 RAHUL RAJASTHAN INDIA 19

In the relation CANDIDATE given above:


Functional dependency Set:
{CAND_NO -> CAND_NAME, CAND_NO ->CAND_STATE, CAND_STATE -> CAND_CUNTRY,
CAND_NO -> CAND_AGE}
So, Candidate key here would be: {CAND_NO}
For the relation given here in the table, CAND_NO -> CAND_STATE and CAND_STATE ->
CAND_COUNTRY are actually true.
Thus, CAND_COUNTRY depends transitively on CAND_NO.
This transitive relation violates the rules of being in the 3NF. So, if we want to convert it into the
third normal form, then we have to decompose the relation CANDIDATE (CAND_NO, CAND_NAME,
CAND_STATE, CAND_COUNTRY, CAND_AGE) as:
CANDIDATE (CAND_NO, CAND_NAME, CAND_STATE, CAND_STATE, CAND_AGE)
STATE_COUNTRY (STATE, COUNTRY)

Example #2
Take a look at these functional dependencies in the relation A (P, Q, R, S, T)
Here,
P -> QR,
RS -> T,
Q -> S,
T -> P
In the relation given above, all the possible candidate keys would be {P, T, RS, QR}. In this case, the
attributes that exist on the right sides of all the functional dependencies are prime.

Practice Problems on Third Normal Form in DBMS


1. Decompose the following table into 3NF:

CANDIDATE_DETAIL Table:
CAND_ID CAND_NAME CAND_ZIP CAND_CITY CAND_STATE
262 Jake 201010 Noida UP
353 Rosa 02228 Boston US
434 Charles 60007 Chicago US
545 Gina 0 6389 Norwich UK
626 Terry 462007 Bhopal MP

Answer: The super key in the table mentioned above would be:
{CAND_ID}, {CAND_ID, CAND_NAME}, {CAND_ID, CAND_NAME, CAND_ZIP} …. and so on
The candidate key here is: {CAND_ID}
Non-prime attributes: All the attributes in the table mentioned above are non-prime instead of CAND_ID.
Notice that CAND_CITY & CAND_STATE are dependent on CAND_ZIP, and CAND_ZIP is dependent
on the CAND_ID. Here, all the non-prime attributes (CAND_CITY, CAND_STATE) are dependent
transitively on the super key (CAND_ID). The transitive dependency here would violate the rules of the
third normal form.
Thus, we must move the CAND_CITY and the CAND_STATE to the new table of <CANDIDATE_ZIP>,
and the primary key here is CAND_ZIP.
Thus,
CANDIDATE Table:
CAND_ID CAND_NAME CAND_ZIP
262 Jake 201010
353 Rosa 02228
434 Charles 60007
545 Gina 06389
626 Terry 462007

CANDIDATE_ZIP Table:
CAND_ZIP CAND_CITY CAND_STATE
02228 Noida UP
201010 Boston US
60007 Chicago US
06389 Norwich UK
462007 Bhopal MP

2. Decompose the following table into 3NF:


TABLE_BOOK_DETAIL
Book ID Genre ID Genre Type Price
111 564 Sports 23.99
222 842 Travel 18.99
333 564 Sports 13.99
444 179 Fashion 15.99
555 842 Travel 27.99

Answer:
TABLE_BOOK
Book ID Genre ID Price
111 564 23.99
222 842 18.99
333 564 13.99
444 179 15.99
555 842 27.99

TABLE_GENRE
Book ID Genre Type
111 Sports
222 Travel
333 Fashion
BOYCE-CODD NORMAL FORM (BCNF):
Boyce-Codd Normal Form or BCNF is an extension to the third normal form, and is also known as
3.5 Normal Form.

Rules for BCNF:


For a table to satisfy the Boyce-Codd Normal Form, it should satisfy the following two conditions:
1. It should be in the Third Normal Form.
2. And, for any dependency A → B, A should be a super key.
it means, that for a dependency A → B, A cannot be a non-prime attribute, if B is a prime attribute.

Time for an Example:


Below we have a college enrolment table with columns student_id, subject and professor.

student_id subject professor


101 Java P.Java
101 C++ P.Cpp
102 Java P.Java2
103 C# P.Chash
104 Java P.Java

As you can see, we have also added some sample data to the table.
In the table above:
 One student can enrol for multiple subjects. For example, student with student_id 101, has opted for
subjects - Java & C++
 For each subject, a professor is assigned to the student.
 And, there can be multiple professors teaching one subject like we have for Java.

What do you think should be the Primary Key?


Well, in the table above student_id, subject together form the primary key, because
using student_id and subject, we can find all the columns of the table.
One more important point to note here is, one professor teaches only one subject, but one subject may
have two different professors.
Hence, there is a dependency between subject and professor here, where subject depends on the
professor name.

Note:
 This table satisfies the 1st Normal form because all the values are atomic, column names are unique
and all the values stored in a particular column are of same domain.
 This table also satisfies the 2nd Normal Form as their is no Partial Dependency.
 And, there is no Transitive Dependency, hence the table also satisfies the 3rd Normal Form.
 But this table is not in Boyce-Codd Normal Form.

Why this table is not in BCNF?


In the table above, student_id, subject form primary key, which means subject column is a prime
attribute.
But, there is one more dependency, professor → subject.
And while subject is a prime attribute, professor is a non-prime attribute, which is not allowed by BCNF.

How to satisfy BCNF?


To make this relation(table) satisfy BCNF, we will decompose this table into two
tables, student table and professor table.
Below we have the structure for both the tables.
Student Table Professor Table
student_id p_id p_id professor subject
101 1 1 P.Java Java
101 2 2 P.Cpp C++
and so on... and so on...

And now, this relation satisfy Boyce-Codd Normal Form.


In the picture below, we have tried to explain BCNF in terms of relations.

A more Generic Explanation:


MULTIVALUED DEPENDENCY AND FOURTH NORMAL FORM (4NF)

Multivalued Dependency:
Multivalued dependency occurs when two attributes in a table are independent of each other but,
both depend on a third attribute. A multivalued dependency consists of at least two attributes that are
dependent on a third attribute that's why it always requires at least three attributes.

Example: Suppose there is a bike manufacturer company which produces two colors(white and black) of
each model every year.
BIKE_MODEL MANUF_YEAR COLOR
M2011 2008 White
M2001 2008 Black
M3001 2013 White
M3001 2013 Black
M4006 2017 White
M4006 2017 Black

Here columns COLOR and MANUF_YEAR are dependent on BIKE_MODEL and independent of each
other.

In this case, these two columns can be called as multivalued dependent on BIKE_MODEL. The
representation of these dependencies is shown below:
1. BIKE_MODEL → → MANUF_YEAR
2. BIKE_MODEL → → COLOR
This can be read as "BIKE_MODEL multi determined MANUF_YEAR" and "BIKE_MODELmulti determined
COLOR"
FOURTH NORMAL FORM (4NF)
What is Fourth Normal Form (4NF)?
A relation will be in 4NF if it is in Boyce Codd normal form and has no multi-valued
dependency.

What is Multi-valued Dependency?


A table is said to have multi-valued dependency, if the following conditions are true,
1. For a dependency A → B, if for a single value of A, multiple value of B exists, then the table may have
multi-valued dependency.
2. Also, a table should have at-least 3 columns for it to have a multi-valued dependency.
3. And, for a relation R(A,B,C), if there is a multi-valued dependency between, A and B, then B and C
should be independent of each other.
If all these conditions are true for any relation(table), it is said to have multi-valueddependency

Example :
STUDENT:
STU_ID COURSE HOBBY
21 Computer Dancing
21 Math Singing
34 Chemistry Dancing
74 Biology Cricket
59 Physics Hockey

In the STUDENT relation, a student with STU_ID, 21 contains two courses, Computer and Math
and two hobbies, Dancing and Singing. So there is a Multi-valued dependency on STU_ID, which leads to
unnecessary repetition of data.
So to make the above table into 4NF, we can decompose it into two tables:

STUDENT_COURSE:
STU_ID COURSE
21 Computer
21 Math
34 Chemistry
74 Biology
59 Physics

So to make the above table into 4NF, we can decompose it into two tables:

STUDENT_COURSE:
STU_ID HOBBY
1 Dancing
21 Singing
34 Dancing
74 Cricket
59 Hockey
EXAMPLE 2:
Below we have a college enrolment table with columns s_id, course and hobby.
s_id course hobby
1 Science Cricket
1 Maths Hockey
2 C# Cricket
2 Php Hockey

As you can see in the table above, student with s_id 1 has opted for two courses, Science and Maths,
and has two hobbies, Cricket and Hockey.

You must be thinking what problem this can lead to, right?
Well the two records for student with s_id 1, will give rise to two more records, as shown below,
because for one student, two hobbies exists, hence along with both the courses, these hobbies should be
specified.
s_id Course Hobby
1 Science Cricket
1 Maths Hockey
1 Science Hockey
1 Maths Cricket

And, in the table above, there is no relationship between the columns course and hobby. They are
independent of each other.
So there is multi-value dependency, which leads to un-necessary repetition of data and other anomalies as
well.

How to satisfy 4th Normal Form?


To make the above relation satify the 4th normal form, we can decompose the table into 2 tables.

Course Opted Table:


s_id course
1 Science
1 Maths
2 C#
2 Php

And, Hobbies Table,


s_id Hobby
1 Cricket
1 Hockey
2 Cricket
2 Hockey

Now this relation satisfies the fourth normal form.


A table can also have functional dependency along with multi-valued dependency. In that case, the
functionally dependent columns are moved in a separate table and the multi-valued dependent columns are
moved to separate tables.
JOIN DEPENDENCY
The concept of Join Dependency is directly based on the concept of 5NF, or Fifth Normal Form.
Similar to functional or multivalued dependency, join dependency is a constraint. It is satisfied only if and
only if the relation concerned is the joining of a set of projections.

What are Join Dependencies in DBMS?


A Join Dependency on a relation schema R, specifies a constraint on states, r of R that every legal
state r of R should have a lossless join decomposition into R1, R2,..., Rn. In a database management
system, join dependency is a generalization of the idea of multivalued dependency.
Let R be a relation schema and R1, R2,..., Rn be the decomposition of R, R is said to satisfy the join
dependency (R1, R2,..., Rn), if and only if every legal instance r ( R ) is equal to join of its projections
on R1, R2,..., Rn.

Example of Join Dependency


Suppose we have the following table R:

E_Name Company Product


Rohan Comp1 Jeans
Harpreet Comp2 Jacket
Anant Comp3 TShirt

 We can break, or decompose the above table into three tables, this would mean that the table is not
in 5NF!
 The three decomposed tables would be:

1. R1: The table with columns E_Name and Company.

E_Name Company
Rohan Comp1
Harpreet Comp2
Anant Comp3

2. R2: The table with columns E_Name and Product.

E_Name Product
Rohan Jeans
Harpreet Jacket
Anant TShirt

3. R3: The table with columns Company and Product.

Company Product
Comp1 Jeans
Comp2 Jacket
Comp3 TShirt

Note: If the natural join of all three tables yields the relation table R, the relation will be said to have join
dependency.
Let's try to figure out whether or not R has join dependency.
Step 1- First, the natural join of R1 and R2:

E_Name Company Product


Rohan Comp1 Jeans
Harpreet Comp2 Jacket
Anant Comp3 TShirt

Step 2- Next, let's perform the natural join of the above table with R3:

E_Name Company Product


Rohan Comp1 Jeans
Harpreet Comp2 Jacket
Anant Comp3 TShirt

In the above example, we do get the same table R after performing the natural joins at both steps,
luckily.
Therefore, our join dependency comes out to be: {(E_Name, Company ), (E_Name, Product),
(Company, Product)}
Because the above-mentioned relations are joined dependent, they are not 5NF. That is, a join
relation of the three relations above is equal to our initial relation table R.

Join Dependencies and Fifth Normal Form (5NF)


 If a relation is in 4NF and does not contain any join dependencies, it is in 5NF.
 To avoid redundancy, 5NF is satisfied when all tables are divided into as many tables as possible.
Conclusion: if a relation has join dependency, it won't be in 5NF.

When is a Join Dependency trivial?


A Join Dependency is trivial, if one of the relation schemas Ri in a join dependency
(i.e. R1, R2,..., or Rn) is equal to the original relation R.
 The table is in Join Dependency if it can be reproduced by connecting numerous tables and each of
these tables has a subset of the table's attributes.
 The relation between 5NF and Join Dependency is that a relation is in 5NF if it is in 4NF and does not
have any join dependencies.
 If one of the relation schemas Ri in a join dependency (i.e. R1, R2,..., or Rn) is equal to the original
relation R, the join dependency is trivial.
5NF is one of the highest levels of normalization form present in database normalization.

Fifth Normal Form (5NF)


A Relation is said to be in 5NF if both conditions are satisfied.
Conditions:
1) Relation should be already in 4NF
2) It cannot be further non-loss decomposed (Join Dependency should not be present)

What is Fifth Normal Form?


 The Fifth Normal Form (5NF) is also known as the Project-Join Normal Form (PJNF).
 5NF gets satisfied when the table is broken down into as many parts as possible to avoid data
redundancy.
Let’s have a look at the above 2 conditions for 5NF.
Relation Should be Already in 4NF
It should satisfy all the conditions of 4NF i.e
1. It should be in BCNF.
2. No multi-valued dependency should exist.

Non-Loss Decomposition
 When the table does not contain any join dependency then it is called a lossless /non-loss
decomposition.
 In other words, we can say that
 A database is in 5NF when there is no join dependency present in the table / database.
 When we decompose the given table to remove redundancy in the data and then compose it again to
create the original table , we should not lose any data, and the original table should be obtained as a
loss should happen after the decomposition of the table.
 Join dependency for relation R can be stated as
R=(R1 ⨝ R2 ⨝ R3 ⨝ ………Rn)
where R1,R2,R3…..Rn are sub-relation of R and ⨝ is Natural Join Operator.

Example
 let’s, take of Table R which has 3 columns i.e. subject, class, and teacher where each subject can be
taught by many teachers in many classes, and a teacher can teach more than 1 subject.

Subject Class Teacher


Math Class 10 Kartik
Math Class 9 Yash
Math Class 10 Yash
Science Class 10 Yash

 Here the subject of math is taught by both teachers kartik and yash. Also yash can teach math and
science. Yash teaches math to both class 9 and class 10.
 As there is redundancy in data we will decompose it into two tables R1 and R2 such that R1 will
have attribute Subject and Class and R2 will have attribute class and teacher.

Table R1 Subject Class


Math Class 9
Math Class 10
Science Class 10

 Here we removed the redundancy in the table by removing the extra tuple with the same values i.e.
subject math taught in class 10. This tuple is repeated 2 times in the main table but in table R1 this
redundancy is removed.
Table R2
Class Teacher

class 10 kartik

class 9 yash

class 10 yash
 Here we removed the redundancy in the table by removing the extra tuple with the same values i.e.
yash is teaching for class 10. This tuple is repeated 2 times in the main table but in table R2 this
redundancy is removed.
After combining both tables R1 and R2 we will get as mentioned below:

Table (R1 ⨝ R2)


Subject Class Teacher

math class 9 yash

math class 10 kartik

math class 10 yash

science class 10 kartik

science class 10 yash

 Here if we notice the newly composed table from R1 and R2 and the original table, an extra tuple is
added that did not exist in the original data, This breaks the second rule of 5NF i.e. non-loss
decomposition.
 This type of unwanted tuple is known as Spurious tuple.
 Here we will decompose the given table in another relation R3 where it will have 2 columns i.e.
subject and teacher.
Table R3
Subject Teacher

math yash

math kartik

science yash

 Here the newly decomposed table R3 will have 3 tuples only as the repeated tuple (redundancy ) is
not added to the table. yash teaching the subject math is repeated 2 times in main table R but here it
will be added only one time resulting in removing the redundancy in the table.
 Now if we compose or rejoin the tables R1, R2, and R3 we will get

Table (R1 ⨝ R2⨝ R3)


Subject Class Teacher
Math Class 9 Yash
Math Class 10 Yash
Math Class 10 Kartik
Science Class 10 Yash
 Now if we see the re-composed table and the original table, there is no loss of data.
 Here all the tables, R1, R2 and R3 had a natural join which resulted in the table R. After the natural
join, the original table is retained as it is. There is no loss of the data.
 So it is a tables
 Given Table R1, R2 and R3 are in the Fifth Normal Form(5NF).

Uses of Fifth Normal Form(5NF)


 5NF ensures that there will be no redundancy present in the database. Removing the redundancy in
the database helps the data to remain more optimized and easy to perform database actions.
 It also ensures that there will be non-lossy decomposition only which will result in data consistency
and data integrity.
 As data redundancy and anomalies are removed, the database performance gets enhanced.

Limitation of Fifth Normal Form(5NF)


 One of the biggest limitations is of 5Nf is the complexity of the database. Due to 5Nf large number
of tables and relation gets created which eventually increases the complexity of the database.
 Slow exhibition due to large number of tables.
 The cost of implementation of 5NF is also high as it increases the complexity of the database.

Which dependency is related to 5NF?


Join dependency is checked in the 5NF. Relation should satisfy the condition of join dependency
then only it will be in the fifth normal form.

What is 5NF also called?


5NF is also known as Project Join Normal Form (PJNF).

Is 5NF is used?
As 5NF increases the complexity of the database, it is less frequently used in the industry.
Complexity is the reason for less use of 5NF.

MULTIVALUED DEPENDENCY:
Multivalued dependency (MVD) is having the presence of one or more rows in a table. It implies the
presence of one or more other rows in that same table. A multivalued dependency prevents fourth normal
form. A multivalued dependency involves at least three attributes of a table.
It is represented with a symbol "->->" in DBMS.
X->Y relates one value of X to one value of Y.
X->->Y (read as X multidetermines Y) relates one value of X to many values of Y.
A Nontrivial MVD occurs when X->->Y and X->->z where Y and Z are not dependent are
independent to each other. Non-trivial MVD produces redundancy.

FUNCTIONAL DEPENDENCY:
A Functional Dependency in DBMS is a fundamental concept that describes the relationship between
attributes (columns) in a table. It shows how the values in one or more attributes determine the value in
another. The functional dependency is a relationship that exists between two attributes. It typically exists
between the primary key and non-key attribute within a table.
X → Y
The left side of FD is known as a determinant, the right side of the production is known as a dependent.

Types of Functional dependency:


1. Trivial functional dependency:
A → B has trivial functional dependency if B is a subset of A.
The following dependencies are also trivial like: A → A, B → B

Example:
Consider a table with two columns Employee_Id and Employee_Name.
{Employee_id, Employee_Name} → Employee_Id is a trivial functional dependency as
Employee_Id is a subset of {Employee_Id, Employee_Name}.
Also, Employee_Id → Employee_Id and Employee_Name → Employee_Name are trivial dependencies to.

2. Non-trivial functional dependency:


A → B has a non-trivial functional dependency if B is not a subset of A.
When A intersection B is NULL, then A → B is called as complete non-trivial.

Example:
ID → Name,
Name → DOB

Differentiate lossless join decomposition and lossy join decomposition.

Lossless Lossy

1. The decompositions R1, R2, R2…Rn for a


1. The decompositions R1, R2, R2…Rn for a
relation schema R are said to be Lossy if there
relation schema R are said to be Lossless if there
natural join results into addition of extraneous
natural join results the original relation R.
tuples with the original relation R.

2. Formally, Let R be a relation and R1, R2, R3 … 2. Formally, Let R be a relation and R1, R2, R3
Rn be it’s decomposition, the decomposition is … Rn be its decomposition, the decomposition
lossless if – is lossy if –

R1 ⨝ R2 ⨝ R3 .... ⨝ Rn = R R ⊂ R1 ⨝ R2 ⨝ R3 .... ⨝ Rn

3. There is no loss of information as the relation 3. There is loss of information as extraneous


obtained after natural join of decompositions is tuples are added into the relation after natural
equivalent to original relation. Thus, it is also join of decompositions. Thus, it is also referred
referred to as non-additive join decomposition to as careless decomposition.

4. The common attribute of the sub relations is a 5. 4. The common attribute of the sub relation is
superkey of any one of the relation. not a superkey of any of the sub relation.
DEPENDENCY PRESERVATION
Dependency preservation in database management systems (DBMS) refers to the property of
ensuring that functional dependencies present in the original relation (table) are preserved when the relation
undergoes certain operations, such as decomposition or normalization.
Functional dependencies (FDs) describe the relationships between attributes within a relation. For
example, if in a relation R, the value of one attribute uniquely determines the value of another attribute, it is
represented as A → B, where A determines B.
When decomposing a relation into multiple smaller relations to achieve higher normal forms (like
2NF, 3NF, BCNF), it's important to ensure that the original functional dependencies are preserved. This
means that after decomposition, the functional dependencies that held in the original relation should still
hold in the decomposed relations.
Dependency preservation is important because it ensures that the information represented by the
original relation is not lost during the decomposition process. If dependency preservation is not maintained,
it can lead to anomalies and inconsistencies in the database.

There are techniques and algorithms to decompose relations while preserving dependencies, such as:
Lossless Decomposition: Decompose the relation into smaller relations in a way that allows us to
reconstruct the original relation using join operations without losing any information.
Dependency-Preserving Decomposition: Decompose the relation in such a way that all original
functional dependencies are preserved in the resulting relations.
Normalization: Normalize the relation into higher normal forms (such as 2NF, 3NF, BCNF) while
ensuring dependency preservation.
By ensuring dependency preservation, we maintain data integrity and consistency within the
database schema, which is crucial for the reliability and correctness of the data stored in the database.

Krithick Raj

You might also like