2023_IT_22IT405_U3-LM10
2023_IT_22IT405_U3-LM10
IV. No Redundancy
V. No modification anomaly
Relations should be designed such that their tuples will have as few NULL values
as possible.
Attributes that are NULL frequently added in separate relations (with the CPL
primary key).
Bad designs for a relational database may result in erroneous results for a certain
JOIN operations.
The “lossless join” property is used to guarantee meaningful results for join
operations.
The relations should be designed to satisfy the lossless join condition.
No spurious tuples should be generated by doing a natural-join of any relations.
Redundancy is storing the same data item in more than one place.
1. Extra storage space: storing the same data in many places takes large amount of
disk space.
2. Entering same data more than once during data insertion.
3. Deleting data from more than one place during deletion.
4. Modifying data in more than one place.
5. Anomalies may occur in the database if insertion, deletion, modification etc are not
done properly. It creates inconsistency and unreliability in the database.
Every schema must ensure the guarantee of the data when modification happens to
the data.
Types of anomaly:
1. Update Anomalies
2. Deletion Anomalies
3. Insert Anomalies
2
2. NORMALIZATION
DBMS Normalization is a systematic approach to decompose (break down)
tables to eliminate data redundancy(repetition) and undesirable characteristics like
Insertion anomaly in DBMS, Update anomaly in DBMS, and Delete anomaly in DBMS.
If a table is not properly normalized and has data redundancy(repetition) then it will
not only eat up extra memory space but will also make it difficult for you to handle and
update the data in the database, without losing data.
Insertion, Updation, and Deletion Anomalies are very frequent if the database is not
normalized.
To understand these anomalies let us take an example of a Student table.
In the table above, we have data for four Computer Science students.
3
As we can see, data for the fields branch, hod(Head of Department),
and office_tel are repeated for the students who are in the same branch in the college, this
is Data Redundancy.
Suppose for a new admission, until and unless a student opts for a branch, data of
the student cannot be inserted, or else we will have to set the branch information
as NULL.
Also, if we have to insert data for 100 students of the same branch, then the branch
information will be repeated for all those 100 students.
These scenarios are nothing but Insertion anomalies.
If you have to repeat the same data in every row of data, it's better to keep the data
separately and reference that data in each row.
So in the above table, we can keep the branch information separately, and just use
the branch_id in the student table, where branch_id can be used to get the branch
information.
What if Mr. X leaves the college? or Mr. X is no longer the HOD of the computer
science department? In that case, all the student records will have to be updated,
and if by mistake we miss any record, it will lead to data inconsistency.
This is an Updation anomaly because you need to update all the records in your
table just because one piece of information got changed.
In our Student table, two different pieces of information are kept together,
the Student information and the Branch information.
So if only a single student is enrolled in a branch, and that student leaves the
college, or for some reason, the entry for the student is deleted, we will lose the
branch information too.
So never in DBMS, we should keep two different entities together, which in the
above example is Student and branch,
1NF requires that each column in a table contains atomic values and that each row
is uniquely identified. This means that a table cannot have repeating groups or arrays as
columns, and each row must have a unique primary key.
Example
A table is in 1NF if each column contains atomic values and each row is uniquely
identified. For example, a table that lists customers and their phone numbers –
This violates 1NF because the Phone Numbers column contains repeating groups.
To normalize this table to 1NF, we can split the Phone Numbers column into
separate rows and add a separate primary key column –
3NF builds on 2NF by requiring that each non-primary key column in a table is not
transitively dependent on the primary key. This means that a table should not have
transitive dependencies, where a non-primary key column depends on another non-primary
key column.
Example
To explain 3NF further, let's consider an example of a table that lists customer
orders –
6
Now, the "Customer City" column is no longer transitively dependent on the
primary key and is instead in a separate table that has a direct relationship with the primary
key. This makes the table 3NF-compliant.
BCNF is a stricter form of 3NF that applies to tables with more than one candidate
key. BCNF requires that each non-trivial dependency in a table is a dependency on a
candidate key. This means that a table should not have non-trivial dependencies, where a
non-primary key column depends on another non-primary key column. BCNF ensures that
each table in a database is a separate entity and eliminates redundancies.
Example
A table is in BCNF if each determinant is a candidate key. In other words, every
non-trivial functional dependency in the table must be on a candidate key. For example,
consider a table that lists information about books and their authors −
In this example, the functional dependency between "Author ID" and "Author
Name" violates BCNF because it is not on a candidate key. To bring this table to BCNF,
we can split it into two tables –
7
Now, the "Author Name" and "Author Nationality" columns are not transitively
dependent on the primary key, and the table is in BCNF.
4NF builds on BCNF by requiring that a table should not have multi-valued
dependencies. A multi-valued dependency occurs when a non-primary key column
depends on a combination of other non-primary key columns. For example, a table that
lists customer orders with a primary key of order ID and non-primary key columns for
customer ID and order items violates 4NF because order items depend on both order ID
and customer ID.
For example, a table that lists orders and their products, with columns for order ID,
product ID, and product details, violates 4NF because the product details depend on the
combination of order ID and product ID.
Example
Consider the following table of orders and products
In this table, the product name and description depend on both the order ID and
product ID, creating a multi-valued dependency. To bring the table into 4NF, we can split
it into three tables −
8
2.3.6 Fifth Normal Form (5NF)
o A relation is in 5NF if it is in 4NF and not contains any join dependency and
joining should be lossless.
o 5NF is satisfied when all the tables are broken into as many tables as possible in
order to avoid redundancy.
o 5NF is also known as Project-join normal form (PJ/NF).
Example
In the above table, John takes both Computer and Math class for Semester 1 but he
doesn't take Math class for Semester 2. In this case, combination of all these fields required
to identify a valid data.
9
Suppose we add a new Semester as Semester 3 but do not know about the subject
and who will be taking that subject so we leave Lecturer and Subject as NULL. But all
three columns together acts as a primary key, so we can't leave other two columns blank.
So to make the above table into 5NF, we can decompose it into three relations P1, P2 &
P3: