0% found this document useful (0 votes)
14 views8 pages

What Is Normalization

Normalization is a systematic process in database design aimed at eliminating anomalies by restructuring relations into normal forms, which include 1NF, 2NF, 3NF, 4NF, and 5NF. It helps maintain data consistency by addressing update, deletion, and insertion anomalies while offering advantages such as reduced database size and improved performance, but also presents challenges like increased complexity and the need for careful implementation. Each normal form builds upon the previous one, with specific criteria that must be met to ensure proper database structure.

Uploaded by

Shitiz Saini
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views8 pages

What Is Normalization

Normalization is a systematic process in database design aimed at eliminating anomalies by restructuring relations into normal forms, which include 1NF, 2NF, 3NF, 4NF, and 5NF. It helps maintain data consistency by addressing update, deletion, and insertion anomalies while offering advantages such as reduced database size and improved performance, but also presents challenges like increased complexity and the need for careful implementation. Each normal form builds upon the previous one, with specific criteria that must be met to ensure proper database structure.

Uploaded by

Shitiz Saini
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 8

What is Normalization?

What is Normalization

 Normalization is a process in which we systematically examine relations for anomalies


and, when detected, remove those anomalies by splitting up the relation into two new,
related, relations.. In other words we can say that the normalization is that process in which we
renovate an un-normalized relation into relations. There is a sequence of stages or steps on
which the normalization works. This sequence is called normal forms. The normal forms are
relevant to entity relations. If the form satisfies a certain sets of constraints then a table or a
relation comes to the particular normal form. There are above five normal forms. These are 1NF,
2NF, 3NF, 4NF and 5NF. Here NF stands Normal Form. It is very important for relational data
model to consider that the first normal form (1NF) is very grave in creating the relations. Rests
of all the forms are optional. There are some guidelines which help to create a good database.

Normalization in DBMS: Anomalies, Advantages, Disadvantages

If a database design is not done properly, it may cause several anomalies to occur in it.
Normalization is essential for removing various anomalies like:

Anomalies in Database

1) Update Anomalies: When several instances of the same data are scattered across the database
without proper relationship/link, it could cause strange conditions where a few of the instances
will get updated with new values whereas some of them will not. This leaves the database in an
inconsistent state.

2) Deletion Anomalies: Incomplete deletion of a particular data section which leaves some
residual instances. The database creator remains unaware of such unwanted data as it is present
at a different location.

3) Insertion Anomalies: This occurs when an attempt to insert data into a non-existent record.

Paying attention to these anomalies can help to maintain a consistent database.

ADVANTAGES OF NORMALIZATION

Here we can see why normalization is an attractive prospect in RDBMS concepts.

1) A smaller database can be maintained as normalization eliminates the duplicate data. Overall
size of the database is reduced as a result.
2) Better performance is ensured which can be linked to the above point. As databases become
lesser in size, the passes through the data becomes faster and shorter thereby improving response
time and speed.

3) Narrower tables are possible as normalized tables will be fine-tuned and will have lesser
columns which allows for more data records per page.

4) Fewer indexes per table ensures faster maintenance tasks (index rebuilds).

5) Also realizes the option of joining only the tables that are needed.

DISADVANTAGES OF NORMALIZATION

1) More tables to join as by spreading out data into more tables, the need to join table’s increases
and the task becomes more tedious. The database becomes harder to realize as well.

2) Tables will contain codes rather than real data as the repeated data will be stored as lines of
codes rather than the true data. Therefore, there is always a need to go to the lookup table.

3) Data model becomes extremely difficult to query against as the data model is optimized for
applications, not for ad hoc querying. (Ad hoc query is a query that cannot be determined before
the issuance of the query. It consists of an SQL that is constructed dynamically and is usually
constructed by desktop friendly query tools.). Hence it is hard to model the database without
knowing what the customer desires.

4) As the normal form type progresses, the performance becomes slower and slower.

5) Proper knowledge is required on the various normal forms to execute the normalization
process efficiently. Careless use may lead to terrible design filled with major anomalies and data
inconsistency.

1st Normal Form Definition

A database is in first normal form if it satisfies the following conditions:

 Contains only atomic values


 There are no repeating groups

An atomic value is a value that cannot be divided

A repeating group means that a table contains two or more columns that are closely related

1st Normal Form Example


A university uses the following relation:
Student(Surname, Name, Skills)

The attribute Skills can contain multiple values and therefore the relation is not in the first normal form.

But the attributes Name and Surname are atomic attributes that can contain only one value
Second normal form (2NF)
Second normal form:

A relation is in second normal form if it is in 1NF and every non key attribute is fully functionally
dependent on the primary key.

A university uses the following relation:

Student(IDSt, StudentName, IDProf, ProfessorName, Grade)

The attributes IDSt and IDProf are the primary keys.


All attributes a single valued (1NF).

The following functional dependencies exist:

1. The attribute ProfessorName is functionally dependent on attribute IDProf (IDProf -->


ProfessorName)

2. The attribute StudentName is functionally dependent on IDSt (IDSt --> StudentName)

3. The attribute Grade is fully functional dependent on IDSt and IDProf (IDSt, IDProf --> Grade)
the table in this example is in first normal form (1NF) since all attributes are single valued. But it is not
yet in 2NF. Because

1. The attribute ProfessorName is functionally dependent on attribute IDProf (IDProf -->


ProfessorName) subset of paimary key

2. The attribute StudentName is functionally dependent on IDSt (IDSt --> StudentName) subset
of paimary key

To solve this problem, we must create a new table Professor with the attribute Professor (the name)
and the key IDProf. The third table Grade is necessary for combining the two relations Student and
Professor and to manage the grades. Besides the grade it contains only the two IDs of the student and
the professor. If now a student is deleted, we do not loose the information about the professor.

Third normal form (3NF)


Third normal form:

A relation is in third normal form if it is in 2NF and no non key attribute is transitively dependent
on the primary key.

A bank uses the following relation:

Vendor(ID, Name, Account_No, Bank_Code_No, Bank)

The attribute ID is the identification key. All attributes are single valued (1NF). The table is also
in 2NF.

The following dependencies exist:

1. Name, Account_No, Bank_Code_No are functionally dependent on ID (ID --> Name,


Account_No, Bank_Code_No)

2. Bank is functionally dependent on Bank_Code_No (Bank_Code_No --> Bank)


Example Third normal form

The table in this example is in 1NF and in 2NF. But there is a transitive dependency between
Bank_Code_No and Bank, because Bank_Code_No is not the primary key of this relation. To
get to the third normal form (3NF), we have to put the bank name in a separate table together
with the clearing number to identify it.

Fourth Normal Form (4NF)


When attributes in a relation have multi-valued dependency, further Normalization to
4NF and 5NF are required. Let us first find out what multi-valued dependency is.
A multi-valued dependency is a typical kind of dependency in which each and every
attribute within a relation depends upon the other, yet none of them is a unique primary
key.
We will illustrate this with an example. Consider a vendor supplying many items to many
projects in an organization. The following are the assumptions:
1. A vendor is capable of supplying many items.
2. A project uses many items.
3. A vendor supplies to many projects.
4. An item may be supplied by many vendors.
A multi valued dependency exists here because all the attributes depend upon the other
and yet none of them is a primary key having unique value.
Vendor Item Project
Code Code No.
V1 I1 P1
V1 I2 P1
V1 I1 P3
V1 I2 P3
V2 I2 P1
V2 I3 P1
V3 I1 P2
V3 I1 P3
The given relation has a number of problems. For example:
1. If vendor V1 has to supply to project P2, but the item is not yet decided, then a row with
a blank for item code has to be introduced.
2. The information about item I1 is stored twice for vendor V3.
Observe that the relation given is in 3NF and also in BCNF. It still has the problem
mentioned above. The problem is reduced by expressing this relation as two relations in
the Fourth Normal Form (4NF). A relation is in 4NF if it has no more than one
independent multi valued dependency or one independent multi valued dependency
with a functional dependency.
The table can be expressed as the two 4NF relations given as following. The fact that
vendors are capable of supplying certain items and that they are assigned to supply for
some projects in independently specified in the 4NF relation.
Vendor-Supply
Item
Vendor Code
Code
V1 I1
V1 I2
V2 I2
V2 I3
V3 I1
Vendor-Project
Project
Vendor No.
Code
V1 P1
V1 P3
V2 P1
V3 P2

Fifth Normal Form (5NF)


These relations still have a problem. While defining the 4NF we mentioned that all the
attributes depend upon each other. While creating the two tables in the 4NF, although
we have preserved the dependencies between Vendor Code and Item code in the first
table and Vendor Code and Item code in the second table, we have lost the relationship
between Item Code and Project No. If there were a primary key then this loss of
dependency would not have occurred. In order to revive this relationship we must add a
new table like the following. Please note that during the entire process of normalization,
this is the only step where a new table is created by joining two attributes, rather than
splitting them into separate tables.
Project Item
No. Code
P1 11
P1 12
P2 11
P3 11
P3 13

You might also like