What Is Normalization
What Is Normalization
What is Normalization
If a database design is not done properly, it may cause several anomalies to occur in it.
Normalization is essential for removing various anomalies like:
Anomalies in Database
1) Update Anomalies: When several instances of the same data are scattered across the database
without proper relationship/link, it could cause strange conditions where a few of the instances
will get updated with new values whereas some of them will not. This leaves the database in an
inconsistent state.
2) Deletion Anomalies: Incomplete deletion of a particular data section which leaves some
residual instances. The database creator remains unaware of such unwanted data as it is present
at a different location.
3) Insertion Anomalies: This occurs when an attempt to insert data into a non-existent record.
ADVANTAGES OF NORMALIZATION
1) A smaller database can be maintained as normalization eliminates the duplicate data. Overall
size of the database is reduced as a result.
2) Better performance is ensured which can be linked to the above point. As databases become
lesser in size, the passes through the data becomes faster and shorter thereby improving response
time and speed.
3) Narrower tables are possible as normalized tables will be fine-tuned and will have lesser
columns which allows for more data records per page.
4) Fewer indexes per table ensures faster maintenance tasks (index rebuilds).
5) Also realizes the option of joining only the tables that are needed.
DISADVANTAGES OF NORMALIZATION
1) More tables to join as by spreading out data into more tables, the need to join table’s increases
and the task becomes more tedious. The database becomes harder to realize as well.
2) Tables will contain codes rather than real data as the repeated data will be stored as lines of
codes rather than the true data. Therefore, there is always a need to go to the lookup table.
3) Data model becomes extremely difficult to query against as the data model is optimized for
applications, not for ad hoc querying. (Ad hoc query is a query that cannot be determined before
the issuance of the query. It consists of an SQL that is constructed dynamically and is usually
constructed by desktop friendly query tools.). Hence it is hard to model the database without
knowing what the customer desires.
4) As the normal form type progresses, the performance becomes slower and slower.
5) Proper knowledge is required on the various normal forms to execute the normalization
process efficiently. Careless use may lead to terrible design filled with major anomalies and data
inconsistency.
A repeating group means that a table contains two or more columns that are closely related
The attribute Skills can contain multiple values and therefore the relation is not in the first normal form.
But the attributes Name and Surname are atomic attributes that can contain only one value
Second normal form (2NF)
Second normal form:
A relation is in second normal form if it is in 1NF and every non key attribute is fully functionally
dependent on the primary key.
3. The attribute Grade is fully functional dependent on IDSt and IDProf (IDSt, IDProf --> Grade)
the table in this example is in first normal form (1NF) since all attributes are single valued. But it is not
yet in 2NF. Because
2. The attribute StudentName is functionally dependent on IDSt (IDSt --> StudentName) subset
of paimary key
To solve this problem, we must create a new table Professor with the attribute Professor (the name)
and the key IDProf. The third table Grade is necessary for combining the two relations Student and
Professor and to manage the grades. Besides the grade it contains only the two IDs of the student and
the professor. If now a student is deleted, we do not loose the information about the professor.
A relation is in third normal form if it is in 2NF and no non key attribute is transitively dependent
on the primary key.
The attribute ID is the identification key. All attributes are single valued (1NF). The table is also
in 2NF.
The table in this example is in 1NF and in 2NF. But there is a transitive dependency between
Bank_Code_No and Bank, because Bank_Code_No is not the primary key of this relation. To
get to the third normal form (3NF), we have to put the bank name in a separate table together
with the clearing number to identify it.