0% found this document useful (0 votes)
41 views

Normalization and Denormalization

Normalization is the process of organizing data in a database to minimize redundancy and dependency. It divides tables to eliminate anomalies like insertion, update, and deletion anomalies. There are several normal forms like 1NF, 2NF, 3NF, BCNF, 4NF and 5NF that tables must satisfy to be normalized properly. Denormalization is the opposite process where redundant data is added to optimize performance by avoiding costly joins.

Uploaded by

umurita37
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views

Normalization and Denormalization

Normalization is the process of organizing data in a database to minimize redundancy and dependency. It divides tables to eliminate anomalies like insertion, update, and deletion anomalies. There are several normal forms like 1NF, 2NF, 3NF, BCNF, 4NF and 5NF that tables must satisfy to be normalized properly. Denormalization is the opposite process where redundant data is added to optimize performance by avoiding costly joins.

Uploaded by

umurita37
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 44

Normalization?

Normalization is the process of organizing the data in the database.


Normalization is used to minimize the redundancy from a relation or set
of relations. It is also used to eliminate undesirable characteristics like
Insertion, Update, and Deletion Anomalies.
Normalization divides the larger table into smaller and links them using
relationships.
The normal form is used to reduce redundancy from the database table.
Why do we need Normalization?
• The main reason for normalizing the relations is removing these anomalies.
Failure to eliminate anomalies leads to data redundancy and can cause data
integrity and other problems as the database grows. Normalization consists of a
series of guidelines that helps to guide you in creating a good database structure.
Data modification anomalies can be categorized into three types:
• Insertion Anomaly: Insertion Anomaly refers to when one cannot insert a new
tuple into a relationship due to lack of data. This occurs when we are not able to
insert data into a database because some attributes may be missing at the time
of insertion.
• Deletion Anomaly: The delete anomaly refers to the situation where the deletion
of data results in the unintended loss of some other important data. This occurs
when deleting one part of the data deletes the other necessary information from
the database.
• Updatation Anomaly: The update anomaly is when an update of a single data
value requires multiple rows of data to be updated. This occurs when the same
data items are repeated with the same values and are not linked to each other.
Advantages of Normalization
• Normalization helps to minimize data redundancy.
• Greater overall database organization.
• Data consistency within the database.
• Much more flexible database design.
• Enforces the concept of relational integrity.
Disadvantages of Normalization
• You cannot start building the database before knowing what the user
needs.
• The performance degrades when normalizing the relations to higher
normal forms, i.e., 4NF, 5NF.
• It is very time-consuming and difficult to normalize relations of a
higher degree.
• Careless decomposition may lead to a bad database design, leading to
serious problems.
Functional dependency
is a relationship that exists between two sets of attributes of a relational
table where one set of attributes can determine the value of the other set of
attributes. It is denoted by X -> Y, where X is called a determinant and Y is
called dependent. It typically exists between the primary key and non-key
attribute within a table.
For example:
• Assume we have an employee table with attributes: Emp_Id, Emp_Name,
Emp_Address.
• Here Emp_Id attribute can uniquely identify the Emp_Name attribute of
employee table because if we know the Emp_Id, we can tell that employee
name associated with it.
• Functional dependency can be written as:
Emp_Id → Emp_Name
• We can say that Emp_Name is functionally dependent on Emp_Id.
Types of Functional dependency
1. Trivial functional dependency
A → B has trivial functional dependency if B is a subset of A.
The following dependencies are also trivial like: A → A, B → B
Example:

Consider a table with two columns Employee_Id and Employee_Name.


{Employee_id, Employee_Name} → Employee_Id is a trivial
functional dependency as
Employee_Id is a subset of {Employee_Id, Employee_Name}.
Also, Employee_Id → Employee_Id and Employee_Name →
Employee_Name are trivial dependencie
2. Non-trivial functional dependency
A → B has a non-trivial functional dependency if B is not a subset of A.
When A intersection B is NULL, then A → B is called as complete non-
trivial.
• Example:
1.ID → Name,
2.Name → DOB

Types of Normal Forms:
• Normalization works through a series of stages called Normal forms.
The normal forms apply to individual relations. The relation is said to
be in particular normal form if it satisfies constraints.
First Normal Form (1NF)
• A relation will be 1NF if it contains an atomic value.
• It states that an attribute of a table cannot hold multiple values. It must
hold only single-valued attribute.
• First normal form disallows the multi-valued attribute, composite attribute,
and their combinations.
• A relation is in 1NF if every attribute is a single-valued attribute or it does
not contain any multi-valued or composite attribute, i.e., every attribute is
an atomic attribute. If there is a composite or multi-valued attribute, it
violates the 1NF. To solve this, we can create a new row for each of the
values of the multi-valued attribute to convert the table into the 1NF.

• Let’s take an example of a relational table <EmployeeDetail> that contains


the details of the employees of the company.
Table: EmployeeDetail

Here, the Employee Phone Number is a multi-valued attribute. So, this relation is not in 1NF.
To convert this table into 1NF, we make new rows with each Employee Phone Number as a new row as shown
below:
EmployeeDetail
Second Normal Form (2NF)

The normalization of 1NF relations to 2NF involves the elimination of partial


dependencies. A partial dependency exists when any non-prime attributes, i.e., an
attribute not a part of the candidate key, is not fully functionally dependent on one
of the candidate keys. It means that each field in a table must depend upon the
entire key, they don’t depend upon the combination key, they are a move to
another table on whose key they depend. The structure which doesn’t contain
combination keys is automatically in the Second Normal Form. An attribute that is
not part of any candidate key is known as non-prime attribute.
For a relational table to be in second normal form, it must satisfy the following
rules:
1. The table must be in first normal form.
2. It must not contain any partial dependency, i.e., all non-prime attributes are
fully functionally dependent on the primary key.
If a partial dependency exists, we can divide the table to remove the partially
dependent attributes and move them to some other table where they fit in well.
Example: Let’s say a school wants to store the data of teachers and the
subjects they teach. They create a table Teacher that looks like this:
Since a teacher can teach more than one subjects, the table can have
multiple rows for a same teacher.
To make the table complies with 2NF we can
disintegrate it in two tables like this:
Teacher_Details table:
Third Normal form (3NF)
A table design is said to be in 3NF if both the following conditions hold:
1. Table must be in 2NF
2. Transitive functional dependency of non-prime attribute on any super
key should be removed.
An attribute that is not part of any candidate key is known as non-prime
attribute. Partial Dependency occurs when a non-prime attribute is
functionally dependent on part of a candidate key
Partial dependency occurs when one primary key determines some other
attribute/attributes. While transitive dependency occurs when some non-key
attribute determines some other attribute.
Full dependency :it means that this meets all the requirements of the First
Normal Form, and all non-key attributes are functionally dependent fully on
the primary key.
An attribute that is a part of one of the candidate keys is known
as prime attribute.
Boyce Codd normal form (BCNF)

• It is an advance version of 3NF that’s why it is also referred as


3.5NF. BCNF is stricter than 3NF.
• A table complies with BCNF if it is in 3NF and for
every functional dependency X->Y, X should be the super key of
the table.
• Example: Suppose there is a company wherein employees work
in more than one department. They store the data like this:
Functional dependencies in the table above:
Emp_Id -> Emp_Nationality
Emp_Dept -> {Dept_Type, Dept_No_Of_Emp}

Candidate key: {Emp_Id, Emp_Dept}

The table is not in BCNF as neither Emp_Id nor Emp_Dept alone are
keys.

To make the table comply with BCNF we can break the table in three
tables like this:
Emp_Nationality table:
Fourth normal form (4NF)

• A relation will be in 4NF if it is in Boyce Codd normal form and has no


multi-valued dependency.
• For a dependency A → B, if for a single value of A, multiple values of B
exists, then the relation will be a multi-valued dependency.
• The given STUDENT table is in 3NF, but the COURSE and HOBBY are
two independent entity. Hence, there is no relationship between
COURSE and HOBBY.
• In the STUDENT relation, a student with STU_ID, 21 contains two
courses, Computer and Math and two hobbies, Dancing and Singing.
So there is a Multi-valued dependency on STU_ID, which leads to
unnecessary repetition of data.
• So to make the above table into 4NF, we can decompose it into two
tables:
Fifth normal form (5NF)
• A relation is in 5NF if it is in 4NF and not contains any join
dependency and joining should be lossless.
• 5NF is satisfied when all the tables are broken into as many tables as
possible in order to avoid redundancy.
• 5NF is also known as Project-join normal form (PJ/NF).
• In the above table, John takes both Computer and Math class for
Semester 1 but he doesn't take Math class for Semester 2. In this
case, combination of all these fields required to identify a valid data.
• Suppose we add a new Semester as Semester 3 but do not know
about the subject and who will be taking that subject so we leave
Lecturer and Subject as NULL. But all three columns together acts as a
primary key, so we can't leave other two columns blank.
So to make the above table into 5NF, we can decompose it into three
relations P1, P2 & P3:
Denormalization
Denormalization is a database optimization technique in which
we add redundant data to one or more tables. This can help us
avoid costly joins in a relational database. Note
that denormalization does not mean ‘reversing normalization’ or
‘not to normalize’. It is an optimization technique that is applied
after normalization.
Denormalization is a database design technique that involves
intentionally introducing redundancy into a relational database
by incorporating data from related tables into a single table. The
primary goal of denormalization is to optimize query
performance and simplify data retrieval at the cost of increased
storage requirements and a potential increase in data update
complexity.
Some key points and considerations related to
denormalization:
• Performance Optimization: Denormalization can significantly improve the performance of read-intensive database
operations because it reduces the need for complex JOIN operations and enables faster data retrieval. It can be
especially beneficial for systems where queries are common and need to be executed quickly.
• Reduced JOINs: In normalized database designs, data is often distributed across multiple tables, requiring JOIN
operations to retrieve related information. Denormalization combines related data into a single table, eliminating the
need for many JOINs.
• Simplified Queries: Denormalized databases often lead to simpler and more straightforward queries, as they eliminate
the need to traverse multiple tables to retrieve data.
• Data Duplication: Denormalization involves duplicating data, which can lead to data redundancy and increased storage
requirements. This can lead to data integrity issues if not properly managed.
• Data Update Complexity: Because data is duplicated in denormalized tables, updates can become more complex. When
a piece of information needs to change, it must be updated in multiple places to maintain consistency.
• Maintenance Challenges: Denormalized databases may require more effort to maintain, as they can become inconsistent
if updates are not carefully managed.
• Use Cases: Denormalization is often used in data warehousing, reporting, and analytical systems where the focus is on
fast data retrieval and reporting. It is less suitable for transactional systems where data integrity and consistency are
paramount.
• Balancing Act: Deciding whether to denormalize or not is a trade-off. Database designers need to strike a balance
between the performance benefits of denormalization and the increased complexity and potential risks it introduces.
Pros of Denormalization:
• Retrieving data is faster since we do fewer joins
• Queries to retrieve can be simpler(and therefore less likely to have
bugs),
• since we need to look at fewer tables.
Cons of Denormalization:
• Updates and inserts are more expensive.
• Denormalization can make update and insert code harder to write.
• Data may be inconsistent.
• Data redundancy necessitates more storage.
Advantages of Denormalization:
• Improved Query Performance: Denormalization can improve query
performance by reducing the number of joins required to retrieve data.
• Reduced Complexity: By combining related data into fewer tables,
denormalization can simplify the database schema and make it easier to
manage.
• Easier Maintenance and Updates: Denormalization can make it easier to
update and maintain the database by reducing the number of tables.
• Improved Read Performance: Denormalization can improve read
performance by making it easier to access data.
• Better Scalability: Denormalization can improve the scalability of a
database system by reducing the number of tables and improving the
overall performance.
Disadvantages of Denormalization:
• Reduced Data Integrity: By adding redundant data, denormalization can
reduce data integrity and increase the risk of inconsistencies.
• Increased Complexity: While denormalization can simplify the database
schema in some cases, it can also increase complexity by introducing
redundant data.
• Increased Storage Requirements: By adding redundant data,
denormalization can increase storage requirements and increase the cost
of maintaining the database.
• Increased Update and Maintenance Complexity: Denormalization can
increase the complexity of updating and maintaining the database by
introducing redundant data.
• Limited Flexibility: Denormalization can reduce the flexibility of a database
system by introducing redundant data and making it harder to modify the
schema.

You might also like