(6737) Seminardocumentation 1
(6737) Seminardocumentation 1
On
NORMALIZATION IN DBMS
In partial fulfilment of the requirements for the award of
BACHELOR OF TECHNOLOGY
In
Computer Science and Engineering (Data Science)
Submitted by
R.Sakethreddy (20E51A6737)
2023-2024
HYDERABAD INSTITUTE OF TECHNOLOGY AND
MANAGEMENT
CERTIFICATE
This is to certify that the Technical Seminar entitled “Normalization in DBMS" is being
submitted by R.SakethReddy bearing hall ticket number 20E51A6737 in partial
fulfilment of the requirements for the degree BACHELOR OF TECHNOLOGY in
COMPUTER SCIENCE AND ENGINEERING (DATA SCIENCE ) by the
Jawaharlal Nehru Technological University, Hyderabad, during the academic year 2023-
2024. The matter contained in this document has not been submitted to any other
University or institute for the award of any degree or diploma.
DECLARATION
ABSTRACT……………………………………………………………………………....i
LIST OF FIGURES……………………………………………………………………...ii
1. CHAPTER - 01…………………………………………………………………….....1
2. CHAPTER - 02……………………………………………………………………….1
4 CHAPTER - 04…………………………………………………………………….8
5 CHAPTER - 05…………………………………………………………………….9
6 CHAPTER - 06…………………………………………………………………….11
7 CHAPTER - 07…………………………………………………………………….14
NORMALIZATION VS. DENORMALIZATION
7.1 Comparison of the two approaches
7.2 When to denormalize a database
8 CONCLUSION ……………………………………………………………………15
9 REFERENCES…………………………………………………………………….17
LIST OF FIGURES
Sl.no CAPTION
1. Database Normalization
2. First Normal Form(1NF)
3. Second Normal Form(2NF)
4. Third Normal Form(3NF)
5. Boyce-codd Normal Form (BCNF)
6. Fourth Normal Form(4NF)
7. Normalization vs. Denormalization
.
i
ABSTRACT
Normalization in DBMS:
1.1Data modification:
In the context of a relational database, data is organized into tables, with each table
consisting of rows (records) and columns (attributes). When designing a database, it's
essential to ensure that data is stored in a manner that reflects real-world relationships
accurately and efficiently. Normalization achieves this by breaking down larger tables
into smaller, related tables and establishing rules that govern how data can be added,
updated, and deleted.
1.2Normal Forms:
The normalization process involves dividing a large table into multiple smaller tables,
each serving a specific purpose. To accomplish this, normalization relies on a set of rules,
known as normal forms, which define the requirements that must be met for a table to be
considered normalized. These normal forms, such as First Normal Form (1NF), Second
Normal Form (2NF), Third Normal Form (3NF), and others, progressively eliminate data
redundancy and various types of anomalies.
1
2.First normal form(1NF)
Database Management Systems (DBMS). It defines the basic requirements that a relation
(table) in a relational database must meet to be considered in 1NF. The primary objective
of 1NF is to eliminate duplicate rows and ensure that each attribute (column) contains
atomic (indivisible) values. The First Normal Form (1NF) is the foundational step in the
normalization process within
Here are the key characteristics and requirements of the First Normal Form:
Atomic Values: Each attribute (column) in a 1NF table must contain atomic (indivisible)
values. This means that the values in a column should not be further decomposed into
smaller parts. For example, if you have a "Phone Numbers" column, it should contain
complete phone numbers, not a combination of area codes, prefixes, and line numbers.
No Repeating Groups: There should be no repeating groups or arrays within a single
cell or attribute. This requirement implies that you should avoid storing multiple values
in a single attribute. For instance, if you have a "Skills" column, you shouldn't store
multiple skills as a comma-separated list within the same cell.
Unique Column Names: Each column in the table should have a unique name to ensure
that attributes are easily distinguishable and to prevent ambiguity in data retrieval.
Unique Rows: Each row in the table should be unique. This implies that there should be
a way to uniquely identify each row, often achieved by having a primary key. No two
rows should be identical in all their attributes.
Here are the key characteristics and requirements of the Second Normal Form (2NF):
Satisfying 1NF: Before a table can be in 2NF, it must already satisfy the requirements of
1NF, which means that it should have atomic values in each attribute and no repeating
groups.
Primary Key: There should be a primary key defined for the table that uniquely
identifies each row.
Functional Dependency: Each non-prime attribute (an attribute not part of the primary
key) should be fully functionally dependent on the entire primary key. This means that
every non-prime attribute should depend on the entire primary key, not just a part of it.
Student ID Course ID
101 1
101 2
102 1
3
Student ID Course ID
103 3
Primary Key: In 2NF, there must be a defined primary key that uniquely identifies each
row in the table.
Non-Prime Attributes: These are attributes that are not part of the primary key.
Decomposition: Describe the process of breaking down the table with partial
dependencies into multiple tables.
Result: Show how decomposition leads to tables where non-prime attributes are fully
functionally dependent on the entire primary key.
4
Example: Provide an example of a table before and after achieving 2NF through
decomposition.
The Third Normal Form (3NF) is a crucial step in the normalization process of relational
databases. It builds upon the principles of the First Normal Form (1NF) and the Second
Normal Form (2NF) while addressing transitive dependencies within the data. 3NF aims
to ensure that data remains accurate and that there are no non-prime attributes (attributes
not part of the primary key) that depend on other non-prime attributes.
Here are the key characteristics and requirements of the Third Normal Form (3NF):
Satisfying 1NF and 2NF: Before a table can be in 3NF, it must already satisfy the
requirements of 1NF and 2NF.
6
Primary Key and Non-Prime Attributes: There should be a defined primary key that
uniquely identifies each row in the table. Non-prime attributes are those that are not part
of the primary key.
Functional Dependency: 3NF addresses functional dependencies between attributes.
Transitive Dependencies: The primary focus of 3NF is to eliminate transitive
dependencies. A transitive dependency occurs when a non-prime attribute depends on
another non-prime attribute, which in turn depends on the primary key.
In this table, "Employee ID" is the primary key, and "Department Head" is a non-prime
attribute. The issue here is that "Department Head" depends on "Department," which in
turn depends on the primary key "Employee ID."
To bring this table into 3NF, we would split it into two tables:
Employees Table:
7
Employee ID Employee Name
Departments Table:
HR Mary Johnson
8
3NF is essential in database design because it helps prevent data anomalies and
inconsistencies by removing transitive dependencies. It results in a well-structured and
efficient database schema, which is easier to manage and query effectively.
Suppose we have a table named "Teachers Courses" that records information about
teachers and the courses they teach:
10
BCNF is important in database design because it helps eliminate redundancy and
anomalies in data, resulting in a well-structured and efficient database schema. It ensures
that data is organized accurately and efficiently while preserving data integrity, which is
essential for database reliability and query performance.
The Fourth Normal Form (4NF) is a level of database normalization that builds upon the
principles of the Third Normal Form (3NF). 4NF aims to address multi-valued
dependencies within a relational database, ensuring that data is organized in a way that
eliminates redundancy and preserves data integrity. Multi-valued dependencies occur
when an attribute depends on multiple, independent values of another attribute within the
same table.
Here are the key characteristics and requirements of the Fourth Normal Form (4NF):
Satisfying 1NF, 2NF, and 3NF: Before a table can be in 4NF, it must already satisfy the
requirements of 1NF, 2NF, and 3NF. This means the table should have atomic values, no
partial dependencies, and no transitive dependencies.
Primary Key and Non-Prime Attributes: There should be a defined primary key that
uniquely identifies each row in the table. Non-prime attributes are those that are not part
of the primary key.
Functional Dependency: 4NF addresses functional dependencies between attributes.
Multi-Valued Dependencies (MVDs): The primary focus of 4NF is to eliminate multi-
valued dependencies. A multi-valued dependency occurs when a non-prime attribute
depends on multiple, independent values of another non-prime attribute.
11
To better understand 4NF, let's consider an example:
Suppose we have a table named "Student Courses" that records information about
students and the courses they have taken, along with the textbooks they have used:
101 1 Book A
101 2 Book B
102 1 Book C
102 3 Book D
103 3 BookE
In this table, "Student ID" and "Course ID" together form the composite primary key, and
"Textbook" is a non-prime attribute. The issue here is that "Textbook" depends on both
"Student ID" and "Course ID," which makes it a multi-valued dependency.
To bring this table into 4NF, we would split it into two tables:
12
Student ID Course ID
101 1
101 2
102 1
102 3
103 3
1. Textbooks Table:
Course ID Textbook
1 Book A
2 Book B
13
Course ID Textbook
1 Book C
3 Book D
3 Book E
Now, "Textbook" is directly related to "Course ID," and the database is in 4NF. This
separation eliminates multi-valued dependencies and ensures that data is organized
efficiently while preserving data integrity.
14
7. Normalization vs Denormalization
Normalization and denormalization are two opposing database design techniques used to
organize and structure relational databases. Each approach has its advantages and
disadvantages, and the choice between them depends on specific requirements and trade-
offs in a given application. Here's a comparison of normalization and denormalization:
Normalization:
Definition: Normalization is a process of organizing data in a relational database to
reduce redundancy and improve data integrity. It involves breaking down large tables
into smaller, related tables while ensuring that each table adheres to certain normalization
forms (e.g., 1NF, 2NF, 3NF, BCNF).
Objective: The primary goal of normalization is to prevent data anomalies, such as
insertion, update, and deletion anomalies, by enforcing rules that ensure data is stored
efficiently and without redundancy.
Advantages:
Reduces data redundancy, leading to storage optimization.
Minimizes data anomalies, ensuring data consistency and accuracy.
Makes it easier to maintain and update the database.
Generally leads to more efficient queries through well-structured tables.
Use Cases: Normalization is typically preferred in scenarios where data integrity is
critical, such as financial systems, healthcare databases, and mission-critical applications.
It's also suitable when data is frequently updated.
Complex Queries: Normalized databases may require more complex queries involving
joins to retrieve data from multiple tables, which can impact query performance in some
cases.
Denormalization:
Definition: Denormalization is a process of intentionally introducing redundancy into a
relational database by combining tables or duplicating data. It's done to improve query
performance or simplify database design.
15
Objective: The primary goal of denormalization is to optimize query performance by
reducing the need for complex joins and allowing for faster data retrieval.
Advantages:
Speeds up query performance by minimizing the number of joins.
Simplifies database design, making it easier to understand and implement.
May be suitable for read-heavy or analytical workloads where data consistency can be
compromised to some extent.
Use Cases: Denormalization is often applied in scenarios where query performance is
critical, such as reporting systems, data warehouses, and applications with a heavy read
workload.
Data Integrity: Denormalization can compromise data integrity because it may
introduce redundancy, making it necessary to carefully manage data updates to ensure
consistency.
Storage Overhead: Denormalized databases can consume more storage space due to
redundant data.
The choice between normalization and denormalization depends on factors like the
specific requirements of your application, the nature of your data, and the trade-offs
you're willing to make. In practice, many databases strike a balance by normalizing the
core data for data integrity and then selectively denormalizing certain parts for query
optimization. This approach is called "controlled denormalization" and seeks to harness
the benefits of both techniques while minimizing their drawbacks.
16
8. Conclusion
Normalization is a fundamental concept in Database Management Systems (DBMS) that
plays a pivotal role in designing efficient, organized, and reliable databases. Throughout
this discussion, we've explored the significance, principles, and various normal forms
involved in the normalization process. In conclusion, here are the key takeaways
regarding normalization in DBMS:
Data Integrity and Accuracy: Normalization is primarily about ensuring data integrity
and accuracy in relational databases. By adhering to the principles of normalization, we
can significantly reduce the chances of data anomalies, such as insertion, update, and
deletion anomalies, ensuring that the data remains consistent and trustworthy.
Structured and Efficient Data Storage: Normalization leads to well-structured
databases, where data is organized in a way that reflects real-world relationships. This
organization not only promotes data consistency but also optimizes data storage by
reducing redundancy. It ensures that data is stored in a compact and efficient manner.
Progressive Normal Forms: Normalization encompasses a series of normal forms, from
First Normal Form (1NF) to higher levels like Boyce-Codd Normal Form (BCNF) and
Fourth Normal Form (4NF). Each level builds on the previous one, refining the database
structure and eliminating specific types of data dependencies and anomalies.
Trade-Offs Between Normalization and Performance: While normalization enhances
data integrity, it can sometimes result in more complex queries involving multiple joins.
Therefore, designers must strike a balance between normalization and performance,
especially in scenarios where read-heavy workloads and query efficiency are critical.
Selective Denormalization: In practice, controlled denormalization is often employed to
optimize query performance while maintaining data integrity. This approach selectively
denormalizes specific parts of the database, striking a balance between normalized and
denormalized data structures.
Application-Specific Considerations: The decision to normalize or denormalize should
be guided by the specific requirements of the application. High-integrity applications,
such as financial systems, may prioritize normalization, while read-heavy analytics
systems may lean toward denormalization.
17
In conclusion, normalization is a vital concept in DBMS that helps ensure the reliability,
efficiency, and accuracy of relational databases. It offers a structured framework for
organizing data and mitigating the risk of data anomalies. The choice of the appropriate
normalization level or the introduction of controlled denormalization should be driven by
the specific needs and performance considerations of the application at hand. Ultimately,
effective database design strikes a balance between data integrity and query performance
to meet the goals of the organization.
9.References
18