0% found this document useful (0 votes)
2 views

Database Normalization

Database normalization is a systematic process aimed at minimizing redundancy in database tables to avoid anomalies such as insertion, deletion, and updation issues. It involves organizing data into normal forms, with First Normal Form (1NF) being the weakest and Boyce-Codd Normal Form (BCNF) being the strongest. The process ensures that data dependencies are logical and that data integrity is maintained by eliminating partial and transitive dependencies.

Uploaded by

pranshu07d
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Database Normalization

Database normalization is a systematic process aimed at minimizing redundancy in database tables to avoid anomalies such as insertion, deletion, and updation issues. It involves organizing data into normal forms, with First Normal Form (1NF) being the weakest and Boyce-Codd Normal Form (BCNF) being the strongest. The process ensures that data dependencies are logical and that data integrity is maintained by eliminating partial and transitive dependencies.

Uploaded by

pranshu07d
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 44

Database Normalization

Objective
• Normalization is the process of minimizing
redundancy from a relation or set of relations.
• It is systematic approach of decomposing tables
to eliminate data redundancy(duplication).
• Redundancy in relation may cause
– Insertion anomalies,
– deletion anomalies
– updation anomalies.
Objective
• Normal forms are used to eliminate or reduce
redundancy in database tables.
• It is a multi-step process that puts data into
tabular form, removing duplicated data from
the relation tables.
Objective
• Normalization is used for mainly two
purposes,
1.Eliminating redundant(useless) data.
2.Ensuring data dependencies make
sense i.e data is logically stored
Why to use Normalization?

 If a table is not properly normalized then


 It takes extra memory space
 it become difficult to handle and update the
database, without facing data loss.
 Insertion, Updation and Deletion Anomalies
are very frequent if database is not
normalized.
Why to use Normalization?

 To understand these anomalies let us take an


example of a Student table.
Roll_No Student_Name Branch AcademeicDean College

71 Jethani Vanshika IT Nidhi Vaidya TSEC


Nidhi Vaidya
72 IT TSEC
Pranav Dani
Pranav Nidhi Vaidya
73 IT TSEC
Shrivastava
Nidhi Vaidya
74 IT TSEC
Pratham Bharat
Why to use Normalization?

 In Student table we have data of 4 IT


students. Data for the fields Branch , Academic
Dean ,college is repeated for the students
who are in the same department in the
college, this is Data Redundancy.
Insertion Anomaly
 For a new admission, until and unless a
student opts for a branch, data of the student
cannot be inserted, or else we will have to set
the branch information as NULL.
 Also, if we have to insert data of 100 students
of same branch, then the branch information
will be repeated for all those 100 students.
 These scenario is known as Insertion
anomalies.
Updation Anomaly
• What if Academic Dean leaves the college? or
is no longer the Dean? In that case all the
student records will have to be updated, and if
by mistake we miss any record, it will lead to
data inconsistency. This is Updation anomaly.
Deletion Anomaly

 In our Student table, contains the Student


information and Branch information.
 Hence, at the end of the academic year, if
student records are deleted, we will also lose
the branch information.
 This is Deletion anomaly.
Normalization
• There is a sequence to normal forms:
– 1NF is considered the weakest,
– 2NF is stronger than 1NF,
– 3NF is stronger than 2NF, and
– BCNF is considered the strongest

• Also,
– any relation that is in BCNF, is in 3NF;
– any relation in 3NF is in 2NF; and
– any relation in 2NF is in 1NF.
First Normal Form (1NF)

 First Normal Form is defined in the definition


of relations (tables) itself.
 This rule defines that all the attributes in a
relation must have atomic domains.
 The values in an atomic domain are indivisible
units.
First Normal Form (1NF)

Roll_no name subject


101 Annaya OS, CN

103 Abhishek Java

 We re-arrange the relation (table), to convert


it to First Normal Form.
First Normal Form (1NF)

Roll_no name subject

101 Annaya OS

101 Annaya CN

103 Abhishek Java

 By doing so, although a few values are getting repeated but


values for the subject column are now atomic for each
record/row.
 Using the First Normal Form, data redundancy increases, as
there will be many columns with same data in multiple rows
but each row as a whole will be unique.
What is Second Normal Form?

 For a table to be in the Second Normal Form,


it must satisfy two conditions:
 The table should be in the First Normal Form.
 There should be no Partial Dependency.
Functional Dependencies
 Functional dependencies (FDs) are used to
specify formal measures of the "goodness" of
relational designs
 FDs and keys are used to define normal forms
for relations
 FDs are constraints that are derived from the
meaning and interrelationships of the data
attributes
Functional Dependency definition
• A set of attributes X functionally determines a set of
attributes Y if the value of X determines a unique
value for Y
• X Y holds if whenever two tuples have the same
value for X, they must have the same value for Y
If t1[X]=t2[X], then t1[Y]=t2[Y] in any relation
instance r(R)
• X  Y in R specifies a constraint on all relation
instances r(R)
• FDs are derived from the real-world constraints on
the attributes
Examples of FD constraints
• Social Security Number determines employee name
SSN  ENAME
• Project Number determines project name and
location
PNUMBER  {PNAME, PLOCATION}
• Employee SSN and project number determines the
hours per week that the employee works on the
project
{SSN, PNUMBER}  HOURS
Functional Dependencies and Keys
 An FD is a property of the attributes in the
schema R
 The constraint must hold on every relation
instance r(R)
 If K is a key of R, then K functionally
determines all attributes in R (since we never
have two distinct tuples with t1[K]=t2[K])
What is Dependency?

• Let's take an example of a Student table with


columns student_id, name, reg_no(registratio
n number), branch and address(student's
home address).
Student_id Name Reg_no Branch Address
What is Dependency?

• In this table, student_id is the primary key and


will be unique for every row.
• Hence we can use student_id to fetch any row
of data from this table
• Even for a case, where student names are
same, if we know the student_id we can easily
fetch the correct record.
What is Dependency?

Student_id Name Reg_no Branch Address


10 Avantika 112345 IT Borivali
11 Vinayak 134567 IT Vashi
What is Dependency?

• Hence we can say a Primary Key for a table is the


column or a group of columns(composite key) which can
uniquely identify each record in the table.
• I can ask from branch name of student
with student_id 1, and I can get it.
• Similarly, if I ask for name of student
with student_id 1 or 2, I will get it. So all I need
is student_id and every other column depends on it, or
can be fetched using it.
• This is Dependency and we also call it Functional
Dependency.
What is Partial Dependency?

 For a simple table like Student, a single column


like student_id can uniquely identfy all the records
in a table.
 But this is not true all the time. So now let's extend
our example to see if more than 1 column together
can act as a primary key.
 Let's create another table for Subject, which will
have subject_id and subject_name fields
and subject_id will be the primary key.
What is Partial Dependency?

subject_id subject_name
1 Java
2 C++
3 Php
What is Partial Dependency?

• Now we have a Student table with student


information and another table Subject for
storing subject information.
• Let's create another table Score, to store
the marks obtained by students in the
respective subjects. We will also be
saving name of the teacher who teaches that
subject along with marks.
What is Partial Dependency?

score_id student_id subject_id marks teacher


1 10 1 70 Java Teacher
2 10 2 75 C++ Teacher
3 11 1 80 Java Teacher
What is Partial Dependency?

• In the score table we are saving


the student_id to know which student's marks
are these and subject_id to know for which
subject the marks are for.
• Together, student_id + subject_id forms
a Candidate Key for this table, which can be
the Primary key.
What is Partial Dependency?

• Now if you look at the Score table, we have a column


names teacher which is only dependent on the subject, for
Java it's Java Teacher and for C++ it's C++ Teacher & so on.
• Now as discussed the primary key for this table is a
composition of two columns which
is student_id & subject_id but the teacher's name only
depends on subject, hence the subject_id, and has
nothing to do with student_id.
• This is Partial Dependency, where an attribute in a table
depends on only a part of the primary key and not on the
whole key.
How to remove Partial Dependency?

 There can be many different solutions for this,


but out objective is to remove teacher's name
from Score table.
 The simplest solution is to remove
columns teacher from Score table and add it
to the Subject table.
How to remove Partial Dependency?

subject_id subject_name teacher


1 Java Java Teacher
2 C++ C++ Teacher
3 Php Php Teacher
How to remove Partial Dependency?

score_id student_id subject_id marks


1 10 1 70
2 10 2 75
3 11 1 80
• For a table to be in the Second Normal form, it should be
in the First Normal form and it should not have Partial
Dependency.
• Partial Dependency exists, when for a composite primary
key, any attribute in the table depends only on a part of
the primary key and not on the complete primary key.
• To remove Partial dependency, we can divide the table,
remove the attribute which is causing partial dependency,
and move it to some other table where it fits in well.
Third Normal Form (3NF)

• Third Normal Form is an upgrade to Second


Normal Form.
• When a table is in the Second Normal Form
and has no transitive dependency, then it is in
the Third Normal Form.
Third Normal Form (3NF)

 let's use the 3


tables, Student, Subject and Score.
Requirements for Third Normal Form
 It should be in the Second Normal form.
 And it should not have Transitive Dependency.
What is Transitive Dependency?

 With exam_name and total_marks added to Score table.


 Primary key for Score table is a composite key, which means
it's made up of two attributes or columns student_id +
subject_id.
 New column exam_name depends on both student and
subject.
 For example, a mechanical engineering student will have
Workshop exam but a computer science student won't. And for
some subjects you have Prctical exams and for some you don't.
 So we can say that exam_name is dependent on
both student_id and subject_id.
What is Transitive Dependency?

• what about second new column total_marks? Does it depend on


our Score table's primary key?
• Well, the column total_marks depends on exam_name as with exam
type the total score changes.
• For example, practicals are of less marks while theory exams are of
more marks.
• But, exam_name is just another column in the score table. It is not a
primary key or even a part of the primary key,
and total_marks depends on it.
• This is Transitive Dependency.
• When a non-prime attribute depends on other non-prime attributes
rather than depending upon the prime attributes or primary key.
How to remove Transitive Dependency?

 Again the solution is very simple. Take out the


columns exam_name and total_marks from
Score table and put them in an Exam table
and use the exam_id wherever required.
How to remove Transitive Dependency?

Exam_id Exam_name Total_marks


1 Workshop 200
2 Mains 70
3 Practicals 30
Advantage of removing Transitive Dependency

 Amount of data duplication is reduced.


 Data integrity achieved.
Boyce-Codd Normal Form (BCNF)

 Boyce-Codd Normal Form or BCNF is an


extension to the third normal form, and is
also known as 3.5 Normal Form.
Rules for BCNF

 For a table to satisfy the Boyce-Codd Normal Form, it


should satisfy the following two conditions:
 It should be in the Third Normal Form.
 And, for any dependency A → B, A should be a super
key.
 The second point sounds a bit tricky, right? In simple
words, it means, that for a dependency A → B, A cannot
be a non-prime attribute, if B is a prime attribute.
Boyce-Codd Normal Form (BCNF)

• Below we have a college enrolment table with


columns student_id, subject and professor

You might also like