Normalization
Normalization
Introduction
Functional Dependencies
Functional Dependency Theory
Normalization
Dependency Preservation
Boyce Codd Normal Form
Multivalued dependencies and Fourth Normal
Form
Join Dependencies and Fifth Normal Form
Introduction
Design Goal:
oDecide whether a particular relation R is in
good form.
oIn the case that a relation R is not in good
form, decompose it into a set of relations
{R
1
, R
2
, ..., R
n
} such that
oeach relation is in good form
othe decomposition is a lossless-join
decomposition - On decomposition of a
relation into smaller relations with fewer
attributes the resulting relations
whenever joined must result in the
same relation without any extra rows.
The join operations can be performed in
any order.
Introduction
A bad design may lead to
Repetition of information- that leads to insert,
delete and update anomalies.
Inability to represent some information
Anomalies: unexpected results from an
operation.
delete: when deleting a value for an attribute, you
inadvertently lose the value for some other attribute
insert: you need to store a value for a particular
attribute but can't because you need some other value
to include that occurrence (don't have key value)
update: like insert but to change a value, you need to
know all instances which may be hard to find.
Functional Dependency
Constraints on the set of legal relations.
Require that the value for a certain set of
attributes determines uniquely the value for
another set of attributes.
Definition of FD: Given a relation R, a set of
attributes X in R is said to functionally
determine another attribute Y, also in R,
(written X Y) if and only if each X value is
associated with at most one Y value.
Functional Dependency
Determinant - Attribute X can be defined as
determinant if it uniquely defines the value Y
in a given relationship or entity .
Determinant attribute need NOT be a key
attribute .
Represented as X->Y ,which means attribute
X decides attribute Y
Example of FD
Employee
SSN Name JobType DeptName
557-78-6587 Lance Smith Accountant Salary
214-45-2398 Lance Smith Engineer Product
SSN Name
Note: Name is functionally dependent on SSN because an
employees name can be uniquely determined from their
SSN. Name does not determine SSN, because more than
one employee can have the same name..
Keys
key: a unique attribute (or field) which can be
used to identify the entire tuple (or record) as
unique
key attributes are determinants but not all the
determinants are key attributes. Eg:marks
Grade
candidate key: the set of all attributes (or
combinations) which might serve as a key
primary key: key selected by the database
administrator as the key we will use for that
relation
composed (or composite) key: a key of two or
more fields
FD Contd..
Consider the following relation :
REPORT (Student#, Course#, CourseName, IName,
Room#, Marks, Grade)
Where:
Student#-Student Number
Course#-Course Number
CourseName -CourseName
IName- Name of the instructor who delivered the course
Room#-Room number which is assigned to respective
instructor
Marks- Scored in Course Course# by student Student #
Grade Obtained by student Student# in course Course #
FD Contd..
Student#,Course# together (called composite attribute)
defines EXACTLY ONE value of marks . This can be
symbolically represented as Student#Course# Marks
Other Functional dependencies in above examples are:
Course# -> CourseName
Course#-> IName(Assuming one course is taught by one
and only one instructor )
IName -> Room# (Assuming each instructor has his /her
own and non- shared room)
Marks ->Grade
Formal definition of FD: In a given relation R, X and Y are
attributes. Attribute Y is functional dependent on attribute X if
each value of X determines exactly one value of Y. This is
represented as : X->Y
However X may be composite in nature.
FD Contd..
A functional dependency is trivial if it is satisfied
by all instances of a relation
Example:
customer_name, loan_number
customer_name
customer_name customer_name
In general, is trivial if
Full functional dependency: In a given relation
R ,X and Y are attributes. Y is fully functionally
dependent on attribute X only if it is not
functionally dependent on sub-set of X.
However X may be composite in nature.
FD Contd..
Full functional dependency :
Eg: Marks is fully functional dependent on
student# Course# and not on the sub set of
Student#Course# .
CourseName is not fully functionally
dependent on student#course# because one of
the subset course# determines the course name
FD Contd..
Partial dependency:
In a given relation R, X and Y are attributes
.Attribute Y is partially dependent on the attribute
X only if it is dependent on subset attribute X
.However X may be composite in nature.
Eg:CourseName, IName,Room# are partially
dependent on composite attribute
Student#Course# because Course# alone can
defines the coursename, IName,Room#.
FD-Partial Dependency
FD- Transitive Dependency
Transitive Dependency:
Room# depends on IName and in turn depends on
Course# . Here Room# transitively depends on
Course#.
Similarly Grade depends on Marks,in turn Marks
depends on Student#Course# hence Grade Fully
transitively depends on Student#Course#.
Closure
Given a set F set of functional dependencies,
there are certain other functional dependencies
that are logically implied by F.
For example: If A B and B C, then we
can infer that A C
The set of all functional dependencies logically
implied by F is the closure of F.
We denote the closure of F by F
+
.
F
+
is a superset of F.
Axioms
Developed by Armstrong in 1974, there are six rules
(axioms) that all possible functional dependencies
may be derived from them.
1. Reflexivity Rule --- If X is a set of attributes and Y is a
subset of X, then X Y holds.
each subset of X is functionally dependent on X.
2. Augmentation Rule --- If X Y holds and W is a set
of attributes, then WX WY holds.
3. Transitivity Rule --- If X Y and Y Z holds, then X
Z holds.
These rules are
sound (generate only functional dependencies that
actually hold) and
complete (generate all functional dependencies that
hold).
Derived Theorems from Axioms
4. Union Rule --- If X Y and X Z holds, then X
YZ holds.
5. Decomposition Rule --- If X YZ holds, then so
do X Y and X Z.
6. Pseudotransitivity Rule --- If X Y and WY Z
hold then so does WX Z.
Example
R = (A, B, C, G, H, I)
F = { A B
A C
CG H
CG I
B H}
some members of F
+
A H
by transitivity from A B and B H
AG I
by augmenting A C with G, to get AG CG
and then transitivity with CG I
Introduction to Normalization
Normalization: Process of decomposing unsatisfactory
"bad" relations by breaking up their attributes into
smaller relations
Goals:
Eliminating redundant data
Ensuring data dependencies make sense (only storing
related data in a table).
These goals reduce the amount of space a database
consumes and ensure that data is logically stored.
Need for Normalization
Minimize data redundancy i.e. no unnecessarily
duplication of data.
To make database structure flexible i.e. it should be
possible to add new data values and rows without
reorganizing the database structure.
Data should be consistent throughout the database
i.e. it should not suffer from following anomalies.
Insert anomaly
Update anomaly
Delete anomaly ADVANTAGES OF
NORMALIZATION
More efficient data structure.
Avoid redundant fields or columns.
More flexible data structure i.e. we should be able to
add new rows and data values easily
Better understanding of data.
Ensures that distinct tables exist when necessary.
o Easier to maintain data structure i.e. it is easy to
perform operations and complex queries can be easily
handled.
o Minimizes data duplication.
o Close modeling of real world entities, processes and
their relationships.
DISADVANTAGES OF
NORMALIZATION
We cannot start building the database before you
know what the user needs.
On Normalizing the relations to higher normal
forms i.e. 4NF, 5NF the performance degrades.
It is very time consuming and difficult process in
normalizing relations of higher degree.
Careless decomposition may leads to bad design of
database which may leads to serious problems.
Normal Form
Initially Codd (1972) presented three normal
forms (1NF, 2NF and 3NF) all based on
functional dependencies among the attributes
of a relation.
Later Boyce and Codd proposed another
normal form called the Boyce-Codd normal
form (BCNF).
The fourth and fifth normal forms are based
on multi-value and join dependencies and
were proposed later.
The primary objective of normalization is to
avoid anomalies.
Normal Forms: Review
Unnormalized There are multivalued
attributes or repeating groups
1 NF No multivalued attributes or
repeating groups.
2 NF 1 NF plus no partial
dependencies
3 NF 2 NF plus no transitive
dependencies
Example Relation Record
First Normal Form(1NF)
First Normal Form(1NF)
A relation R is said to be in first normal form
(1NF) if and only if all the attributes of the
relation R, are atomic in nature.
That means only one piece of data can be
stored within the field (attribute) of a particular
record (tuple).
Non-atomic values complicate storage and
encourage redundant (repeated) storage of
data
Example Relation Record
First Normal Form(1NF)
Eg:Student details are repeated for each course
and course details are repeated for each student.
To avoid this Student Details, Course Details and
Result Details can be further divided.
Student Details attribute is divided into
Student#(Student Number) , Student Name and date
of birth.
Course Details is divided into Course#, Course
Name,Prerequisites and duration.
Results attribute is divided into
Student#,Course#,DateOfexam, Marks and Grade.
Student Table
Course Table
Result Table
Second Normal Form (2NF)
A relation is said to be in Second Normal Form if
and only If:
It is in the first normal form ,and
No partial dependency exists between non-key
attributes and key attributes.
Let us re-visit 1NF table structure.
Student# is key attribute for Student ,
Course# is key attribute for Course
Student#Course# together form the composite
key attributes for result.
Other attributes are non-key attributes.
Second Normal Form (2NF)
To make this table 2NF complaint, we have to remove
all the partial dependencies.
StudentName and DateOfBirth depend only on
student#.
CourseName,PreRequisite and DurationInDays
depends only on Course#
DateOfExam depends only on Course#.
To remove this partial dependency we need to split
the table Result into two table
1. Result(Student#,Course#,Marks,Grade)
2. Exam(Course#,DateofExam)
Result and Exam Table
Second Normal Form (2NF)
In the first table (STUDENT), the key attribute is Student#
and all other non-key attributes, StudentName and
DateOfBirth are fully functionally dependant on the key
attribute.
In the Second Table (COURSE) , Course# is the key
attribute and all the non-key attributes, CourseName,
DurationInDays are fully functional dependant on the key
attribute.
In third table (RESULT) Student#Course# together are
key attributes and all other non-key attributes, Marks and
Grade are fully functional dependant on the key attributes.
In the fourth Table (EXAM DATE) Course# is the key
attribute and the non-key attribute, DateOfExam is fully
functionally dependant on the key attribute
Second Normal Form (2NF)
What about anomalies?
At first look it appears like all our anomalies are
taken away!
Now we are storing Student 1003 and M4 record
only once.
We can insert prospective students and courses at
our will.
We will update only once if we need to change any
data in STUDENT, COURSE tables.
We can get rid of any course or student details by
deleting just one row.
Second Normal Form (2NF)
Let us analyse the RESULT Table
We already concluded that:
All attributes are atomic in nature
No partial dependency exists between the key
attributes and non-key attributes
RESULT table is in 2NF
Second Normal Form (2NF)
Assume, at present, as per the university
evaluation policy,
Students who score more than or equal
to 80 marks are awarded with A grade
Students who score more than or equal
to 70 marks up till 79 are awarded with
B grade
Students who score more than or equal
to 60 marks up till 69 are awarded with
C grade
Students who score more than or equal
to 50 marks up till 59 are awarded with
D grade
The University management which is
committed to improve the quality of education wants
to change the existing grading system to a new
grading system .In the present RESULT table
structure,
We dont have an option to introduce new grades
like A+ , B- and E
We need to do multiple updates on the existing
record to bring them to new grading definition
We will not be able to take away D grade if we
want to.
2NF does not take care of all the anomalies and
inconsistencies.
Second Normal Form (2NF)
Third Normal Form 3NF
A relation R is said to be in 3NF if and only if
It is in 2NF
No transitive dependency exists between non-
key attributes and key attributes.
In the RESULT table Student# and Course# are the
key attributes.
All other attributes, except grade are non-partially,
non transitively dependant on key attributes.
The grade attribute is dependent on Marks and in
turn Marks is dependent on Student# Course#.
To bring the table in 3NF we need to take off this
transitive dependency.
Third Normal Form 3NF-Result & Grade
Table
Third Normal Form 3NF
After normalizing tables to 3NF, we got rid of all
the anomalies and inconsistencies.
Now we can add new grade systems, update the
existing one and delete the unwanted ones.
Hence the Third Normal form is the most optimal
normal form and 99% of the databases which
require efficiency in
INSERT
UPDATE
DELETE
Operations are designed in this normal form.
BCNF
A relation is in Boyce-Codd normal form id and only if
every determinant is a candidate key.
It should be noted that most relations that are in 3NF
are also in BCNF. Infrequently, a 3NF relation is not in
BCNF and this happens only if
(a) the candidate keys in the relation are composite keys
(that is, they are not single attributes),
(b) there is more than one candidate key in the relation,
and
(c) the keys are not disjoint, that is, some attributes in
the keys are common
BoyceCodd normal form (BCNF)
A relation is in BCNF if and only if every
determinant is a candidate key.
Difference between 3NF and BCNF is that for a
functional dependency A B, 3NF allows this
dependency in a relation if B is a primary-key
attribute and A is not a candidate key.
Whereas, BCNF insists that for this
dependency to remain in a relation, A must be
a candidate key.
Every relation in BCNF is also in 3NF.
However, a relation in 3NF is not necessarily
in BCNF.
Violation of BCNF is quite rare.
The potential to violate BCNF may occur in a
relation that:
contains two (or more) composite candidate
keys;
the candidate keys overlap, that is have at least
one attribute in common.
Multi-valued Dependency (MVD)
Dependency between attributes (for example, A,
B, and C) in a relation, such that for each value
of A there is a set of values for B and a set of
values for C. However, the set of values for B
and C are independent of each other.
The formal definition is given as follows.
Let be a relation schema and let and
(subsets). The multivalued dependency
(which can be read as multidetermines ) holds on
if, in any legal relation , for all pairs of tuples and
in such that , there exist tuples and in such that
There exist anomalies/redundancies in relational schemas that cannot
be captured by FDs.
Example: consider the following table:
There are no (non-trivial) FDs that hold on this scheme; therefore the
scheme (Course, Set-of-teachers, Set-of-books) is in BCNF.
CTB table contains redundant information
because:
whenever (c;t1b1)2CTB and (c;t2b2)2CTB
then also (c;t1b2)2CTB
and, by symmetry, (c;t2b1)2CTB
we say that a multivalued dependency (MVD)
C !T (and C !B as well)
holds on CTB.
given a course, the set of teachers and the set of
books are uniquely
determined and independent.