0% found this document useful (0 votes)
36 views

Functional Dependency (Normalization) Asad Khailany, DSC.: First Normal Form

This document discusses normalization and functional dependencies in database design. It begins by defining the first three normal forms: first normal form requires that attributes are simple and not composed of other relations; second normal form requires that non-key attributes are fully dependent on the primary key. An example student-courses database is used to illustrate how it violates these forms and suffers from insertion, update, and deletion anomalies. The database is normalized into third normal form by breaking it into separate Student, Course, and Student-Course tables to eliminate anomalies.

Uploaded by

Madhukar Reddy
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views

Functional Dependency (Normalization) Asad Khailany, DSC.: First Normal Form

This document discusses normalization and functional dependencies in database design. It begins by defining the first three normal forms: first normal form requires that attributes are simple and not composed of other relations; second normal form requires that non-key attributes are fully dependent on the primary key. An example student-courses database is used to illustrate how it violates these forms and suffers from insertion, update, and deletion anomalies. The database is normalized into third normal form by breaking it into separate Student, Course, and Student-Course tables to eliminate anomalies.

Uploaded by

Madhukar Reddy
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 13

Functional Dependency (Normalization)

Asad Khailany, Dsc.


The concept of functional dependency (also known as normalization was
introduced by professor Codd in 1970 when he defined the first three normal
forms (first, second and third normal forms). Normalization is used to avoid or
eliminate the three types of anomalies (insertion, deletion and update
anomalies) which a database may suffer from. These concepts will be clarified
soon, but first let us define the first three normal forms.
First Normal Form: A relation is in first normal form if all its attributes are
simple. In other words, none of the attributes of the relation is a relation. Notice
that relation means 2-diemenatioanl table.
Example -1. Assume the following relation
Student-courses (Sid:pk, Sname, Phone, Courses-taken)
Where attribute Sid is the primary key, Sname is student name, Phone is
student's phone number and Courses-taken is a table contains course-id, course-
description, credit hours and grade for each course taken by the student. More
precise definition of table Course-taken is :
Course-taken (Course-id:pk, Course-description, Credit-hours, Grade)
According to the definition of first normal form relation Student-courses is
not in first normal form because one of its attributeCourses-taken is itself a
table and is not a simple attribute. To clarify it more assume the above tables
contain the data as shown below:
Student-courses
Sid Sname Phone Courses-taken
100 John 487 2454 St-100-courses-taken
200 Smith 671 8120 St-200-courses-taken
300 Russell 871 2356 St-300-courses-taken
St-100-Course-taken
Course-id Course-description Credit-hours Grade
IS380 Database Concepts 3 A
IS416 Unix Operating System 3 B
St-200-Course-taken
Course-id Course-description Credit-hours Grade
IS380 Database Concepts 3 B
IS416 Unix Operating System 3 B
IS420 Data Net Work 3 C
St-300-Course-taken
Course-id Course-description Credit-hours Grade
IS417 System Analysis 3 A
 
 
 
Definition of the three types of anomalies:
Insertion anomaly means that that some data can not be inserted in the
database. For example we can not add a new course to the database of example-
1,unless we insert a student who has taken that course.
Update anomaly means we have data redundancy in the database and to make
any modification we have to change all copies of the redundant data or else the
database will contain incorrect data. For example in our database we have the
Course description "Database Concepts" for IS380 appears in both St-100-
Course-taken and St-200-Course-taken tables. To change its description to
"New Database Concepts" we have to change it in all places. Indeed one of the
purposes of normalization is to eliminate data redundancy in the database.
Deletion anomaly means deleting some data cause other information to be lost.
For example if student Russell is deleted from St-100-Course-taken table we
also lose the information that we had a course call IS417 with description
System Analysis.
Thus Student-courses table suffers from all the three anomalies.
To convert the above structure to first normal form relations, all non-
simple attributes must be removed or converted to simple attribute. To do
that a new relation is created by combining each row of Student-courses with
all rows of its corresponding course table that was taken by that specific student.
Following is Student-courses table in first normal form.
Student-courses ( Sid:pk1, Sname, Phone, Course-id:pk2,
Course-description, Credit-hours, Grade)
Notice that the primary key of this table is a composite key made up of two
parts; Sid and Course-id. Note that pk1 following an attribute indicates that the
attribute is the first part of the primary key and pk2 indicates that the attribute is
the second part of the primary key.
 
Student-courses
Sid Sname Phone Course- Course-description Credit- Grade
id hours
10 John 487 IS380 Database Concepts 3 A
0 2454
10 John 487 IS416 Unix Operating 3 B
0 2454 System
20 Smith 671 IS380 Database Concepts 3 B
0 8120
20 Smith 671 IS416 Unix Operating 3 B
0 8120 System
20 Smith 671 IS420 Data Net Work 3 C
0 8120
30 Russel 871 IS417 System Analysis 3 A
0 l 2356
 
Examination of the above Student-courses relation reveals that Sid does not
uniquely identify a row (tuple) in the relation hence cannot be the primary key.
For the same reason Course-id cannot be the primary key. However the
combination of Sid andCourse-id uniquely identifies a row in Student-courses,
Therefore (Sid, Course-id) is the primary key of the above relation.
The primary key determines every attribute. For example if you know
both Sid and Course-id for any student you will be able to retrieve Sname,
Phone, Course-description, Credit-hours and Grade, because these attributes
are dependent on the primary key. Figure 1below is the graphical representation of
the functional dependency between the primary key and attributes of the above
relation.
 
Note that the attribute to the right of the arrow is functionally dependent on the
attribute in the left of the arrow. Thus the combination (Sid, Course-id) is the
determinant (that determines other attributes) and attributes Sname, Phone,
Course-description, Credit-hours and Grade are dependent attributes.
Formally speaking a determinant is an attribute or a group of attributes
determine the value of other attributes. In addition to the (Sid, Course-id) there
are two other determinants in the above Student-courses relation. These
are; Sid and Course-id attributes. Note that Sid alone determines both Sname
and Phone, and attribute Course-id alone determines both Credit-hours and
Course_description attributes.
 
Attribute Grade is fully functionally dependent on the primary key (Sid,
Course-id) because both parts of the primary keys are needed to
determine Grade. On the other hand both Sname, and Phone attributes are not
fully functionally dependent on the primary key, because only a part of the
primary key namely Sid is needed to determine both Sname and Phone. Also
attributes Credit-hours and Course-Description are not fully functionally
dependent on the primary key because only Course-idis needed to determine
their values.
The new relation Student-courses still suffers from all three anomalies for the
following reasons:
1.     The relation contains redundant data (Note Database_Concepts as
the course
description for IS380 appears in more than one place).
2.     The relation contains information about two entities Student and
course.
Following is the detail description of the anomalies that relation Student-
courses suffers from.
1.     Insertion anomaly: We cannot add a new course such as IS247 with
course description programming techniques to the database unless we add
a student who to take the course.
2.     Update anomaly: If we change the course description for IS380 from
Database Concepts to New_Database_Concepts we have to make changes
in more than one place or else the database will be inconsistent. In other
words in some places the course description will be
New_Database_Concepts and in any place were we forgot to make the
changes the description still will be Database_Concepts.
3.     Deletion anomaly: If student Russell is deleted from the database we
also loose information that we had on course IS417 with description
System_Analysis.
The above discussion indicates that having a single table Student-courses for
our database causing problems (anomalies). Therefore we break the table to
smaller table to get a higher normal form relation. Before doing that let us
define the second normal form.
Second normal relation: A first normal form relation is in second normal form
if all its non-primary attributes are fully functionally dependent on the primary
key.
Note that primary attributes are those attributes, which are parts of the primary
key, and non-primary attributes do not participate in the primary key.
In Student-courses relation both Sid and Course-id are primary attributes
because they are components of the primary key. However attributes Sname,
Phone, Course-description, Credit-hours and Grade all are non primary
attributes because non of them is a component of the primary key.
To convert Student-courses to second normal relations we have to make all
non-primary attributes to be fully functionally dependent on the primary key. To
do that we need to project (that is we break it down to two or more
relations) Student-coursestable into two or more tables. However projections
may cause problems. To avoid such problems it is important to keep attributes,
which are dependent on each other in the same table, when a relation is
projected to smaller relations. Following this principle and examination of
Figure-1 indicate that we should divide Student-courses relation into following
three relations:
PROJECT Student-courses ON (Sid, Sname, Phone) creates a table call
it Student. The relation Student will be Student (Sid:pk, Sname, Phone) and
PROJECT Student-courses ON (Sid, Course-id, Grade) creates a table call
it Student-grade. The relation Student-grade will be
Student-grade (Sid:pk1:fk:Student, Course-id::pk2:fk:Courses, Grade) and
Projects Student-courses ON (Course-id, Course-Description, Credit-hours)
create a table call it Courses. Following are these three relations and their
contents:
 
Student (Sid:pk, Sname, Phone)
Sid Sname Phone
100 John 487 2454
200 Smith 671 8120
300 Russell 871 2356
 
Courses (Course-id::pk, Course-Description)
Course-id Course-description Credit-hours
IS380 Database Concepts 3
IS416 Unix Operating System 3
IS420 Data Net Work 3
IS417 System Analysis 3
 
 
 
Student-grade (Sid:pk1:fk:Student, Course-id::pk2:fk:Courses, Grade)
 
Sid Course-id Grade
100 IS380 A
100 IS416 B
200 IS380 B
200 IS416 B
200 IS420 C
300 IS417 A
 
All these three relations are in second normal form. Examination of these
relations shows that we have eliminated the redundancy in the database. Now
relation Student contains information only related to the entity student,
relation Courses contains information related to entity Courses only, and the
relation Student-grade contains information related to the relationship between
these two entity.
Further these three sets are free from all anomalies. Let us clarify this in more
detail.
Insertion anomaly: Now a new Course with course-id IS247 and Course-
description can be inserted to the table Course. Equally we can add any new
students to the database by adding their id, name and phone to Student table.
Therefore our database, which made up of these three tables does not suffer
from insertion anomaly.
Update anomaly: Since redundancy of the data was eliminated no update
anomaly can occur. To change the course-description for IS380 only one change
is needed in table Courses.
Deletion anomaly: the deletion of student Russell from the database is
achieved by deleting Russell's records from both Studentand Student-
grade relations and this does not have any side effect because the course IS417
untouched in the table Courses.
Third Normal Form: A second normal form relation is in third normal form if
all non-primary attributes (that is attributes that are not parts of the primary key
or of any candidate key) have non-transitivity dependency on the primary key.
Assume the relation:
 
STUDENT (Sid: pk, Activity, fee)
Further Activity ------------> fee that is the Activity determine the fee
 
Sid Activity Fee
100 Swimming 100
200 Tennis 100
300 Golf 300
400 Swimming 100
 
Table STUDENT is in first normal form because all its attributes are simple.
Also STUDENT is in second normal form because all its non-primary attributes
are fully functionally dependent on the primary key (Sid). Notice that a first
normal relation with non-composite (that is simple) primary key automatically
will be in second normal form because all its non-primary attributes will be
fully functionally dependent on the primary key.
Table STUDENT suffers from all 3 anomalies; a new student can not be added
to the database unless he/she takes an activity and no activity can be inserted
into the database unless we get a student to take that activity. There is
redundancy in the table (see Swimming), therefore to change the fee for
Swimming we must make changes in more than one place and that will cause
update anomaly problem. If student 300 is deleted from the table we also loose
the fact that we had Golf activity with its fee to be 300. To overcome these
anomalies STUDENT table should be converted to smaller tables. Consider the
following three projection of the STUDENT relation:
PROJECT STUDENT on [Sid, Activity] and we get a relation name it
STUD-AVT (Sid:pk, Activity) with the following data :
 
STUD_ACT
 
Sid Activity
100 Swimming
200 Tennis
300 Golf
400 Swimming
 
PROJECT STUDENT on [Activity, Fee] and we get a relation name AVT-Fee
(Activity:pk, Fee) with the following data :
 
AVT-Fee
 
Activity Fee
Swimming 100
Tennis 100
Golf 300
Swimming 100
 
PROJECT STUDENT on [Sid, Fee] and we get a relation name
Sid-Fee (Sid:pk, Fee) with the following data :
 
Sid-Fee
 
Sid Fee
100 100
200 100
300 300
400 100
 
The question is which pairs of these projections should we choose? The answer
to that is to choose the pair STUD-AVT and AVT-Fee because the join of these
two projections produces the original STUDENT table. Such projections are
called non-loss projections. Therefore the join of STUD-AVT and AVT-Fee on
the common attribute Activity recreate the original STUDENT table.On the
other hand as shown below the join of projections Sid-Fee and AVT-Fee on
their common attribute Sid generates erroneous data that were not in the original
STUDENT table and such projections are called loss projections. Following is
the join of projections Sid-Fee and AVT-Fee on their common attribute Sid
 
Sid Activity Fee
100 Swimming 100
100 Tennis 100
200 Tennis 100
200 Swimming 100
300 Golf 300
400 Swimming 100
400 Tennis 100
The three rows marked in red color were not in the original STUDENT table.
Thus we have an erroneous data in the database.
Both projections STUD-AVT and AVT-Fee are in third normal form and they
do not suffer from any anomalies.
 
Boyce Codd normal (BOC): A relation is in BOC form if every determinant is a
candidate key. This is an improved form of third normal form.
 
Fourth Normal Form: A Boyce Codd normal form relation is in fourth normal
form if there is no multi value dependency in the relation or there are multi
value dependency but the attributes, which are multi value dependent on a
specific attribute, are dependent between themselves. This is best discussed
through mathematical notation. Assume the following relation
 
 R(a:pk1, b:pk2, c:pk3)
 
Recall that a relation is in BOC normal form if all its determinant are
candidate keys, in other words each determinant can be used as a
primary key. Because relation R has only one determinant (a, b, c),,
which is the composite primary, key and since the primary is a
candidate key therefore R is in BOC normal form.
 
Now R may or may not be in fourth normal form.
 
1. If R contains no multi value dependency then R will be in Fourth
normal form.
 
2. Assume R has the following two-multi value dependencies:
 
a ------->> b        and              a ---------->> c
 
In this case R will be in the fourth normal form if b and c
dependent on each other. However if b andc are
independent of each other then R is not in fourth normal
form and the relation has to be projected to following two
non-loss projections. These non-loss projections will be in
fourth normal form.
 
 
Example:
 
Case 1:
Assume the following relation:
Employee (Eid:pk1, Language:pk2, Skill:pk3)
 
 No multi value dependency, therefore R is in fourth normal form.
 
case 2:
Assume the following relation with multi-value dependency:
 
Employee (Eid:pk1, Languages:pk2, Skills:pk3)
Eid ---->> Languages               Eid ------>> Skills
Languages and Skills are dependent.
This says an employee speak several languages and has several skills.
However for each skill  a specific language is used when that skill is
practiced.
 
Eid Language Skill
100 English Teaching
100 Kurdish Politic
100 French cooking
200 English cooking
200 Arabic Singing
 
Thus employee 100 when he/she teaches speaks English but when he cooks
speaks French. This relation is in fourth normal form and does not
suffer from any anomalies.
case 3:
Assume the following relation with multi-value dependency:
 
Employee (Eid:pk1, Languages:pk2, Skills:pk3)
Eid ---->> Languages               Eid ------>> Skills
Languages and Skills are independent.
 
Eid Language Skill
100 English Teaching
100 Kurdish Politic
100 English politic
100 Kurdish Teaching
200 Arabic Singing
     
 
 
This relation is not in fourth normal form and suffers from all three types
of anomalies.
Insertion anomaly: To insert row (200 English Cooking) we have to insert
two extra rows (200 Arabic cooking), and (200 English Singing) otherwise
the database will be inconsistent. Note the table will be as follow:
 
Eid Language Skill
100 English Teaching
100 Kurdish Politic
100 English politic
100 Kurdish Teaching
200 Arabic Singing
200 English Cooking
200 Arabic Cooking
200 English Singing
 
Deletion anomaly: If employee 100 discontinue politic skill we have to
delete two rows
(100 Kurdish Politic),  and (100 English Politic) otherwise the database will
be inconsistent.
 
Update anomaly: If employee 200 changes his skill from singing to dancing
we have to make changes in more than one place
 
The relation is projected to the following two non-loss projections which
are in forth normal form
 
Emplyee_Language(Eid:pk1, Languages:pk2)
 
Eid Language
100 English
100 Kurdish
200 Arabic
 
Emplyee_Language(Eid:pk1, Skills:pk2)
 
Eid Skill
100 Teaching
100 Politic
200 Singing
 
 

You might also like