Functional Dependency
Functional Dependency
Sname
John
Smith
Russell
Phone
487 2454
671 8120
871 2356
Courses-taken
St-100-courses-taken
St-200-courses-taken
St-300-courses-taken
St-100-Course-taken
Course-id Course-description
IS380
Database Concepts
Credit-hours
3
Grade
A
IS416
St-200-Course-taken
Course-id
IS380
IS416
IS420
Course-description
Database Concepts
Unix Operating System
Data Net Work
Credit-hours
3
3
3
Grade
B
B
C
Credit-hours
3
Grade
A
St-300-Course-taken
Course-id
IS417
Course-description
System Analysis
To convert the above structure to first normal form relations, all non-simple
attributes must be removed or converted to simple attribute. To do that a new
relation is created by combining each row of Student-courses with all rows of its
corresponding course table that was taken by that specific student. Following is
Student-courses table in first normal form.
Student-courses ( Sid:pk1, Sname, Phone, Course-id:pk2,
Course-description, Credit-hours, Grade)
Notice that the primary key of this table is a composite key made up of two parts;
Sid and Course-id. Note that pk1 following an attribute indicates that the attribute
is the first part of the primary key and pk2 indicates that the attribute is the second
part of the primary key.
Student-courses
Sid Sname Phone
Course-description
100 John
Courseid
IS380
100
IS416
200
200
200
300
487
2454
John
487
2454
Smith 671
8120
Smith 671
8120
Smith 671
8120
Russell 871
2356
Credithours
3
Grade
Unix Operating
System
Database Concepts
IS420
Unix Operating
System
Data Net Work
IS417
System Analysis
IS380
IS416
Database Concepts
Examination of the above Student-courses relation reveals that Sid does not
uniquely identify a row (tuple) in the relation hence cannot be the primary key. For
the same reason Course-id cannot be the primary key. However the combination
of Sid and Course-id uniquely identifies a row in Student-courses, Therefore
(Sid, Course-id) is the primary key of the above relation.
The primary key determines every attribute. For example if you know both Sid and
Course-id for any student you will be able to retrieve Sname, Phone, Coursedescription, Credit-hours and Grade, because these attributes are dependent on the
primary key. Figure 1 below is the graphical representation of the functional
dependency between the primary key and attributes of the above relation.
Note that the attribute to the right of the arrow is functionally dependent on the
attribute in the left of the arrow. Thus the combination (Sid, Course-id) is the
determinant (that determines other attributes) and attributes Sname, Phone, Coursedescription, Credit-hours and Grade are dependent attributes.
Formally speaking a determinant is an attribute or a group of attributes determine
the value of other attributes. In addition to the (Sid, Course-id) there are two other
determinants in the above Student-courses relation. These are; Sid and Course-id
attributes. Note that Sid alone determines both Sname and Phone, and attribute
Course-id alone determines both Credit-hours and Course_description attributes.
Attribute Grade is fully functionally dependent on the primary key (Sid, Courseid) because both parts of the primary keys are needed to determine Grade. On
the other hand both Sname, and Phone attributes are not fully functionally
dependent on the primary key, because only a part of the primary key namely
Sid is needed to determine both Sname and Phone. Also attributes Credit-hours
and Course-Description are not fully functionally dependent on the primary key
because only Course-id is needed to determine their values.
The new relation Student-courses still suffers from all three anomalies for the
following reasons:
1. The relation contains redundant data (Note Database_Concepts as the
course
description for IS380 appears in more than one place).
2. The relation contains information about two entities Student and
course.
Following is the detail description of the anomalies that relation Student-courses
suffers from.
1. Insertion anomaly: We cannot add a new course such as IS247 with course
description programming techniques to the database unless we add a student
who to take the course.
2. Update anomaly: If we change the course description for IS380 from
Database Concepts to New_Database_Concepts we have to make changes in
more than one place or else the database will be inconsistent. In other words
in some places the course description will be New_Database_Concepts and
in any place were we forgot to make the changes the description still will be
Database_Concepts.
3. Deletion anomaly: If student Russell is deleted from the database we also
loose information that we had on course IS417 with description
System_Analysis.
The above discussion indicates that having a single table Student-courses for our
database causing problems (anomalies). Therefore we break the table to smaller table
to get a higher normal form relation. Before doing that let us define the second
normal form.
Second normal relation: A first normal form relation is in second normal form if
all its non-primary attributes are fully functionally dependent on the primary key.
Note that primary attributes are those attributes, which are parts of the primary key,
and non-primary attributes do not participate in the primary key. In Student-courses
relation both Sid and Course-id are primary attributes because they are components
of the primary key. However attributes Sname, Phone, Course-description, Credithours and Grade all are non primary attributes because non of them is a component
of the primary key.
To convert Student-courses to second normal relations we have to make all nonprimary attributes to be fully functionally dependent on the primary key. To do that
we need to project (that is we break it down to two or more relations) Studentcourses table into two or more tables. However projections may cause problems. To
avoid such problems it is important to keep attributes, which are dependent on each
other in the same table, when a relation is projected to smaller relations. Following
this principle and examination of Figure-1 indicate that we should divide Studentcourses relation into following three relations:
PROJECT Student-courses ON (Sid, Sname, Phone) creates a table call it Student.
The relation Student will be Student (Sid:pk, Sname, Phone) and
PROJECT Student-courses ON (Sid, Course-id, Grade) creates a table call it
Student-grade. The relation Student-grade will be
Student-grade (Sid:pk1:fk:Student, Course-id::pk2:fk:Courses, Grade) and
Projects Student-courses ON (Course-id, Course-Description, Credit-hours) create
a table call it Courses. Following are these three relations and their contents:
Sname
John
Smith
Russell
Phone
487 2454
671 8120
871 2356
Course-description
Database Concepts
Unix Operating System
Data Net Work
System Analysis
Credit-hours
3
3
3
3
Sid
100
100
200
200
200
300
Course-id
IS380
IS416
IS380
IS416
IS420
IS417
Grade
A
B
B
B
C
A
All these three relations are in second normal form. Examination of these relations
shows that we have eliminated the redundancy in the database. Now relation
Student contains information only related to the entity student, relation Courses
contains information related to entity Courses only, and the relation Student-grade
contains information related to the relationship between these two entity.
Further these three sets are free from all anomalies. Let us clarify this in more detail.
Insertion anomaly: Now a new Course with course-id IS247 and Coursedescription can be inserted to the table Course. Equally we can add any new students
to the database by adding their id, name and phone to Student table. Therefore our
database, which made up of these three tables does not suffer from insertion
anomaly.
Update anomaly: Since redundancy of the data was eliminated no update anomaly
can occur. To change the course-description for IS380 only one change is needed in
table Courses.
Deletion anomaly: the deletion of student Russell from the database is achieved by
deleting Russell's records from both Student and Student-grade relations and this
does not have any side effect because the course IS417 untouched in the table
Courses.
Adrienne Watt
A functional dependency is a relationship between two attributes. Typically between the PK and
other non-key attributes with in the table. For any relation R, attribute Y is functionally
dependent on attribute X (usually the PK), if for every valid instance of X, that value of X
uniquely determines the value of Y.
X >
The left-hand side of the FD is called the determinant, and the right-hand side is the dependent.
Examples:
SIN -> Name, Address, Birthdate
SIN determines names and address and birthdays. Given SIN, we can determine any of the other
attributes within the table.
Sin, Course >
DateCompleted
Sin and Course determine date completed. This must also work for a composite PK.
ISBN > Title
ISBN determines title.
A B,
Since the values of A are unique, it follows from the FD definition that:
A C,
A D,
A E
B E, C E,
D E
A E, B E
AB E
Other observations:
When looking at the data, it makes a lot more sense in terms of which attributes are dependent
and which are determinants.
Inference Rules
Armstrongs axioms are a set of axioms (or, more precisely, inference rules) used to infer all the
functional dependencies on a relational database. They were developed by William W.
Armstrong.
Let R(U) be a relation scheme over the set of attributes U. We will use the letters X, Y, Z to
represent any subset of and, for short, the union of two sets of attributes and by instead of the
usual X U Y.
Source: http://en.wikipedia.org/wiki/Armstrong%27s_axioms
Axiom of reflexivity
Studentno, course > studentName, address, city, prov, pc, grade, dateCompleted
This situation is not desirable, because every non key attribute has to be fully dependent on the
PK. In this situation Student information is only partially dependent on the PK; StudentNo.
To fix this problem, we need to break down the table into two as follows:
StudentNo, course grade, dateCompleted
StudentNo studentName, address, city, prov, pc
Axiom of transitivity
This situation is not desirable, because a non key attributes depends on another non key attribute.
To fix this problem, we need to break this table into two; one to hold information about the
student and the other to hold information about the program. However we still need to leave a
FK in the student table, so that we can determine which program the Student is enrolled in.
StudentNo > studentName, address, city, prov, pc, ProgramID
ProgramID > ProgramName
Additional rules
Union
Decomposition
Dependency Diagram
A dependency diagram illustrates the various dependencies that may exist in a non
normalized table. The following dependencies are identified:
ProjectNo, and EmpNo combined is the PK.
Partial Dependencies:
ProjectNo > ProjName
EmpNo > EmpName, DeptNo, HrsWork
Transitive Dependency:
DeptNo > DeptName
Objectives of Normalization
Develop a good description of the data, its relationships and constraints
Produce a stable set of relations that
Normal Forms
Each is contained within the previous form each has stricter rules than the previous form
Redundancy
Dependencies between attributes within a relation cause redundancy
Ex. All addresses in the same town have the same zip code
SSN
1234
2345
3456
Name
Joe
Mary
Tom
Town
Zip
Huntingdon 16652
Huntingdon 16652
Huntingdon 16652
5948
Harry
Alexandria 16603
A person entity with multiple hobbies yields multiple rows in table Person
Hence, the association between Name and Address for the same person is stored
redundantly
SSN is key of entity set, but (SSN, Hobbies) is key of corresponding relation below
The relation Person cant describe people without hobbies
but more important is the replication of what would be the key value
SSN Name
Address
1111 Joe
123 Main
1111 Joe
123 Main
2222 Mary
321 Elm
Hobbies
hiking
biking
lacross
Anomalies
An anomaly is an inconsistent, incomplete, or contradictory state of the database
Insertion anomaly user is unable to insert a new record of data when it should be
possible to do so because not all other information is available.
Deletion anomaly when a record is deleted, other information that is tied to it is also
deleted
Update anomaly a record is updated, but other appearances of the same items are not
updated
Decomposition
Solution: use two relations to store Person information
Person1 (SSN, Name, Address)
Hobbies (SSN, Hobby)
The decomposition is more general: people with hobbies can now be described
No update anomalies:
Name and address stored once
A hobby can be separately supplied or deleted
Decomposition is the process of breaking a relation into two or more relations to eliminate the
redundancies and corresponding anomalies.
Normalization Theory
If nulls are likely (non-applicable) then consider decomposition of the relation into two or more
relations that hold only the non-null valued tuples.
Too much decomposition of relations into smaller ones may also lose information or generate
erroneous information
Be sure that relations can be logically joined using natural join and the result doesn't
generate relationships that don't exist
Functional Dependencies
FD's are constraints on well-formed relations and represent a formalism on the infrastructure of
relation.
Definition: A functional dependency (FD) on a relation schema R is a constraint X Y, where
X and Y are subsets of attributes of R.
Definition: an FD is a relationship between an attribute "Y" and a determinant (1 or more other
attributes) "X" such that for a given value of a determinant the value of the attribute is uniquely
defined.
X is a determinant
X determines Y
Y is functionally dependent on X
XY
X Y is trivial if Y X
ZipCodeAddressCity
ArtistNameBirthYear
Author, TitlePublDate
{A}{A}
{A,B} {A}
{A,B} {B}
{A,B} {A,B}
are all trivial FDs and will not contribute to the evaluation of normalization.
FD Axioms
Understanding: Functional Dependencies are recognized by analysis of the real world; no
automation or algorithm. Finding or recognizing them are the database designer's task.
FD manipulations:
Axiom
Example
SSN,Name SSN
Decomposition or
Projectivity*
Pseudotransitivity*
(NOTE)
Closure
Find all FD's for attributes a in a relation R
a+ denotes the set of attributes that are functionally determined by a
IF attribute(s) a IS/ARE A SUPERKEY OF R THEN a+ SHOULD BE THE WHOLE
RELATION R. This is our goal. Any attributes in a relation not part of the closure
indicates a problem with the design.
Algorithm for Closure
Anyone have any idea where should I look to? I am having difficulty
differentiating whether a FD is in 1NF, 2NF or 3NF?
I've been reading Wikipedia and used Google search to find good research, but
can't find any that explains it in simple terms.
Maybe you all can share on how you learned FD's and normalization during
your life as well.
database normalization functional-dependencies
edited Feb 3 '15 at 10:37 asked Nov 16 '10 at 21:57
5 Answers
active oldest votes
up vote 17
down vote
accepted
Functional Dependencies
Functional Dependencies and Normalization for Relational Databases
The Relational Data Model: Functional-Dependency Theory
The following are academic papers. Heavier reading but well worth the effort.
If you are seriously interested in this subject I suggest you put out the cash for a
good book on the subject of Relational Database Design. For example: An
Introduction to Database Systems by C.J. Date
edited Feb 4 '11 at 9:31 answered Nov 18 '10 at 17:19
NealB
50.5k74890
12.4k22149
Thank you for all the explanations, I am just trying to get a full understanding of this
for my exam aherlambang Nov 18 '10 at 18:03
4NF involves multi-valued dependencies (MVDs) & 5NF involves join dependencies
(JDs). But a binary JD corresponds to a pair of MVDs (which come in pairs anyway).
So "4th & 5th NFs involve" JDs (not MVDs). philipxy Jul 27 '15 at 2:42
add a comment
sqlvogel
15.4k11844
do you have any free resources that you can point to me? aherlambang Nov 17 '10
at 1:22
add a comment
We can understand functional dependency in a way that assume we have two attribute
and one attribute in totally dependent on the other is called functional dependency.
up vote
2 down
vote
Say take a real life example. We know that everybody has a social security number
against the name of the person . Say Frank is a person and we want to know the social
security number of this person but database will be unable to help out with this
information because there may be many persons named frank but we can determine
name of person against social security number so name of the person is totally
functional dependent on Social security number.
answered Jun 21 '12 at 6:29
SBTec
50448
add a comment
SSN
| Name
| Date of birth | Address
| Phone
number
----------------------------------------------------------------------123-98-1234 | Cindy Cry
| 15-05-1983
| Los Angeles | 123-4567891
121-45-6145 | John O'Neill | 30-01-1980
| Paris
| 568-9742562
658-78-2369 | John Lannoy | 30-01-1980
| Dallas
| 963-2587413
Here, the value in the column SSN (Social Security Number) determines the values in
columns name, date of birth, address and phone number. This means that if we had two
rows with the same value in the SSN column, then values in columns name, date of
birth, address and phone number would be equal. A person with SSN 123-98-1234 is
always called Cindy Cry, is born on 15-05-1983, and so on. A situation like this is
called functional dependency.
The notion of functional dependencies is used to define second, and third normal form,
and the Boyce-Codd normal form (BCNF).
To read more about functional dependencies and normalization you can go to then
well-known academic books like Introduction to Databases by C.J. Date, or any of the
books by the H. Garcia-Molina, J.Ullman, J.Widom trio.
If you want a less formal approach, we're starting a series of posts on data
normalization on our company blog.
answered Mar 7 '14 at 11:06
Agnieszka
1464
add a comment
up vote
1 down values of Y.
vote