4 DBMS Module-IV
4 DBMS Module-IV
Module -4
Functional Dependencies & Normalization for Relational
Databases
Functional Dependence:
- The attribute B is functionally dependent on A if A determines B. “A determines B”
indicates that knowing the value of attribute A means that we can look up (determine)
the value of attribute B. In this case, attribute A is the called the determinant.
- For example, knowing the PROD_CODE in the PRODUCT table in Figure 1 means
we can look up the product‟s description, product price and so on. The notation we use
for “A determines B” is:
AB if A determines B
A B, C, D if A determines B, C & D
PROD_CODE PROD_DESCRIPT
Eg:
Guideline 1. Design a relation schema so that it is easy to explain its meaning. Do not combine
attributes from multiple entity types and relationship types into a single relation. Intuitively, if a
relation schema corresponds to one entity type or one relationship type, it is straightforward to
explain its meaning. Otherwise, if the relation corresponds to a mixture of multiple entities and
relationships, semantic ambiguities will result and the relation cannot be easily explained.
2. Reducing the Redundant Value in Tuples
Mixing attributes of multiple entities may cause problems Information is stored
redundantly wasting storage Problems with update anomalies
Insertion anomalies
Deletion anomalies
Modification anomalies
Eg:
If we integrate these two and is used as a single table i.e Student Table
Here whenever if we insert the tuples there may be „N‟ students in one department, so
Dept No,Dept Name values are repeated „N‟ times which leads to data redundancy.
Another problem is updata anamolies ie if we insert new dept that has no students.
If we delet the last student of a dept,then whole information about that department will be
deleted
If we change the value of one of the attributes of aparticaular table the we must update the
tuples of all the students belonging to thet depy else Database will become inconsistent.
Guideline 2. Design the base relation schemas so that no insertion, deletion, or odification
anomalies are present in the relations. If any anomalies are present, note them clearly and make
sure that the programs that update the database will
Attributes that are NULL frequently could be placed in separate relations (with the
primary key)
Bad designs for a relational database may result in erroneous results for certain JOIN operations
The "lossless join" property is used to guarantee meaningful results for join
operations
Note: The relations should be designed to satisfy the lossless join condition. No spurious
tuples should be generated by doing a natural-join of any relations.
Guideline 4. Design relation schemas so that they can be joined with equality conditions on
attributes that are appropriately related (primary key, foreign key) pairs in a way that guarantees
that no spurious tuples are generated. Avoid relations that contain matching attributes that are
not (foreign key, primary key) combinations because joining on such attributes may produce
spurious tuples.
- Normalization
- Normalization is a technique used to design table structures in which data redundancies
are minimized/eliminated.
- Normalization works through a series of stages called normal forms. The first three stages
are described as first normal form (1NF), second normal form (2NF), and third normal
form (3NF).
- From a structural point of view, 2NF is better than 1NF, and 3NF is better than 2NF.
- For most business database design purposes, 3NF is as high as we need to go in the
normalization process.
- You should not assume that the highest level of normalization is always the most
desirable. Generally, the higher the normal form, the more joins are required to produce a
specified output and the more slowly the database responds to end user demands.
Therefore, you will occasionally be expected to denormalize some table structures to
avoid performance degradation.
- Denormalization is the reverse process of normalization and it produces a lower normal
form; that is, 3NF can be denormalized to 2NF and 2NF can be denormalized to 1NF.
- The table in Figure 2 contains information about projects and employees working on
these projects. This table does not conform to the Relational Database Model we learned
in Chapter 5 for the following reasons:
3. All attributes are dependent on the primary key. Note that the primary key
determines any other attribute in the table.
- Note that all tables conforming to the Relational Model we learned in Chapter 5 are in
1NF by default.
- Figure 4 below displays the conversion of the table in Figure 2 to 1NF by satisfying the above
three conditions.
- Note that the table in Figure 4 satisfies the three conditions. The first condition is satisfied
because the table has a composite primary key consisting of the key attributes PROJ_NUM
and EMP_NUM. The second condition is satisfied because the table no longer contains
repeating groups. The third condition is satisfied because any attribute is dependent on the
primary key. For example, EMP_NAME is dependent on the primary key PROJ_NUM,
EMP_NUM because if we know PROJ_NUM and EMP_NUM (15 and 103 for example), we
can determine EMP_NAME (J. Arbough).
- Dependencies between attributes for the table in Figure 4 can be identified with the help of
the dependency diagram as shown in Figure 5 below:
3. The arrows below the dependency diagram indicate less desirable dependencies:
- A table is said to be in second normal form (2NF) if it satisfies the following conditions:
1. It is in 1NF.
2. It includes no partial dependencies; that is, no attribute is dependent on a portion of the
primary key.
- Let us convert the table in Figure 4 from 1NF to 2NF. To do that, follow these steps:
1. Write each key attribute (recall from Chapter 2 that a key attribute is an attribute
that is part of the key) on a separate line, then write the primary key on the last
line:
PROJ_NUM
EMP_NUM
PROJ_NUM EMP_NUM
The attributes on each line will become keys in a new table. In other words, the
original table is now split into three tables. We‟ll call these tables PROJECT,
EMPLOYEE and ASSIGN respectively.
2. The attribute(s) on each line will form the primary key for the new tables. The
rest of the attributes for each table can be determined from the dependency
diagram:
- Note that the new tables are in 2NF because each table is in 1NF and none of the tables
contain partial dependencies. The dependency diagram for the new tables is shown in Figure
6:
- Note that the conversion to 2NF did not eliminate the transitive dependency (the conversion
to 3NF will eliminate it). This dependency causes data redundancy if multiple employees
have the same JOB_CLASS and it will therefore create the data anomalies we learned in
Chapter 1.
- Because a partial dependency applies only to tables with composite primary keys, a table
whose primary key consists of only a single attribute must automatically be in 2NF if it is in
1NF.
- To eliminate the transitive dependency, the table must be converted to 3NF. A table is said to
be in 3NF if it satisfies the following two conditions:
1. It it is in 2NF.
2. It contains no transitive dependencies.
- Note that in Figure 6, the EMPLOYEE table is the one that contains transitive dependency:
- To eliminate the transitive dependency, simply move the attributes causing data redundancy
(namely JOB_CLASS and CHG_HOUR) into a new table called JOB. Then in the
EMPLOYEE table, define a foreign key that will be used to link it to the JOB table. In this
case, the foreign key will be JOB_CLASS. The result of the conversion to 3NF will be the
following tables:
- Note that the conversion to 3NF has eliminated the original EMPLOYEE table‟s transitive
dependency; the tables are now said to be in third normal form (3NF).
A B C D
A, B C, D
CB
- Note that the table structure in Figure 7 has no partial dependencies nor does it contain
transitive dependencies (Note that C B indicates that a nonkey attribute determines a
key attribute and this dependency is neither partial nor transitive).
- Therefore, the table structure in Figure 7 is in 3NF but not in BCNF because of the
dependency C B.
- To convert the table structure in Figure 7 to BCNF, follow the following procedure:
A C D
C B
- The table structure above is in 3NF but not BCNF because CLASS_CODE is a
determinant but not a superkey key. Why? To convert this table structure, we apply the
above procedure by creating two tables. The first table will contain STU_ID,
CLASS_CODE and ENROLL_GRADE (with STU_ID and CLASS_CODE as the primary
key). The second table will contain the attributes CLASS_CODE and STAFF_ID with
CLASS_CODE as the primary key. The result of the conversion to BCNF is shown in
Figure 10 below.
CLASS_CODE STAFF_ID