0% found this document useful (0 votes)
9 views24 pages

unit-4-DBMS

The document discusses schema refinement and normalization in databases, emphasizing the importance of eliminating data redundancy and anomalies such as update, insertion, and deletion anomalies. It explains various types of functional dependencies, decomposition methods, and the significance of lossless decomposition in maintaining data integrity. Additionally, it outlines the criteria for achieving different normal forms (1NF, 2NF, 3NF) to ensure efficient database design.

Uploaded by

studiossteam10
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views24 pages

unit-4-DBMS

The document discusses schema refinement and normalization in databases, emphasizing the importance of eliminating data redundancy and anomalies such as update, insertion, and deletion anomalies. It explains various types of functional dependencies, decomposition methods, and the significance of lossless decomposition in maintaining data integrity. Additionally, it outlines the criteria for achieving different normal forms (1NF, 2NF, 3NF) to ensure efficient database design.

Uploaded by

studiossteam10
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 24

UNIT- IV

1) INTRODUCTION TO SCHEMA REFINEMENT OR NORMALIZATION

 Normalization or Schema Refinement is a technique of organizing the data in


the database. It is a systematic approach of decomposing tables to eliminate data redundancy
and undesirable characteristics like Insertion, Update and Deletion Anomalies.
 The Schema Refinement refers to refine the schema by using some technique. The best
technique of schema refinement is decomposition.
 Decomposition means the process of breaking down the relation into smaller relations
is called decomposition.
 Identifying and clearing the future problems in the database is called schema refinement.
In this refinement main problem is data redundancy. It is avoided by
normalization
technique.
 The Basic Goal of Normalization is used to eliminate redundancy.
 Redundancy refers to repetition of same data or duplicate copies of same data stored in
different locations.

Problems Caused by redundancy


Storing the same information redundantly, can lead to several problems
 Update Anomalies: If one copy of such repeated data is updated, an inconsistency is created
unless all copies are similarly updated.
 Insertion Anomalies: It may not be possible to store certain information unless some other,
unrelated information is stored as well.
 Deletion anomalies: It may not be possible to delete certain information without losing
some other, unrelated, information as well.

Problems in Updating / Update Anomaly

1. Update Anomaly: If there is an update in the fee from 5000 to 7000, then we have to
update the FEE column in all the rows; otherwise, the data will become inconsistent.
Insertion Anomalies:
When a new course, say C4, is introduced, but no student is enrolled in the C4 subject. Because of
the insertion of some data, it is forced to insert some other dummy data.

Deletion Anomaly:
Deletion of the S3 student causes the deletion of the course. Because of the deletion of some
data, it forces the deletion of some other useful data.
Solutions to Anomalies:

DECOMPOSITION OF TABLES
Purpose of Normalization:

⇒ Minimize redundancy in data.

⇒ Remove insert, update, and delete anomalies during database activities.

Advantages of Normalization:

1. The amount of unnecessary redundant data is reduced.

2. Data integrity is easily maintained within the database.

3. Security is easier to maintain or manage.

Disadvantages of Normalization:

1. Normalization produces many tables with a relatively small number of columns. These
columns then have to be joined using their primary/foreign key relationship.

2. This has two disadvantages:

⇒ Reduce the need to organize the data when it is modified or enhanced.

⇒ Performance: All the joins required to merge data slow down processing and place
additional stress on your hardware.

⇒ Complex Queries: Developers have to code complex queries to merge data from different
tables.

2) Functional Dependency

Functional dependency in DBMS is an important concept that describes the relationship


between attributes (columns) in a table. It shows that the value of one attribute determines the
other. It is represented by an arrow sign (→).
In other words, a dependency FD: X→Y means that the values of Y are determined by the
values of X. Two tuples sharing the same values of X will necessarily have the same values
of Y. An attribute on the left-hand side is known as the "Determinant". Here, X is a
Determinant.

Example: R (A, B, C, D) and set of Functional Dependencies are A→B, B→D, C→B then
what is the Closure of A, B, C, D?

Solution: A* is

A* → {A, B, D} i.e., A→B, B→D is exists and C is not FD on A. So it is eliminated.

B* → {B, D} i.e., B→D is exists and A, C is not FD on A. So it is eliminated.

C* → {C, B, D} i.e., C→B, B→D is exists and A is not FD on C. So it is eliminated.

Types of Functional Dependencies

1. Fully Functional Dependency:

A functional dependency is said to be a full dependency if and only if the determinant of the
functional dependency is either a candidate key or a super key, and the dependent can be
either a prime or non-prime attribute.

Explanation: Let’s take the functional dependency X →Y (i.e., X determines Y). Here, Y is
said to be fully dependent if it cannot determine any subset of X.

Example: Consider the following determinant ABC→D. Here, ABC determines D, but D is
not determined by any subset of A, B, C, or AB, BC, AC. So, D is fully functionally
dependent on ABC.
2. Partial Functional Dependency:

If a non-prime attribute of the relation is derived by only a part of the candidate key, then
such a dependency is known as a partial dependency.

Explanation: In a relation having more than one key field, a subset of non-key fields may
depend on all key fields, but another subset or a particular non-key field may depend on only
one of the key fields. Such a dependency is defined as a partial dependency.

Example: Consider the determinants AC→P, A→D, D→P. From these determinants, P is not
fully functionally dependent on AC. If we find A+ (i.e., A’s closure), A→D, D→P, i.e., A→P.
But we don’t have any requirement of C. The C attribute is removed completely. So, P is
partially dependent on AC.

3. Transitive Functional Dependency:

If a non-prime attribute of a relation is derived by either another non-prime attribute or a


combination of part of the candidate key along with a non-prime attribute, then such a
dependency is defined as a transitive dependency. In a relation, there may be a dependency
among non-key fields. Such a dependency is called a transitive functional dependency.

Example: If X→Y and Y→Z, then we can determine that X→Z holds.

4. Trivial Functional Dependency:

It is basically related to the reflexive rule. If X is a set of attributes, and Y is a subset of X,


then X→Y holds.

Example: ABC→BC is a trivial dependency.

5. Multi-Valued Dependency:

Consider three fields X, Y, and Z in a relation. If for each value of X, there is a well-defined
set of values for Y and a well-defined set of values for Z, and the set of values for Y is
independent of the set of values for Z, this dependency is a multi-valued dependency, i.e.,
X→Y and X→Z.

These types of functional dependencies help in identifying and organizing relationships


between attributes in a database, ensuring a more structured and efficient database design.
3) Lossless Join and Dependency Preserving Decomposition
What is Decomposition in DBMS?
Decomposition is a process of dividing a relation into multiple relations to remove
redundancy while maintaining the original data.

Types of Decomposition
There are two types of Decomposition:
 Lossless Decomposition
 Lossy Decomposition

Lossless Decomposition

A lossless decomposition of a relation ensures that:

a) No information is lost during decomposition. This is why the term lossless is


used in this decomposition as no information is lost.
b) If a relation R is divided into two relations R1 and R2 using lossless
decomposition then the natural join of R1 and R2 would return the original
relation R.
Let’s check whether this decomposition is loss-less decomposition or not:
Rule 1:
R1 U R2 = (A, B) U (C, A) = (A, B, C)
Union or R1 and R2 gives the original relations, thus first rule of lossless
decomposition applies here.
Rule 2:
R1 ∩ R2 = (A, B) ∩ (C, A) = (A)
Result is not null so the second rule also applies here.
Rule 3:
R1 ∩ R2 = (A, B) ∩ (C, A) = (A)
Result is a super key of both the relations thus third rule also applies here.
Rule 4: Dependency preserving
The dependencies that exists in the original relation, exists after decomposition.
Example of LossLess decomposition
StudentCourse Table:

Student_Id Student_Name Course_Id Course_Name


S101 Chaitanya C01 Maths
S102 Ajeet C01 Maths
S103 Rahul C02 Science
S104 Steve C02 Science
S105 John C03 English
S101 Chaitanya C03 English
S102 Ajeet C02 Science

The primary key of given relation is {Student_Id, Course_Id}

This table has redundant data as the Course_Id and Course_Detail are common
for several students. Let’s decompose this relation into two relations.

Student Table:

The primary key of this table is {Student_Id, Course_Id}

Student_Id Student_Name Course_Id


S101 Chaitanya C01
S102 Ajeet C01
S103 Rahul C02
S104 Steve C02
S105 John C03
S101 Chaitanya C03
S102 Ajeet C02

Course Table:

The primary key of this table is {Course_Id}


Course_Id Course_Name
C01 Maths
C02 Science
C03 English

Let’s check all the three rules of lossless decomposition to check whether this
decomposition is lossless or not.
Rule 1:

{Student} U {Course}

Union Result:

Student_Id Student_Name Course_Id Course_Detail


S101 Chaitanya C01 Maths
S102 Ajeet C01 Maths
S103 Rahul C02 Science
S104 Steve C02 Science
S105 John C03 English
S101 Chaitanya C03 English
S102 Ajeet C02 Science

The union results in the original relation StudentCourse so we can say that
the first rule holds true.

Rule 2 & 3:

R1 ∩ R2

Result:

Course_Id
C01
C02
C03

The result is not null so rule 2 holds true.

The result is a super key of the second relation R2 so the third rule also applies
here.

Rule 4: Dependencies in original relation:

Student_Id -> {Student_Name}

Course_Id -> {Course_Detail}

These dependencies are still present in the decomposed relations. Thus we can
say that this decomposition is dependency preserving.

Since all the three rules applies here, the decomposition of relation StudentCourse
into Student and Course is a lossless decomposition.

2. Lossy Decomposition

As the name suggests, in lossy decomposition, the information is lost during


decomposition. The three rules that we discussed above would not apply in lossy
decomposition. In lossy decomposition, one or more rules will fail.

Let’s take the same example that we discussed above.

StudentCourse Table:

Student_Id Student_Name Course_Id Course_Name


S101 Chaitanya C01 Maths
S102 Ajeet C01 Maths
S103 Rahul C02 Science
S104 Steve C02 Science
S105 John C03 English
S101 Chaitanya C03 English
S102 Ajeet C02 Science

Now if we divide this relation like this:

Student Table:

The primary key of this table is {Student_Id}


Student_Id Student_Name
S101 Chaitanya
S102 Ajeet
S103 Rahul
S104 Steve
S105 John

Course Table:

The primary key of this table is {Course_Id}

Course_Id Course_Detail
C01 Maths
C02 Science
C03 English

This is a lossy decomposition as the intersection of Student and Course relation


will return null so the second and third rule of lossless decomposition will fail
here.

In this decomposition, the relation of Student and Course is lost, there is no way
to form the original relation from these two relations as the information that
suggests who is attending which course is lost during decomposition.

4) Normal forms

1) First Normal Form (1NF)


There are some criteria to keep the database table in first normal form.
These are as follows -

 A table will be in 1NF, if it contains an atomic value.

 It needs to note that a table attribute is unable to hold several values.


It must only contain an attribute with a single value.

 The multi-value attribute, the composite attribute, and their


combinations are disallowed by the first normal type.

Example: Table STUDENT is not in 1NF because of multi-valued attribute


ST_PHONE.

STUDENT table:

ST_ID ST_NAME ST_PHONE ST_STATE

101 Ram 8877665544, 8989265471 MP

102 Rex 8574783832 Maharashtra

103 Kapil 7390372389, 8589830302 Delhi

The decomposition of the STUDENT table into 1NF has been shown below:

ST_ID ST_NAME ST_PHONE ST_STATE

101 Ram 8877665544 MP

101 Ram 8989265471 MP

102 Rex 8574783832 Maharashtra


103 Kapil 7390372389 Delhi

103 Kapil 8589830302 Delhi

2) Second Normal Form (2NF)

There are several conditions for keeping the second normal shape of the
database table. These are the following -

 In the 2NF row, the table must be 1NF.

 In the second normal type, all non-key attributes depending on the


primary key are completely functional.

Example: let's we assume that teachers' data and the subjects they teaches
can be stored by a school. A teacher can teach more than one subject in a
school.

TEACHER table:

TEACHER_ID SUBJECT TEACHER_AGE

T1 Hindi 45

T1 Science 45

T2 Maths 35

T3 English 28

T3 SST 28
The non-prime attribute TEACHER AGE in the provided table is based on
TEACHER ID, which is the right subset of a candidate key. That is why it
violates the 2NF law.

We split it down into two tables to translate the given table into 2NF:

TEACHER_ID TEACHER_AGE

T1 45

T2 35

T3 28

TEACHER_SUBJECT table:

TEACHER_ID SUBJECT

T1 Hindi

T1 Science

T2 Math

T3 English

T3 SST

3) Third Normal Form (3NF)

 In the 3NF, the table must be 2NF.

 There should no transitive dependency for non-prime attributes


If at least one of the following conditions holds X->Y in any non trivial
function dependency, a Relationship is in 3NF:

 X is a Super key.

 Y is a primary attribute.

If A->B and B->C are two FDs then A->C is called transitive dependency.

The normalization of the relationship of 2NF with 3NF requires the


elimination of transitive dependencies. If there is a transitive dependence,
by putting the attribute(s) in a new relation along with a copy of the
determinant, we delete the transitively dependent attribute(s) from the
relation.

Consider the examples given below.

Example-1:
In relation TEACHER given in following table,

TEACHER_ID TEACHER_NAME TEACHER_ZIP_CODE TEACHER_CITY

101 Rex Z1 Noida

102 Rohan Z2 Boston

103 Puneet Z3 Chicago

104 Jack Z4 Norwich

105 Amit Z5 Bhopal

Super key in the table above:


{TEACHER_ID}, {TEACHER_ID, TEACHER_NAME}, {TEACHER_ID,
TEACHER_NAME, TEACHER_ZIP_CODE}....so on

Candidate key: {TEACHER_ID}

Non-prime attributes: In the given table, all attributes except


TEACHER_ID are non-prime.

Here, TEACHER_STATE & TEACHER_CITY dependent on


TEACHER_ZIP_CODE and TEACHER_ZIP_CODE dependent on
TEACHER_ID. The non-prime attributes (TEACHER_STATE,
TEACHER_CITY) transitively dependent on super key (TEACHER_ID).
It violates the rule of third normal form.

That's why we need to move the TEACHER_CITY and


TEACHER_STATE to the new <TEACHER_ZIP> table, with
TEACHER_ZIP_CODE as a Primary key

TEACHER table:

TEACHER_ID TEACHER_NAME TEACHER_ZIP_CODE

101 Rex Z1

102 Rohan Z2

103 Puneet Z3

104 Jack Z4

105 Amit Z5

TEACHER_ZIP table:
TEACHER_ZIP_CODE TEACHER_CITY

Z1 Noida

Z2 Boston

Z3 Chicago

Z4 Norwich

Z5 Bhopal

4) Boyce Codd normal form (BCNF)

There are several conditions for keeping the Boyce Codd normal
form which forms of the database table. These are the following -

 The modern variant of 3NF is the BCNF. Stricter than 3NF, it is.

 If each functional dependency X ->Y, X is the upper key of the table,


a table is in BCNF.

 The table should be in 3NFF for BCNF.

Example: Let's presume that there is a business in which learners work in


more than one department.

STUDENT table:

ST_ID ST_COUNTRY ST_SPECIALISATION DEPT_TYPE ST_SPECIALISATION_NO

101 India Engineering D101 S1

101 India IT D101 S2


102 UK Architecture D201 S3

102 UK Maths D201 S4

In the above table Functional dependencies are as follows:

1. ST_ID → ST_COUNTRY

2. ST_SPECIALISATION → {DEPT_TYPE,
ST_SPECIALISATION_NO}

Candidate key: {EMP-ID, EMP-DEPT}

The table is not in BCNF because neither ST_SPECIALISATION nor


ST_ID alone are keys.

To convert the given table into BCNF, we decompose it into three tables:

ST_COUNTRY table:

ST_ID ST_COUNTRY

101 India

101 India

ST_SPECIALISATION table:

ST_SPECIALISATION DEPT_TYPE ST_SPECIALISATION_NO

Engineering D101 S1

IT D101 S2
Architecture D201 S3

Maths D201 S4

ST_SPECIALISATION_MAPPING table:

ST_ID ST_SPECIALISATION

D101 S1

D101 S2

D201 S3

D201 S4

Functional dependencies:

1. ST_ID → ST_COUNTRY

2. ST_SPECIALISATION→{DEPT_TYPE,
ST_SPECIALISATION_NO}

Candidate keys:

For the first table: ST_ID


For the second table: ST_SPECIALISATION
For the third table: {ST_ID, ST_SPECIALISATION}

Now, this is in BCNF because left side part of both the functional
dependencies is a key.

5) Fourth normal form (4NF)


There are several conditions for keeping the Fourth normal form which
forms of the database table. These are the following -

 If it is in Boyce Codd Normal Form and has no multi-value


dependence, a table will be in 4NF.

 For dependency A -> B, if multiple values of B exist for a single value


of A, then the table would be a multi-valued dependency.

Example

STUDENT

STU_ID COURSE HOBBY

101 Maths Dancing

101 Physics Singing

102 Chemistry Dancing

103 Science Cricket

104 English Hockey

The given STUDENT table is in 3NF, but the COURSE and HOBBY are
two independent entity. Hence, there is no database between COURSE and
HOBBY.

In the STUDENT table, a student with STU_ID, 21 contains two


courses, Computer and Math and two hobbies, Dancing and Singing. So
there is a Multi-valued dependency on STU_ID, which leads to unnecessary
repetition of data.
So to make the above table into 4NF, we can decompose it into two tables:

STUDENT_COURSE

STU_ID COURSE

101 Maths

101 Physics

102 Chemistry

103 Science

104 English

STUDENT_HOBBY:

STU_ID HOBBY

101 Dancing

101 Singing

102 Dancing

103 Cricket

104 Hockey

6) Fifth normal form (5NF)


There are several conditions for keeping the Fifth normal form which
forms of the database table. These are the following -

 If it is in 4NF and does not have any join dependence, a table is in


5NF and joining should be lossless.

 5NF is fulfilled when in order to prevent duplication, all tables are


broken into as many tables as possible.

 5NF is often referred to as the normal form of Project-join (PJ/NF).

Example:

SUBJECT LECTURER SEMESTER

Physics Mr. Kabir I

Physics Mrs. Hemlata II

Chemistry Mr. Ramkumar I

Chemistry Mr. Amar Singh I

Math Mr. Lal K. II

In the table above for semester 1, Ram takes both computer and math
classes, but for semester 2, he does not take math classes. In this case, all
these fields must be combined to classify valid data.

We add a new Semester as Semester 3 to suppose but do not know about


the topic and who will take that topic so that we leave Lecturer and Subject
as NULL. But together all three columns serve as the primary key, so we
can't leave two more columns blank.
So we can decompose the table above into 3 tables Table 1, Table 2 & Table
3 to transform the table above into 5NF:

Table 1:

SEMESTER SUBJECT

I Physics

II Physics

I Chemistry

II Math

Table 2:

SUBJECT LECTURER

Physics Mr. Kabir

Physics Mrs. Hemlata

Chemistry Mr. Ramkumar

Chemistry Mr. Amar Singh

Math Mr. Lal K.

Table 3:

SEMSTER LECTURER

I Mr. Kabir
II Mrs. Hemlata

I Mr. Ramkumar

I Mr. Amar Singh

II Mr. Lal K.

You might also like