0% found this document useful (0 votes)
0 views

SQL modules

The document discusses relational database design, focusing on generating relation schemas to minimize redundancy through functional dependencies and normalization. It outlines Armstrong's Axioms for functional dependencies, various anomalies in database design, and the normalization process across different normal forms. The document emphasizes the importance of structuring databases to enhance data integrity and reduce redundancy.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
0 views

SQL modules

The document discusses relational database design, focusing on generating relation schemas to minimize redundancy through functional dependencies and normalization. It outlines Armstrong's Axioms for functional dependencies, various anomalies in database design, and the normalization process across different normal forms. The document emphasizes the importance of structuring databases to enhance data integrity and reduce redundancy.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 52

R E L AT I O N A L DATA B A S E D E S I G N

P R E PA R E D B Y :
Sumati Baral
Introduction:

 The goal of relational database design is to generate a set of relation


schemas that allow us to store information without unnecessary
redundancy.

 The approach is to decompose relations into normal forms using


functional dependencies.

⚫ A functional dependency(FD) occurs when one attribute in a


relation uniquely determines another attribute.
⚫ Functional Dependency in DBMS is denoted using an arrow
between two or more attributes such as :FD : A B
⚫ Here, A & B are the attributes present in any relation.
 Diagrammatic representation of FD’s:

⚫ Roll Name
⚫ Roll Mark Mathematical Representation of F D
( LHS RHS )
⚫ Roll Address
Example:
X Y

if AB C

if X YZ
Functional Dependency In DBMS : Armstrong’s Axioms
Axioms in database management systems was introduced by William W. Armstrong in
late 90’s and these axioms play a vital role while implementing the concept of functional
dependency in DBMS for database normalization. There exists six inferences known a s
“Armstrong’s Axioms” which are discussed below.
1.Reflexive : It means, if set “B” is a subset of “A”, then A B.
2.Augmentation : It means, if A B, then AC BC.
3.Transitive : It means, if A B&B C, then A C.
4.Decomposition : It means, if A BC, then A B&A C.
5.Union : It means, if A B&A C, then A BC.
6.Pseudo-Transitivity : It means, if A B and DB C, then DA C.

NOTE : These rules are helpful while dealing with determination of closure of functional
dependency in DBMS and calculation of canonical cover. Both of these topics are
discussed in the next chapter.
Inference Rules for Functional Dependencies (ARMSTRONG AXIOMS)

A. BASIC RULES:

1. Reflexive Rule:
The LHS can determine itself or part of itself.

Ex: FirstName,LastName FirstName


2. Augmentation Rule:
It states that same column or attribute can be added to both LHS and RHS

If A → B and C is a set of attributes then AC→ BC

Ex: if Roll Name

then Roll , Address Name , Address

3. Transitive Rule
It states that if an LHS determine an RHS and further that RHS
determines another RHS then it can be said that the first LHS determine
the last RHS.
If A →B and B → C then A → C

Ex: Rollno → Address and Address → Pincode


Then Rollno → Pincode
B. ADDITIONAL RULES FUNCTIONAL DEPENDENCIES:
These rules can be derived from the basic axioms.
4. Decomposition Rule (Projective Rule):
It sates that if the RHS of a functional dependency contains more than one attributes
then it can be broken down individually.

If A→ BC then A→B and A → C


Ex : If Rollno → FirstName,LastName
Then Rollno → Firstname
and Rollno → Lastname
Proof:
Given A→ BC
BC→ B (By Reflexive Rule),
BC→ C (By Reflexive Rule),
Thus A →B (By Transitive Rule),
and A→ C (By Transitive Rule),
5. Union Rule (Additive Rule):
It states that if the LHS's of two FD's are same then the RHS's can be
merged together.

If A → B and A→ C then A → B C

if Rollno → Name and Rollno → Address


E x:

Then Rollno → Name, Address

Proof:
Given A→ B and A → C
Then AA → A B (By Augmentation Rule),
ie A→ A B ------( a)

AB → BC (By Augmentation Rule), --------(b)

From (a) and (b) A → BC (By Transitive Rule),


6. Pseudotransitive Rule
If a LHS determines an RHS and further the RHS along with other column
determines another RHS then it can be said that the first LHS along with the
extra column determines the last RHS.

If A → B and BC→ D then AC → D

Ex: If Rollno → Name


and Name, GuardianName→ GuardianAddress

Then Rollno,GuardianName → GuardianAddress

Proof:
Given A → B and B C → D
Then A C → B C (By Augmentation Rule),
Thus AC → D (By Transitive Rule),
ANAMOLIES IN DESIGNING DB

Student

➢ Data redundancy means repetition of information in the relation(or table).


➢ The aim of the database system is to reduce redundancy, because
➢redundancy leads to
➢ wastage of storage space
➢ increase in the size of stored data.
➢ Redundancy also gives rise to inconsistency problems.
An insertion anomaly is the inability to add data to the database due to absence of other data.

Insertion Anomaly: Let’s say we have a table that has 4 columns. Student ID, Student Name,
Student Address and Student Grades. Now when a new student enroll in school, even though
first three attributes can be filled but 4th attribute will have NULL value because he doesn't have
any marks yet.
Update Anomaly: It refers to the inconsistencies arising due to update of multiple
copies of the same facts ie whenever updates are made and not all, but only
some copies are updated.

Change the address of the student J


Deletion Anomaly:
➢This anomaly indicates unnecessary deletion of important information from the table. Let’s
say we have student’s information and courses they have taken as follows (student ID,Student
Name, Course, address). If any student leaves the school then the entry related to that student
will be deleted. However, that deletion will also delete the course information even though
course depends upon the school and not the student.
➢Lets say, 224 Student Id wants to discontinue the course, so by deleting student id , course
details will also be deleted
➢It refers to the loss of information ie. Some useful information may be lost when a tuple is
deleted , which we don’t want to lose.
• Various anamolies can be removed by decomposition of the relations
(or tables).
• These anomalies can be avoided or minimized by designing databases that
adhere to the principles of normalization.
Some keywords
 Functional dependency : An attribute Y is said to have a functional
dependency on a set of attributes X (written X → Y) if and only if each X
value is associated with precisely one Y value.

For example:
{Employee ID} → {Employee Date of Birth} would hold.

 Trivial functional dependency : A trivial functional dependency is a


functional dependency of an attribute on a superset of itself.

For example:
{Employee ID, Employee Address} → {Employee Address}
 Full functional dependency : An attribute is fully functionally dependent
on a set of attributes X if it is: functionally dependent on X, and not
functionally dependent on any proper subset of X.

Example:
{Employee ID, Skill} → {Employee Address} is not a full functional
dependency as {Employee ID} → {Employee Address}

 Transitive dependency : A transitive dependency is an indirect


functional dependency, one in which X→Z only by virtue of X→Y and Y→Z.

 J o i n dependency : A table T is subject to a join dependency if T can


always be recreated by joining multiple tables each having a subset of the
attributes of T.
NORMALIZATION USING FDS:
⚫ Normalization is the process of minimizing redundancy from a
relation or set of relations.
⚫ It is the process of decomposing of the relation (or table) into smaller relations
based on the concept of functional dependencies, to overcome undesirable anomalies.
⚫ It groups the data over a number of tables which are independent
and contain no duplicate data.
⚫ Normalization divides the larger table into the smaller tables
and links them using relationship.

Definition:
⚫ Database normalization is the process of structuring a relational
database, in accordance with a series of so-called normal forms in
order to reduce data redundancy and improve data integrity.

⚫ It was first proposed by Edgar F. Codd as part of his relational model.


Types of normal forms:
⚫ 1NF (First Normal Form)

⚫ 2NF (Second Normal Form)

⚫ 3NF (Third Normal Form)

⚫ BCNF (Boyce-Codd Normal Form)

⚫ 4NF (Fourth Normal Form)

⚫ 5NF (Fifth Normal Form)


First Normal Form:
⚫ A relation will be 1NF if it contains an atomic value.
⚫ It states that an attribute of a table cannot hold multiple values. It must hold
only single-valued attribute.

For a table to be in the First Normal Form, it should follow the following 4 rules:
⚫ It should only have single(atomic) valued attributes/columns.
⚫ Values stored in a column should be of the same domain
⚫ All the columns in a table should have unique names.
⚫ And the order in which data is stored, does not matter.
⚫ The Student table can now be decomposed as it contains redundant
data.

⚫ A relation R is said to be in 1NF if for each tuple in R, each


attribute of it is a single, non composite value (i.e. Atomic).
⚫ Thus each field can store maximum only one value .
 Second Normal Form(2NF):
⚫ A relation is said to be in 2NF if
 It is in 1 N F and
 Every non prime attribute is fully functionally dependent on the key

attribute(i.e No Partial Dependency)


 A non-prime attribute of a relation is an attribute that is not a part of

any candidate key of the relation.


EmpNo ProjectNo Total_hours Empname ProjectName ProjectLoc

10 101 5 Smith USsteel Gurgaon

10 102 3 Smith GE healthcare New Delhi

20 101 7 Bob USsteel Gurgaon

20 103 1 Bob Odisha Mining Odisha

30 103 8 Smith Odisha Mining Odisha

40 102 6 Jack GE healthcare New Delhi

40 103 2 Jack Odisha Mining Odisha

(Table not in 2NF)


 The table need to be decomposed:

[Tables in 2NF]
Third Normal Form (3NF)
 A relation is said to be in 3NF if it is in 2NF and no non key attribute is
transitively dependent on a key through some other non key attribute.
 Remove all transitive dependencies.

EmpID EmpName Address DeptNo DeptName


10 Smith BBSR 10 Finance
20 Jack CTC 20 Admin
30 Bob RKL 20 Admin
40 Jack BBSR 10 Finance
50 Henry RKL 10 Finance

Table not in 3NF


Tables after decomposition are as follows:
Employee
EmpID EmpName Address DeptNo
10 Smith BBSR 10
20 Jack CTC 20
30 Bob RKL 20
40 Jack BBSR 10
50 Henry RKL 10

Department
DeptNo DeptName
10 Finance
20 Admin

A table is said to be in the Third Normal Form when,


⚫ It is in the Second Normal form.
⚫ And, it doesn't have Transitive Dependency(i.e a non prime attribute should not
determine another non prime attribute)
Boyce –Codd Normal Form (BCNF)
⚫ Advanced form of 3NF and also called as 3.5 NF.
⚫ A relationship is said to be in BCNF if it is already in 3NF and the left hand side
of every dependency is a candidate key.
⚫ In other words if A B then A can not be a non prime attribute if B is a prime
attribute.
⚫ The constituent attributes of the candidate keys are called prime attributes.
⚫ An attribute that does not occur in ANY candidate key is called a non-prime
attribute

⚫ Till now we have seen


 Prime attribute Non prime attribute (Functional Dependency)

 Non prime attribute Non prime attribute (Transitive Dependency)

 Part of primary key Non prime attribute (Partial Dependency)

 If Non prime attribute Prime attribute (BCNF doesn’t hold good)


 Example:
StudentEnrollment
Roll Subject ProfessorId Professor
10 C 101 Mr. Smith
10 Java 102 Mr. Jack
20 C++ 103 Mr. Henry
20 Java 102 Mr. Jack
30 C 104 Mr Bob
40 Java 105 Mr. Kenny

Primary key is {Roll, Subject} as (Roll, Subject) ProfessorId, Professor

Now as one professor teaches only one subject , so ProfessorId Subject

The table above does not satisfies BCNF.

But, “ProfessorId” is a non prime attribute where as “Subject” is a part of the key and
hence a prime attribute. So BCNF doesn’t hold good as the LHS is not a superkey.
 Let’s decompose the table
Student
Roll ProfessorId
10 101
10 102
20 103
20 102
30 104
40 105

Professor
ProfessorId Professor Subject
101 Mr. Smith C
102 Mr. Jack Java
103 Mr. Henry C++
104 Mr Bob C
105 Mr. Kenny Java
 Example 2
ProfessorCode Department HeadOfDepartment PercentTime

P1 Physics Ghosh 50

P1 Mathematics Krishnan 50

P2 Chemistry Rao 25

P2 Physics Ghosh 75

P3 Mathematics Krishnan 100

 The relation diagram for the above relation is given as the following:
 The relation is not in B C N F so it need to be decomposed.
Professor Code Department Percent Time
P1 Physics 50
P1 Mathematics 50
P2 Chemistry 25
P2 Physics 75
P3 Mathematics 100

Department HeadOfDepartment

Physics Ghosh

Mathematics Krishnan

Chemistry Rao
Fourth Normal Form(4NF)
 It should satisfy BCNF.

 It should not have multi-valued dependency.

A table is said to have multi-valued dependency, if the following conditions are true,
 For a dependency A → B, if for a single value of A, multiple value of B exists,
then the table may have multi-valued dependency.

 Also, a table should have at-least 3 columns for it to have a multi-valued


dependency.

 And, for a relation R(A,B,C), if there is a multi-valued dependency between, A and


B, then B and C should be independent of each other.
Example of multivalued dependency:

ROLL SUBJECTS HOBBY


1 Physics Cricket
1 Math Badminton
2 Chemistry Hockey
2 Biology Cricket

The two records for student with Roll 1, will give rise to two more records, as shown
below
ROLL SUBJECTS HOBBY
1 Physics Cricket
1 Physics Badminton
1 Math Cricket
1 Math Badminton

So there is multi-value dependency, which leads to un-necessary


repetition of data and other anomalies too.
 To make the above relation satisfy the 4th normal form, we can decompose the table
into 2 tables.
CourseOpted
ROLL SUBJECTS
1 Physics
1 Math
2 Chemistry
2 Biology

Hobbies
ROLL HOBBY
1 Cricket
1 Badminton
2 Hockey
2 Cricket
 In case of a table with functional dependency along with multi-valued dependency.,
the functionally dependent columns are moved in a separate table and the multi-
valued dependent columns are moved to separate tables.

Example:
ROLL ADDRESS SUBJECTS HOBBY

In the above table:


ROLL ADDRESS (Functional Dependency)

ROLL ➙ SUBJECTS (Multivalued Dependency)

ROLL ➙ HOBBY (Multivalued Dependency)

New decomposed tables are as follows:


ROLL ADDRESS ROLL SUBJECTS ROLL HOBBY
Fifth Normal Form(5NF)
⚫ It should be in 4NF.

⚫ It should not have join dependency and joining should be lossless.

⚫ A join dependency is said to exist if the join of R1 and R2 over C is equal


to R.
Where , R1 and R2 are decompositions R1(A,B,C) and R2(C,D) of a
relation R(A,B,C,D)

⚫ 5NF is also known as Project Join Normal Form(PJNF).


 Let’s take an example:

SPC table

SUPPLIER PRODUCT CUSTOMER

Smith Gear Hundai

Smith Gear Ford

Smith Switch Maruti

Jack Gear Maruti

Bob Clause Hundai

Bob Gear Hundai

Smith Switch Hundai


 How to check if the relation is in 5th NF?
⚫ Table should satisfy 4NF.
⚫ If join dependency exist then decompose the table.

ER diagram:

Conclusions from above ER diagram:


SUPPLIER PRODUCT CUSTOMER

Smith Gear Hundai


Smith Gear Ford
Smith Switch Maruti
Jack Gear Maruti
Bob Clause Hundai
Bob Gear Hundai
Smith Switch Hundai

SUPPLIER CUTOMER
SUPPLIER PRODUCT
Smith Hundai
Smith Switch

CUTOMER PRODUCT
Hundai Switch
 If a relation has join dependency then it can be divided into smaller relations such
that if we combine the smaller relations then we can get the original table.
 If join dependency doesn’t exist then either data is lost or new entries are created.

Two properties of decomposition:


⚫ The lossless join or nonadditive join property:
It guarantees that the spurious tuple generation does not occur with
respect to the relation schemas created after decomposition.

⚫ The dependency preservation property:


It ensures that each functional dependency is represented in some
individual relations resulting after decomposition.
 Example 2 : A G E N T C O M PA N Y PRODUCT
Vivek ABC Laptop
Vivek ABC Modem
Vivek XYZ Desktop
Swati ABC Desktop

AGENT PRODUCT
AGENT C O M PA N Y
Vivek Laptop
Vivek ABC
Vivek Modem
Vivek XYZ
Vivek Desktop
Swati ABC
Swati Desktop
AGENT C O M PA N Y PRODUCT
V ivek ABC L aptop
V ivek ABC Modem
V ivek ABC Desktop **(spurious row)
V ivek XYZ Laptop **(spurious row)
V ivek XYZ Modem **(spurious row)
V ivek XYZ Desktop
S wati ABC Desktop

C O M PA N Y PRODUCT
ABC Laptop
ABC Modem
ABC Desktop
XYZ Desktop
P1 * P2 *P3

AGENT C O M PA N Y PRODUCT
Vivek ABC Laptop
Vivek ABC Modem
Vivek ABC Desktop **(spurious row)
Vivek XYZ Desktop
Swati ABC Desktop
Now consider different case:
 P1 * P2

 P1 * P2 * P3
Joint dependency – Join decomposition is a further generalization of Multivalued
dependencies.

If the join of R1 and R2 over C is equal to relation R then we can say that a join
dependency (JD) exists, where R1 and R2 are the decomposition R1(A, B, C) and
R2(C, D) of a given relations R (A, B, C, D).

Alternatively, R1 and R2 are a lossless decomposition of R.

Conclusion regarding 5NF:


 If a table is decomposed into smaller tables and that leads to some loss of
information or some additional information is getting created then we should not go
for decomposition.

 But if breaking down the table doesn’t lead to information loss and by using the
decomposed table we can still verify all the facts about the data then we must
decompose the relation.
 For a relation R(A,B,C), if there is a multi-valued dependency between, A & B and A
& C where B and C are independent of each other then 4th Normal Form will be
applied.

 For a relation R(A,B,C), if there is a multi-valued dependency between, A & B and A


& C where B and C are interlinked with some restriction then one extra
restricted table will be created.
Types of decomposition:

Lossless Decomposition
⚫ If the information is not lost from the relation that is decomposed, then the
decomposition will be lossless.
⚫ The lossless decomposition guarantees that the join of relations will result in the
same relation as it was decomposed.
⚫ The relation is said to be lossless decomposition if natural joins of all the
decomposition give the original relation.
Example of lossless join:

After decomposition:
Example of lossy join:

Find the value of C if the value of A is 1.

Spurious Tuples
Rule for decomposing the relation:
 Common attribute should be Candidate key or super key of either R1 or R2 or both.

You might also like