Unit 3
Unit 3
Unit-3
1
RELATIONAL DATABASE DESIGN
1 Functional dependency
Consider a relation schema R, and let α ⊆ R and β ⊆ R. The functional dependency
α → β holds on relation schema R if, in any legal relation r(R), for all pairs of tuples t1
and t2 in r such that t1 [α] = t2 [α], then t1 [β] = t2 [β] must also satisfy with in r(R).
A B C D
a1 b1 c1 d1
a1 b2 c1 d2
a2 b2 c2 d2
a2 b2 c2 d3
a3 b3 c2 d4
Solution: Observe that A→C is satisfied. There are two tuples that have an A value
of a1 . These tuples have the same C value namely, c1 . Similarly, the two tuples with an
A value of a2 have the same C value, c2 . There are no other pairs of distinct tuples that
have the same A value. The functional dependency C → A is not satisfied, however. To
see that it is not, consider the tuples t1 = (a2 , b3 , c2 , d3 ) and t2 = (a3 , b3 , c2 , d4 ). These
two tuples have the same C values, c2 , but they have different A values, a2 and a3 , re-
spectively. Thus, we have found a pair of tuples t1 and t2 such that t1 [C] = t2 [C], but
t1 [A] 6= t2 [A].
Some other functional dependencies which satisfied are the following:-
AB → C, D → B, BC → A, CD → A, CD → B, AD → B, AD → C.
2
Some trivial functional dependencies are the following:- ABC → C, CD → C, A →
A.
3
1.4 Closure of attribute sets
Consider relation schema R and a set of functional dependencies F. Let α ⊆ R.
The closure of α is the set of all the attributes of R which are logically determined by α
under a set F. It is denoted by α+ .
The closure of α is computed by following algorithm:- α and F α+ = result result ← α
changes to result each functional dependency β → γ in F β ⊆ result result ← result ∪ γ
An algorithm to compute α+ , the closure of α under F
Example: Consider relation schema R = (A, B, C, G, H, I) and the set F of functional
dependencies A → B, A → C, CG → H, CG → I, B → H. Compute the closure of {A,G},
{C,G} and {A}.
Solution:
{A, G}+ = {A,G}
= {A,B,C,G}
= {A,B,C,G,H,I}
Therefore, {A, G}+ = {A,B,C,G,H,I}
{C, G}+ = {C,G}
= {C,G,H,I}
Therefore, {C, G}+ = {C,G,H,I}
{A}+ = {A}
= {A,B,C}
= {A,B,C,H}
Therefore, {A}+ = {A,B,C,H}
Uses or applications of attribute closure
There are several uses of the attribute closure:
4
Eliminate A from FD, AB → C. We get B → C. Clearly, B → C can not be derived from
F, therefore A is not extraneous attribute.
Now, we check B is extraneous attribute or not.
Eliminate B from FD, AB → C. We get A → C. Clearly, A → C is derived from F,
therefore B is an extraneous attribute.
Consider the functional dependency A → C. In this dependency, left hand and right hand
side contains single attribute, therefore this FD has no extraneous attribute.
Canonical Cover:
Canonical cover is defined for a set F of functional dependencies.
Canonical cover of F is the minimal set of functional dependencies equivalent to F that is
canonical cover is a set of functional dependencies equivalent to F which does not contain
any extraneous attribute and redundant FD. It is denoted by Fc .
A canonical cover for a set of functional dependencies F can be computed by following
algorithm.
[h] F Fc Fc ← F
Fc does not change any further Use the union rule to replace any dependencies in Fc of
the form α1 → β1 and α1 → β2 with α1 → β1 β2 .
Find a functional dependency α → β in Fc with an extraneous attribute either in α or in
β.
/* Note: the test for extraneous attributes is done using Fc , not F */
If an extraneous attribute is found, delete it from α → β.
Computing canonical cover
5
Example: Consider the following set F of functional dependencies on relation schema
R = (A,B,C):
A → BC
B→C
A→B
AB → C
Compute the canonical cover for F.
Solution:
• There are two functional dependencies with the same set of attributes on the left
side of the arrow:
A → BC, A → B
We combine these functional dependencies using union rule into A → BC.
6
1.6 Decomposition
Let R be a relation schema. A set of relation schemas {R1 , R2 , ..., Rn } is a decomposition
of R if
R = R1 ∪ R2 ∪ ............... ∪ Rn
That is, {R1 , R2 , ..., Rn } is a decomposition of R if, for i = 1, 2, . . . , n, each Ri is a
subset of R, and every attribute in R appears in at least one Ri .
Let r be a relation on schema R, and let ri = ΠRi (r) for i = 1, 2, . . . , n. That is,
{r1 , r2 , ..., rn } is the database that results from decomposing R into {R1 , R2 , ..., Rn }. It
is always the case that
r ⊆ r1 ./ r2 ./ ∆∆∆ ./ rn .
A decomposition {R1 , R2 , ..., Rn } of R is a lossless-join decomposition if, for all relations
r on schema R,
r = ΠR1 (r) ./ ΠR2 (r) ./ ..................... ./ ΠRn (r)...........(1)
If equation (1) is not satisfied, then this decomposition is said to be lossy decomposition.
• R1 ∩ R2 → R1
• R1 ∩ R2 → R2
Dependency Preservation
Let F be a set of functional dependencies on a schema R, and let R1 , R2 , ..., Rn be a decom-
position of R. Let F1 , F2 , ..., Fn is the set of dependencies corresponding to R1 , R2 , ..., Rn .
Let F’ = F1 ∪ F2 ∪ ... ∪ Fn
If F’ = F, then the decomposition will be functionally dependency preserve.
If F’ 6= F , then the decomposition may or may not be functionally dependency preserve.
7
In this case, compute F + and F 0+ . If F + = F 0+ , then the decomposition will be func-
tionally dependency preserve otherwise not.
2 Normalization
Normalization is a database design technique that reduces data redundancy and elimi-
nates undesirable characteristics like Insertion, Update and Deletion Anomalies. Normal-
ization rules divides larger tables into smaller tables and links them using relationships.
The purpose of Normalization is to eliminate redundant (repetitive) data and ensure data
is stored logically.
Insertion anomaly: Let us assume that a new department has been started by the
organization but initially there is no employee appointed for that department, then the
tuple for this department cannot be inserted into this table as the E# will have NULL,
which is not allowed as E# is primary key.
This kind of a problem in the relation where some tuple cannot be inserted is known as
insertion anomaly.
Deletion anomaly:
Now consider there is only one employee in some department and that employee leaves
the organization, then the tuple of that employee has to be deleted from the table, but
in addition to that the information about the department also will get deleted.
8
E# Ename Address D# Dname Dmgr#
123456789 Akhilesh Ghaziabad 5 Research 333445555
333445555 Ajay Kanpur 5 Research 333445555
999887777 Shreya Lucknow 4 Administration 987654321
987654321 Sanjay Mirjapur 4 Administration 987654321
666884444 Om Prakash Lucknow 5 Research 333445555
453453453 Manish Delhi 5 Research 333445555
987987987 Ishani Prayagraj 4 Administration 987654321
888665555 Garvita Prayagraj 1 Headquarters 888665555
Table 1: Emp-Dept
This kind of a problem in the relation where deletion of some tuples can lead to loss of
some other data not intended to be removed is known as deletion anomaly.
Modification/update anomaly:
Suppose the manager of a department has changed, this requires that the Dmgr# in all
the tuples corresponding to that department must be changed to reflect the new status.
If we fail to update all the tuples of the given department, then two different records
of employee working in the same department might show different Dmgr# leading to
inconsistency in the database.
This is known as modification/update anomaly.
2.2 Normalization
To overcome these anomalies we need to normalize the data. In the next section we will
discuss different types of normal forms. These normal forms are:-
9
emp id emp name emp address emp mobile
101 Herschel New Delhi 8912312390
102 Jon Kanpur 8812121212 , 9900012222
103 Ron Chennai 7778881212
104 Lester Bangalore 9990000123 , 8123450987
Table 2: Employee
Prime attribute An attribute which is a part of any candidate key is said to be prime
attribute. And which is not part of any candidate key is said to be non-prime attribute.
Example: Consider a relation schema R = (A,B,C,D) with only one candidate key
as {A,B}. In this case, A and B are prime attributes. C and D are non-prime attributes.
10
Y→V
WX → YZ
Is this relation in 2NF?
Solution:
Here candidate keys are VW, WX and WY. Therefore non-prime attributes are Z. Clearly
Z is fully dependent on candidate keys. Therefore R is in 2NF.
11
Solution:
Here candidate key is A. Therefore, non-prime attributes are B, C, D, E. Since candidate
key contains single attribute, therefore R is in 2NF.
Now, Clearly attributes D and E are transitively dependent on candidate key A, therefore
R is not in 3NF.
Example: The relation schema Student Performance (name, courseNo, rollNo, grade)
has the following FDs:
name,courseNo → grade
rollNo,courseNo → grade
name → rollNo
rollNo → name
Is this relation in 3NF?
Solution:
Here candidate keys are {name, courseNo} and {rollNo, courseNo}. Therefore non-prime
attributes are grade. Clearly, grade is fully functional dependent on candidate keys,
therefore this relation schema is in 2NF.
Now, clearly grade is not transitively dependent on candidate keys, therefore this relation
schema is in 3NF.
Example: The relation schema Student Performance (name, courseNo, rollNo, grade)
has the following FDs:
name,courseNo → grade
rollNo,courseNo → grade
name → rollNo
rollNo → name
Is this relation in BCNF?
Solution:
Consider FD, name,courseNo → grade. Clearly, {name, courseNo} is a super key, there-
fore this satisfy 2nd criteria of BCNF.
Consider FD, rollNo,courseNo → grade. Clearly, {rollNo, courseNo} is a super key, there-
fore this satisfy 2nd criteria of BCNF.
Consider FD, name → rollNo. Clearly, this functional dependency not satisfy any condi-
tion of BCNF. Therefore, this is not in BCNF.
Example: Consider the following relation schemas and their respective functional de-
pendencies:
Customer = (customer-name, customer-street, customer-city)
customer-name → customer-street, customer-city
Branch = (branch-name, assets, branch-city)
12
branch-name → assets, branch-city
Loan = (branch-name, customer-name, loan-number, amount)
loan-number → amount, branch-name
Clearly, Customer and Branch schema are in BCNF, because left side of functional de-
pendency customer-name → customer-street, customer-city and branch-name
→ assets, branch-city is super key.
But Loan schema is not in BCNF, because left side of functional dependency loan-
number → amount, branch-name is not super key and this functional dependency is
also not trivial.
Example:
Student-course-info(Name, Course, Grade, Phone-no, Major, Course-dept)
F = { Name → Phone-no Major, Course → Course-dept, Name Course → Grade }
Is this relation schema in 2NF? If not then decompose it into 2NF.
Solution:
Here, Primary key = {Name,Course}.
Clearly, this is not in 2NF because partial dependency holds.
Now, we decompose R into R1 , R2 and R3 as the following:-
R1 = (Name, Phone-no, Major), F1 = {Name → Phone-no Major}
R2 = (Course, Course-dept), F2 = {Course → Course-dept}
R3 = (Name, Course, Grade), F3 = {Name Course → Grade}
Now, R1 , R2 and R3 are in 3NF.
13
Now, R1 and R2 are in 3NF.
Example:
Let Banker-info = { branch-name, customer-name, banker-name, office-number} and F =
{ banker-name → branch-name office-number, customer-name branch-name → banker-
name}.
Is this relation schema in 3NF? If not then decompose it into 3NF.
Solution:
Here, Primary key = {customer-name, branch-name}.
Clearly, this is not in 3NF because transitivity dependency holds.
Now, we decompose relation Banker-info into R1 and R2 as the following:-
R1 = (banker-name, branch-name, office-number), F1 = {banker-name → branch-name
office-number}
R2 = (customer-name, branch-name, banker-name), F2 = {customer-name branch-name
→ banker-name}
Now, R1 and R2 are in 3NF.
14
F4 = { loan-number → amount branch-name}
Now, the final relation schema are R2 , R3 and R4 . All these are in BCNF.
1. Create a matrix S with one row i for each relation Ri in D, and one column j for
each attribute aj in R.
3. . change ← true
(change) each functional dependency α → β in F (row i and j exists such that
the same symbol appears in each column corresponding to attribute in α) one of
the symbols in the β column is ar Make other symbol to be ar the symbols are bpk
and bqk Make both of them bpk change = false
4. If there exists a row in which all the symbols are a’s, then the decomposition has
the lossless. Otherwise decomposition has lossy.
Example:
Consider the following schema:-
R=(SSN, Ename, Pnumber, Pname, Plocation, Hours)
F= { SSN → Ename, Pnumber → Pname Plocation, SSN pnumber → Hours }
R1 = (SSN, Ename)
R2 = (Pnumber, Pname, Plocation)
R3 = (SSN, Pnumber, Hours)
Find out decomposition of R in R1 , R2 , R3 is lossless or lossy.
Solution:
First construct matrix. It will be order of 3×6.
The initialization table will be the following:-
After the first iteration , the table will be the following:- After the second iteration,
the table will not changed. Therefore, the above table is the final table.
Since in this table, row 3 contains only a’s symbol, therefore this decomposition is lossless.
Example:
Consider the following schema:-
R=(A, B, C, D, E)
F= { AB → CD, A → E, C → D }
R1 = (A, B, C)
R2 = (B, C, D)
R3 = (C, D, E)
Is the decomposition of R in R1 , R2 , R3 is lossless or lossy?
Solution:
First construct matrix. It will be order of 3×5.
The initialization table will be the following:- After the first iteration , the table will
be the following:- After the second iteration, the table will not changed. Therefore, the
15
above table is the final table.
Since in this table, no row contains only a’s symbol, therefore this decomposition is lossy.
3 Multivalued dependency
Let R be a relation schema and let α ⊆ R and β ⊆ R. The multivalued dependency
α →→ β holds on R if in any legal relation r(R), for all pairs of t1 and t2 in r such that
t1 [α] = t2 [α], there exists two tuples t3 and t4 in r such that
t1 [α] = t2 [α] = t3 [α] = t4 [α]
t3 [β] = t1 [β]
t4 [β] = t1 [β]
t3 [R − β] = t2 [R − β]
t4 [R − β] = t1 [R − β]
16
X →→ Y and XY →→ Z ⇒ X →→ (Z − Y )
(10) Coalescence rule:
Given that W ⊆ Y and Y ∩ Z = φ, and if X →→ Y and Z → W then X → W .
17
SSN Ename Pnumber Pname Plocation Hours
R1 a1 a2 b13 b14 b15 b16
R2 b21 b22 a3 a4 a5 b26
R3 a1 b32 a3 b34 b35 a6
A B C D E
R1 a1 a2 a3 b14 b15
R2 b21 a2 a3 a4 b25
R3 b31 b32 a3 a4 a5
A B C D E
R1 a1 a2 a3 a4 b15
R2 b21 a2 a3 a4 b25
R3 b31 b32 a3 a4 a5
18
schema. Find out this table is in 4NF or not. If not, then decompose it into 4NF.
Solution: In this table, multivalued dependency
customer-name →→ customer-street customer-city
holds.
Clearly, neither this multivalued dependency is trivial nor customer-name is super key.
Therefore, this table is not in 4NF.
By using above algorithm, this table is decomposed as
R1 = (customer-name, loan-number)
R2 = (customer-name, customer-street, customer-city)
Now, R1 and R2 are in 4NF.
5 Join dependency
Given a relation schema R. let R1 , R2 , ..........., Rn are the projections of R. A relation
r(R) satisfies the join dependency *(R1 , R2 , ..........., Rn ), iff the join of the projection of
r on Ri , 1 ≤ i ≤ n, is equal to r.
r = ΠR1 ./ ΠR2 ./ ΠR3 ./ ......... ./ ΠRn
5.3 Exercise
1. Consider R = (A, B, C, D, E, F, G) and
F= {A → B, BC → F , BD → EG, AD → C, D → F , BEG → F A}
Calculate the following:-
(a) (A)+
(b) (ACEG)+
(c) (BD)+
19
The decomposition of R is
D={R1 (B, C), R2 (A, C), R3 (A, B, D, E), R4 (A, B, D, F )}
Check whether the decomposition is lossless or lossy.
9. Define partial functional dependency. Consider the following two sets of functional
dependencies F = A →C, AC →D, E →AD, E →H and G = A →CD, E →AH.
Check whether or not they are equivalent.
11. Write the difference between 3NF and BCNF. Find normal form of relation R(A,B,C,D,E)
having FD set F= A→B,BC→E,ED→A.
5. Define partial functional dependency. Consider the following two sets of functional
dependencies F = A →C, AC →D, E →AD, E →H and G = A →CD, E →AH.
Check whether or not they are equivalent.
20
6. Define Minimal Cover. Suppose a relation R (A,B,C) has FD set F = A → B, B
→C, A → C, AB → B, AB → C, AC → B convert this FD set into minimal cover.
7. Write the difference between 3NF and BCNF. Find normal form of relation R(A,B,C,D,E)
having FD set F= A→B,BC→E,ED→A.
8. Define 2 NF.
11. Explain 1NF, 2NF, 3NF and BCNF with suitable example.
7 Gate questions
1. From the following instance of a relation schema R(A,B,C), we can conclude that
A B C
1 1 1
1 1 0
2 3 2
2 3 2
21
The relation (Roll number, Name, Date of birth, Age) is in
(a) Second normal form but not in third normal form
(b) Third normal form but not in BCNF
(c) BCNF
(d) None of the above
3. The relation schema student performance (name, courseNo, rollNo, grade) has the
following functional dependencies:-
(name, courseNo) → grade
(rollNo, courseNo) → grade
name → rollNo
rollNo → name
22