Functional Dependency (DBMS)
Functional Dependency (DBMS)
A->B
B IS FUNCTIONALLY DETERMINED BY A
Mathematically,
α⊆R
β⊆R
Then, for a functional dependency to exist from α to β,
Α Β
A1 B1
A2 B1
A1 B1
A->B EXIST
IS B->A EXIST? NO
Examples-
AB → A
AB → B
AB → AB
Examples-
AB → BC
AB → CD
Rule-01:
A functional dependency X → Y will always hold if all the values of X are unique (different) irrespective of the values
of Y.
Example-
5 4 3 2 2
8 5 3 2 1
1 9 3 3 5
4 7 3 3 8
The following functional dependencies will always hold since all the values of attribute ‘A’ are unique-
A→B
A → BC
A → CD
A → BCD
A → DE
A → BCDE
In general, we can say following functional dependency will always hold-
Rule-02:
A functional dependency X → Y will always hold if all the values of Y are same irrespective of the values of X.
Example-
Consider the following table-
A B C D E
5 4 3 2 2
8 5 3 2 1
1 9 3 3 5
4 7 3 3 8
The following functional dependencies will always hold since all the values of attribute ‘C’ are same-
A→C
AB → C
ABDE → C
DE → C
AE → C
In general, we can say following functional dependency will always hold true-
Rule-03:
For a functional dependency X → Y to hold, if two tuples in the table agree on the value of attribute X, then they must
also agree on the value of attribute Y.
Rule-04:
For a functional dependency X → Y, violation will occur only when for two or more same values of X, the
corresponding Y values are different.
PROVE UNION RULE:
A->B
Augment by A
A->AB …….1
A->C
Augment by B
AB->BC……..2
By applying transitivity between 1 and 2
ABC
UNION RULE IS PROVED
By reflexivity we get,
BC IS SUBSET OF A
B IS A SUBSET OF A. A->B
C IS A SUBSET OF A. A->C
A->B
Augment by C
AC->BC……1
BC->D……2
Applying transitivity between 1 and 2 we get
AC->D
CLOSURE OF ATTRIBUTE:
A→C
AC → D
E → AD
E→H
Find A+ (CLOSURE OF A)
result=A
= AC
=ACD
Therefore, A+={ACD}
Find E+
Result=E
=EADH
= EADHC
E is a candidate key.
Find AC+
result=AC
=ACD
Question: Consider the relation scheme R = {E, F, G, H, I, J, K, L, M, N} and the set of functional
dependencies {{E, F} -> {G}, {F} -> {I, J}, {E, H} -> {K, L}, K -> {M}, L -> {N} on R. What is the Candidate key for
R?
A. {E, F}
B. {E, F, H}
C. {E, F, H, K, L}
D. {E}
EF+ EFH+=EFH=EFHGIJ=EFHGIJKL=EFHGIJKLMN
result=EF=EFG=EFGIJ SO, EFH is a candidate key.
F+
Result=F=FIJ
EH+
Result=EH=EHKL=EHKLM
K+
Result=K=KM
L+
Result=L=LN
If F and G are the two sets of functional dependencies, then following 3 cases are possible-
Case-01: F covers G (F ⊇ G)
Case-02: G covers F (G ⊇ F)
Step-01:
Step-02:
Step-03:
Step-01:
Step-02:
Step-03:
Problem-
A→C
AC → D
E → AD
E→H
Set G-
A → CD
E → AH
(A) G ⊇ F
(B) F ⊇ G
(C) F = G
Solution-
Step-01:
Step-02:
Step-03:
Functional dependencies of set F can determine all the attributes which have been determined by the functional
dependencies of set G.
Thus, we conclude F covers G i.e. F ⊇ G.
Step-02:
Step-03:
Functional dependencies of set G can determine all the attributes which have been determined by the functional
dependencies of set F.
Thus, we conclude G covers F i.e. G ⊇ F.
In DBMS,
A canonical cover is a simplified and reduced version of the given set of functional dependencies.
Since it is a reduced version, it is also called as Irreducible set.
Characteristics-
Need-
Working with the set containing extraneous functional dependencies increases the computation time.
Therefore, the given set is reduced by eliminating the useless functional dependencies.
This reduces the computation time and working with the irreducible set becomes easier.
A->BC
B->C
A->C
AB->C
A->BC
(X ->Y)
Let t=B
{B->C,AB->C}U {A->(BC-B)}
{B->C,AB->C}U {A->C}
{B->C,AB->C,A->C}
X+
A+=A=AC
As t=B not in A+, SO B is not extraneous.
X+
A+=A=AB=ABC
As t= C is in A+ So, C is extraneous.
AB->C
X Y
(X-t)+=(AB-A)+=B+=B=BC
Problem-
The following functional dependencies hold true for the relational scheme R ( W , X , Y , Z ) –
X→W
WZ → XY
Y → WXZ
Solution-
Step-01:
Write all the functional dependencies such that each contains exactly one attribute on its right side-
X→W
WZ → X
WZ → Y
Y→W
Y→X
Y→Z
Step-02:
For X → W:
Considering X → W, (X)+ = { X , W }
Ignoring X → W, (X)+ = { X }
Now,
For WZ → X:
Considering WZ → X, (WZ)+ = { W , X , Y , Z }
Ignoring WZ → X, (WZ)+ = { W , X , Y , Z }
Now,
X→W
WZ → Y
Y→W
Y→X
Y→Z
For WZ → Y:
Considering WZ → Y, (WZ)+ = { W , X , Y , Z }
Ignoring WZ → Y, (WZ)+ = { W , Z }
Now,
For Y → W:
Considering Y → W, (Y)+ = { W , X , Y , Z }
Ignoring Y → W, (Y)+ = { W , X , Y , Z }
Now,
Clearly, the two results are same.
Thus, we conclude that Y → W is non-essential and can be eliminated.
X→W
WZ → Y
Y→X
Y→Z
For Y → X:
Considering Y → X, (Y)+ = { W , X , Y , Z }
Ignoring Y → X, (Y)+ = { Y , Z }
Now,
For Y → Z:
Considering Y → Z, (Y)+ = { W , X , Y , Z }
Ignoring Y → Z, (Y)+ = { W , X , Y }
Now,
X→W
WZ → Y
Y→X
Y→Z
Step-03:
Consider the functional dependencies having more than one attribute on their left side.
Check if their left side can be reduced.
In our set,
Now,
(Z)+ = { Z }
Clearly,
None of the subsets have the same closure result same as that of the entire left side.
Thus, we conclude that we can not write WZ → Y as W → Y or Z → Y.
Thus, set of functional dependencies obtained in step-02 is the canonical cover.
X→W
WZ → Y
Y→X
Y→Z
Canonical Cover
Decomposition of a Relation-
The process of breaking up or dividing a single relation into two or more sub relations is called as decomposition of a
relation.
Properties of Decomposition-
The following two properties must be followed when decomposing a given relation-
1. Lossless decomposition-
2. Dependency Preservation-
None of the functional dependencies that holds on the original relation are lost.
The sub relations still hold or satisfy the functional dependencies of the original relation.
Types of Decomposition-
R1 ⋈ R2 ⋈ R3 ……. ⋈ Rn = R
Example-
A B C
1 2 1
2 5 3
3 3 3
R( A , B , C )
Consider this relation is decomposed into two sub relations R1( A , B ) and R2( B , C )-
A B
1 2
2 5
3 3
R1 ( A , B )
B C
2 1
5 3
3 3
R2 ( B , C )
R1 ⋈ R 2 = R
Now, if we perform the natural join ( ⋈ ) of the sub relations R1 and R2 , we get-
A B C
1 2 1
2 5 3
3 3 3
NOTE-
R1 ⋈ R2 ⋈ R3 ……. ⋈ Rn ⊃ R
Example-
Consider the following relation R( A , B , C )-
A B C
1 2 1
2 5 3
3 3 3
R( A , B , C )
Consider this relation is decomposed into two sub relations as R1( A , C ) and R2( B , C )-
A C
1 1
2 3
3 3
R1 ( A , B )
B C
2 1
5 3
3 3
R2 ( B , C )
R1 ⋈ R 2 ⊃ R
Now, if we perform the natural join ( ⋈ ) of the sub relations R1 and R2 we get-
A B C
1 2 1
2 5 3
2 3 3
3 5 3
3 3 3
This relation is not same as the original relation R and contains some extraneous tuples.
Clearly, R1 ⋈ R2 ⊃ R.
Then,
Condition-01:
Union of both the sub relations must contain all the attributes that are present in the original relation R.
Thus,
R1 ∪ R2 = R
Condition-02:
R1 ∩ R2 ≠ ∅
Condition-03:
Intersection of both the sub relations must be a super key of either R1 or R2 or both.
Thus,
R1 ∩ R2 = Super key of R1 or R2
Solution-
Condition-01:
According to condition-01, union of both the sub relations must contain all the attributes of relation R.
So, we have-
R1 ( A , B ) ∪ R2 ( C , D )
=R(A,B,C,D)
Clearly, union of the sub relations contain all the attributes of relation R.
Condition-02:
According to condition-02, intersection of both the sub relations must not be null.
So, we have-
R1 ( A , B ) ∩ R2 ( C , D )
=Φ
Problem-02:
A→B
B→C
C→D
D→B
Solution-
Strategy to Solve
When a given relation is decomposed into more than two sub relations, then-
Consider any one possible ways in which the relation might have been decomposed into those sub relations.
First, divide the given relation into two sub relations.
Then, divide the sub relations according to the sub relations given in the question.
As a thumb rule, remember-
Any relation can be decomposed only into two sub relations at a time.
Consider the original relation R was decomposed into the given sub relations as shown-
Condition-01:
According to condition-01, union of both the sub relations must contain all the attributes of relation R.
So, we have-
R‘ ( A , B , C ) ∪ R3 ( B , D )
=R(A,B,C,D)
Clearly, union of the sub relations contain all the attributes of relation R.
Condition-02:
According to condition-02, intersection of both the sub relations must not be null.
So, we have-
R‘ ( A , B , C ) ∩ R3 ( B , D )
=B
Condition-03:
According to condition-03, intersection of both the sub relations must be the super key of one of the two sub relations
or both.
So, we have-
R‘ ( A , B , C ) ∩ R3 ( B , D )
=B
B+ = { B , C , D }
Now, we see-
Attribute ‘B’ can not determine attribute ‘A’ of sub relation R’.
Thus, it is not a super key of the sub relation R’.
Attribute ‘B’ can determine all the attributes of sub relation R3.
Thus, it is a super key of the sub relation R3.
Clearly, intersection of the sub relations is a super key of one of the sub relations.
Condition-01:
According to condition-01, union of both the sub relations must contain all the attributes of relation R’.
So, we have-
R1 ( A , B ) ∪ R2 ( B , C )
= R’ ( A , B , C )
Clearly, union of the sub relations contain all the attributes of relation R’.
Condition-02:
According to condition-02, intersection of both the sub relations must not be null.
So, we have-
R1 ( A , B ) ∩ R2 ( B , C )
=B
Condition-03:
According to condition-03, intersection of both the sub relations must be the super key of one of the two sub relations
or both.
So, we have-
R1 ( A , B ) ∩ R2 ( B , C )
=B
B+ = { B , C , D }
Now, we see-
Attribute ‘B’ can not determine attribute ‘A’ of sub relation R1.
Thus, it is not a super key of the sub relation R1.
Attribute ‘B’ can determine all the attributes of sub relation R2.
Thus, it is a super key of the sub relation R2.
Clearly, intersection of the sub relations is a super key of one of the sub relations.
Conclusion-
Overall decomposition of relation R into sub relations R1, R2 and R3 is lossless.
Normalization in DBMS-
Normal Forms-
A given relation is called in First Normal Form (1NF) if each cell of the table contains only an atomic value.
OR
A given relation is called in First Normal Form (1NF) if the attribute of every tuple is either single valued or a null
value.
Example-
However,
Relation is in 1NF
NOTE-
A given relation is called in Second Normal Form (2NF) if and only if-
Partial Dependency
A partial dependency is a dependency where few attributes of the candidate key determines non-prime attribute(s).
OR
A partial dependency is a dependency where a portion of the candidate key or incomplete candidate key determines non-prime
attribute(s).
In other words,
NOTE-
To avoid partial dependency, incomplete candidate key must not determine any non-prime attribute.
However, incomplete candidate key can determine prime attributes.
Example-
R(A,B,C,D)
AB->C (2NF)
C->D (2NF)
A given relation is called in Third Normal Form (3NF) if and only if-
1. A is a super key or
2. B is a prime attribute.
(Prime attributes are components of key)
NOTE-
OR
Example-
A → BC (3NF)
CD → E (3NF)
B → D (3NF)
E → A (3NF)
A , E , CD , BC
From here,
Prime attributes = { A , B , C , D , E }
There are no non-prime attributes
Now,
Example-
A → B(BCNF)
B → C(BCNF)
C → A(NOT IN BCNF)
A,B
Now, we can observe that RHS of each given functional dependency is a candidate key.
We have discussed-
In this article, we will discuss some important points about normal forms.
Point-01:
Point-03:
Point-04:
Point-05:
Singleton keys are those that consist of only a single attribute.
If all the candidate keys of a relation are singleton candidate keys, then it will always be in 2NF at least.
This is because there will be no chances of existing any partial dependency.
The candidate keys will either fully appear or fully disappear from the dependencies.
Thus, an incomplete candidate key will never determine a non-prime attribute.
Point-06:
If all the attributes of a relation are prime attributes, then it will always be in 2NF at least.
This is because there will be no chances of existing any partial dependency.
Since there are no non-prime attributes, there will be no Functional Dependency which determines a non-prime
attribute.
Point-07:
If all the attributes of a relation are prime attributes, then it will always be in 3NF at least.
This is because there will be no chances of existing any transitive dependency for non-prime attributes.
Point-08:
Third Normal Form (3NF) is considered adequate for normal relational database design.
Point-09:
Every binary relation (a relation with only two attributes) is always in BCNF.
Point-10:
BCNF is free from redundancies arising out of functional dependencies (zero redundancy).
Point-11:
Point-12:
BCNF decomposition is always lossless but not always dependency preserving.
Point-13:
Point-14:
There exist many more normal forms even after BCNF like 4NF and more.
But in the real world database systems, it is generally not required to go beyond BCNF.
Point-15:
Point-16:
Unlike BCNF, Lossless and dependency preserving decomposition into 3NF and 2NF is always possible.
Point-17:
Point-18:
If a relation consists of only singleton candidate keys and it is in 3NF, then it must also be in BCNF.
Point-19:
If a relation consists of only one candidate key and it is in 3NF, then the relation must also be in BCNF.
Decomposition Algorithms
In the previous section, we discussed decomposition and its types with the help of small examples. In
the actual world, a database schema is too wide to handle. Thus, it requires algorithms that may
generate appropriate databases.
Here, we will get to know the decomposition algorithms using functional dependencies for two
different normal forms, which are:
o Decomposition to BCNF
o Decomposition to 3NF
Decomposition to BCNF
Before applying the BCNF decomposition algorithm to the given relation, it is necessary to test if the
relation is in Boyce-Codd Normal Form. After the test, if it is found that the given relation is not in
BCNF, we can decompose it further to create relations in BCNF.
There are following cases which require to be tested if the given relation schema R satisfies the BCNF
rule:
Case 1: Check and test, if a nontrivial dependency α -> β violate the BCNF rule, evaluate and
compute α+ , i.e., the attribute closure of α. Also, verify that α+ includes all the attributes of the given
relation R. It means it should be the superkey of relation R.
Case 2: If the given relation R is in BCNF, it is not required to test all the dependencies in F +. It only
requires determining and checking the dependencies in the provided dependency set F for the BCNF
test. It is because if no dependency in F causes a violation of BCNF, consequently, none of the
F+ dependency will cause any violation of BCNF.
Note: Case2 does not work if the relation gets decomposed. It means during the testing of the given relation R, we
cannot check the dependency of F for the cause of violation of BCNF.
This algorithm is used for decomposing the given relation R into its several decomposers. This
algorithm uses dependencies that show the violation of BCNF for performing the decomposition of the
relation R. Thus, such an algorithm not only generates the decomposers of relation R in BCNF but is
also a lossless decomposition. It means there occurs no data loss while decomposing the given
relation R into R1, R2, and so on…
The BCNF decomposition algorithm takes time exponential in the size of the initial relation schema R.
With this, a drawback of this algorithm is that it may unnecessarily decompose the given relation R,
i.e., over-normalizing the relation. Although decomposing algorithms for BCNF and 4NF are similar,
except for a difference. The fourth normal form works on multivalued dependencies, whereas BCNF
focuses on the functional dependencies. The multivalued dependencies help to reduce some form of
repetition of the data, which is not understandable in terms of functional dependencies.
Decomposition to 3NF
The decomposition algorithm for 3NF ensures the preservation of dependencies by explicitly building a
schema for each dependency in the canonical cover. It guarantees that at least one schema must hold
a candidate key for the one being decomposed, which in turn ensures the decomposition generated to
be a lossless decomposition.
o The result of the decomposing algorithm is not uniquely defined because a set of functional
dependencies can hold more than one canonical cover.
o In some cases, the result of the algorithm depends on the order in which it considers the
dependencies in Fc.
o If the given relation is already present in the third normal form, then also it may decompose a
relation.
Multivalued dependency occurs when there are more than one independent multivalued attributes in a table.
For example: Consider a bike manufacture company, which produces two colors (Black and white) in each model
every year.
...
Multivalued dependency in DBMS.
4NF:
Properties – A relation R is in 4NF if and only if the following conditions are satisfied:
1. It should be in the Boyce-Codd Normal Form (BCNF).
2. the table should not have any Multi-valued Dependency.
A table with a multivalued dependency violates the normalization standard of Fourth Normal Form (4NK) because
it creates unnecessary redundancies and can contribute to inconsistent data.
Fifth Normal Form / Project Join Normal Form (PJNF)(5NF):
A relation R is in 5NF if and only if every join dependency in R is implied by the candidate keys of R. A relation
decomposed into two relations must have loss-less join Property, which ensures that no spurious or extra tuples
are generated, when relations are reunited through a natural join.
Properties – A relation R is in 5NF if and only if it satisfies following conditions:
1. R should be already in 4NF.
2. It cannot be further non loss decomposed (join dependency)
Example – Consider the above schema, with a case as “if a company makes a product and an agent is an agent
for that company, then he always sells that product for the company”. Under these circumstances, the ACP table is
shown as:
Table – ACP
A1 PQR Nut
A1 PQR Bolt
A1 XYZ Nut
A1 XYZ Bolt
A2 PQR Nut
The relation ACP is again decompose into 3 relations. Now, the natural Join of all the three relations will be shown
as:
Table – R1
Agent Company
A1 PQR
A1 XYZ
A2 PQR
Table – R2
Agent Product
A1 Nut
A1 Bolt
Agent Product
A2 Nut
Table – R3
Company Product
PQR Nut
PQR Bolt
XYZ Nut
XYZ Bolt
Result of Natural Join of R1 and R3 over ‘Company’ and then Natural Join of R13 and R2 over ‘Agent’and
‘Product’ will be table ACP.
Hence, in this example, all the redundancies are eliminated, and the decomposition of ACP is a lossless join
decomposition. Therefore, the relation is in 5NF as it does not violate the property of lossless join.