Unit3[1]
Unit3[1]
Functional Dependency
The functional dependency is a relationship that exists between two attributes. It
typically exists between the primary key and non-key attribute within a table.
1. X → Y
The left side of FD is known as a determinant, the right side of the production is
known as a dependent.
Example:
roll_no name dept_name dept_building
42 abc CO A4
43 pqr IT A3
44 xyz CO A4
45 xyz IT A3
46 mno EC B2
47 jkl ME B2
42 abc 17
43 pqr 18
44 xyz 18
42 abc 17
43 pqr 18
44 xyz 18
42 abc 17
43 pqr 18
44 xyz 18
roll_no name age
45 abc 19
42 abc CO 4
43 pqr EC 2
44 xyz IT 1
45 abc EC 2
1. If X ⊇ Y then X → Y
Example:
1. X = {a, b, c, d, e}
2. Y = {a, b, c}
1. If X → Y then XZ → YZ
Example:
1. If X → Y and Y → Z then X → Z
1. If X → Y and X → Z then X → YZ
Proof:
1. X → Y (given)
2. X → Z (given)
3. X → XY (using IR2 on 1 by augmentation with X. Where XX = X)
4. XY → YZ (using IR2 on 2 by augmentation with Y)
5. X → YZ (using IR3 on 3 and 4)
1. If X → YZ then X → Y and X → Z
Proof:
1. X → YZ (given)
2. YZ → Y (using IR1 Rule)
3. X → Y (using IR3 on 1 and 2)
1. If X → Y and YZ → W then XZ → W
Proof:
1. X → Y (given)
2. WY → Z (given)
3. WX → WY (using IR2 on 1 by augmenting with W)
4. WX → Z (using IR3 on 3 and 2)
Normalization
o Normalization is the process of organizing the data in the database.
o Normalization is used to minimize the redundancy from a relation or set of
relations. It is also used to eliminate undesirable characteristics like Insertion,
Update, and Deletion Anomalies.
o Normalization divides the larger table into smaller and links them using
relationships.
o The normal form is used to reduce redundancy from the database table.
Why do we need Normalization?
The main reason for normalizing the relations is removing these anomalies. Failure
to eliminate anomalies leads to data redundancy and can cause data integrity and
other problems as the database grows. Normalization consists of a series of
guidelines that helps to guide you in creating a good database structure.
Data modification anomalies can be categorized into three types:
Advantages of Normalization
o You cannot start building the database before knowing what the user needs.
o The performance degrades when normalizing the relations to higher normal
forms, i.e., 4NF, 5NF.
o It is very time-consuming and difficult to normalize relations of a higher
degree.
o Careless decomposition may lead to a bad database design, leading to
serious problems.
EMPLOYEE table:
7272826385,
14 John UP
9064738238
7390372389,
12 Sam Punjab
8589830302
The decomposition of the EMPLOYEE table into 1NF has been shown below:
EMP_ID EMP_NAME EMP_PHONE EMP_STATE
14 John 7272826385 UP
14 John 9064738238 UP
Example: Let's assume, a school can store the data of teachers and the subjects they
teach. In a school, a teacher can teach more than one subject.
TEACHER table
25 Chemistry 30
25 Biology 30
47 English 35
83 Math 38
83 Computer 38
To convert the given table into 2NF, we decompose it into two tables:
TEACHER_DETAIL table:
TEACHER_ID TEACHER_AGE
25 30
47 35
83 38
TEACHER_SUBJECT table:
TEACHER_ID SUBJECT
25 Chemistry
25 Biology
47 English
83 Math
83 Computer
A relation is in third normal form if it holds atleast one of the following conditions for
every non-trivial function dependency X → Y.
1. X is a super key.
2. Y is a prime attribute, i.e., each element of Y is part of some candidate key.
Example:
EMPLOYEE_DETAIL table:
Non-prime attributes: In the given table, all attributes except EMP_ID are non-
prime.
That's why we need to move the EMP_CITY and EMP_STATE to the new
<EMPLOYEE_ZIP> table, with EMP_ZIP as a Primary key.
EMPLOYEE table:
201010 UP Noida
02228 US Boston
60007 US Chicago
06389 UK Norwich
462007 MP Bhopal
EMPLOYEE table:
1. EMP_ID → EMP_COUNTRY
2. EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}
Candidate key: {EMP-ID, EMP-DEPT}
The table is not in BCNF because neither EMP_DEPT nor EMP_ID alone are keys.
To convert the given table into BCNF, we decompose it into three tables:
EMP_COUNTRY table:
EMP_ID EMP_COUNTRY
264 India
264 India
EMP_DEPT table:
EMP_DEPT_MAPPING table:
EMP_ID EMP_DEPT
D394 283
D394 300
D283 232
D283 549
Functional dependencies:
1. EMP_ID → EMP_COUNTRY
2. EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}
Candidate keys:
Forthefirsttable: EMP_ID
Forthesecondtable: EMP_DEPT
For the third table: {EMP_ID, EMP_DEPT}
Now, this is in BCNF because left side part of both the functional dependencies is a key.
Example
STUDENT
21 Computer Dancing
21 Math Singing
34 Chemistry Dancing
74 Biology Cricket
59 Physics Hockey
The given STUDENT table is in 3NF, but the COURSE and HOBBY are two independent
entity. Hence, there is no relationship between COURSE and HOBBY.
So to make the above table into 4NF, we can decompose it into two tables:
STUDENT_COURSE
STU_ID COURSE
21 Computer
21 Math
34 Chemistry
74 Biology
59 Physics
STUDENT_HOBBY
STU_ID HOBBY
21 Dancing
21 Singing
34 Dancing
74 Cricket
59 Hockey
Fifth Normal Form (5NF)
• A Relation is said to be in 5NF if both conditions are satisfied.
1) Relation should be already in 4NF
2) It cannot be further non-loss decomposed (Join Dependency should
not be present)
• The Fifth Normal Form (5NF) is also known as the Project-Join Normal
Form (PJNF).
• 5NF gets satisfied when the table is broken down into as many parts as
possible to avoid data redundancy.
• Here the subject of math is taught by both teachers kartik and yash.
Also yash can teach math and science. Yash teaches math to both
class 9 and class 10.
• As there is redundancy in data we will decompose it into two tables R1
and R2 such that R1 will have attribute Subject and Class and R2 will
have attribute class and teacher.
Table R1
Subject Class
math class 9
math class 10
science class 10
class 10 kartik
class 9 yash
Class Teacher
class 10 yash
• Here if we notice the newly composed table from R1 and R2 and the
original table, an extra tuple is added that did not exist in the original
data, This breaks the second rule of 5NF i.e. non-loss
decomposition.
• This type of unwanted tuple is known as Spurious tuple.
• Here we will decompose the given table in another relation R3 where it
will have 2 columns i.e. subject and teacher.
Table R3
Subject Teacher
math yash
math kartik
Subject Teacher
science yash
• Here the newly decomposed table R3 will have 3 tuples only as the
repeated tuple (redundancy ) is not added to the table. yash teaching
the subject math is repeated 2 times in main table R but here it will be
added only one time resulting in removing the redundancy in the table.
• Now if we compose or rejoin the tables R1, R2, and R3 we will get
Table (R1 ⨝ R2⨝ R3)
Subject Class Teacher
• Now if we see the re-composed table and the original table, there is no
loss of data.
• Here all the tables, R1, R2 and R3 had a natural join which resulted in
the table R. After the natural join, the original table is retained as it is.
There is no loss of the data.
• So it is atables
• Given Table R1, R2 and R3 are in the Fifth Normal Form(5NF).
Uses of Fifth Normal Form(5NF)
• 5NF ensures that there will be no redundancy present in the database.
Removing the redundancy in the database helps the data to remain
more optimized and easy to perform database actions.
• It also ensures that there will be non-lossy decomposition only which
will result in data consistency and data integrity.
• As data redundancy and anomalies are removed, the database
performance gets enhanced.
Limitation of Fifth Normal Form(5NF)
• One of the biggest limitationsis of 5Nf is the complexity of the
database. Due to 5Nf large number of tables and relation gets created
which eventually increases the complexity of the database.
• Slow exhibition due to large number of tables.
• The cost of implementation of 5NF is also high as it increases the
complexity of the database.
Relational Decomposition
o When a relation in the relational model is not in appropriate normal form then
the decomposition of a relation is required.
o In a database, it breaks the table into multiple tables.
o If the relation has no proper decomposition, then it may lead to problems like
loss of information.
o Decomposition is used to eliminate some of the problems of bad design like
anomalies, inconsistencies, and redundancy.
Types of Decomposition
Lossless Decomposition
o If the information is not lost from the relation that is decomposed, then the
decomposition will be lossless.
o The lossless decomposition guarantees that the join of relations will result in
the same relation as it was decomposed.
o The relation is said to be lossless decomposition if natural joins of all the
decomposition give the original relation.
Example:
EMPLOYEE_DEPARTMENT table:
The above relation is decomposed into two relations EMPLOYEE and DEPARTMENT
EMPLOYEE table:
22 Denim 28 Mumbai
33 Alina 25 Delhi
46 Stephan 30 Bangalore
52 Katherine 36 Mumbai
60 Jack 40 Noida
DEPARTMENT table
827 22 Sales
438 33 Marketing
869 46 Finance
575 52 Production
678 60 Testing
Now, when these two relations are joined on the common column "EMP_ID", then the
resultant relation will look like:
Employee ⋈ Department
Dependency Preserving
Multivalued Dependency
o Multivalued dependency occurs when two attributes in a table are
independent of each other but, both depend on a third attribute.
o A multivalued dependency consists of at least two attributes that are
dependent on a third attribute that's why it always requires at least three
attributes.
Example: Suppose there is a bike manufacturer company which produces two colors(white
and black) of each model every year.
In this case, these two columns can be called as multivalued dependent on BIKE_MODEL.
The representation of these dependencies is shown below:
1. BIKE_MODEL → → MANUF_YEAR
2. BIKE_MODEL → → COLOR
This can be read as "BIKE_MODEL multidetermined MANUF_YEAR" and "BIKE_MODEL
multidetermined COLOR".
Closure of an Attribute
Closure of an Attribute: Closure of an Attribute can be defined as a set of attributes that
can be functionally determined from it.
OR
Closure of a set F of FDs is the set F+ of all FDs that can be inferred from F
Closure of a set of attributes X concerning F is the set X+ of all attributes that are
functionally determined by X.
Advertisement
1. X+ := X;
2. repeat
3. oldX+ := X+ ;
4. for each functional dependency Y → Z in F do
5. if X+ ⊇ Y then X+ := X+ ∪ Z;
6. until (X+ = oldX+ );
Now as per algorithm look into a set of FD that complete the left side of any FD contains
either Q, R, or QR since in FD QR→ST has complete QR.
Again, trace the remaining two FD that any left part of FD contains any Q, R, S, T.
Since no complete left side of the remaining two FD{P->Q, PTV->V} contain Q, R, S, T.
Now as per algorithm look into a set of FD, and check that complete left side of any FD
contains either P, R, or PR. Since in FD P→Q, P is a subset of PR, Hence PR+ = PRQ
Again, trace the remaining two FD that any left part of FD contains any P, R, Q, Since, in
FD QR → ST has its complete left part QR in PQR
Again trace the remaining one FD { PTV->V } that its complete left belongs to PRQST.
Since complete PTV is not in PRQST, hence we ignore it.
Determine Closure of ( T )+
T + = T (as the closure of an attribute or set of attributes contain same) Now as per
algorithm look into a set of FD that complete the left side of any FD contains T since, in FD
T → P, T is in T, Hence T+ = TP Again trace the remaining three FD that any left part of FD
contain any TP, Since in FD P → QR has its complete left part P in TP, Hence T+ = TPQR
Again trace the remaining two FD { RS->T, Q->S } that any of its Complete left belongs to
TPQR, Since in FD Q → S has its complete left part Q in TPQR, Hence T+ = TPQRS Again
trace the remaining one FD { RS->T } that its complete left belongs to TPQRS, Since in FD
RS → T has its complete left part RS in TPQRS Hence T+ = TPQRS ( no changes, as T, is
already in TPQRS) Therefore T+ = TPQRS ( Answer).
Canonical Cover
In database management systems (DBMS), a canonical cover is a set of
functional dependencies that is equivalent to a given set of functional
dependencies but is minimal in terms of the number of dependencies. The
process of finding the canonical cover of a set of functional dependencies
involves three main steps:
• Reduction: The first step is to reduce the original set of functional
dependencies to an equivalent set that has the same closure as the
original set, but with fewer dependencies. This is done by removing
redundant dependencies and combining dependencies that have
common attributes on the left-hand side.
• Elimination: The second step is to eliminate any extraneous attributes
from the left-hand side of the dependencies. An attribute is considered
extraneous if it can be removed from the left-hand side without
changing the closure of the dependencies.
• Minimization: The final step is to minimize the number of
dependencies by removing any dependencies that are implied by other
dependencies in the set.
Illustrative Example
Consider a set of Functional dependencies: 𝐹={𝐴→𝐵𝐶,𝐵→𝐶,𝐴𝐵→𝐶}. Here
are the steps to find the canonical cover –
Step 1:Decompose FDs to have a single attribute on the right-hand
side
• 𝐴→𝐵𝐶 becomes 𝐴→𝐵 and 𝐴→𝐶.
• Therefore, we have {𝐴→𝐵, 𝐴→𝐶, 𝐵→𝐶, 𝐴𝐵→𝐶}.
Step 2:Remove extraneous attributes from the left-hand side of
FDs
• Checking 𝐴𝐵→𝐶: First, check if 𝐴 or 𝐵 is extraneous.
• We can reach 𝐶 without using 𝐴𝐵→𝐶 with other functional
dependencies; therefore, we remove 𝐴𝐵→𝐶.
• Finally, we have {𝐴→𝐵, 𝐴→𝐶, 𝐵→𝐶}.
Step 3:Remove redundant FDs
• Check each functional dependency to see if it can be reached without
using it. For example, 𝐴→𝐶 can be reached with 𝐴→𝐵 and 𝐵→𝐶.
Therefore, 𝐴→𝐶 is redundant and can be removed.
• Hence, Canonical Cover = {𝐴→𝐵, 𝐵→𝐶}.