CB3401 Database Management and Security Unit-2
CB3401 Database Management and Security Unit-2
FUNCTIONAL DEPENDENCY
The dependent attributes are determined by pointing side of arrow and the determinant attribute is
determined by the origin of the arrow.
Types of Functional Dependency in DBMS:
There are 4 types of functional dependency in DBMS are as follows:
1. Trivial Functional Dependency in DBMS
2. Non-Trivial Functional Dependency in DBMS
3. Multivalued Functional Dependency in DBMS
4. Transitive Functional Dependency in DBMS
To check for lossless join decomposition using the FD set, the following conditions must hold:
1. The Union of Attributes of R1 and R2 must be equal to the attribute of R. Each attribute of R must
be either in R1 or in R2.
Att(R1) U Att(R2) = Att(R)
2. The intersection of Attributes of R1 and R2 must not be NULL.
Att(R1) ∩ Att(R2) ≠ Φ
3. The common attribute must be a key for at least one relation (R1 or R2)
Att(R1) ∩ Att(R2) -> Att(R1) or Att(R1) ∩ Att(R2) -> Att(R2)
For Example, A relation R (A, B, C, D) with FD set{A->BC} is decomposed into R1(ABC) and R2(AD)
which is a lossless join decomposition as:
1. First condition holds true as Att(R1) U Att(R2) = (ABC) U (AD) = (ABCD) = Att(R).
2. Second condition holds true as Att(R1) ∩ Att(R2) = (ABC) ∩ (AD) ≠ Φ
3. The third condition holds as Att(R1) ∩ Att(R2) = A is a key of R1(ABC) because A->BC is given.
Question
1. Consider a schema R(A, B, C, D) and functional dependencies A->B and C->D. Then the
decomposition of R into R1(AB) and R2(CD)
(A) dependency preserving and lossless join
(B) lossless join but not dependency preserving
(C) dependency preserving but not lossless join
(D) not dependency preserving and not lossless join
Answer:
For lossless join decomposition, these three conditions must hold:
Att(R1) U Att(R2) = ABCD = Att(R)
Att(R1) ∩ Att(R2) = Φ, which violates the
condition of lossless join decomposition. Hence the decomposition is not lossless.
For dependency preserving decomposition, A->B can be ensured in R1(AB) and C->D can be ensured in
R2(CD). Hence it is dependency preserving decomposition. So, the correct option is C.
NORMALIZATION
A large database defined as a single relation may result in data duplication. This repetition of data may
result in:
o Making relations very large.
o It isn't easy to maintain and update data as it would involve searching many records in relation.
o Wastage and poor utilization of disk space and resources.
o The likelihood of errors and inconsistencies increases.
So to handle these problems, we should analyze and decompose the relations with redundant data into
smaller, simpler, and well-structured relations that are satisfy desirable properties. Normalization is a
process of decomposing the relations into relations with fewer attributes.
What is Normalization?
o Normalization is the process of organizing the data in the database.
o Normalization is used to minimize the redundancy from a relation or set of relations. It is also used to
eliminate undesirable characteristics like Insertion, Update, and Deletion Anomalies.
o Normalization divides the larger table into smaller and links them using relationships.
o The normal form is used to reduce redundancy from the database table.
5NF A relation is in 5NF. If it is in 4NF and does not contain any join
dependency, joining should be lossless.
Advantages of Normalization:
o Normalization helps to minimize data redundancy.
o Greater overall database organization.
o Data consistency within the database.
o Much more flexible database design.
o Enforces the concept of relational integrity.
Disadvantages of Normalization:
o You cannot start building the database before knowing what the user needs.
o The performance degrades when normalizing the relations to higher normal forms, i.e., 4NF, 5NF.
o It is very time-consuming and difficult to normalize relations of a higher degree.
o Careless decomposition may lead to a bad database design, leading to serious problems.
Example 2:
ID Name Courses
1 A C1,C2
2 E C3
3 M C2,C3
In the above table, Course is a multi-valued attribute so it is not in 1NF.
Below Table is in 1NF as there is no multi-valued attribute:
ID Name Course
1 A C1
1 A C2
2 E C3
3 M C2
4 M C3
Note: A database design is considered as bad if it is not even in the First Normal Form (1NF).
Example #1
Let us take a look at an example in the form of a table. Here, we can divide the values available in the first
row of the [Hues] column into pink and black. Thus, the [TABLE_ITEMS] is not present in 1NF.
TABLE_ITEMS:
Item No. Hues Cost
1 pink, black 15.99
2 red 23.99
3 black 17.50
4 red, grey 9.99
5 brown 29.99
The table here isn’t in the first normal form, since the column [Hues] can consist of multiple values in it.
For instance, the first row consists of pink and black, and the fourth row consists of red and grey.
How do we bring this table, which is in an unnormalized form, into the normalized form? We will split this
table into two separate ones. As a result, we will have to generate the following tables:
TABLE_ITEMS_HUES:
Item No. Hues
1 pink
1 black
2 red
3 black
4 red
4 grey
5 brown
TABLE_ITEMS_COSTS:
Item No. Cost
1 15.99
2 23.99
3 17.50
4 9.99
5 29.99
Here, the first normal form is finally satisfied with both of these tables. It is because all the columns
of each of these hold just single values, and that’s what we want from 1NF.
Remember that a repeating group refers to a table that consists of two or more than two columns that are
related to each other closely.
Example #2
Look at this sample data in a table:
Here, you can see there are multiple values in similar columns. We can resolve it as follows:
This way, although a few values are getting repeated, we can still see that there is just one value in every
column.
STUDENT_TABLE
A relation is said to be in the 2nd Normal Form in DBMS (or 2NF) when it is in the First Normal
Form but has no non-prime attribute functionally dependent on any candidate key’s proper subset in a
relation. A relation’s non-prime attribute refers to that attribute that isn’t a part of a relation’s candidate
key.
Note: When a candidate key’s subset determines the non-prime attributes, then we can call it a partial
dependency.
TUTOR table
TUTOR_ID COURSE TUTOR_AGE
2115 Java 30
2115 C 30
4997 Python 35
8663 C++ 38
8663 Go 38
Answer:
TUTOR_DETAIL table:
TUTOR_ID TUTOR_AGE
2115 30
4997 35
8663 38
TUTOR_COURSE table:
TUTOR_ID COURSE
2115 Java
2115 C
4997 Python
8663 C++
8663 Go
<Course_Info>
Course_ID Course_Name
A09 CSS
A07 PHP
A03 HTML
A05 Ruby
Example #2
Take a look at these functional dependencies in the relation A (P, Q, R, S, T)
Here,
P -> QR,
RS -> T,
Q -> S,
T -> P
In the relation given above, all the possible candidate keys would be {P, T, RS, QR}. In this case, the
attributes that exist on the right sides of all the functional dependencies are prime.
CANDIDATE_DETAIL Table:
CAND_ID CAND_NAME CAND_ZIP CAND_CITY CAND_STATE
262 Jake 201010 Noida UP
353 Rosa 02228 Boston US
434 Charles 60007 Chicago US
545 Gina 0 6389 Norwich UK
626 Terry 462007 Bhopal MP
Answer: The super key in the table mentioned above would be:
{CAND_ID}, {CAND_ID, CAND_NAME}, {CAND_ID, CAND_NAME, CAND_ZIP} …. and so on
The candidate key here is: {CAND_ID}
Non-prime attributes: All the attributes in the table mentioned above are non-prime instead of CAND_ID.
Notice that CAND_CITY & CAND_STATE are dependent on CAND_ZIP, and CAND_ZIP is dependent
on the CAND_ID. Here, all the non-prime attributes (CAND_CITY, CAND_STATE) are dependent
transitively on the super key (CAND_ID). The transitive dependency here would violate the rules of the
third normal form.
Thus, we must move the CAND_CITY and the CAND_STATE to the new table of <CANDIDATE_ZIP>,
and the primary key here is CAND_ZIP.
Thus,
CANDIDATE Table:
CAND_ID CAND_NAME CAND_ZIP
262 Jake 201010
353 Rosa 02228
434 Charles 60007
545 Gina 06389
626 Terry 462007
CANDIDATE_ZIP Table:
CAND_ZIP CAND_CITY CAND_STATE
02228 Noida UP
201010 Boston US
60007 Chicago US
06389 Norwich UK
462007 Bhopal MP
Answer:
TABLE_BOOK
Book ID Genre ID Price
111 564 23.99
222 842 18.99
333 564 13.99
444 179 15.99
555 842 27.99
TABLE_GENRE
Book ID Genre Type
111 Sports
222 Travel
333 Fashion
BOYCE-CODD NORMAL FORM (BCNF):
Boyce-Codd Normal Form or BCNF is an extension to the third normal form, and is also known as
3.5 Normal Form.
As you can see, we have also added some sample data to the table.
In the table above:
One student can enrol for multiple subjects. For example, student with student_id 101, has opted for
subjects - Java & C++
For each subject, a professor is assigned to the student.
And, there can be multiple professors teaching one subject like we have for Java.
Note:
This table satisfies the 1st Normal form because all the values are atomic, column names are unique
and all the values stored in a particular column are of same domain.
This table also satisfies the 2nd Normal Form as their is no Partial Dependency.
And, there is no Transitive Dependency, hence the table also satisfies the 3rd Normal Form.
But this table is not in Boyce-Codd Normal Form.
Multivalued Dependency:
Multivalued dependency occurs when two attributes in a table are independent of each other but,
both depend on a third attribute. A multivalued dependency consists of at least two attributes that are
dependent on a third attribute that's why it always requires at least three attributes.
Example: Suppose there is a bike manufacturer company which produces two colors(white and black) of
each model every year.
BIKE_MODEL MANUF_YEAR COLOR
M2011 2008 White
M2001 2008 Black
M3001 2013 White
M3001 2013 Black
M4006 2017 White
M4006 2017 Black
Here columns COLOR and MANUF_YEAR are dependent on BIKE_MODEL and independent of each
other.
In this case, these two columns can be called as multivalued dependent on BIKE_MODEL. The
representation of these dependencies is shown below:
1. BIKE_MODEL → → MANUF_YEAR
2. BIKE_MODEL → → COLOR
This can be read as "BIKE_MODEL multi determined MANUF_YEAR" and "BIKE_MODELmulti determined
COLOR"
FOURTH NORMAL FORM (4NF)
What is Fourth Normal Form (4NF)?
A relation will be in 4NF if it is in Boyce Codd normal form and has no multi-valued
dependency.
Example :
STUDENT:
STU_ID COURSE HOBBY
21 Computer Dancing
21 Math Singing
34 Chemistry Dancing
74 Biology Cricket
59 Physics Hockey
In the STUDENT relation, a student with STU_ID, 21 contains two courses, Computer and Math
and two hobbies, Dancing and Singing. So there is a Multi-valued dependency on STU_ID, which leads to
unnecessary repetition of data.
So to make the above table into 4NF, we can decompose it into two tables:
STUDENT_COURSE:
STU_ID COURSE
21 Computer
21 Math
34 Chemistry
74 Biology
59 Physics
So to make the above table into 4NF, we can decompose it into two tables:
STUDENT_COURSE:
STU_ID HOBBY
1 Dancing
21 Singing
34 Dancing
74 Cricket
59 Hockey
EXAMPLE 2:
Below we have a college enrolment table with columns s_id, course and hobby.
s_id course hobby
1 Science Cricket
1 Maths Hockey
2 C# Cricket
2 Php Hockey
As you can see in the table above, student with s_id 1 has opted for two courses, Science and Maths,
and has two hobbies, Cricket and Hockey.
You must be thinking what problem this can lead to, right?
Well the two records for student with s_id 1, will give rise to two more records, as shown below,
because for one student, two hobbies exists, hence along with both the courses, these hobbies should be
specified.
s_id Course Hobby
1 Science Cricket
1 Maths Hockey
1 Science Hockey
1 Maths Cricket
And, in the table above, there is no relationship between the columns course and hobby. They are
independent of each other.
So there is multi-value dependency, which leads to un-necessary repetition of data and other anomalies as
well.
We can break, or decompose the above table into three tables, this would mean that the table is not
in 5NF!
The three decomposed tables would be:
E_Name Company
Rohan Comp1
Harpreet Comp2
Anant Comp3
E_Name Product
Rohan Jeans
Harpreet Jacket
Anant TShirt
Company Product
Comp1 Jeans
Comp2 Jacket
Comp3 TShirt
Note: If the natural join of all three tables yields the relation table R, the relation will be said to have join
dependency.
Let's try to figure out whether or not R has join dependency.
Step 1- First, the natural join of R1 and R2:
Step 2- Next, let's perform the natural join of the above table with R3:
In the above example, we do get the same table R after performing the natural joins at both steps,
luckily.
Therefore, our join dependency comes out to be: {(E_Name, Company ), (E_Name, Product),
(Company, Product)}
Because the above-mentioned relations are joined dependent, they are not 5NF. That is, a join
relation of the three relations above is equal to our initial relation table R.
Non-Loss Decomposition
When the table does not contain any join dependency then it is called a lossless /non-loss
decomposition.
In other words, we can say that
A database is in 5NF when there is no join dependency present in the table / database.
When we decompose the given table to remove redundancy in the data and then compose it again to
create the original table , we should not lose any data, and the original table should be obtained as a
loss should happen after the decomposition of the table.
Join dependency for relation R can be stated as
R=(R1 ⨝ R2 ⨝ R3 ⨝ ………Rn)
where R1,R2,R3…..Rn are sub-relation of R and ⨝ is Natural Join Operator.
Example
let’s, take of Table R which has 3 columns i.e. subject, class, and teacher where each subject can be
taught by many teachers in many classes, and a teacher can teach more than 1 subject.
Here the subject of math is taught by both teachers kartik and yash. Also yash can teach math and
science. Yash teaches math to both class 9 and class 10.
As there is redundancy in data we will decompose it into two tables R1 and R2 such that R1 will
have attribute Subject and Class and R2 will have attribute class and teacher.
Here we removed the redundancy in the table by removing the extra tuple with the same values i.e.
subject math taught in class 10. This tuple is repeated 2 times in the main table but in table R1 this
redundancy is removed.
Table R2
Class Teacher
class 10 kartik
class 9 yash
class 10 yash
Here we removed the redundancy in the table by removing the extra tuple with the same values i.e.
yash is teaching for class 10. This tuple is repeated 2 times in the main table but in table R2 this
redundancy is removed.
After combining both tables R1 and R2 we will get as mentioned below:
Here if we notice the newly composed table from R1 and R2 and the original table, an extra tuple is
added that did not exist in the original data, This breaks the second rule of 5NF i.e. non-loss
decomposition.
This type of unwanted tuple is known as Spurious tuple.
Here we will decompose the given table in another relation R3 where it will have 2 columns i.e.
subject and teacher.
Table R3
Subject Teacher
math yash
math kartik
science yash
Here the newly decomposed table R3 will have 3 tuples only as the repeated tuple (redundancy ) is
not added to the table. yash teaching the subject math is repeated 2 times in main table R but here it
will be added only one time resulting in removing the redundancy in the table.
Now if we compose or rejoin the tables R1, R2, and R3 we will get
Is 5NF is used?
As 5NF increases the complexity of the database, it is less frequently used in the industry.
Complexity is the reason for less use of 5NF.
MULTIVALUED DEPENDENCY:
Multivalued dependency (MVD) is having the presence of one or more rows in a table. It implies the
presence of one or more other rows in that same table. A multivalued dependency prevents fourth normal
form. A multivalued dependency involves at least three attributes of a table.
It is represented with a symbol "->->" in DBMS.
X->Y relates one value of X to one value of Y.
X->->Y (read as X multidetermines Y) relates one value of X to many values of Y.
A Nontrivial MVD occurs when X->->Y and X->->z where Y and Z are not dependent are
independent to each other. Non-trivial MVD produces redundancy.
FUNCTIONAL DEPENDENCY:
A Functional Dependency in DBMS is a fundamental concept that describes the relationship between
attributes (columns) in a table. It shows how the values in one or more attributes determine the value in
another. The functional dependency is a relationship that exists between two attributes. It typically exists
between the primary key and non-key attribute within a table.
X → Y
The left side of FD is known as a determinant, the right side of the production is known as a dependent.
Example:
Consider a table with two columns Employee_Id and Employee_Name.
{Employee_id, Employee_Name} → Employee_Id is a trivial functional dependency as
Employee_Id is a subset of {Employee_Id, Employee_Name}.
Also, Employee_Id → Employee_Id and Employee_Name → Employee_Name are trivial dependencies to.
Example:
ID → Name,
Name → DOB
Lossless Lossy
2. Formally, Let R be a relation and R1, R2, R3 … 2. Formally, Let R be a relation and R1, R2, R3
Rn be it’s decomposition, the decomposition is … Rn be its decomposition, the decomposition
lossless if – is lossy if –
R1 ⨝ R2 ⨝ R3 .... ⨝ Rn = R R ⊂ R1 ⨝ R2 ⨝ R3 .... ⨝ Rn
4. The common attribute of the sub relations is a 5. 4. The common attribute of the sub relation is
superkey of any one of the relation. not a superkey of any of the sub relation.
DEPENDENCY PRESERVATION
Dependency preservation in database management systems (DBMS) refers to the property of
ensuring that functional dependencies present in the original relation (table) are preserved when the relation
undergoes certain operations, such as decomposition or normalization.
Functional dependencies (FDs) describe the relationships between attributes within a relation. For
example, if in a relation R, the value of one attribute uniquely determines the value of another attribute, it is
represented as A → B, where A determines B.
When decomposing a relation into multiple smaller relations to achieve higher normal forms (like
2NF, 3NF, BCNF), it's important to ensure that the original functional dependencies are preserved. This
means that after decomposition, the functional dependencies that held in the original relation should still
hold in the decomposed relations.
Dependency preservation is important because it ensures that the information represented by the
original relation is not lost during the decomposition process. If dependency preservation is not maintained,
it can lead to anomalies and inconsistencies in the database.
There are techniques and algorithms to decompose relations while preserving dependencies, such as:
Lossless Decomposition: Decompose the relation into smaller relations in a way that allows us to
reconstruct the original relation using join operations without losing any information.
Dependency-Preserving Decomposition: Decompose the relation in such a way that all original
functional dependencies are preserved in the resulting relations.
Normalization: Normalize the relation into higher normal forms (such as 2NF, 3NF, BCNF) while
ensuring dependency preservation.
By ensuring dependency preservation, we maintain data integrity and consistency within the
database schema, which is crucial for the reliability and correctness of the data stored in the database.
Krithick Raj