Unit Iii
Unit Iii
UNIT III
RELATIONAL DATABASE DESIGN
Basic Concepts
2 Al Jamia Arts & Science College
A database is a collection of logically related records. A relational database stores its data in 2-
dimensional tables. A table is a two-dimensional structure made up of rows (tuples, records)
and columns (attributes, fields). A relational database organizes data in tables (or relations). A
table is made up of rows and columns. A row is also called a record (or tuple). A column is
also called a field (or attribute). A database table is similar to a spreadsheet. However, the
relationships that can be created among the tables enable a relational database to efficiently
store huge amount of data, and effectively retrieve selected data. A language called SQL
(Structured Query Language) was developed to work with relational databases.
• Eliminate Data Redundancy: the same piece of data shall not be stored in more than one
place. This is because duplicate data not only waste storage spaces but also easily lead
to inconsistencies.
• Ensure Data Integrity and Accuracy
Concepts:
Domains and attributes :
Keys:
Tuples:
Integrity rules:
Semantics
Anomalies
Functional dependency, Decomposition ,Normalization
Semantics
Semantics specifies how to interpret the attribute values stored in a tuple of the relation-in other
words, how the attribute values in a tuple relate to one another
Types:
1. Insertion Anomaly
2. Updation Anomaly
3. Deletion Anomaly
3 Al Jamia Arts & Science College
Insertion Anomaly
Suppose for a new admission, until and unless a student opts for a branch, data of the student
cannot be inserted, or else we will have to set the branch information as NULL.
Also, if we have to insert data of 100 students of same branch, then the branch information will be
repeated for all those 100 students.
Updation Anomaly
What if Mr. X leaves the college? Or is no longer the HOD of computer science department? In
that case all the student records will have to be updated, and if by mistake we miss any record, it
will lead to data inconsistency. This is Updation anomaly.
Deletion Anomaly
In our Student table, two different informations are kept together, Student information and Branch
information. Hence, at the end of the academic year, if student records are deleted, we will also
lose the branch information. This is Deletion anomaly.
Cause of Anomalies
Fixing Anomalies
Anomalies can be corrected by
Decomposition
Normalization
FUNCTIONAL DEPENDENCIES
Functional dependency is defined as the attributes of a table is said to be dependent on each other
when an attribute of a table uniquely identifies another attribute of the same table. If column A of a
table uniquely identifies the column B of same table then it can represented as A->B (Attribute B
4 Al Jamia Arts & Science College
Multi-Valued Dependency
A table is said to have multi-valued dependency, if the following conditions are true,
1. For a dependency A → B, if for a single value of A, multiple value of B exists, then the table
may have multi-valued dependency.
2. Also, a table should have at-least 3 columns for it to have a multi-valued dependency.
3. And, for a relation R(A,B,C), if there is a multi-valued dependency between, A and B, then B
and C should be independent of each other.
If all these conditions are true for any relation (table), it is said to have multi-valued dependency.
For an Example
Below we have a college enrolment table with columns s_id, course and hobby.
As you can see in the table above, student with s_id 1 has opted for two
courses, Science and Maths, and has two hobbies, Cricket and Hockey.
You must be thinking what problem this can lead to, right?
Well the two records for student with s_id 1, will give rise to two more records, as shown below,
because for one student, two hobbies exists, hence along with both the courses, these hobbies
should be specified.
So there is multi-value dependency, which leads to un-necessary repetition of data and other
anomalies as well.
6 Al Jamia Arts & Science College
ProjectID ProjectCost
001 1000
002 5000
<EmployeeProject>
<StudentProject>
DECOMPOSITION
Decomposition is the process of breaking down in parts or elements.
It replaces a relation with a collection of smaller relations.
It breaks the table into multiple tables in a database.
It should always be lossless, because it confirms that the information in the original relation
can be accurately reconstructed based on the decomposed relations.
If there is no proper decomposition of the relation, then it may lead to problems like loss of
information.
Properties of Decomposition
o If the information is not lost from the relation that is decomposed, then the decomposition
will be lossless.
o The lossless decomposition guarantees that the join of relations will result in the same
relation as it was decomposed.
8 Al Jamia Arts & Science College
o The relation is said to be lossless decomposition if natural joins of all the decomposition
give the original relation.
Example:
EMPLOYEE_DEPARTMENT table:
The above relation is decomposed into two relations EMPLOYEE and DEPARTMENT
EMPLOYEE table:
22 Denim 28 Mumbai
33 Alina 25 Delhi
46 Stephan 30 Bangalore
52 Katherine 36 Mumbai
60 Jack 40 Noida
DEPARTMENT table
827 22 Sales
438 33 Marketing
869 46 Finance
575 52 Production
678 60 Testing
Now, when these two relations are joined on the common column "EMP_ID", then the resultant
relation will look like:
9 Al Jamia Arts & Science College
Employee ⋈ Department
As the name suggests, when a relation is decomposed into two or more relational schemas, the loss
of information is unavoidable when the original relation is retrieved. If the information is lost from
the relation that is decomposed, then the decomposition will be lossy decomposition.
Let us see an example:
<EmpInfo>
<EmpDetails>
<DeptDetails>
Dept_ID Dept_Name
Dpt1 Operations
Dpt2 HR
Dpt3 Finance
Now, you won’t be able to join the above tables, since Emp_ID isn’t part of
the DeptDetails relation. Therefore, the above relation has lossy decomposition.
10 Al Jamia Arts & Science College
Dependency Preserving
It is an important constraint of the database.
In the dependency preservation, at least one decomposed table must satisfy every dependency.
If a relation R is decomposed into relation R1 and R2, then the dependencies of R either must
be a part of R1 or R2 or must be derivable from the combination of functional dependencies of
R1 and R2.
For example, suppose there is a relation R (A, B, C, D) with functional dependency set
(A->BC). The relational R is decomposed into R1(ABC) and R2(AD) which is dependency
preserving because FD A->BC is a part of relation R1(ABC).
NORMALIZATION
Normalization is the process of organizing the data in the database. Normalization is used to
minimize the redundancy from a relation or set of relations. It is also used to eliminate the
undesirable characteristics like Insertion, Update and Deletion Anomalies. Normalization divides
the larger table into the smaller table and links them using relationship. The normal form is used to
reduce redundancy from the database table.
To be in second normal form, a relation must be in first normal form and relation must not contain
any partial dependency. A relation is in 2NF if it has No Partial Dependency, i.e., no non-prime
attribute (attributes which are not part of any candidate key) is dependent on any proper subset of
any candidate key of the table.
If we follow second normal form, then every non-prime attribute should be fully functionally
dependent on prime key attribute. That is, if X → A holds, then there should not be any proper
subset Y of X, for which Y → A also holds true.
We see here in Student_Project relation that the prime key attributes are Stu_ID and Proj_ID.
According to the rule, non-key attributes, i.e. Stu_Name and Proj_Name must be dependent upon
both and not on any of the prime key attribute individually. But we find that Stu_Name can be
identified by Stu_ID and Proj_Name can be identified by Proj_ID independently. This is
called partial dependency, which is not allowed in Second Normal Form.
We broke the relation in two as depicted in the above picture. So there exists no partial
dependency.
Third Normal Form (3NF)
A table is said to be in the Third Normal Form when,
We find that in the above Student_detail relation, Stu_ID is the key and only prime key attribute.
We find that City can be identified by Stu_ID as well as Zip itself. Neither Zip is a superkey nor
is City a prime attribute. Additionally, Stu_ID → Zip → City, so there exists transitive
dependency.
To bring this relation into third normal form, we break the relation into two relations as follows −
Boyce and Codd Normal Form is a higher version of the Third Normal form. This form deals
with certain type of anomaly that is not handled by 3NF. A 3NF table which does not have
multiple overlapping candidate keys is said to be in BCNF. For a table to be in BCNF, following
conditions must be satisfied:
Below we have a college enrolment table with columns student_id, subject and professor.
103 C# P.Chash
As you can see, we have also added some sample data to the table.
In the table above:
One student can enrol for multiple subjects. For example, student with student_id 101, has
opted for subjects - Java & C++
For each subject, a professor is assigned to the student.
And, there can be multiple professors teaching one subject like we have for Java.
Well, in the table above student_id, subject together form the primary key, because
using student_id and subject, we can find all the columns of the table.
One more important point to note here is, one professor teaches only one subject, but one subject
may have two different professors.Hence, there is a dependency
between subject and professor here, where subject depends on the professor name.
This table satisfies the 1st Normal form because all the values are atomic, column names are
unique and all the values stored in a particular column are of same domain.
This table also satisfies the 2nd Normal Form as there is no Partial Dependency.And, there is
no Transitive Dependency, hence the table also satisfies the 3rd Normal Form.But this table is
not in Boyce-Codd Normal Form.To make this relation (table) satisfy BCNF, we will decompose
this table into two tables, student table and professor table.
student_id p_id
101 1
101 2
and so on...
And, Professor Table
1 P.Java Java
2 P.Cpp C++
and so on...
And now, this relation satisfy Boyce-Codd Normal Form. In the next tutorial we will learn about
the Fourth Normal Form.
15 Al Jamia Arts & Science College
Below we have a college enrolment table with columns s_id, course and hobby.
1 Science Cricket
1 Maths Hockey
2 C# Cricket
2 Php Hockey
As you can see in the table above, student with s_id 1 has opted for two
courses, Science and Maths, and has two hobbies, Cricket and Hockey.
You must be thinking what problem this can lead to, right?
Well the two records for student with s_id 1, will give rise to two more records, as shown below,
because for one student, two hobbies exists, hence along with both the courses, these hobbies
should be specified.
1 Science Cricket
1 Maths Hockey
1 Science Hockey
1 Maths Cricket
And, in the table above, there is no relationship between the columns course and hobby. They are
independent of each other.
So there is multi-value dependency, which leads to un-necessary repetition of data and other
anomalies as well.
CourseOpted Table
s_id course
1 Science
1 Maths
2 C#
2 Php
s_id hobby
1 Cricket
1 Hockey
2 Cricket
2 Hockey
A relation R is in 5NF if and only if every join dependency in R is implied by the candidate keys
of R. A relation decomposed into two relations must have loss-less join Property, which ensures
that no spurious or extra tuples are generated, when relations are reunited through a natural join. A
relation is in Fifth Normal Form (5NF), if it is in 4NF, and won’t have lossless decomposition into
smaller tables.