Unit 3 - Chapter6 - Jicm
Unit 3 - Chapter6 - Jicm
Database design can be generally defined as a collection of tasks or processes that enhance
the designing, development, implementation, and maintenance of enterprise data management
system.
Designing a proper database reduces the maintenance cost thereby improving data
consistency and the cost-effective measures are greatly influenced in terms of disk storage
space. Therefore, there has to be a brilliant concept of designing a database.
Life Cycle
There are various strategies that are considered while designing a schema.
That is, we start with a general idea of what is needed for the system and then work your
way down to the more specific details of how the system will interact.
This process involves the identification of different entity types and the definition of each
entity’s attributes.
Conceptual model
ENTITY ENTITY
TOP-DOWN DESIGN
The bottom up approach begins with the specific details and moves up to the general.
This is done by first identifying the data elements and then grouping them together in data
sets.
In other words, this method first identifies the attributes, and then groups them to form
entities.
ENTITY ENTITY
Conceptual model
Anomalies in the relational model refer to inconsistencies or errors that can arise when
working with relational databases, specifically in the context of data insertion, deletion,
and modification. There are different types of anomalies that can occur in referencing and
referenced relations which can be discussed as:
1. Insertion Anomalies
2. Deletion Anomalies
Database anomalies are the faults in the database caused due to poor management of storing
everything in the flat database.
It can be removed with the process of Normalization, which generally splits the database
which results in reducing the anomalies in the database.
1 C1 DBMS
2 C2 Computer Networks
1 C2 Computer Networks
Deletion and Updation anomaly: If a tuple is deleted or updated from referenced relation
(TABLE 1) and the referenced attribute value is used by referencing attribute in referencing
relation, it will not allow deleting the tuple from referenced relation.
What is Decomposition?
Decomposition is the process of breaking down bigger relation into smaller relations. So,
Also, there should not be any loss of information while breaking into smaller parts.
Lossy Decomposition
If you perform natural JOIN operation on decomposed relations, resultant relation will
be original relation which was decomposed.
For example,
Let us take ‘A’ as the Relational Schema, having an instance of ‘a’. Consider that it is
decomposed into: A1, A2, A3, . . . . An; with instance: a1, a2, a3, . . .. an, If a1 ⋈ a2 ⋈ a3 . . .
. ⋈ an, then it is known as ‘Lossless Join Decomposition’.
For example,
Now, we will decompose this into two relations – EMPLOYEE and DEPARTMENT.
EMPLOYEE relation –
22 Rahul 29 Delhi
33 Shyam 35 Mumbai
46 Aryan 35 Bangalore
EMP_ID EMP_NAME EMP_AGE EMP_CITY
52 Katherine 26 Noida
60 Ali 45 Patna
DEPARTMENT Relation
120 22 Sales
138 33 Marketing
169 46 Finance
175 52 Production
178 60 Sales
Now, if you perform natural JOIN operation on EMPLOYEE and DEPARTMENT, resultant
will be –
LOSSY DECOMPOSITION:
Let us consider a relation X. Let us now consider that it gets decomposed into n number
of sub relations, X1, X2, X3, …, Xn. If we naturally join these sub relations, then we will
either obtain the exact previous relation X or we will lose information in this process. In
case we do not get the same relation X (that was decomposed) after joining X1 and X2,
it is known as a lossy decomposition in DBMS.
The natural joining of these sub relations always has some extraneous tuples. In the case
of a lossless decomposition, we can see that:
X1 ⋈ X2 ⋈ X3 ……. ⋈ Xn ⊃ X
Properties of Decomposition
2. Dependency Preservation
Decomposition must always be lossless, which means the information must never get lost
from a decomposed relation. This way, we get a guarantee that when joining the relations, the
join would eventually lead to the same relation in the result as it was actually decomposed.
2. Dependency Preservation
Dependency is a crucial constraint on a database, and a minimum of one decomposed table
must satisfy every dependency. If {P → Q} holds, then two sets happen to be dependent
functionally. Thus, it becomes more useful when checking the dependency if both of these
are set in the very same relation. This property of decomposition can be done only when we
maintain the functional dependency. Added to this, this property allows us to check various
updates without having to compute the database structure’s natural join.
If X is a relation that has attributes P and Q, then their functional dependency would be
represented by -> (arrow sign)
The left side of this arrow is a Determinant (A determinant in a database table is any attribute
that you can use to determine the values assigned to other attribute(s) in the same row.). The
right side of this arrow is a Dependent. P will be the primary key attribute, while Q will be a
dependent non-key attribute from a similar table as the primary key.
but
Functional Dependencies in a relation are dependent on the domain of the relation. Consider
the STUDENT relation given in Table 1.
Functional Dependency Set: Functional Dependency set or FD set of a relation is the set of
all FDs present in the relation. For Example, FD set or relation STUDENT shown in table 1
is:
STUD_STATE->STUD_COUNTRY}
Example:
Example:
43 pqr 18
Here, roll_no → name is a non-trivial functional dependency,
since the dependent name is not a subset
of determinant roll_no. Similarly, {roll_no, name} → age is also a
44 xyz 18
non-trivial functional dependency, since age is not a subset of
{roll_no, name}
3. Multivalued Functional Dependency
45 abc 19
Normalization
Normalization is a process of decomposing the relations into relations with fewer attributes.
What is Normalization?
The main reason for normalizing the relations is removing the anomalies. Failure to eliminate
anomalies leads to data redundancy and can cause data integrity and other problems as the
database grows. Normalization consists of a series of guidelines that helps to guide you in
creating a good database structure.
o Insertion Anomaly: Insertion Anomaly refers to when one cannot insert a new tuple
into a relationship due to lack of data.
o Deletion Anomaly: The delete anomaly refers to the situation where the deletion of
data results in the unintended loss of some other important data.
o Updation Anomaly: The update anomaly is when an update of a single data value
requires multiple rows of data to be updated.
Normalization works through a series of stages called Normal forms. The normal forms apply
to individual relations. The relation is said to be in particular normal form if it satisfies
constraints.
2NF A relation will be in 2NF if it is in 1NF and all non-key attributes are
fully functional dependent on the primary key.
4NF A relation will be in 4NF if it is in Boyce Codd's normal form and has
no multi-valued dependency.
5NF A relation is in 5NF. If it is in 4NF and does not contain any join
dependency, joining should be lossless.
Advantages of Normalization
Disadvantages of Normalization
o You cannot start building the database before knowing what the user needs.
o The performance degrades when normalizing the relations to higher normal forms,
i.e., 4NF, 5NF.
o It is very time-consuming and difficult to normalize relations of a higher degree.
o Careless decomposition may lead to a bad database design, leading to serious
problems.
1. First Normal Form (1NF): This is the most basic level of normalization. In 1NF, each
table cell should contain only a single value, and each column should have a unique
name. The first normal form helps to eliminate duplicate data and simplify queries.
2. Second Normal Form (2NF): 2NF eliminates redundant data by requiring that each
non-key attribute be dependent on the primary key. This means that each column
should be directly related to the primary key, and not to other columns.
3. Third Normal Form (3NF): 3NF builds on 2NF by requiring that all non-key attributes
are independent of each other. This means that each column should be directly related
to the primary key, and not to any other columns in the same table.
4. Boyce-Codd Normal Form (BCNF): BCNF is a stricter form of 3NF that ensures that
each determinant in a table is a candidate key. In other words, BCNF ensures that
each non-key attribute is dependent only on the candidate key (a unique identifier for
a record within a table that can be chosen as the primary key. It possesses the essential
characteristics required for a primary key: uniqueness and minimal redundancy.).
5. Fourth Normal Form (4NF): 4NF is a further refinement of BCNF that ensures that a
table does not contain any multi-valued dependencies.
6. Fifth Normal Form (5NF): 5NF is the highest level of normalization and involves
decomposing a table into smaller tables to remove data redundancy and improve data
integrity.
in 1NF;
Should not consist of partial dependency (When a candidate key’s subset
determines the non-prime attributes, and then we can call it a partial dependency.)
In simpler words,
If a relation is in 1NF and all the attributes of the non-primary keys are fully dependent on
primary keys, then this relation is known to be in the 2NF or the Second Normal Form.
We remove the partial dependencies to normalize the given 1NF relations to the 2NF
relations. In case there is a partial dependency, we will remove that attribute from the relation
that is partially dependent. We basically do so by placing it in a new relation with a copy of
its determinant.
Example #1
111 S1 1000
222 S2 1500
111 S4 2000
444 S3 1000
444 S1 1000
222 S5 2000
In this table, you can note that many subjects come with the same subject fee. Three things
are happening here:
The SUBJECT_FEE won’t be able to determine the values of CAND_NO or SUBJECT_NO
alone;
The SUBJECT_FEE along with CAND_NO won’t be able to determine the values of
SUBJECT_NO;
The SUBJECT_FEE along with SUBJECT_NO won’t be able to determine the values of
CAND_NO;
Thus,
We can conclude that the attribute SUBJECT_FEE is a non-prime one since it doesn’t belong
to the candidate key here {SUBJECT_NO, CAND_ID};
But, on the other hand, the SUBJECT_NO – > SUBJECT_FEE, meaning the SUBJECT_FEE
depends directly on the SUBJECT_NO, and it forms the candidate key’s proper subset. Here,
the SUBJECT_FEE is a non-prime attribute, and it depends directly on the candidate key’s
proper subset. Thus, it forms a partial dependency.
The relation mentioned here does not exist in 2NF.
Let us now convert it into 2NF. To do this, we will split this very table into two, where:
Table 1: CAND_NO, SUBJECT_NO and Table 2: SUBJECT_NO, SUBJECT_FEE
Table 1 Table 2
CAND_NO SUBJECT_NO
SUBJECT_NO SUBJECT_FEE
11 S1
S1 1000
222 S2
S2 1500
111 S4
S3 1000
444 S3
S4 2000
444 S1
S5 2000
222 S5
A relation is in the third normal form, if there is no transitive dependency for non-
prime attributes as well as it is in the second normal form.
In other words,
A relation that is in First and Second Normal Form and in which no non-primary-key
attribute is transitively dependent on the primary key, then it is in Third Normal Form
(3NF).
Example 1:
Candidate Key: {STUD_NO} For this relation in table 4, STUD_NO -> STUD_STATE and
STUD_STATE -> STUD_COUNTRY are true.
So STUD_COUNTRY is transitively dependent on STUD_NO.
It violates the third normal form. To convert it in third normal form, we will decompose the
relation STUDENT (STUD_NO, STUD_NAME, STUD_STATE, STUD_COUNTRY_STUD_AGE) as:
STUDENT
STATE_COUNTRY
STATE COUNTRY
HARYANA INDIA
PUNJAB INDIA
PUNJAB INDIA
BCNF:
BCNF is the advanced version of 3NF. A table is in BCNF if every functional dependency
X->Y, X is the super key of the table. For BCNF, the table should be in 3NF, and for every
FD. LHS is super key (collections of one or more properties (columns) in database
management systems that allow a tuple (row) in a relation (table) to be distinctly identified.)
Example
jhansi K.Das C
subbu R.Prasad C
F: { (student, Teacher) -> subject
The above relation is not in BCNF, because in the FD (teacher->subject), teacher is not a key.
This relation suffers with anomalies −
For example, if we try to delete the student Subbu, we will lose the information that R. Prasad
teaches C. These difficulties are caused by the fact the teacher is determinant but not a
candidate key.
Teacher Subject
P.Naresh database
K.DAS C
R.Prasad C
R2
Student Teacher
Jhansi P.Naresh
Jhansi K.Das
Student Teacher
Subbu P.Naresh
Subbu R.Prasad
All the anomalies which were present in R, now removed in the above two relations.
o A relation is in 5NF if it is in 4NF and not contains any join dependency and joining
should be lossless.
o 5NF is satisfied when all the tables are broken into as many tables as possible in
order to avoid redundancy.
When we decompose the given table to remove redundancy in the data and then
compose it again to create the original table, we should not lose any data, and the
original table should be obtained as a loss should happen after the decomposition
of the table.
Example
let’s, take of Table R which has 3 columns i.e. subject, class, and teacher where each
subject can be taught by many teachers in many classes, and a teacher can teach more
than 1 subject.
Subject Class Teacher
Here the subject of math is taught by both teachers kartik and yash. Also yash can
teach math and science. Yash teaches math to both class 9 and class 10.
As there is redundancy in data we will decompose it into two tables R1 and R2 such
that R1 will have attribute Subject and Class and R2 will have attribute class and
teacher.
Table R1
Subject Class
math class 9
math class 10
science class 10
Here we removed the redundancy in the table by removing the extra tuple with the
same values i.e. subject math taught in class 10. This tuple is repeated 2 times in the
main table but in table R1 this redundancy is removed.
Table R2
Class Teacher
class 10 kartik
class 9 yash
class 10 yash
Here we removed the redundancy in the table by removing the extra tuple with
the same values i.e. yash is teaching for class 10. This tuple is repeated 2 times in
the main table but in table R2 this redundancy is removed.
After combining both tables R1 and R2 we will get as mentioned below:
Here if we notice the newly composed table from R1 and R2 and the original
table, an extra tuple is added that did not exist in the original data, This breaks the
second rule of 5NF i.e. non-loss decomposition.
This type of unwanted tuple is known as spurious tuple.
Here we will decompose the given table in another relation R3 where it will have
2 columns i.e. subject and teacher.
Table R3
Subject Teacher
math yash
math kartik
science yash
Here the newly decomposed table R3 will have 3 tuples only as the repeated tuple
(redundancy) is not added to the table. yash teaching the subject math is repeated 2
times in main table R but here it will be added only one time resulting in removing the
redundancy in the table.
Now if we compose or rejoin the tables R1, R2, and R3 we will get
Table (R1 ⨝ R2⨝ R3)
Now if we see the re-composed table and the original table, there is no loss of
data.
Here all the tables, R1, R2 and R3 had a natural join which resulted in the table R.
After the natural join, the original table is retained as it is. There is no loss of the
data.
Given Table R1, R2 and R3 are in the Fifth Normal Form(5NF).
************************************************************