Chapter Four
Chapter Four
1
The whole purpose of the data base design is to create an accurate
representation of the data, the relationship between the data and the
business constraints pertinent to that organization.
Therefore, one can use one or more technique to design a data base.
One such a technique was the E-R model.
In this chapter we use another technique known as
―Normalization with a different emphasis to the database design----
defines the structure of a database with a specific data model.
Logical design is the process of constructing a model of the
information used in an enterprise based on a specific data model
(e.g. relational, hierarchical or network or object), but
independent of a particular DBMS and other physical considerations.
2
CONT…
The focus in logical database design is the
Normalization Process
Normalization process
Collection of Rules (Tests) to be applied on
relations to obtain the minimal, non redundant set or
attributes.
Discover new entities in the process
Revise attributes based on the rules and the
discovered Entities
Works by examining the relationship between
attributes known as functional dependency.
3
The purpose of normalization is to find the suitable
set of relations that supports the data requirements of an
enterprise.
A suitable set of relations has the following characteristics;
Minimal number of attributes to support the data
requirements of the enterprise
Attributes with close logical relationship (functional
dependency) should be placed in the same relation.
Minimal redundancy with each attribute represented only
once with the exception of the attributes which form the
whole or part of the foreign key, which are used for
joining of related tables.
The first step before applying the rules in relational data
model is converting the conceptual design to a form suitable
for relational logical model, which is in a form of tables. 4
Converting ER Diagram to Relational Tables
8
Example to illustrate the major rules in
mapping ER to relational schema
The following ER has been designed to represent the requirement of an
organization to capture Employee, Department and Project information. And
Employee works for department where an employee might be assigned to
manage a department. Employees might participate on different projects within
the organization. An employee might as well be assigned to lead a project
where the starting and ending date of his/her project leadership and bonus
will be registered.
9
After we have drawn the ER diagram, the next thing is to map the ER into
relational schema so as the rules of the relational data model can be tested for
each relational schema.
The mapping can be done for the entities followed by relationships based
on the rule of mapping. the mapping has been done as follows.
Mapping EMPLOYEE Entity:
There will be Employee table with EID, Salary, FName and LName being the
columns.
The composite attribute Name will be ignored as its decomposed attributes
(FName and LName) are columns in the Employee Table. The Tel attribute will
be a new table as it is multi-valued. Employee
Telephone
10
Mapping the MANAGES Relationship:
As the relationship is having one-to-one cardinality, the
PK or CK of one of the table can be posted into the other.
But based on the recommendation, the Pk or CK of the
partial participant (Employee) should be posted to the
total participants (Department).
This will require adding the PK of Employee (EID) in the
Department Table as a foreign key.
We can give the foreign key another name which is
MEID to mean "managers employee id". this will affect
the degree of the Department table.
Department
11
Mapping the WORKSFOR Relationship:
As the relationship is having one-to-many cardinality,
the PK or CK of the "One" side (PK or CK of
Department table) should be posted to the m side
(Employee table).
This any will require adding the PK of Department
(DID) in the Employee Table as a foreign key.
We can give the foreign key another name which is
EDID to mean "Employee's Department id". This will
affect the degree of the Employee table.
Employee
12
CONT….
Mapping the PARTICIPATES Relationship:
• As the relationship is having many-to-many cardinality,
we need to create a new table and post the PK or CK
of the Employee and Project table into the new
table.
• We can give a descriptive new name for the
new table like Emp_Partc_Project to mean
"Employee participate in a project".
• Emp_Partc_Project
13
Mapping the LEADS Relationship:
As the relationship is associative entity, we are supposed
to create a table for the associative entity where the PK of
Employee and Project tables will be posted in the new
table as a foreign key.
The new table will have the attributes of the associative
entity as columns.
We can give a descriptive new name for the new
table like Emp_Lead_Project to mean "Employee leads a
project".
Emp_Lead_Project
At the end of the mapping we will have the following
relational schema (tables) for the logical database design
14
phase.
After converting the ER diagram in to table forms,
the next phase is implementing the process of
normalization, which is a collection of rules each table
15
should satisfy.
Normalization
A relational database is merely a collection of data,
organized in a particular manner.
As the father of the relational database approach, Coded
created a series of rules (tests) called normal forms that help
define that organization.
One of the best ways to determine what information should
be stored in a database is to clarify what questions will be
asked of it and what data would be included in the answers.
Database normalization is a series of steps followed to
obtain a database design that allows for consistent
storage and efficient access of data in a relational
database.
These steps reduce data redundancy and the risk of 16data
CONT…
NORMALIZATION is the process of identifying the
logical associations between data items and designing a
database that will represent such associations but
without suffering the update anomalies which are:
1.Insertion Anomalies
2.Délétion Anomalies
3.Modification Anomalies
17
Normalization may reduce system performance since data
will be cross referenced from many tables.
Thus denormalization is sometimes used to improve
performance, at the cost of reduced consistency
guarantees.
Normalization normally is considered ―good if it is lossless
decomposition.
All the normalization rules will eventually remove the
update anomalies that may exist during data manipulation after
the implementation. The update anomalies are;
The type of problems that could occur in insufficiently
normalized table is called update anomalies which includes.
1.Insertion anomalies
An "insertion anomaly" is a failure to place information
about a new entry into all the places in the database where
information about database that new entry needs to18 be
2.Deletion anomalies
A "deletion anomaly" is a failure to remove information
about an existing database entry when it is time to remove
that entry.
Additionally, deletion of one data may result in lose of
other information.
In a properly normalized database, information about an
old, to-be- gotten-rid-of entry needs to be deleted from
only one place in the database; in an inadequately
normalized database, information about that old entry
may need to be deleted from more than one place,
and, human fallibility being what it is, some of the
needed additional deletions may be missed. 19
Cont…
3.Modification anomalies
A modification of a database involves changing some
value of the attribute of a table.
In a properly normalized database table, what ever
information is modified by the user, the change will be
effected and used accordingly.
In order to avoid the update anomalies we in a given
table, the solution is to decompose it to smaller tables
based on the rule of normalization.
However, the decomposition has two important
properties.
20
Cont...
a.The Lossless-join property insures that any instance of
the original relation can be identified from the instances
of the smaller relations.
b.The Dependency preservation property implies that
constraint on the original dependency can be maintained
by enforcing some constraints on the smaller relations.
i.e. we don‘t have to perform Join operation to check
whether a constraint on the original relation is violated or
not.
The purpose of normalization is to reduce the chances for
anomalies to occur in a database.
21
CONT...
Example of problems related with Anomalies
Deletion Anomalies:
If employee with ID 16 is deleted then ever information about skill
C++ and the type of skill is deleted from the database. Then we
will not have any information about C++ and its skill type. 22
CONT...
Insertion Anomalies:
What if we have a new employee with a skill called Pascal?
We can not decide weather Pascal is allowed as a value for
skill and we have no clue about the type of skill that Pascal
should be categorized as.
Modification Anomalies:
What if the address for Helico is changed from Piazza to
Mexico? We need to look for every occurrence of Helico and
change the value of School_Add from Piazza to Mexico,
which is prone to error.
Database-management system can work only with the
information that we put explicitly into its tables for a given
database and into its rules for working with those tables, where
23
such rules are appropriate and possible.
Functional Dependency (FD)
Before moving to the definition and application of
normalization, it is important to have an understanding of
"functional dependency."
Data Dependency
The logical associations between data items that point
the database designer in the direction of a good database
design are referred to as determinant or dependent
relationships.
Two data items A and B are said to be in a determinant or
dependent relationship if certain values of data item B
always appears with certain values of data item A.
If the data item A is the determinant data item and B
the dependent data item then the direction of the association
is from A to B and not vice versa. 24
CONT’...
However, for the purpose of normalization, we are
interested in finding 1..1 (one to one) dependencies, lasting
for all times (intension rather than extension of the
database), and the determinant having the minimal number of
attributes.
X Y holds if whenever two tuples have the same value for X, they
must have the same value for Y
The notation is: AB which is read as; B is functionally dependent
on A
In general, a functional dependency is a relationship among
attributes. In relational databases, we can have a determinant that
governs one or several other attributes.
FDs are derived from the real-world constraints on the attributes and
they are properties on the database intension not extension.
25
CONT...
Example
26
CONT...
Since both Wine type and Fork type are determined by
the Dinner type, we say Wine is functionally dependent
on Dinner and Fork is functionally dependent on Dinner.
Dinner Wine
Dinner Fork
Partial Dependency
If an attribute which is not a member of the primary
key is dependent on some part of the primary key (if
we have composite primary key) then that attribute is
partially functionally dependent on the primary key.
• Let {A,B} is the Primary Key and C is no key attribute.
• Then if {A,B} C and B C
27
• Then C is partially functionally dependent on {A,B}
CONT...
29
CONT...
Normalization towards a logical design consists of the following
steps:
UnNormalized Form(UNF):
Identify all data elements
First Normal Form(1NF):
Find the key with which you can find all data i.e. remove any
repeating group
Second Normal Form(2NF):
Remove part-key dependencies (partial dependency). Make all
data dependent on the whole key.
Third Normal Form(3NF)
Remove non-key dependencies (transitive dependencies). Make
all data dependent on nothing but the key.
For most practical purposes, databases are considered normalized if
they adhere to the third normal form (there is no transitive
dependency). 30
CONT...
First Normal Form (1NF)
Requires that all column values in a table are atomic (e.g., a number
is an atomic value, while a list or a set is not).
We have two ways of achieving this: 1. Putting each repeating
group into a separate table and connecting them with a primary
key-foreign key relationship
Moving these repeating groups to a new row by repeating the
non-repeating attributes known as ―flattening the table.
If so then Find the key with which you can find all data
Definition: a table (relation) is in 1NF If
There are no duplicated rows in the table. Unique identifier
Each cell is single-valued (i.e., there are no repeating groups).
Entries in a column (attribute, field) are of the same kind
31
CONT...
Example for First Normal form (1NF)
UNNORMALIZED
32
CONT...
FIRST NORMAL FORM (1NF)
Remove all repeating groups. Distribute the multi-valued
attributes into different rows and identify a unique identifier
for the relation so that is can be said is a relation in relational
database. Flatten the table.
33
CONT...
Second Normal form 2NF
No partial dependency of a non key attribute on part of the primary key.
This will result in a set of relations with a level of Second Normal Form.
Any table that is in 1NF and has a single-attribute (i.e., a non-composite)
key is automatically also in 2NF.
Definition: a table (relation) is in 2NF If
It is in 1NF and
If all non-key attributes are dependent on the entire primary key.
i.e. no partial dependency.
Example for 2NF:
EMP_PROJ
EMP_PROJ rearranged
34
CONT...
Business rule: Whenever an employee participates in a
project, he/she will be entitled for an incentive.
This schema is in its 1NF since we don‘t have any repeating
groups or attributes with multi- valued property.
To convert it to a 2NF we need to remove all partial
dependencies of non key attributes on part of the
primary key.
{EmpID, ProjNo} EmpName, ProjName, ProjLoc,
ProjFund, ProjMangID, Incentive
But in addition to this we have the following dependencies
FD1: {EmpID}EmpName
FD2:{ProjNo}ProjName,ProjLoc, ProjFund,
ProjMangID
FD3: {EmpID, ProjNo} Incentive 35
CONT...
As we can see, some non key attributes are partially
dependent on some part of the primary key.
This can be witnessed by analyzing the first two functional
dependencies (FD1 and FD2).
Thus, each Functional Dependencies, with their dependent
attributes should be moved to a new relation where the
Determinant will be the Primary Key for each.
EMPLOYEE
PROJECT
EMP_PROJ
36
CONT...
Third Normal Form (3NF)
Eliminate Columns dependent on another non-Primary Key - If attributes do not
contribute to a description of the key; remove them to a separate table. This level
avoids update and deletes anomalies.
Definition: a Table (Relation) is in 3NF If:
It is in 2NF and
There are no transitive dependencies between a primary key and non-primary
key attributes.
Example for (3NF)
Assumption: Students of same batch (same year) live in one building or dormitory
STUDENT
37
CONT...
This schema is in its 2NF since the primary key is a single attribute and there are
no repeating groups (multi valued attributes).
Let‘s take StudID, Year and Dormitary and see the dependencies.
StudID Year AND Year Dormitary And Year can not determine StudID
and Dormitary can not determine StudID
Then transitively StudIDDormitary
To convert it to a 3NF we need to remove all transitive dependencies of
non key attributes on another non-key attribute.
The non-primary key attributes, dependent on each other will be moved to
another table and linked with the main table using Candidate Key- Foreign Key
relationship.
STTUDENT DORM
38
CONT...
Generally, even though there are other four additional
levels of Normalization, a table is said to be normalized if it
reaches 3NF. A database with all tables in the 3NF is said
to be Normalized Database.
Mnemonic for remembering the rationale for normalization
up to 3NF could be the following:
1.No Repeating or Redundancy: no repeating fields in the
table.
2.The Fields Depend Upon the Key: the table should solely
depend on the key.
3.The Whole Key: no partial key dependency.
4.And Nothing But the Key: no inter data dependency.
5.So Help Me Codd: since Codd came up with these rules.
39