0% found this document useful (0 votes)
11 views

Chapter6 Normalization

Uploaded by

Usama Mushtaq
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

Chapter6 Normalization

Uploaded by

Usama Mushtaq
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 77

Chapter Six

Good Database Design


 The major objective of database design is to group attributes into
relations to minimize data redundancy and thereby reduce the file
storage space required by the implemented base relations.
 Relations that contain redundant information may potentially suffer
from update anomalies.
 Types of update anomalies include:
 Insertion Anomaly– adding new rows forces user to create
duplicate data
 Deletion Anomaly – deleting rows may cause a loss of data that
would be needed for other future rows
 Modification Anomaly – changing data in a row forces changes to
other rows because of duplication
Update Anomalies
Proj- Proj- Emp- Emp_ Job-class Charge Hours-
num name num name /hr worked
15 P1 103 X1 Eecl. Engineer
101 X2 Database Designer
105 X3 Database Designer
106 X4 Programmer
102 X5 Sys Analyst
18 P2 114 X6 App. Designer
118 X7 Gen. Support
104 X8 Sys Analyst
112 X9 D.B.A
22 P3 105 X3 Database Designer
104 X8 S. Analyst
113 X10 App. Designer
111 X11 Clerical Sup
106 X4 Programmer
25 P4 107 X12 Programmer
115 X13 Sys Analyst
101 X2 Database Designer
114 X6 App. Designer
108 X14 System Analyst
118 X7 Gen. Support
112 X9 D.B.A
Update Anomalies
Informal Guidelines for Database
Design
GUIDELINE 1: Informally, each tuple in a relation should
represent one entity or relationship instance. (Applies to
individual relations and their attributes).
 Attributes of different entities (EMPLOYEEs, DEPARTMENTs,
PROJECTs) should not be mixed in the same relation
 Only foreign keys should be used to refer to other entities
GUIDELINE 2: Design a schema that does not suffer from
the insertion, deletion and update anomalies. If there are
any present, then note them so that applications can be
made to take them into account.
Informal Guidelines for Database
Design
GUIDELINE 3: Relations should be designed such
that their tuples will have as few NULL values as
possible
 Attributes that are NULL frequently could be placed in
separate relations (with the primary key)
GUIDELINE 4: The relations should be designed to
satisfy the lossless join condition.
What is Normalization?
Normalization is a systematic way of ensuring that a
database structure is suitable for general-purpose querying
and free of certain undesirable characteristics — insertion,
update, and deletion anomalies.
Normalization is a technique for producing a set of
suitable relations that support the data requirements of an
enterprise
There are two goals of the normalization process
 eliminating redundant data
 ensuring data dependencies make sense (only storing related data
in a table)
Purpose of Normalization
Characteristics of a suitable set of relations include:
the minimal number of attributes necessary to support
the data requirements of the enterprise;
attributes with a close logical relationship are found in
the same relation;
minimal redundancy with each attribute represented
only once with the important exception of attributes
that form all or part of foreign keys.
Purpose of Normalization
The benefits of using a database that has a suitable
set of relations is that the database will be:
easier for the user to access and maintain the data;
take up minimal storage space on the computer.
How Normalization Supports
Database Design
Process of Normalization
Involves a series of rules that can be used to test individual
relations so that a database can be normalized to a any
degree (known as normal forms).
 First normal form (1NF)
 Second normal form (2NF)
 Third normal form (3NF)
 Boyce-Codd Normal Form (BCNF)
When a requirement or a test is not met, the relation
violating the requirement must be decomposed into
relations that individually meet the requirements of
normalization.
Process of Normalization
Normalization is often performed as a series of tests
on a relation to determine whether it satisfies or
violates the requirements of a given normal form.
As normalization proceeds, relations become
progressively more restricted (stronger) in format and
also less vulnerable to update anomalies.
Process of Normalization
Unnormalized Relations
Unnormalized Relations
Cust_no CName Pr_no Paddr ReSt ReFi Rent O_No Oname
CR76 J. K PG4 R no.602, 1-7-94 31-8-96 350 CO40 T. M
Bl. 1006
Manama

PG16 R. no. 12 1-9-96 1-9-98 400 CO93 R. S


Bl. 203
Muharaq
CR56 A. S PG4 R no.602, 1-9-92 10-6-94 350 C040 T. M
Bl. 1006
Manama

PG36 R. no. 9310-10-94 1-12-95 375 CO93 R. S


Bl. 333
Manama

PG16 R. no. 12 1-1-96 10-8-96 400 CO93 R. S


Bl. 203
Muharaq
Repeating group
Repeating group
Group of multiple entries of same type exist for any
single key attribute occurrence
Relational table must not contain repeating groups
First Normal Form (1NF)
Remove horizontal redundancies
No two columns hold the same information
 No multivalued attributes
No single column holds more than a single item
 Every attribute value is atomic
Each row must be unique
Use a primary key
Conversion to 1NF
Nominate an attribute or group of attributes to
act as the key for the unnormalized table.
Identify repeating group(s) in unnormalized table
which repeats for the key attribute(s).
Remove repeating group by:
Entering appropriate data into the empty columns of
rows containing repeating data (‘flattening’ the table).
 Or by
Placing repeating data along with copy of the original
key attribute(s) into a separate relation.
Cust_no CName Pr_no Paddr ReSt ReFi Rent O_No Oname
CR76 J. K PG4 R no.602, 1-7-94 31-8-96 350 CO40 T. M
Bl. 1006
Manama

PG16 R. no. 12 1-9-96 1-9-98 400 CO93 R. S


Bl. 203
Muharaq

Example 1
CR56 A. S PG4 R no.602, 1-9-92 10-6-94 350 C040 T. M
Bl. 1006
Manama

PG36 R. no. 9310-10-94 1-12-95 375 CO93 R. S


Bl. 333
Manama

PG16 R. no. 12 1-1-96 10-8-96 400 CO93 R. S


Bl. 203
Muharaq

Cust_no CName Pr_no Paddr ReSt ReFi Rent O_No Oname


CR76 J. K PG4 R no.602, 1-7-94 31-8- 350 CO40 T. M
Bl. 1006 96
Manama
CR76 J. K PG16 R. no. 12 1-9-96 1-9-98 400 CO93 R. S
Bl. 203
Muharaq
CR56 A. S PG4 R no.602, 1-9-92 10-6- 350 C040 T. M
Bl. 1006 94
Manama
CR56 A. S PG36 R. no. 93 10-10- 1-12- 375 CO93 R. S
Bl. 333 94 95
Manama
CR56 A. S PG16 R. no. 12 1-1-96 10-8- 400 CO93 R. S
Bl. 203 96
Muharaq
Customer_relation:
Cust_no Cname
CR76 J.K
CR56 A. S

Prop_rental_Owner_relation:
Cust_no Pr_no Paddr ReSt ReFi Rent O_No Oname
CR76 PG4 R no.602, Bl. 1-7-94 31-8-96 350 CO40 T. M
1006 Manama
CR76 PG16 R. no. 12 1-9-96 1-9-98 400 CO93 R. S
Bl. 203 Muharaq
CR56 PG4 R no.602, Bl. 1-9-92 10-6-94 350 C040 T. M
1006 Manama
CR56 PG36 R. no. 93 Bl. 333 10-10- 1-12-95 375 CO93 R. S
Manama 94

CR56 PG16 R. no. 12 1-1-96 10-8-96 400 CO93 R. S


Bl. 203 Muharaq
Functional Dependency
It is the main concept associated with
normalization.
Functional dependences (FDs) describe the
relationships between attributes in a relation.
FDs are constraints that are derived from the
meaning and interrelationships of the data
attributes.
Main Characteristics of Functional
Dependencies
Have a 1:1 relationship between attribute(s) on left
and right-hand side of a dependency;
Hold for all time;
Are nontrivial
a dependency is trivial if it is impossible for it not to be
satisfied.
Furthermore, a dependency is trivial, if and only if, the
right hand side is a subset of the left-hand side.)
Functional Dependency
If A and B are attributes of relation R, B is
functionally dependent on A (denoted A  B), if
each value of A in R is associated with exactly one
value of B in R.
Diagrammatically, a FD can be shown as
Example - Functional Dependency

Determinant of a functional dependency refers to attribute or


group of attributes on left-hand side of the arrow.
Example - Functional Dependency

 Prime attribute - attribute that is member of the primary key K


 A Nonprime attribute is not a prime attribute -that is, it is not a
member of any candidate key
Example - Functional Dependency

Order_ID  Order_Date, Customer_ID, Customer_Name, Customer_Address


Customer_ID  Customer_Name, Customer_Address
Product_ID  Product_Description, Product_Finish, Unit_Price
Order_ID, Product_ID  Order_Quantity
Full Dependencies and Partial
Dependencies
Full functional dependency - a FD Y  Z where
removal of any attribute from Y means the FD
does not hold any more
If A and B are attributes of a relation, B is fully
functionally dependent on A if B is functionally
dependent on A but not on any proper subset of A.
Partial functional dependency – a functional
dependency A  B is partially dependency if there
is some attributes that can be removed from A
and the dependency still holds.
Full Dependencies and Partial
Dependencies
Consider the following functional dependency:

staff_no, sname  brach_no.

It is correct to say that each value of (staff_no, sname)
is associated with a single value of branch_no.

However, it is not a full functional dependency
because branch_no is also functionally dependent on
subset of (staff_no, sname), namely staff_no.
Second Normal Form (2NF)
A relation schema R is in second normal form
(2NF) if every non-prime attribute A in R is fully
functionally dependent on the primary key i.e.
Table is in second normal form (2NF) when:
 It is in 1NF and
 It includes no partial dependencies:

 No attribute is dependent on only portion of

primary key
Conversion to Second Normal Form
Step 1: Write Each Key Component on a Separate Line
Write each key component on separate line, then write
original (composite) key on last line
Each component will become key in new table
Step 2: Assign Corresponding Dependent Attributes
Determine those attributes that are dependent on other
attributes
Conversion to Second Normal Form
Conversion to Second Normal Form
Conversion to Second Normal Form
Cust_no Pr_no Paddr ReSt ReFi Rent O_No Oname

(fully dependent on key)

Rental_relation
Cust_no Pr_no ReSt ReFi
CR76 PG4 1-7-94 31-8-96
CR76 PG16 1-9-96 1-9-98

Property_Owner_Relation
Pr_no Paddr Rent O_No Oname
PG4 R no.602, Bl. 350 CO40 T. M
1006 Manama
PG16 R. no. 12 400 CO93 R. S
Bl. 203 Muharaq
Third Normal Form (3NF)
Based on the concept of transitive dependency.
Transitive Dependency is a condition where
A, B and C are attributes of a relation such that if A  B
and B  C,
then C is transitively dependent on A through B.
(Provided that A is not functionally dependent on B or
C).

Staff_no.  Branch_no and


Barnch_no Baddress
Then Staff_no  Baddress
Third Normal Form (3NF)
A relation schema R is in third normal form (3NF) if
it is in 2NF and
no non-prime attribute A in R is transitively dependent
on the primary key.
2NF to 3NF
Identify the primary key in the 2NF relation.
Identify functional dependencies in the relation.
If transitive dependencies exist on the primary key
remove them by placing them in a new relation along
with copy of their determinant.
2NF to 3NF
2NF to 3NF
2NF to 3NF
A Complete Example
For a shipping company.
Vessel number identifies the ship.
Voyage = year – voyage number – direction
Remaining fields list ports and dates for voyage.
A Complete Example
Boyce Codd Normal Form (BCNF)
A relation is in BCNF, if and only if, every
determinant is a candidate key.
A table can be in 3NF but not in BCNF. This occurs
when a non key attribute is a determinant of a key
attribute.

ABCD
CB
BCNF Example
ClientInterview
roomNo staffNo interviewTime interviewDate ClientNo
G101 SG5 10.30 13-May-02 CR76

G101 SG5 12.00 13-May-02 CR76

G102 SG37 12.00 13-May-02 CR74

G102 SG5 10.30 1-Jul-02 CR56

 FD1 clientNo, interviewDate  interviewTime, staffNo, roomNo (Primary


Key)

 FD2 staffNo, interviewDate, interviewTime clientNo (Candidate


key)

 FD3 roomNo, interviewDate, interviewTime  clientNo, staffNo (Candidate


key)

 FD4 staffNo, interviewDate  roomNo (not a candidate key)

 As a consequece the ClientInterview relation may suffer from update anmalies.

 For example, two tuples have to be updated if the roomNo need be changed for
staffNo SG5 on the 13-May-02.
Example of BCNF
To transform the ClientInterview relation to BCNF, we must remove
the violating functional dependency by creating two new relations
called Interview and StaffRoom as shown below,

Interview (clientNo, interviewDate, interviewTime, staffNo)


StaffRoom(staffNo, interviewDate, roomNo)
Interview
staffNo interviewTime interviewDate ClientNo
SG5 10.30 13-May-02 CR76
SG5 12.00 13-May-02 CR76
SG37 12.00 13-May-02 CR74
SG5 10.30 1-Jul-02 CR56

StaffRoom
roomNo interviewDate staffNo
G101 13-May-02 SG5
G102 13-May-02 SG37
G102 1-Jul-02 SG5

BCNF Interview and StaffRoom relations


Review of Normalization (UNF to BCNF)
Review of Normalization (UNF to BCNF)
Review of Normalization (UNF to BCNF)
Fourth Normal Form (4NF)
Although BCNF removes anomalies due to
functional dependencies, another type of
dependency called a multi-valued dependency
(MVD) can also cause data redundancy.

Possible existence of multi-valued dependencies


in a relation is due to 1NF and can result in data
redundancy.
Fourth Normal Form (4NF)
Multi-valued Dependency (MVD)
Dependency between attributes (for example, A, B,
and C) in a relation, such that for each value of A
there is a set of values for B and a set of values for
C. However, the set of values for B and C are
independent of each other.
Fourth Normal Form (4NF)
MVD between attributes A, B, and C in a relation
using the following notation:
A −>> B
A −>> C
Fourth Normal Form (4NF)
A multi-valued dependency can be further
defined as being trivial or nontrivial.
A MVD A −>> B in relation R is defined as being
trivial if (a) B is a subset of A or (b) A  B = R.
A MVD is defined as being nontrivial if neither (a)
nor (b) are satisfied.
A trivial MVD does not specify a constraint on a
relation, while a nontrivial MVD does specify a
constraint.
Fourth Normal Form (4NF)
Defined as a relation that is in Boyce-Codd
Normal Form and contains no nontrivial multi-
valued dependencies.
4th Normal Form Cont…
This is best discussed through mathematical notation.

Assume the following relation

R(a:pk1, b:pk2, c:pk3)

Recall that a relation is in BCNF if all its determinant


are candidate keys, in other words each determinant
can be used as a primary key.
Because relation R has only one determinant (a, b, c),
which is the composite primary key and since the
primary is a candidate key therefore R is in BCNF.
4th Normal Form Cont…
Now R may or may not be in fourth normal form.

1. If R contains no multi value dependency then R will be in Fourth


normal form.

2. Assume R has the following two-multi value dependencies:

a --->> b and a --->> c

In this case R will be in the fourth normal form if b and c dependent on


each other.
However if b and c are independent of each other then R is not in
fourth normal form and the relation has to be projected to following
two non-loss projections. These non-loss projections will be in fourth
normal form.
4th Normal Form Cont…
Many-to-many relationships

Fourth Normal Form applies to situations


involving many-to-many relationships.
In relational databases, many-to-many
relationships are expressed through cross-
reference tables.
4NF - Example
Example 2
Consider a case of class enrollment. Each
student can be enrolled in one or more
classes and each class can contain one or
more students.
Clearly, there is a many-to-many relationship
between classes and students. This
relationship can be represented by a
Student/Class cross-reference table:

{StudentID, ClassID}
Example 2 Cont…
The key for this table is the combination of
StudentID and ClassID. To avoid violation of
2NF, all other information about each student
and each class is stored in separate Student and
Class tables, respectively.

Note that each StudentID determines not a


unique ClassID, but a well-defined, finite set of
values. This kind of behavior is referred to as
multi-valued dependency of ClassID on
StudentID.
Example 3
Consider another example with two many-to-many relationships, between
students and classes and between classes and teachers.

* *
Students Classes

* *
Classes Teachers

Also, a many-to-many relationship between


.students and teachers is implied
Example 3 Cont…
However, the business rules do not constrain this
relationship in any way—the combination of StudentID
and TeacherID does not contain any additional
information beyond the information implied by the
student/class and class/teacher relationships.

Consequentially, the student/class and class/teacher


relationships are independent of each other—these
relationships have no additional constraints. The
following table is, then, in violation of 4NF:

{StudentID, ClassID, TeacherID}


4th NF and Anomalies
As an example of the anomalies that can occur,
realize that it is not possible to add a new class
taught by some teacher without adding at least
one student who is enrolled in this class.

To achieve 4NF, represent each independent


many-to-many relationship through its own cross-
reference table.
4th Normal Form and anomalies Cont…
Case 1:
Assume the following relation:
Employee (Eid:pk1, Language:pk2, Skill:pk3)

No multi value dependency, therefore R is in


fourth normal form.
4th Normal Form and
anomalies Cont…
:case 2
Assume the following relation with multi-value
:dependency

Employee (Eid:pk1, Languages:pk2, Skills:pk3)


Eid --->> Skills Eid --->> Languages

.Languages and Skills are dependent


This says an employee speak several languages and
has several skills. However for each skill a specific
.language is used when that skill is practiced
Thus employee 100 when he/she teaches speaks
English but when he cooks speaks French. This
relation is in fourth normal form and does not
suffer from any anomalies.

Skill Language Eid


Teaching English 100
Politic Kurdish 100
Cooking French 100
Cooking English 200
Singing Arabic 200
4th Normal Form and
…anomalies Cont

case 3:
Assume the following relation with multi-value
dependency:

Employee (Eid:pk1, Languages:pk2, Skills:pk3)

Eid --->> Languages Eid --->> Skills

Languages and Skills are independent.


4th Normal Form and
anomalies Cont…
This relation is not in fourth normal form and suffers
.from all three types of anomalies

Skill Language Eid


Teaching English 100
Politic Kurdish 100
Politic English 100
Teaching Kurdish 100
Singing Arabic 200
Insertion anomaly: To insert row (200 English Cooking) we
have to insert two extra rows (200 Arabic cooking), and
(200 English Singing) otherwise the database will be
inconsistent. Note the table will be as follow:

Skill Language Eid


Teaching English 100
Politics Kurdish 100
Politics English 100
Teaching Kurdish 100
Singing Arabic 200
Cooking English 200
Cooking Arabic 200
Singing English 200
 Deletion anomaly: If employee 100 discontinue politic skill
we have to delete two rows:

(100 Kurdish Politic), and (100 English Politic) otherwise the


database will be inconsistent.

Skill Language Eid


Teaching English 100
Politics Kurdish 100
Politics English 100
Teaching Kurdish 100
Singing Arabic 200
Cooking English 200
Cooking Arabic 200
Singing English 200
More anomalies
Update anomaly: If employee 200 changes
his skill from singing to dancing we have to
make changes in more than one place.
The relation is projected to the following two non-loss
projections which are in forth normal form

Emplyee_Language(Eid:pk1, Languages:pk2)

Language Eid
English 100
Kurdish 100
Arabic 200
…Cont

Emplyee_Language(Eid:pk1, Skills:pk2)
Skill Eid
Teaching 100
Politic 100
Singing 200

You might also like