Normalization PDF
Normalization PDF
Databases
Introduction:
Relational database design ultimately produces a set of relations.
The implicit goals of the design activity are: information
preservation and minimum redundancy.
Guideline 1
1. Design a relation schema so that it is easy to explain.
2. Do not combine attributes from multiple entity types and
relationship types into a single relation.
Deletion Anomalies:
The problem of deletion anomalies is related to the second
insertion anomaly situation just discussed.
Example: If we delete from EMP_DEPT an employee tuple that
happens to represent the last employee working for a particular
department, the information concerning that department is lost
from the database.
Modification Anomalies happen if we fail to update all tuples as
a result in the change in a single one.
Example: if the manager changes for a department, all employees
who work for that department must be updated in all the tables.
It is easy to see that these three anomalies are undesirable and
cause difficulties to maintain consistency of data as well as
require unnecessary updates that can be avoided; hence
Guideline 2
Design the base relation schemas so that no insertion, deletion,
or modification anomalies are present in the relations.
If any anomalies are present, note them clearly and make sure
that the programs that update the database will operate correctly.
The second guideline is consistent with and, in a way, a
restatement of the first guideline.
NULL Values in Tuples
Fat Relations: A relation in which too many attributes are
grouped. If many of the attributes do not apply to all tuples in the
relation, we end up with many NULLs in those tuples. This can
waste space at the storage level and may also lead to problems
with understanding the meaning of the attributes and with
specifying JOIN operations at the logical level.
Another problem with NULLs is how to account for them when
aggregate operations such as COUNT or SUM are applied.
Guideline 3
As much as possible, avoid placing attributes in a base relation
whose values may frequently be NULL.
If NULLs are unavoidable, make sure that they apply in
exceptional cases only.
For example, if only 15 percent of employees have individual
offices, there is little justification for including an attribute
Office_number in the EMPLOYEE relation; rather, a relation
EMP_OFFICES(Essn, Office_number) can be created
Functional Dependencies
The single most important concept in relational schema design
theory is that of a functional dependency.
tuples t1 and t2 in r that have t1[X] = t2[X], they must also have
t1[Y] = t2[Y].
a. SsnEname
b. Pnumber {Pname, Plocation}
c. {Ssn, Pnumber}Hours
A functional dependency is a property of the relation schema R,
not of a particular legal relation state r of R. Therefore, an FD
cannot be inferred automatically from a given relation extension r
but must be defined explicitly by someone who knows the
semantics of the attributes of R.
Example:
A B C D
a1 b1 c1 d1
a1 b2 c2 d2
a2 b2 c2 d3
a3 b3 c4 d3
The following FDs may hold because the four tuples in the
current extension have no violation of these constraints:
B C; C B; {A, B} C; {A, B} D; and {C, D} B
Normalization of Relations
The normalization process, as first proposed by Codd (1972a),
takes a relation schema through a series of tests to certify
whether it satisfies a certain normal form.
The process, which proceeds in a top-down fashion by evaluating
each relation against the criteria for normal forms and
decomposing relations as necessary, can thus be considered as
relational design by analysis.
Initially, Codd proposed three normal forms, which he called first,
second, and third normal form.
A stronger definition of 3NF—called Boyce-Codd normal form
(BCNF)—was proposed later by Boyce and Codd. All these
normal forms are based on a single analytical tool: the
functional dependencies among the attributes of a relation.
Definition.
The normal form of a relation refers to the highest normal form
condition that it meets, and hence indicates the degree to which it
has been normalized.
Normal forms, when considered in isolation from other factors, do
not guarantee a good database design. It is generally not
sufficient to check separately that each relation schema in the
database is in a given normal form.
Rather, the process of normalization through decomposition must
also confirm the existence of additional properties that the
relational schemas, taken together, should possess. These would
include two properties:
The nonadditive join or lossless join property, which
guarantees that the spurious tuple generation problem does not
occur with respect to the relation schemas created after
decomposition.
The dependency preservation property, which ensures that
each functional dependency is represented in some individual
relation resulting after decomposition.
Participating in Keys
Definition:
An attribute of relation schema R is called a prime attribute of R
if it is a member of some candidate key of R.
An attribute is called nonprime if it is not a prime attribute—that
is, if it is not a member of any candidate key, both Ssn and
Pnumber are prime attributes of WORKS_ON, whereas other
attributes of WORKS_ON are nonprime.
We now present the first three normal forms: 1NF, 2NF, and 3NF.
As we shall see, 2NF and 3NF attack different problems.
First Normal Form
First normal form (1NF) is now considered to be part of the
formal definition of a relation in the basic (flat) relational model.
It states that:
1. the domain of an attribute must include only atomic (simple,
indivisible) values and
2. that the value of any attribute in a tuple must be a single
value from the domain of that attribute.
Hence, 1NF disallows having a set of values, a tuple of values, or
a combination of both as an attribute value for a single tuple. In
other words, 1NF disallows relations within relations or relations
as attribute values within tuples.
The only attribute values permitted by 1NF are single atomic (or
indivisible) values.
Step 2:
Identify the attributes that are dependent on each determinant
and place them in the new tables with their determinant and
remove them from their original table.
In our example, remove CHG_HOUR from EMPLOYEE
EMP_NUMEMP_NAME, JOB_CLASS
So now our design becomes:
PROJECT(PROJ_NUM, PROJ_NAME)
EMPLOYEE(EMP_NUM, EMP_NAME, JOB_ID)
JOB(JOB_ID, JOB_CLASS, CHG_HOUR)
ASSIGNMENT(PROJ_NUM, EMP_NUM, ASSIGN_HOURS)