0% found this document useful (0 votes)
66 views

Normalization: Dr. M. Brindha Assistant Professor Department of CSE NIT, Trichy-15

The document discusses various normal forms for relational database design including first normal form, second normal form, third normal form, and Boyce-Codd normal form, with the goals of eliminating redundancy, anomalies, and representing relationships properly through decomposition into relational schemas in good normal forms. It provides examples of relations that violate various normal forms and how to decompose them into relations in higher normal forms through normalization.

Uploaded by

Nimish Agrawal
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
66 views

Normalization: Dr. M. Brindha Assistant Professor Department of CSE NIT, Trichy-15

The document discusses various normal forms for relational database design including first normal form, second normal form, third normal form, and Boyce-Codd normal form, with the goals of eliminating redundancy, anomalies, and representing relationships properly through decomposition into relational schemas in good normal forms. It provides examples of relations that violate various normal forms and how to decompose them into relations in higher normal forms through normalization.

Uploaded by

Nimish Agrawal
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 47

Normalization

Dr. M. Brindha
Assistant Professor
Department of CSE
NIT, Trichy-15
Relational Database Design
• First Normal Form
• Pitfalls in Relational Database Design
• Functional Dependencies
• Boyce-Codd Normal Form
• Third Normal Form
• Multivalued Dependencies and Fourth Normal Form
First Normal Form
• Domain is atomic if its elements are considered to be
indivisible units
• Examples of non-atomic domains:
• Set of names, composite attributes
• Identification numbers like CS101 that can be broken
up into parts
•A relational schema R is in first normal form if the
domains of all attributes of R are atomic
• Non-atomic values complicate storage and encourage
redundant (repeated) storage of data
• E.g. Set of accounts stored with each customer, and set
of owners stored with each account
First Normal Form (Contd.)
• Atomicity is actually a property of how the elements of
the domain are used.
• E.g. Strings would normally be considered indivisible
• Suppose that students are given roll numbers which are
strings of the form CS0012 or EE1127
• If the first two characters are extracted to find the
department, the domain of roll numbers is not atomic.
• Doing so is a bad idea: leads to encoding of information
in application program rather than in the database.
First Normal Form
• Disallows
• composite attributes
• multivalued attributes
• nested relations; attributes whose values for
an individual tuple are non-atomic

• Considered to be part of the definition of


relation
Normalization into 1NF
Normalization nested relations into 1NF
Pitfalls in Relational Database Design

• Relational database design requires that we find a


“good” collection of relation schemas. A bad design
may lead to
• Repetition of Information.
• Inability to represent certain information.

• Design Goals:
• Avoid redundant data
• Ensure that relationships among attributes are
represented
• Facilitate the checking of updates for violation of
database integrity constraints.
Goals of Normalization
• Decide whether a particular relation R is in “good” form.
• In the case that a relation R is not in “good” form,
decompose it into a set of relations {R1, R2, ..., Rn} such
that
• each relation is in good form
• the decomposition is a lossless-join decomposition

• Our theory is based on:


• functional dependencies
• multivalued dependencies
Example
• Consider the relation schema:
Lending-schema = (branch-name, branch-city, assets,
customer-name, loan-number, amount)

• Redundancy:
• Data for branch-name, branch-city, assets are repeated for each loan
that a branch makes
• Wastes space
• Complicates updating, introducing possibility of inconsistency of
assets value
• Null values
• Cannot store information about a branch if no loans exist
• Can use null values, but they are difficult to handle.
Redundant Information in Tuples and
Update Anomalies
• Information is stored redundantly
• Wastes storage
• Causes problems with update anomalies
• Insertion anomalies
• Deletion anomalies
• Modification anomalies
EXAMPLE OF AN UPDATE ANOMALY

• Consider the relation:


• Lending-schema = (branch-name, branch-city, assets,
customer-name, loan-number, amount)
• Update Anomaly:
• Changing the assets from “1900000” to “1700000”
may cause this update to be made for all customers
on corresponding branch
EXAMPLE OF AN INSERT ANOMALY

• Consider the relation:


• Lending-schema = (branch-name, branch-city, assets,
customer-name, loan-number, amount)
• Insert Anomaly:
• Cannot insert values if branch does not have any
loan.
EXAMPLE OF AN DELETE ANOMALY

• Consider the relation:


• Lending-schema = (branch-name, branch-city, assets,
customer-name, loan-number, amount)
• Delete Anomaly:
• When a branch is deleted, it will result in deleting all
the loans on that branch.
• Alternately, if a branch has a single loan, deleting
that laon would result in deleting the corresponding
branch.
Null Values in Tuples
• GUIDELINE 3:
• Relations should be designed such that their tuples
will have as few NULL values as possible
• Attributes that are NULL frequently could be placed
in separate relations (with the primary key)
• Reasons for nulls:
• Attribute not applicable or invalid
• Attribute value unknown (may exist)
• Value known to exist, but unavailable
Relation schema suffering from update
anomaly
EXAMPLE OF AN UPDATE ANOMALY

• Consider the relation:


• EMP_PROJ(Emp#, Proj#, Ename, Pname,
No_hours)
• Update Anomaly:
• Changing the name of project number P1 from
“Billing” to “Customer-Accounting” may cause this
update to be made for all 100 employees working on
project P1.
Lossless join decomposition
Lending-schema = (branch-name, branch-city, assets,
customer-name, loan-number, amount)

The Relation Lending(11 tuples)


Lossless join decomposition

The Relation branch-customer(11 tuples)


The Relation customer-loan (11 tuples)
Inference:
1. Additional 4 tuples
2. Find all branches with loan amount<1000, From lending schema
it is Mianus and Roundhill and for Joined schema it is Mianus,
Roundhill and Downtown.

The Relation branch-customer customer-loan (15 tuples)


Spurious Tuples or Dangling Tuples
• Bad designs for a relational database may result in
erroneous results for certain JOIN operations
• The"lossless join" property is used to guarantee
meaningful results for join operations

• GUIDELINE 4:
• The relations should be designed to satisfy the
lossless join condition.
• No spurious tuples should be generated by doing a
natural-join of any relations.
The branch Relation
Account Relation

A-405 Carolina 700


Functional Dependencies
• Constraints on the set of legal relations.
• Require that the value for a certain set of attributes
determines uniquely the value for another set of
attributes.
• A functional dependency is a generalization of the
notion of a key.
Functional Dependencies (Cont.)
• Let R be a relation schema
  R and   R
• The functional dependency


holds on R if and only if for any legal relations r(R),


whenever any two tuples t1 and t2 of r agree on the
attributes , they also agree on the attributes . That
is,
t1[] = t2 []  t1[ ] = t2 [ ]
Functional Dependencies (Cont.)

• Example: Consider r(A,B) with the following instance of r.


1 4
1 5
3 7

• On this instance, A  B does NOT hold, but B  A does hold.


Second Normal Form (1)
• Uses the concepts of FDs, primary key
• Definitions
• Prime attribute: An attribute that is member of the
primary key K
• Full functional dependency: a FD Y -> Z where removal of
any attribute from Y means the FD does not hold any more
• Examples:
• {SSN, PNUMBER} -> HOURS is a full FD since neither SSN -
> HOURS nor PNUMBER -> HOURS hold
• {SSN, PNUMBER} -> ENAME is not a full FD (it is called a
partial dependency ) since SSN -> ENAME also holds
Second Normal Form (2)

• A relation schema R is in second normal form (2NF) if


• It is in 1NF
• every
non-prime attribute A in R is fully functionally
dependent on the primary key

•R can be decomposed into 2NF relations via the


process of 2NF normalization
Second Normal Form (3)
Second Normal Form (4)
Third Normal Form
•A relation schema R is in third normal form (3NF) if it
is in 2NF and no non-prime attribute A in R is
transitively dependent on the primary key
• R can be decomposed into 3NF relations via the
process of 3NF normalization
• NOTE:
• In X -> Y and Y -> Z, with X as the primary key, we
consider this a problem only if Y is not a candidate key.
• When Y is a candidate key, there is no problem with the
transitive dependency .
• E.g., Consider EMP (SSN, Emp#, Salary ).
• Here, SSN -> Emp# -> Salary and Emp# is a candidate key.
Third Normal Form (2)
• Definition:
• Transitive functional dependency: a FD X -> Z that can be
derived from two FDs X -> Y and Y -> Z
• Examples:
• SSN -> DMGRSSN is a transitive FD
• Since SSN -> DNUMBER and DNUMBER -> DMGRSSN hold
• SSN -> ENAME is non-transitive
• Since there is no set of attributes X where SSN -> X and X -> ENAME
Third Normal Form (4)
Third Normal Form (3)
Normal Forms Defined
Informally
• 1st normal form
• All attributes depend on the key

• 2nd normal form


• All attributes depend on the whole key

• 3rd normal form


• All attributes depend on nothing but the key
Need for BCNF
• Example of problems due to redundancy in 3NF
• R = (J, K, L)
F = {JK  L, L  K}(Two candidate keys: JK and JL)
J L K
j1 l1 k1
j2 l1 k1
j3 l1 k1

null l2 k2

A schema that is in 3NF but not in BCNF has the problems of


 repetition of information (e.g., the relationship l1, k1)
 need to use null values (e.g., to represent the relationship l2,
k2 where there is no corresponding value for J).
Boyce-Codd Normal Form
A relation schema R is in BCNF with respect to a set F of functional
dependencies if for all functional dependencies in F+ of the form
  , where   R and   R, at least one of the following holds:

•  is trivial (i.e.,   )


•  is a superkey for R
Example
• R = (A, B, C)
F = {A  B
B  C}
Key = {A}
• R is not in BCNF
• Decomposition R1 = (A, B), R2 = (B, C)
• R1 and R2 in BCNF
• Lossless-join decomposition
• Dependency preserving
Boyce-Codd normal form
Design Goals
• Goal for a relational database design is:
• BCNF.
• Lossless join.
• Dependency preservation.

• If we cannot achieve this, we accept one of


• Lack of dependency preservation
• Redundancy due to use of 3NF
Multivalued Dependencies
• There are database schemas in BCNF that do not
seem to be sufficiently normalized
• Consider a database
classes(course, teacher, book)
such that (c,t,b)  classes means that t is qualified to
teach c, and b is a required textbook for c
• The database is supposed to list for each course the
set of teachers any one of which can be the course’s
instructor, and the set of books, all of which are
required for the course (no matter who teaches it).
Multivalued Dependencies (Cont.)
course teacher book
t1 database Avi DB Concepts
t2 database Hank Ullman
t3 database Hank DB Concepts
t4 database Avis Ullman
database Sudarshan DB Concepts
database Sudarshan Ullman
operating systems Avi OS Concepts
operating systems Avi Shaw
operating systems Jim OS Concepts
operating systems Jim Shaw
classes
• There are no non-trivial functional dependencies and
therefore the relation is in BCNF
• Insertion anomalies – i.e., if Sara is a new teacher that
can teach database, two tuples need to be inserted
(database, Sara, DB Concepts)
(database, Sara, Ullman)
Multivalued Dependencies (MVDs)
• Let R be a relation schema and let   R and   R.
The multivalued dependency
  
holds on R if in any legal relation r(R), for all pairs for
tuples t1 and t2 in r such that t1[] = t2 [], there exist
tuples t3 and t4 in r such that:
t1[] = t2 [] = t3 [] = t4 []
t3[] = t1 []
t3[R – ] = t2[R – ]
t4 [] = t2[]
t4[R – ] = t1[R – ]
Multivalued Dependencies (Cont.)
• Therefore, it is better to decompose classes
into:
course teacher
database Avi
database Hank
database Sudarshan
operating systems Avi
operating systems Jim
teaches
course book
database DB Concepts
database Ullman
operating systems OS Concepts
operating systems Shaw
text
We shall see that these two relations are in Fourth Normal
Form (4NF)
Example (Cont.)
• In our example:
course  teacher
course  book
• The above formal definition is supposed to formalize
the notion that given a particular value of Y (course) it
has associated with it a set of values of Z (teacher) and a
set of values of W (book), and these two sets are in
some sense independent of each other.
Thank You!!!

You might also like