0% found this document useful (0 votes)
0 views

Normalization

The document discusses normalization in database design, outlining functional dependencies and various normal forms including 1NF, 2NF, 3NF, and BCNF. It highlights the importance of avoiding redundancy and anomalies such as insertion, deletion, and update anomalies through effective schema design. The document also provides examples and explanations of how to achieve normalization and the implications of bad schema designs.

Uploaded by

Rajath MS
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
0 views

Normalization

The document discusses normalization in database design, outlining functional dependencies and various normal forms including 1NF, 2NF, 3NF, and BCNF. It highlights the importance of avoiding redundancy and anomalies such as insertion, deletion, and update anomalies through effective schema design. The document also provides examples and explanations of how to achieve normalization and the implications of bad schema designs.

Uploaded by

Rajath MS
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 74

Unit 03

Normalization

Ashwini Kumar Mathur


Outlines

Functional dependencies,

Normal forms, first, second, third normal forms, BCNF, inclusion


dependence, lossless join decompositions, normalization using FD.
Database Design

Coming up with a ‘good’ schema is very


important
Scope of Normalization

1. How do we characterize the “goodness” of a schema ?


2. If two or more alternative schemas are available how do we
compare them ?
3. What are the problems with “bad” schema designs ?
4. Normal Forms: Each normal form specifies certain conditions
If the conditions are satisfied by the schema certain kind of
problems are avoided
Scenario : Student Dept Schema

"In a university database, we need to store details of students and their respective departments.
Considering the concepts of database normalization and redundancy, how should we design the
schema to ensure efficient storage and data integrity?

1. What attributes should be included in the Student and Department tables?


2. How should we establish a relationship between these tables?
3. What issues can arise if we store department details directly within the student table?
4. How does normalization help in this scenario?
5. Based on the given schemas (correct and incorrect), which design would you choose and why?"
Example..

What will be the problem arises in incorrect schema ?


Brief explanation of previous example

❌ Problems:

1. Redundancy: CS department details are repeated for Alice and Charlie.


2. Update Anomalies: If the HOD or officePhone for CS changes, it needs to be updated in multiple rows, increasing the risk
of errors.
3. Insertion Anomalies: If a new department is created but has no students yet, we cannot insert it into the table.
4. Deletion Anomalies: If the last student of the CS department is deleted, department information is also lost.
Solution
To avoid these issues, we should normalize the schema into two separate tables.
Brief explanation of previous example

✅ Advantages:

1. No redundancy: Department details are stored only once.

2. Consistency: If the office phone or HOD changes, it updates in one place.

3. Efficient storage and querying.


Explanation

Problems with Bad Database Schema


1. Redundant Data Storage

● Example: Office Phone & HOD information


○ Stored multiple times for each student in a department.
● Implications:
○ Wasted Disk Space: Unnecessarily increases storage costs.
○ Increased Maintenance: Updates must be made in several locations.
○ minimize chances of errors during updates.
2. Challenges of Updating Data

● When a program updates the office phone:


○ Must change it in multiple records.
● Consequences:
○ Longer Execution Time: More places to update leads to inefficiencies.
○ Higher Error Risk: Increased chances of inconsistency and mistakes.

3. Importance of Transaction Efficiency

● Transactions should complete as quickly as possible to ensure:


○ System Performance: Reduces wait times for users.
○ Data Integrity: Minimize the chances during updates
Problem with bad Schema

1. Insertion Anomaly
2. Deletion Anomaly
3. Update Anomaly
Anomalies in Bad Database Schema

Insertion Anomaly

● Issue: Cannot add a new department without entering a dummy


student record.
● Implication: Forces irrelevant data into the database, leading to
unnecessary complexity.
2. Deletion Anomaly

● Issue: Deleting all students from a department results in loss of department


information.
● Implication: Critical data about the department may be unintentionally
removed, complicating data retrieval.

3. Update Anomaly

● Issue: Updating the office phone number for a department requires


changes in multiple records.
○ If any record is overlooked:
○ Consequence: Leads to data inconsistency, compromising data
integrity.
Normal Forms

First Normal Form (1NF) - included in the definition of a relation


Second Normal Form (2NF)
Third Normal Form (3NF)
Boyce-Codd Normal Form (BCNF)
Fourth Normal Form (4NF) - defined using multivalued
dependencies
Functional Dependencies
Functional Dependency - Examples
Consider the schema:
Student ( studName, rollNo, gender, dept, hostelName,
roomNo)
Since rollNo is a key, rollNo → {studName, gender, dept,
hostelName, roomNo}
Suppose that each student is given a hostel room
exclusively, then hostelName, roomNo → rollNo
Suppose boys and girls are accommodated in separate
hostels, then hostelName → gender
Trivial and Non Trivial FD’s

An FD X → Y where Y ⊆ X - called a trivial FD, it always


holds good
An FD X → Y where Y ⊈ X - non-trivial FD
An FD X → Y where X ∩ Y = NULL - completely non-
trivial FD
Deriving new FDs

Given that a set of FDs F holds on R we can infer that a certain


new FD must also hold on R
For instance, given that X → Y, Y → Z hold on R
we can infer that X → Z must also hold

How to systematically obtain all such new FDs ?


Unless all FDs are known, a relation schema is not fully
specified
Entailment Relation
Practice Question
True

False

False
True
More practice questions: Self Practices
click here for document
Normalization

Normalization:
The process of decomposing unsatisfactory "bad" relations
by breaking up their attributes into smaller relations

Normal form:
Condition using keys and FDs of a relation to certify
whether a relation schema is in a particular normal form
First Normal Form

Disallows
1. composite attributes
2. multivalued attributes
3. nested relations; attributes whose values for an
individual tuple are non-atomic
Normalization into 1NF

Not in 2NF [ Because non prime


attribute is not fully functional
dependent on key ]

In this case Dname can be determined by


part of composite key like Dnumber only
but the key here is (Dname, DLocation).
Another Example as exercise
Composite
Key
VERY VERY important Slide

Condition for 2NF

There are NO partial dependencies (i.e., no


attribute should depend on only part of a
composite primary key).
However, if a table has a single-column primary key, then
partial dependency is not possible :)

[ SO ITS SATISFIED 2NF DIRECTLY ]


IMPORTANT CASE

Can a Non-Prime Attribute Determine Another Non-Prime


Attribute?
Yes, it is possible for a non-prime attribute to determine another non-
prime attribute, but this violates 3NF, not 2NF.
Example:

Primary Key: StudentID

StudentName, Department, and HOD are non-prime attributes.

Dependency: Department → HOD (A department has a fixed Head of Department).

Here, HOD is determined by Department, which is another non-prime attribute.

This violates 3NF, not 2NF.


More Examples of 2NF
Practice Questions: click here to open the document
Portion till CIE 02 !!

Mock Practices Link :


click here to open responder link
Normalization 3NF and BCNF
Overview of 3NF
A relation is in the third normal form, if there is no transitive dependency for non-
prime attributes as well as it is in the second normal form.
A relation is in 3NF if at least one of the following conditions holds in every non-trivial
functional dependency X –> Y.
- X is a super key.
- Y is a prime attribute (each element of Y is part of some candidate key).
In other words,
A relation that is in First and Second Normal Form and in which no non-primary-
key attribute is transitively dependent on the primary key, then it is in Third
Normal Form (3NF).
FD set: { STUD_NO -> STUD_NAME,
STUD_NO -> STUD_STATE,
STUD_STATE -> STUD_COUNTRY,
STUD_NO -> STUD_AGE}

Candidate Key: {STUD_NO}

For this relation in table 4, STUD_NO -> STUD_STATE and STUD_STATE -> STUD_COUNTRY are
true. So STUD_COUNTRY is transitively dependent on STUD_NO.

It violates the third normal form.


Decomposition of table

STUDENT (STUD_NO, STUD_NAME, STUD_PHONE, STUD_STATE, STUD_AGE)

STATE_COUNTRY (STATE, COUNTRY)


Example 02
Identify the above table is 3NF or not
Identify the above table is 3NF or not
Consider Relation R(A, B, C, D, E)
A -> BC,

CD -> E,

B -> D,

E -> A

Identify the above relation is 3NF or Not


Example 03
R(Student_ID,Course_ID,Instructor,Instructor_Phone)
Functional Dependencies (FDs):

1. Student_ID, Course_ID→Instructor (A student taking a course is taught by a specific


instructor.)
2. Instructor→Instructor_Phone (Each instructor has one unique phone number.)
3NF and Decomposition
Lossless-join
- Always dependency preserving
- Possible to have extra data (there may be redundancy)

Questions:
- Is the relation in 3NF?
- Is any refinement needed?
To calculate 3NF

Identify PK of the original table

1. Take Canonical Cover (Fc)


2. Turn (minimal set of) FDs into tables
Canonical Cover
Introduction to Canonical Cover
A minimal set of functional dependencies that has the same closure as the
original set F

Extraneous attributes = attribute of FDs that we can removed without changing


the closure of FDs
- F logically implies all dependencies in Fc
- Fc logically implies all dependencies in F
- No FD in Fc contains an extraneous attribute
Compute the canonical cover of a set of functional dependencies F
Always start with F and use rules to minimize
Example 01 : 3NF and Fc
Example 02 : 3NF and Fc
BCNF and Decomposition
BCNF (Boyce-Codd Normal Form)
Overview of BCNF

A relation schema R is in Boyce-Codd Normal Form (BCNF) if


whenever an FD X -> A holds in R, then X is a superkey of R.
OR
For any functional dependency (A->B), A should be either the
super key or the candidate key. In simple words, it means that
A can't be a non-prime attribute if B is given as a prime attribute.
Each normal form is strictly stronger than the previous one

Every 2NF relation is in 1NF


Every 3NF relation is in 2NF
Every BCNF relation is in 3NF

Note: There exist relations that are in 3NF but not in BCNF
Simple Example
Stu_ID −> Stu_Branch
Stu_Course −> {Branch_Number, Stu_Course_No}

Identify weather the above schema is BCNF or not ?


It is not BCNF because
As we can see that neither Stu_ID nor Stu_Course is a Super Key. As the rules mentioned above clearly tell
that for a table to be in BCNF, it must follow the property that for functional dependency X−>Y, X must be in
Super Key and here this property fails, that’s why this table is not in BCNF.
Task : Justify that the table in the next slide is 3NF not BCNF
Lossless-join
- Guarantee redundancy free
- May involve dependency across relations

Given a relation R, for every nontrivial FD X →Y in R, X is a super key


For all FDs, “key → everything”
Questions:

1. Is the relation in BCNF?


2. Is any refinement needed?
To calculate BCNF

You might also like