0% found this document useful (0 votes)
5 views

Lecture Notes 4 - AD - Normalisation Process - Part 3

Normalization is a database design technique aimed at reducing data redundancy and eliminating anomalies through a structured process involving multiple normal forms (1NF, 2NF, 3NF, BCNF). Each normal form has specific rules regarding the organization of data, ensuring that tables are logically structured and dependencies are properly managed. The document outlines the principles of normalization, the definitions of functional dependencies, and the significance of primary and candidate keys in maintaining data integrity.

Uploaded by

mushabati6jr
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Lecture Notes 4 - AD - Normalisation Process - Part 3

Normalization is a database design technique aimed at reducing data redundancy and eliminating anomalies through a structured process involving multiple normal forms (1NF, 2NF, 3NF, BCNF). Each normal form has specific rules regarding the organization of data, ensuring that tables are logically structured and dependencies are properly managed. The document outlines the principles of normalization, the definitions of functional dependencies, and the significance of primary and candidate keys in maintaining data integrity.

Uploaded by

mushabati6jr
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 57

NORMALIZATION PROCESS

Data Normalisation Processes

Part 3
What is Normalisation?
Normalization is a database design technique that reduces
data redundancy and eliminates undesirable characteristics
like Insertion, Update and Deletion Anomalies.
Normalization rules divides larger tables into smaller tables and
links them using relationships.

The purpose of Normalisation in SQL is to eliminate redundant


(repetitive) data and ensure data is stored logically.

The inventor of the relational model Edgar Codd proposed the


theory of normalization of data with the introduction of the First
Normal Form.

The same inventor continued to extend the theory with Second


and Third Normal Form.

Later he joined Raymond F. Boyce to develop the theory of


Boyce-Codd Normal Form.
Each normal form involves a set of dependency properties that
a table must satisfy.

Each normal form gives guarantees about the presence and/or


absence of update (insert, update and delete) anomalies.

This means that higher normal forms have less redundancy,


and as a result, fewer update problems.

The goals of normalization are to:


 Be able to characterize the level of redundancy in a
relational table
 Provide mechanisms for transforming tables in order to
remove redundancy
The complete normalization theory draws heavily on the
theory of functional dependencies that defines seven
normal forms (NF).

The seven normal forms are indicated below:


1NF (First Normal Form)
2NF (Second Normal Form)
3NF (Third Normal Form)
BCNF (Boyce-Codd Normal Form)
4NF (Fourth Normal Form)
5NF (Fifth Normal Form)
6NF (Sixth Normal Form)

BCNF is sometimes referred to as 3.5 Normal Form.


Pre-Liminary Facts about Normalization
What Is A Relation?

Students
Name Age Sex Student Major
Number
Singh 18 M 9901 English
Literature
Jones 18 F 9902 Geography
Lee 18 F 9922 Computing
O’Toole 18 M 9923 Geography
Choudhury 19 F 9811 Languages
Basic Properties of a Relation
 Relation Named
 Atomic Values in Cells
 Attribute Named
 Attribute Value Drawn From a Domain
 No Duplicate Tuples (Rows)
 No Significance to Order of Tuples
 No Significance to Order of Attributes
Functional Dependence
 AB

 THIS IS THE NOTATION

 IF WE KNOW THE VALUE OF A THEN WE WILL KNOW THE VALUE OF B

 B IS FUNCTIONALLY DEPENDENT ON A

 A FUNCTIONALLY DETERMINES B
FUNCTIONAL DEPENDENCE

• STUDENTID
StudentID  STUDENTNAME
StudentName
23 Tasila
34 Mutinta
56 Mutinta
76 Tasila
CANDIDATE AND PRIMARY KEYS
StudentID Activity Cost
21 Dancing K230
21 Swimming K500
34 Dancing K230
55 Netball K200

Activity  Cost

No other dependencies
DELETE, INSERT AND UPDATE ANOMALIES

What happens if we change the price of Dancing?

StudentID Activity Cost


21 Dancing K230
21 Swimming K500
34 Dancing K230
55 Fencing K200

What happens if we delete this row?

What happens if we insert a new activity?


SPLITTING THE RELATION
StudentID Activity Cost
21 Dancing K230
21 Swimming K500
34 Dancing K230
55 Fencing K200

StudentI Activity
D Activity Cost
21 Dancing Dancing K230
21 Swimming Swimming K500
34 Dancing Fencing K200
55 Fencing
First Normal Form (1NF)
In the first normal form, only single values are permitted at
the intersection of each row and column (a field);

We also need to eliminate the repeating groups.

To normalize a relation that contains a repeating group,


remove the repeating group and form two new relations.

The Primary Key of the new relation is a combination of


the Primary Key of the original relation plus an attribute
from the newly created relation for unique identification.

We will use the Student_Grade_Report table below, from


a university database, as our example to explain the
process for 1NF.
Student_Grade_Report (StudentNo, StudentName, Major,
CourseNo, CourseName, InstructorNo, InstructorName,
InstructorLocation, Grade)

 In the Student Grade Report table, the repeating group is the


course information. A student can take many courses.

 Remove the repeating group. In this case, it’s the course


information for each student.

 Identify the Primary Key for your new table.

 The Primary Key must uniquely identify the attribute value


(StudentNo and CourseNo – A Composite Primary Key).

 After removing all the attributes related to the new


StudentCourse Table, you are left with StudentNo,
StudentName and Major as the attributes for the Student Table.
 The Student table (Student) is now in first normal form
with the repeating group removed.

 The two new tables are shown below.

Student (StudentNo, StudentName, Major)

StudentCourse (StudentNo, CourseNo, CourseName,


InstructorNo, InstructorName, InstructorLocation, Grade)

The Student relation has StudentNo as a primary key and


the StudentCourse relation has a Composite Primary Key
made of two columns (StudentNo and CourseNo).
To be in First Normal Form (1NF), a relation should satisfy the
following two conditions:

 All fields in a relation should only have atomic (scalar or


single) values;

 There should be no repeating groups in the relations.


Update Anomalies observed at 1NF:

StudentCourse (StudentNo, CourseNo, CourseName,


InstructorNo, InstructorName, InstructorLocation, Grade)

 To add a new course, we need a student.

 When course information needs to be updated, we may have


inconsistencies. This is because we have more than one set
of information (student, course & instructor) in one record.

 To delete a student, we might also delete critical information


about a course and / or instructor.
Second Normal Form (2NF)
 For the second normal form, the relation must first be in 1NF.

 The relation is automatically in 2NF if its associated Primary


Key is made of a single attribute.

 If the relation has a Composite Primary Key, then each non-


key attribute must be fully dependent on the ENTIRE Primary
Key and not on a subset / component of the Primary Key
(i.e., there must be no partial dependencies).

 From our example, the Student table is already in 2NF


because it has a single-column Primary Key.

 When examining the StudentCourse table, we see that not


all the attributes are fully dependent on the Primary Key;
specifically, all course information.
 The only attribute that is fully dependent on the ENTIRE
Primary Key is the attribute Grade.

 Identify and create a new table that will contains the course
information.

 Identify the Primary Key for the new table.

 The resulting three new tables are shown below.

Student (StudentNo, StudentName, Major)

CourseGrade (StudentNo, CourseNo, Grade)

CourseInstructor (CourseNo, CourseName, InstructorNo,


InstructorName, InstructorLocation)
To be in Second Normal Form (2NF), a relation should satisfy
the following two conditions:

 It should already be in First Normal Form (1NF);

 It should have ALL its non-key attributes fully dependent on


the ENTIRE Primary Key and not just part of the Primary Key.
This only applies to relations with Composite Primary Keys.
Update Anomalies observed at 2NF:

CourseInstructor (CourseNo, CourseName, InstructorNo,


InstructorName, InstructorLocation)

 When adding a new instructor, we need a course.

 Updating course information could lead to inconsistencies for


instructor information. This is because we two sets of
information (course and instructor) in one record.

 Deleting a course may also delete instructor information.


Third Normal Form (3NF)
To be in third normal form, the relation must be in second
normal form.

Also all transitive dependencies must be removed; a non-key


attribute should not be functionally dependent on another non-
key attribute.

To process for the Third Normal Form (3NF), we need to:


 Eliminate all dependent attributes in transitive relationship(s)
from each of the relations / tables that have a transitive
relationship.

 Create new table(s) with removed dependency.

 Check new table(s) as well as table(s) modified to make sure


that each table has a primary key and that no table contains
inappropriate dependencies.
Below are the four resulting tables:

Student (StudentNo, StudentName, Major)

CourseGrade (StudentNo, CourseNo, Grade)

Course (CourseNo, CourseName, InstructorNo)

Instructor (InstructorNo, InstructorName, InstructorLocation)

At this stage, in Third Normal Form (3NF), there should be no


anomalies in all the associated relations / tables.
To be in Third Normal Form (3NF), a relation should satisfy the
following two conditions:

 It should already be in Second Normal Form (2NF);

 There should be no transitive dependencies in all the


associated relations / tables,

In other words:
we should not have any non-key column / attribute
depending on another non-key column / attribute.

OR

We should make sure that the non-key columns / attributes


depend upon the primary key and not on any other non-
key column / attribute.
Exercises
Exercises

1. To keep track of students and courses, a new college uses


the table structure provided below:

You are required to normalize the data into relevant third


normal form relations.
2. An agency called Instant Cover supplies part-time/temporary
staff to hotels in Scotland. The table below lists the time
spent by agency staff working at various hotels. The national
insurance number (NIN) is unique for every member of staff.
Use the table to answer questions below:

i) This table is susceptible to update anomalies. Provide


examples of insertion, deletion and update anomalies.

ii) Normalize this table to third normal form. State any


assumptions.
Boyce-Code Normal Form (BCNF)
Determinant Vs Candidate Key

BCNF is rarely used. When a table has more than one candidate key,
anomalies may result even though the relation is in 3NF.

Boyce-Codd normal form is a special case of 3NF. A relation is in


BCNF if, and only if, every determinant is a candidate key.

A "determinant" refers to a set of attributes (or columns) in a relation


(or table) that uniquely determines the values of other attributes in
that relation.

When the value of one attribute allows us to identify the value of


another attribute in the same relation, this first attribute is called a
determinant.

The value of a determinant might not be a primary key value. In


addition, it is not necessarily unique but is responsible for functionally
determining another column's value.
Employee Table

Identifying Determinants
emp_id → emp_name, department, manager, salary ✅
Why? emp_id uniquely determines all the attributes of an
employee in the table.
department → manager ✅
Why? If each department has one manager, then department
determines manager.
Note: Not all determinants are candidate keys – for example,
department determines manager, but it does not uniquely identify
other columns (ie not the entire row).
Employee Table

Identifying Candidate Keys

emp_id ✅
Why? It can uniquely identify each column in the entire row and is
minimal, ie it has no subset that can equally identify each column.
department ❌
Why Not? It can only uniquely identify the manager but not other
columns.
Note: ALL candidate keys are determinants, but NOT ALL
determinants are candidate keys.
Each candidate key must satisfy two properties:

uniqueness (no two distinct tuples have the same values for the
candidate key)
and
minimality (removing any attribute from the candidate key would
cause it to lose its uniqueness property).

Determinants help in understanding the functional


dependencies within a table, while candidate keys help in
uniquely identifying each tuple in the table.
Primary Key

A determinant is a set of attributes whose values uniquely


determine the values of other attributes in the relation.

Functional dependencies, which are defined based on


determinants, represent relationships between attributes in the
relation.

These relationships exist independently of whether a primary


key is defined for the relation or not defined.

Consider the following scenario:


In this example, StudentID uniquely determines Name, and
therefore {StudentID} is a determinant.

However, StudentID is not explicitly defined as a primary key in


this table.

While it is common practice to define primary keys to uniquely


identify tuples in a relation, it's not strictly necessary for
determinants to exist.

Functional dependencies and determinants can still be identified


and utilized for data integrity and normalization purposes even
in the absence of a primary key.

However, having a primary key can simplify data management


and enforce uniqueness constraints in the table.
non-trivial functional dependency
A non-trivial functional dependency is not necessarily related to
whether the determinant is a candidate key or not.
A functional dependency occurs when a value of X uniquely
determines Y (i.e., X → Y).
In English, the word “trivial” means “of very little importance or
value; or insignificant”.
A trivial functional dependency occurs when Y is a subset of X,
meaning the dependency does not add new constraints (e.g.,
EmployeeID, Name → EmployeeID).
A non-trivial functional dependency occurs when Y is not a subset
of X (e.g., EmployeeID → Salary).
Let us go step by step, using tables to explain trivial functional
dependency and non-trivial functional dependency.

1. Trivial Functional Dependency

A trivial functional dependency exists when the dependent


attribute (Y) is a subset of the determinant (X).

This means that the dependency does not add any new
constraint to the relationship.

Example Table: Employee Information


Trivial Functional Dependency in This Table

A functional dependency X → Y is trivial if Y is a subset of X.


For example:
 {EmployeeID, Name} → EmployeeID
 {Department, EmployeeID} → EmployeeID

Why is this trivial?


 The right-hand side (EmployeeID) is already part of the left-
hand side.
 Knowing {EmployeeID, Name} already includes EmployeeID,
so saying {EmployeeID, Name} → EmployeeID does not add
any new information.

General Rule: X→Y is trivial if Y⊆X


2. Non-Trivial Functional Dependency

A non-trivial functional dependency exists when Y is not a


subset of X, meaning Y contains attributes that are not already
in X.

Example Table: Employee Salary

Non-Trivial Functional Dependency in This Table


 EmployeeID → Name
 EmployeeID → Salary
Why is this non-trivial?

 Name and Salary are not subsets of EmployeeID.

 The left-hand side (EmployeeID) uniquely determines Name


and Salary, but Name and Salary are not already part of
EmployeeID.

General Rule:𝑋→𝑌 is non-trivial if Y ⊈ X (ie Y not subset of


X)

Key Differences
Understanding the Rule

Trivial dependency:
A functional dependency X → Y is trivial if Y ⊆ X. These are
always valid, even in unnormalized relations, and not restricted
by BCNF.

If Y ⊈ X, then X → Y is non-trivial and must have X as a super


Non-trivial dependency:

key to be valid in BCNF (super key is less restrictive and


includes determinants that are not in their minimal state).

Therefore, in BCNF, all non-trival Functional Dependencies


must have a super key / candidate key as the determinant.

In other words, BCNF demands that every determinant in a


table must be a super key / candidate key.
Boyce-Codd Normal Form (BCNF)

In the Boyce-Codd Normal Form (BCNF), one of the criteria for


a relation to be in BCNF is that every determinant must be a
candidate key or even a super key.

This means that each set of attributes that determines other


attributes in the relation must be a candidate key (super key).

For example, consider a relation R(A, B, C) where {A} or {A, B}


is a candidate key.

If the determinant for a functional dependency is {A}, then {A}


must be a candidate key / super key for the relation to satisfy
BCNF.
Similarly, if the determinant is {B, C}, then {B, C} must be a
candidate key / super key.
Ensuring that every determinant is a candidate key / super key
helps eliminate anomalies (update, insert and delete) and
redundancy in the database.
This provides for data integrity and a reduction in the risk of
anomalies during database operations.

Example:
The semantic rules (business rules applied to the database) for
the table below are:
1. Each Student may major in several subjects.
2. For each Major, a given Student has only one Advisor.
3. Each Major has several Advisors.
4. Each Advisor advises only one Major.
5. Each Advisor advises several Students in one Major.
The functional dependencies for this table are listed below. The
first one is a candidate key; the second is not – why not.

1. Student_id, Major ——> Advisor


2. Advisor ——> Major

Anomalies for this table include:


1. Delete – student deletes advisor information
2. Insert – a new advisor needs a student
3. Update – inconsistencies (two sets of information)

Note: No single attribute is a candidate key (super key).

To reduce the St_Maj_Adv relation to BCNF, you create two


new tables:
1. St_Adv (Student_id, Advisor) – Is both determinant and Candidate key
2. Adv_Maj (Advisor) – Is both determinant and candidate key
To be in Boyce-Codd Normal Form (BCNF), a relation should
satisfy the following conditions:

1. A relation should already be in Third Normal Form (3NF).

2. Every determinant in a relation should be a candidate key


/ super key:
A determinant is a set of attributes whose values uniquely
determine the values of other attributes in the relation.

3. All non-trival functional dependencies must have a


candidate key / super key as the determinant:
𝑋→𝑌 is non-trivial if Y ⊈ X (ie Y not subset of X).

By ensuring that all determinants are candidate keys, BCNF


ensures that there are no non-trivial functional dependencies,
thus eliminating redundancy and ensuring data integrity.
By scrutinizing the 2nd and 3rd condition, we notice some
similarities between them. We analyze them as follows:

Condition 2:
Every determinant in a BCNF Relation should be a candidate
key / super key.
 A determinant is the left-hand side of a functional
dependency (FD), like X → Y.
 Saying that every determinant is a candidate key or super
key means: For all X → Y, X must be a super key (i.e.,
uniquely identifies a row).

Condition 3:

key / super key as the determinant. (𝑋 → 𝑌 is non-trivial if 𝑌 is


All non-trivial functional dependencies must have a candidate

not a subset of 𝑋)
 This just limits the scope to non-trivial dependencies (where
Y is not part of X).

 And again, it says X must be a candidate key/super key.

What is the difference between the two conditions?

Semantically:
No real difference. Both are saying:
For every functional dependency X → Y (especially non-trivial
ones), X must be a super key.

Syntactically:
Condition 2 is more general, applying to all determinants.

Condition 3 is more formal, focusing specifically on non-trivial


dependencies.
The two conditions are using different phrasings of the same
BCNF requirement.

In BCNF, for every non-trivial functional dependency X → Y, X


must be a candidate key or super key.

Condition 3 may be modified to the phrase below:


All functional dependencies (trival or non-trival) must have a
candidate key / super key as the determinant.

New Comparison:
Original Condition 2:
"Every determinant must be a candidate key or super key.
"Modified Condition 3:
"All functional dependencies (trivial or non-trivial) must have a
candidate key / super key as the determinant.“
Both conditions can now be summarized as saying:

For any functional dependency X → Y, X must be a super key.

This is still correct because in trivial functional dependencies


like X → X or X → subset of X, there is no risk of redundancy or
update (update, insert, or delete) anomalies.

As a result, they are often ignored in normal form checks.

Including trivial dependencies does not violate the logic of


BCNF because they do not invalidate the normal form, as long
as the determinant is a super key.

The modification of Condition 3 makes it exactly equivalent to


Condition 2 in meaning. This is because Condition 3 now
applies to all functional dependencies, just like Condition 2.
In Summary

 3NF removes transitive dependencies but does not always


enforce that every determinant is a candidate key / super key.

 BCNF ensures that every determinant is a candidate key /


super key, making it a stricter form of 3NF.

 A table in BCNF is always in 3NF, but a table in 3NF may


NOT be in BCNF.

 The BCNF normalization level ensures that the relation is free


from update anomalies and redundancy, facilitating efficient
data management and maintenance.

 Functional Dependencies (FD) are used in 1NF. 2NF, 3NF


and BCNF.
QUESTIONS

You might also like