Lecture Notes 4 - AD - Normalisation Process - Part 3
Lecture Notes 4 - AD - Normalisation Process - Part 3
Part 3
What is Normalisation?
Normalization is a database design technique that reduces
data redundancy and eliminates undesirable characteristics
like Insertion, Update and Deletion Anomalies.
Normalization rules divides larger tables into smaller tables and
links them using relationships.
Students
Name Age Sex Student Major
Number
Singh 18 M 9901 English
Literature
Jones 18 F 9902 Geography
Lee 18 F 9922 Computing
O’Toole 18 M 9923 Geography
Choudhury 19 F 9811 Languages
Basic Properties of a Relation
Relation Named
Atomic Values in Cells
Attribute Named
Attribute Value Drawn From a Domain
No Duplicate Tuples (Rows)
No Significance to Order of Tuples
No Significance to Order of Attributes
Functional Dependence
AB
B IS FUNCTIONALLY DEPENDENT ON A
A FUNCTIONALLY DETERMINES B
FUNCTIONAL DEPENDENCE
• STUDENTID
StudentID STUDENTNAME
StudentName
23 Tasila
34 Mutinta
56 Mutinta
76 Tasila
CANDIDATE AND PRIMARY KEYS
StudentID Activity Cost
21 Dancing K230
21 Swimming K500
34 Dancing K230
55 Netball K200
Activity Cost
No other dependencies
DELETE, INSERT AND UPDATE ANOMALIES
StudentI Activity
D Activity Cost
21 Dancing Dancing K230
21 Swimming Swimming K500
34 Dancing Fencing K200
55 Fencing
First Normal Form (1NF)
In the first normal form, only single values are permitted at
the intersection of each row and column (a field);
Identify and create a new table that will contains the course
information.
In other words:
we should not have any non-key column / attribute
depending on another non-key column / attribute.
OR
BCNF is rarely used. When a table has more than one candidate key,
anomalies may result even though the relation is in 3NF.
Identifying Determinants
emp_id → emp_name, department, manager, salary ✅
Why? emp_id uniquely determines all the attributes of an
employee in the table.
department → manager ✅
Why? If each department has one manager, then department
determines manager.
Note: Not all determinants are candidate keys – for example,
department determines manager, but it does not uniquely identify
other columns (ie not the entire row).
Employee Table
emp_id ✅
Why? It can uniquely identify each column in the entire row and is
minimal, ie it has no subset that can equally identify each column.
department ❌
Why Not? It can only uniquely identify the manager but not other
columns.
Note: ALL candidate keys are determinants, but NOT ALL
determinants are candidate keys.
Each candidate key must satisfy two properties:
uniqueness (no two distinct tuples have the same values for the
candidate key)
and
minimality (removing any attribute from the candidate key would
cause it to lose its uniqueness property).
This means that the dependency does not add any new
constraint to the relationship.
Key Differences
Understanding the Rule
Trivial dependency:
A functional dependency X → Y is trivial if Y ⊆ X. These are
always valid, even in unnormalized relations, and not restricted
by BCNF.
Example:
The semantic rules (business rules applied to the database) for
the table below are:
1. Each Student may major in several subjects.
2. For each Major, a given Student has only one Advisor.
3. Each Major has several Advisors.
4. Each Advisor advises only one Major.
5. Each Advisor advises several Students in one Major.
The functional dependencies for this table are listed below. The
first one is a candidate key; the second is not – why not.
Condition 2:
Every determinant in a BCNF Relation should be a candidate
key / super key.
A determinant is the left-hand side of a functional
dependency (FD), like X → Y.
Saying that every determinant is a candidate key or super
key means: For all X → Y, X must be a super key (i.e.,
uniquely identifies a row).
Condition 3:
not a subset of 𝑋)
This just limits the scope to non-trivial dependencies (where
Y is not part of X).
Semantically:
No real difference. Both are saying:
For every functional dependency X → Y (especially non-trivial
ones), X must be a super key.
Syntactically:
Condition 2 is more general, applying to all determinants.
New Comparison:
Original Condition 2:
"Every determinant must be a candidate key or super key.
"Modified Condition 3:
"All functional dependencies (trivial or non-trivial) must have a
candidate key / super key as the determinant.“
Both conditions can now be summarized as saying: