0% found this document useful (0 votes)
2 views

4-Database Design Theory-Without Inclass Exercises (1)

Uploaded by

epicfacegotyou
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

4-Database Design Theory-Without Inclass Exercises (1)

Uploaded by

epicfacegotyou
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 121

Module 4: Database Design Theory

and Normalization

This image will be replaced with something course specific.


In This Module
• How can we measure the quality
of database design?

• What is a functional dependency


(FD) constraint?

• How do we reason with FDs?

• What is a normal form (NF)?

• How do you achieve a (higher) NF?

Page 2
Learning Outcomes
• After successfully completing this module you should be able to reason with
the logical foundation of the relational data model and understand the
fundamental principles of correct relational database design.

• Provide examples of modification, insertion, and deletion anomalies.


• Given a set of functional dependencies that hold over a table,
determine associated keys and superkeys.

• Given a set F of functional dependencies and set X of attributes of a


relation, compute the closure of X, which is X+.

• Justify why lossless-join decompositions are preferred decompositions.

• Given a relation schema R and a set of functional dependencies on it,


show that R is/isn’t in 1NF, 2NF, 3NF, BCNF.

• Given a universal relation schema R and a set of functional dependencies


decompose into a set of lossless 3NF or BCNF relationships.
Page 3
Design Guidelines
Functional Dependencies

Normalization

Relational Database Schema Design

Page 4
Informal Design Guidelines
Informal measures of relational database schema quality
and design guidelines

1. Making sure that the semantics of the attributes is clear in the


schema.

2. Reducing the redundant values in tuples.

3. Reducing the null values in tuples.

4. Disallowing the possibility of generating spurious tuples.

Page 5
Guideline 1
Design each relation so that it is easy to explain its meaning

Using meaningful Do not combine attributes from


names multiple entity types and relationship
types into a single relation
• This can make the meaning of an
entity type confusing
• This can also lead to redundancy

EMPLOYEE F.K.
Ename Ssn Bdate Address Dnumber

P.K.
EMP_DEPT
DEPARTMENT F.K.
Ename Ssn Bdate Address Dnumber Dname Dmgr_ssn
Ename Dnumber Dmgr_ssn
P.K.

Page 6
Redundant Values in Tuples
• One design goal is to minimize the storage space that base relations
occupy.

• The way in which attributes are grouped into relational schema has
a significant impact on storage space.

• In addition, an incorrect grouping may cause update anomalies


which may result in inconsistent data or even loss of data.

Page 7
A Motivating Example
Let’s consider an example of a company where an employee’s salary
directly corresponds to the level or position, they hold. For example, a
manager has a fix salary of $70,00 and a developer has a fix salary of
$60,000.

ID name level salary

Employee

ID Name Level Salary


1 Paris Developer 60,000
2 Anna Manager 70,000
3 Ben Manager 70,000
4 Rose Driver 50,000
5 Jack Developer 60,000
Page 8 6 Charlie Administration 50,000
Update Anomalies
Employee
ID Name Level Salary
1 Paris Developer 60,000
2 Anna Manager 70,000
3 Ben Manager 70,000
4 Rose Driver 50,000
5 Jack Developer 60,000
6 Charlie Administration 50,000

Modification Anomalies
• Updating the Salary of one developer, makes the “Developer” salary
inconsistent.
Deletion Anomalies
• By deleting ”Charlie” we no longer store the salary of “Administration” Staff.

Insertion Anomalies
• We cannot store the salary of a “Cook” if no employee has that position.
• Inserting a new row with a different Salary for a developer, makes the
“Developer” salary inconsistent.
Page 9
Guideline 2
Design the base relation schema so that no insertion, deletion, or
modification anomalies occur in the relations

• If any do occur, ensure that all applications that access the database
update the relations in such a way as to not compromise the integrity
of the database

Page 10
Guideline 3
As far as possible, avoid placing attributes in a base relation
whose values may be null

• Null values waste storage space, introduce ambiguity, and cannot be


used for comparison

• If nulls are unavoidable, make sure that they apply in exceptional


cases only in the relation

Page 11
Decomposing a Relation
• A decomposition of R replaces R by two or more relations
such that:
• Each new relation contains a subset of the attributes of R
(and no attributes not appearing in R)
• Every attribute of R appears in at least one new relation.
R1 A B
R
1 2
A B C
4 5
1 2 3 decomposition
7 8
4 5 6
R2 B C
7 8 9
2 3
5 6
8 9
Page 12
Decomposing a Relation Example
Employee
ID Name Level Salary
1 Paris Developer 60,000
2 Anna Manager 70,000
3 Ben Manager 70,000
4 Rose Driver 50,000
5 Jack Developer 60,000
6 Charlie Administration 50,000

ID Name Salary Salary Level ID Name Level Level Salary


1 Paris 60,000 60,000 Developer 1 Paris Developer Developer 60,000
2 Anna 70,000 2 Anna Manager
70,000 Manager Manager 70,000
3 Ben 70,000 3 Ben Manager
50,000 Driver Driver 50,000
4 Rose 50,000 4 Rose Driver
50,000 Administration 5 Jack Developer Administration 50,000
5 Jack 60,000
6 Charlie Administration
6 Charlie 50,000

How should we decompose


Page 13
relations to remove anomalies?
The Join Operation
Definition: R1⋈ R2 is the natural join of the two relations
• Each tuple of R1 is concatenated with every tuple in R2
having the same values on the common attributes

R1 A B
1 2 A B C
4 5 R1 ⋈ R2 1 2 3
7 8 4 5 6
7 8 9
R2 B C
2 3
5 6
8 9
Page 14
Lossless Join Decomposition
Decomposition of R into R1 and R2 is a lossless join if for every
instance r that satisfies F: We will discuss F and r satisfying F later

R = R1 ⋈ R2

Informally: If we break a relation R into bits, when we put the bits back
together, we should get exactly R back again.

Page 15
Example Lossy Join Decomposition
R1 A B
R 1 2 R’
A B C 4 5 A B C
Decompose 7 2 Join 1 2 3
1 2 3
4 5 6 1 2 9
R2 B C
7 2 9 4 5 6
2 3 7 2 3
5 6 7 2 9
2 9
The word loss in lossless refers to loss of information, not loss of tuples
• Maybe a better term here is “addition of spurious information”

...there are two extra rows. Why? What if B determined C?


Page 16
Guideline 4
Design the relation schemas so that they can be (relationally) joined
with equality conditions on attributes that are either primary keys or
foreign keys in a way that guarantees that no spurious tuples are
generated

Page 17
Design Guidelines

Functional Dependencies
Normalization

Relational Database Schema Design

Page 18
Functional Dependency
How can we know for sure if all employee members appointed at the
same level have the same salary?
Databases allow you to say that one attribute determines another
through a functional dependency
Assume level determines salary but not id. We say that there is a
functional dependency from level to salary (i.e., if we know an
employee's level, we can find their salary )

ID name level salary

Employee

Page 19
Functional Dependency, Formally
A functional dependency (FD) XàY holds on relation R if for every
legal instance of R such as r, for all tuples t1, t2:
if t1[X] = t2[X] à t1[Y] = t2[Y]

• Which means given two tuples in r, if their X values agree, then their Y
values must also agree
• Example: level à salary (i.e., if two employees have the same level,
then they must have the same salary)

An FD X à Y is a constraint between two sets of attributes X and Y in a


relational schema R
• It specifies a restriction on the possible tuples that can form a relation
instance of R
Page 20
Identifying Functional Dependencies
A FD is a statement about all allowable instances
• Must be identified by application semantics
• Given some instance of R, we can check if it violates some FD f
but we cannot tell if f holds over R!
ID Name Level Salary
1 Paris Developer 60,000
2 Anna Manager 70,000
3 Ben Manager 70,000
4 Rose Driver 50,000
5 Jack Developer 60,000
6 Charlie Administration 50,000

Based on this instance alone, we cannot conclude that level à salary


How can we find them then? Using our knowledge of the system or the
UoD.
Page 21
Question: Functional Dependencies
Consider the relation R with the following instance:

A B C D
1 2 3 4
2 3 4 6
6 7 8 9
1 3 4 5

Which FDs cannot be true given the instance above?


A. Bà C
B. Bà D
C. Dà B
D. All of the above can be true
E. None of the above can be true
Page 22
Fixing Anomalies
Employee level à salary
ID Name Level Salary
1 Paris Developer 60,000
2 Anna Manager 70,000
3 Ben Manager 70,000
4 Rose Driver 50,000
5 Jack Developer 60,000
6 Charlie Administration 50,000

ID Name Level Level Salary


1 Paris Developer Developer 60,000
2 Anna Manager Manager 70,000
3 Ben Manager Driver 50,000
4 Rose Driver Administration 50,000
5 Jack Developer
6 Charlie Administration
Page 23
Anomalies Fixed?
Level à Salary
ID Name Level Level Salary
1 Paris Developer Developer 60,000
2 Anna Manager Manager 70,000
3 Ben Manager Driver 50,000
4 Rose Driver Administration 50,000
5 Jack Developer Cook 50,000
6 Charlie Administration

Modification Anomalies Fixed


• Updating the Salary of one developer no longer makes the “Developer” salary
inconsistent.

Deletion Anomalies Fixed


• Deleting ”Charlie” no longer removes the salary of “Administration” Staff.

Insertion Anomalies Fixed


• We can now store the salary of a “Cook” without any employees holding that position.
• Inserting a new row with a different Salary for a developer is no longer possible
Page 24
General Anomaly Fixing
We were able to fix the anomalies by splitting the Staff table into two
tables
In the rest of this module, we will discuss how anomalies can be
addressed formally

Page 25
Question: Anomalies
Consider the following relation R with the given functional dependencies.
A
A B
B C
C D
D
R[A, B, C, D]: 11 44 22 55
D à AC
22 33 44 33
11 11 22 55

Which of the following is not an example of an update anomaly?


A. Deleting <2, 3, 4, 3> from R
B. Inserting values <3, 5, 3, 3> into R
C. Modifying <1, 1, 2, 5> to <1, 2, 2, 5> in R
D. Inserting values <1, null, 2, 4> into R
E. Modifying <1, 1, 2, 5> to <1, 2, 3, 5> in R

Page 26
Keys
A key is a minimal set of attributes that uniquely identify a relation
• i.e., a key is a minimal set of attributes that functionally determines all
the attributes in the relation
A superkey for a relation uniquely identifies the relation

ID name level salary

Employee

• Example key: {ID}


• Example superkey: {ID, level}
Page 27
Question: Possible Keys
Assume that the following FDs hold for a relation R(A,B,C,D):
BàC
CàB
DàABC

Which of the following is a key for the above relation?


A. B
B. C
C. BD
D. All of the above
E. None of the above

Page 28
Question: Possible Superkeys
Assume the following FDs hold for relation R(A,B,C,D):
BàC
CàB
DàABC

Which of the following is a superkey for the above relation?


A. D
B. BD
C. BCD
D. All are superkeys
E. None are superkeys

Page 29
Explicit and Implicit FDs
Given a set of (explicit) functional dependencies, we can determine
implicit ones
• Explicit FDs: IDà level, levelà salary
• Implicit FD: IDà salary

Implicit FDs are also called inferred FDs

The notation F ⊨ X à Y denotes that FD X à Y can be inferred from


the set of functional dependencies F
• X is called the left-hand side (LHS)
• Y is called the right-hand side (RHS)
• Y can be derived from X under F
• X implies Y under F

Page 30
Closure of F
Given a set F of FDs, many implicit FDs can be derived
Trivial FDs
• Everything implies itself, such as A à A, ABC à AB
• These are called trivial FDs (i.e., an FD is trivial if the LHS contains
the RHS)
• The inference of trivial FDs does not depend on any F

Non-trivial FDs, such as A à B, A à AB


• The inference of non-trivial FDs depends on a given F

Closure of F (denoted as F+): the set of all FDs that can be implied by F
• F+ includes both trivial and non-trivial FDs

Page 31
Inference Rules
It is practically impossible to specify all possible FDs that may hold

To infer additional FDs from a set of valid FDs we need a system of


inference rules
• There are 6 inference rules
• The first three rules are referred to as Armstrong’s Axioms

Page 32
Armstrong’s Axioms
Armstrong’s Axioms (X, Y, Z are sets of attributes):
• Reflexivity: If Y ⊆ X, then X à Y
- e.g., {ID, level} à level
• Augmentation: If X à Y, then XZ à YZ for any Z
- e.g., if ID à level then {ID, salary} à {level, salary}
• Transitivity: If X à Y and Y à Z, then X à Z
- e.g., if ID à level and level à salary then ID à salary

These three rules are sound and complete


• Sound: Given a set of FDs F, holding on a relation schema R, any FD
that we can infer from F by using these three rules holds in every legal
relation instance
• Complete: Repeatedly applying these three rules generates all
possible FDs that can be inferred from F
Page 33
Additional Rules
The following rules are frequently used for convenience, but can be
derived using Armstrong’s Axioms

• Decomposition: if X à YZ then X à Y and X à Z


- e.g., if ID à {level, salary} then ID à level and ID à salary

• Union: if X à Y and X à Z then X à YZ


- If ID à level and ID à salary then ID à {level, salary}

• Pseudotransitivity: if X à Y and WY à Z then WX à Z


- If level à salary and {awards, salary} à raise then
{awards, level} à raise

Page 34
Proof of Decomposition Rule
To prove IR4: X à YZ ⊨ X à Y

IR1 Reflexivity: Y ⊆ X ⊨ X à Y
IR2 Augmentation: X à Y, ⊨ XZ à YZ
IR3 Transitivity: {X à Y, Y à Z} ⊨ X à Z

Proof:
1: X à YZ: Given
2: YZ à Y: Using IR1 (Y is a subset of YZ)
3: X à Y: Using IR3 on results of 1 & 2

Page 35
Proof of Union Rule
To prove IR5: {X à Y, X à Z} ⊨ X à YZ

IR1 Reflexivity: Y ⊆ X ⊨ X à Y
IR2 Augmentation: X à Y, ⊨ XZ à YZ
IR3 Transitivity: {X à Y, Y à Z} ⊨ X à Z

Proof:
1: X à Y: Given
2: X à Z: Given
3: XX à XY: Using IR2 on 1 (adding X)
4: X à XY: Since XX=X (not using IR1-R3)
5: XY à YZ: Using IR2 on 2 (adding Y)
6: X à YZ: Using IR3 on 4 & 5
Page 36
Proof of Pseudotransitive Rule
To prove IR6: {X à Y, WY à Z} ⊨ WX à Z

IR1 Reflexivity: Y ⊆ X ⊨ X à Y
IR2 Augmentation: X à Y, ⊨ XZ à YZ
IR3 Transitivity: {X à Y, Y à Z} ⊨ X à Z

Proof
1: X à Y: Given
2: WY à Z: Given
3: WX à WY: Using IR2 on 1 (adding W)
4: WX à Z: Using IR3 on 3 & 2

Page 37
The Six Inference Rules

IR1 IR2 IR3

Reflexivity: Augmentation: Transitivity:


Y⊆ X⊨ XàY X à Y ⊨ XZ à YZ {X à Y, Y à Z} ⊨ X à Z

IR4 IR5 IR6

Decomposition: Union: Pseudotransitivity:


X à YZ ⊨ X à Y {X à Y, X à Z} ⊨ X à YZ {X à Y, WY à Z} ⊨ WX à Z

Page 38
F+ and X+
F+ is all FDs which can be derived from F
• The six inference rules can be used to compute F+
• Too many and too time-consuming to compute
• And not necessary

X+ is the set of attributes determined by X under F which is called the


closure of X
• e.g., ID+ = {name, level, salary}
• Computing X+ is easy

Page 39
Computing X+ (Informally)
1. X+ initially contains all attributes in X

2. For each FD in the set F:

If the LHS of the FD is a subset of X+ then add the RHS to X+

3. If step 2 resulted in changes in X+ then repeat 2, otherwise finish

Example
Employee (ID, level, salary), F{IDà level, levelà salary, IDà name}
1. ID+ = {ID}
2. ID+ = {ID, level} using IDà level
3. ID+ = {ID, level, salary} using levelà salary
4. ID+ = {ID, name, level, salary} using IDà name
Page 40
Computing X+

X+ := X;
repeat
old X+ := X+ ;
for each FD Y à Z in F do
if Y Í X+ then X+ = X+ È Z;
until (old X+ = X+ );

Example
Employee (ID, level, salary), F{IDà level, levelà salary, IDà name}
1. ID+ = {ID}
2. ID+ = {ID, level} using IDà level
3. ID+ = {ID, level, salary} using levelà salary
4. ID+ = {ID, name, level, salary} using IDà name
Page 41
Example X+ Computation
R (pNumber, pName, pLocation, dNum, dName, mgrSSN, mgrStartDate)
F={
pNumber à {pName, pLocation, dNum},
dNum à {dName, mgrSSN, mgrStartDate}
}
X = {pNumber}

1. X+ := {pNumber}
2. X+ := {pNumber, pName, pLocation, dNum}
3. X+ := {pNumber, pName, pLocation, dNum, dName, mgrSSN,
mgrStartDate}

Page 42
Example: Supplier-Part DB
Suppliers supply parts to projects.
• SupplierPart (sName, city, status, pNum, pName, qty)
- supplier attributes: sName, city, status
- part attributes: pNum, pName
- supplier-part attributes: qty

Functional dependencies:
• fd1: sName à city
• fd2: city à status
• fd3: pNum à pName
• fd4: sName, pNum à qty

Page 43
Example Finding Superkey
Exercise: Show that (sName, pNum) is a superkey for SupplierPart

SupplierPart (sName, city, status, pNum, pPame, qty)

F={
sName à city,
city à status,
pNum à pName,
{sName, pNum} à qty,
}

• {sName, pNum}+ = {sName, pNum}


• {sName, pNum}+ = {sName, pNum, city} Using sName à city

• {sName, pNum}+ = {sName, pNum, city, status} Using city à status

• {sName, pNum}+ = {sName, pNum, city, status, pName} Using pNum à pName

• {sName, pNum}+ = {sName, pNum, city, status, pName, qty} Using {sName, pNum} à qty
Page 44
A Note About Finding Keys
Given: A complete set of FDs F on relation R

Approach: for any subset S of attributes in R, S is a key iff (1) S+ = R,


and (2) there is no S’ ⊂ S such that S’+= R. If R has n attributes, there
exist 2n subsets to consider

Tips for finding keys:


• If an attribute does not appear on the RHS of any FDs in F, a key
must contain that attribute
• If a subset S is a key, there is no need to test any superset of S (they
must be a superkey and cannot be a key)
• One relation can have multiple keys of different length
- For example, if ABC is a key, ABCD cannot be a key, but ABDE can
also be a key

Page 45
Example Finding Key
Exercise: Show that (sName, pNum) is a key for SupplierPart

SupplierPart (sName, city, status, pNum, pPame, qty)

F={
sName à city,
city à status,
pNum à pName,
{sName, pNum} à qty,
}

• Show {sName, pNum}+ = {sName, pNum, city, status, pName, qty}


• Show sName is not key {sName}+ = {sName, city, status}
• Show pNum is not key {pNum}+ = {pNum, pName}

Page 46
Question: Finding Keys
Assume that the following FDs hold for a relation R(ABCDEF):
BàCF
CàE
EFàD

Which of the following is a key for the above relation?


A. B
B. BE
C. EF
D. AB
E. None of the above

Page 47
Question: Finding Key
Which of the following is a key of relation R(ABCDE) with
F = {D à C,
CE à A,
D à A,
AE à D}

A. ABDE
B. BCE
C. CDE
D. All of these are keys
E. None of these are keys

Page 48
Question: Finding Keys
Assume that the following FDs hold for a relation R(ABCDEF):
ABàE
CàBE
EDàF

Which of the following is a key for the above relation?


A. AB
B. ABC
C. ACD
D. AD
E. None of the above

Page 49
Approaching Normality
Role of FDs in detecting redundancy:
• If there is no FD (beyond keys), there would be no anomalies.
• Consider a relation R with 3 attributes, A, B, C
- No FDs hold: There is no redundancy here
- Given A à B: Several tuples could have the same A value, and if so,
they’ll all have the same B value!

Normalization: the process of removing redundancy from data

We were able to fix the anomalies by splitting (decomposing) the


Employee table discussed before, but how should we do this more
formally? and how is it related to functional dependencies?

Page 50
Design Guidelines

Functional Dependencies

Normalization
Relational Database Schema Design

Page 51
Normalization
Normalization is a process that aims at achieving better designed
relational database schemas using
• Functional Dependencies
• Primary Keys

The normalization process takes a relational schema through a


series of tests to certify whether it satisfies certain conditions
• The schemas that satisfy certain conditions are said to be in a given
Normal Form
• Unsatisfactory schemas are decomposed by breaking up their
attributes into smaller relations that possess desirable properties
(e.g., no anomalies)

Page 52
“Key” Concepts
Superkey: a set of attributes such that no two tuples have the same values
for these attributes
• That is, its closure contains all attributes in the relation
(Candidate) key: a minimal superkey
• Minimal ≠ shortest
• There can be many candidate keys for one relation
• Any superkey must contain at least one candidate key
Primary key: one of the selected candidate keys
• There can be only one primary key for a relation
Prime attribute: an attribute in any candidate key
• Not necessarily a member of the primary key!
Non-prime attribute: An attribute that is not a member of any candidate key

Page 53
Normalization – Overview
1NF 3NF
• Outcome: Removal of non- • Outcome: Removal of partial
atomic values from relations. and transitive dependencies,
• Test: Relation should have no which remove most anomalies
multivalued attributes or • Test: LHS of any non-trivial FD
nested relations. in F+ is a superkey or RHS is a
2NF prime attribute.
• Outcome: Removal of partial BCNF
dependencies, which remove • Outcome: Removal of all
some anomalies. anomalies at the cost of not
• Test: LHS of any non-trivial FD preserving all FDs
in F+ is not a proper subset of a • Test: LHS of any non-trivial FD
candidate key or RHS is a in F+ is a superkey.
prime attribute
Normalization – Overview
Employee
ID Name Level Salary
1 Paris Developer 60,000
2 Anna Manager 70,000
3 Ben Manager 70,000
4 Rose Driver 50,000
5 Jack Developer 60,000
6 Charlie Administration 50,000

Level à Salary

1NF 2NF 3NF BCNF

Page 55
Normalization – Overview
1NF 3NF
• Outcome: Removal of non- • Outcome: Removal of partial
atomic values from relations. and transitive dependencies,
• Test: Relation should have no which remove most anomalies
multivalued attributes or • Test: LHS of any non-trivial FD
nested relations. in F+ is a superkey or RHS is a
2NF prime attribute.
• Outcome: Removal of partial BCNF
dependencies, which remove • Outcome: Removal of all
some anomalies. anomalies at the cost of not
• Test: LHS of any non-trivial FD preserving all FDs
in F+ is not a proper subset of a • Test: LHS of any non-trivial FD
candidate key or RHS is a in F+ is a superkey.
prime attribute
First Normal Form (1NF)
A relation schema is in 1NF if the domains of the attributes include
only atomic (simple, indivisible) values

• The value of an attribute is a single value from the domain of that


attribute

• 1NF disallows having a set of values, a tuple of values


(nested attributes), or a combination of both

Page 57
Examples of Non-1NF Relations

customerName orderNum items


Tom Jones 123 Hat
Sri Gupta 876 Glass, Pencil

name
orderNum item
firstName familyName
Tom Jones 123 Hat
Sri Gupta 876 Glass
Sri Gupta 876 Pencil

Page 58
Normalized to 1NF

customerName orderNum item


Tom Jones 123 Hat
Sri Gupta 876 Glass
Sri Gupta 876 Pencil

customerName orderNum item1 item2


Tom Jones 123 Hat null
Sri Gupta 876 Glass Pencil

Problems:
• Above: redundancy
• Below: lack of flexibility

Page 59
Normalization – Overview
1NF 3NF
• Outcome: Removal of non- • Outcome: Removal of partial
atomic values from relations. and transitive dependencies,
• Test: Relation should have no which remove most anomalies
multivalued attributes or • Test: LHS of any non-trivial FD
nested relations. in F+ is a superkey or RHS is a
2NF prime attribute.
• Outcome: Removal of partial BCNF
dependencies, which remove • Outcome: Removal of all
some anomalies. anomalies at the cost of not
• Test: LHS of any non-trivial FD preserving all FDs
in F+ is not a proper subset of a • Test: LHS of any non-trivial FD
candidate key or RHS is a in F+ is a superkey.
prime attribute
Full and Partial Functional Dependencies
A functional dependency X à Y is a full functional dependency if removal of
any attribute A from X means that the dependency does not hold anymore.
• A Î X, (X - {A}) does not functionally determine Y

A functional dependency X à Y is a partial dependency if some attribute A


can be removed from X and the dependency still holds.
• A Î X, (X - {A}) à Y

Example
Address [houseNum, street, postcode, state, value]
F = {{houseNum, street, postcode} à {state, value}, postcode à state}
• {houseNum, street, postcode} à value full functional dependency
- Since no subset of {houseNum, street, postcode} determines value

• {houseNum, street, postcode} à state partial dependency


- Since postcode à state
Page 61
Second Normal Form (2NF)
A relation schema R is in 2NF if every non-prime attribute A in R is
fully functionally dependent on the primary key of R.

Example
Address [houseNum, street, postcode, state, value]
F = {{houseNum, street, postcode} à {state, value}, postcode à state}
• Key = {houseNum, street, postcode}
• State is not a prime attribute
• State is partially dependent on the primary key (because of postcode
à state)
FD postcode à state violates 2NF; therefore, Address is not in 2NF

…2NF can be said informally to have no partial dependency


Page 62
Second Normal Form – More Formally
A relation schema R with the set F of FDs is in 2NF iff
• For all subset of attributes X Ì R
• and, for all attributes A Î R
• such that for any non-trivial FD: X à A in F+
Then
1. X is NOT a proper subset of a candidate key for R, or
2. A is a prime attribute

Example
• Address [houseNum, street, postcode, state, value]
• F = {{houseNum, street, postcode} à {state, value}, postcode à state}

Consider X à A where X = postcode and A = state


• 1) state is not a prime attribute, and
• 2) postcode à state is a non-trivial FD, and
• 3) postcode is a proper subset of {houseNum, street, postcode}, which is a
Page 63 candidate key
Example Normalized to 2NF
Example
• Address [houseNum, street, postcode, state, value]
• F = {{houseNum, street, postcode} à {state, value},
postcode à state}

Normalizing Address
• Address [houseNum, street, postcode, value]
• Postcodes [postcode, state]
Address.postcode à Postcodes.postcode

2NF eliminates anomalies due to partial dependencies. However, 2NF


does not eliminate anomalies which are due to transitive dependencies

Page 64
Transitive Dependency
A functional dependency X à Y in a relation R is a transitive
dependency if there exists a set of attributes Z in R such that:
• Z is neither a candidate key nor a subset of any key of R.
• Both X à Z and Z à Y

Example
• Employee [ID, name, level, salary]
• level à salary

Consider the FD ID à salary


• There exists an attribute “level” which is neither a candidate key
nor a subset of any key
• ID à level and level à salary hold
• ID à salary is a transitive dependency
Page 65
Problems with 2NF Relations
ID Name Level Salary
Employee [ID, name, level, salary] 1 Paris Developer 60,000
• level à salary 2 Anna Manager 70,000
3 Ben Manager 70,000
4 Rose Driver 50,000
level à salary does not violate 2NF 5 Jack Developer 60,000
• Staff is in 2NF 6 Charlie Administration 50,000

Modification Anomalies
• Updating the Salary of one developer, makes the “Developer” salary inconsistent.

Deletion Anomalies
• By deleting ”Charlie” we no longer store the salary of “Administration” Staff.

Insertion Anomalies
• We cannot store the salary of a “Cook” if no employee has that position.
• Inserting a new row with a different Salary for a developer, makes the “Developer”
salary inconsistent.

Page 66
Normalization – Overview
1NF 3NF
• Outcome: Removal of non- • Outcome: Removal of partial
atomic values from relations. and transitive dependencies,
• Test: Relation should have no which remove most anomalies
multivalued attributes or • Test: LHS of any non-trivial FD
nested relations. in F+ is a superkey or RHS is a
2NF prime attribute.
• Outcome: Removal of partial BCNF
dependencies, which remove • Outcome: Removal of all
some anomalies. anomalies at the cost of not
• Test: LHS of any non-trivial FD preserving all FDs
in F+ is not a proper subset of a • Test: LHS of any non-trivial FD
candidate key or RHS is a in F+ is a superkey.
prime attribute
Third Normal Form (3NF)
A relation schema R with the set F of FDs is in 3NF iff
• For all subset of attributes X Ì R
• and, for all attributes A Î R
• such that for any non-trivial FD: X à A in F+
Then
1. X is a superkey for R, or
2. A is a prime attribute

In other words, a relation is in 3NF iff for any non-trivial FD X à A


where A is a non-prime attribute, X must be a superkey

Example
• Employee [ID, name, level, salary]
- level à salary (transitive dependency)
• level is not a superkey and salary is not a prime attribute
• level à salary violates 3NF; therefore Employee is not in 3NF
Page 68
Example Normalized to 3NF
Example
• Employee [ID, name, level, salary]
- level à salary (transitive dependency)

Normalizing Employee
• StaffAppointment [ID, name, level]
• StaffIncome [level, salary]
StaffAppointment.level à StaffIncome.level

Most 3NF tables are free of anomalies; however, some 3NF tables,
rarely met with in practice, are still affected by anomalies.

Page 69
Question: partial dependency vs. transitive dependency
What is the difference between partial dependency and transitive dependency?
A. Partial dependency is where an attribute only depends on a subpart of the primary key to be
identified. Normalizing to third normal form solves this. Transitive dependency is where a
non-prime attribute depends on other non-prime attributes to be identified. Normalizing to
second normal form solves this.
B. Partial dependency is where an attribute only depends on a subpart of the primary key to be
identified. Normalizing to second normal form solves this. Transitive dependency is where a
non-prime attribute depends on other non-prime attributes to be identified. Normalizing to
third normal form solves this.
C. Partial dependency is where an attribute only depends on a subpart of the primary key to be
identified. Normalizing to second normal form solves this. Transitive dependency is where a
non-prime attribute depends on other non-prime attributes to be identified. Normalizing to
third normal form solves this.
D. Partial dependency is where a non-prime attribute depends on other non-prime attributes to
be identified. Normalizing to third normal form solves this. Transitive dependency is where an
attribute only depends on a subpart of the primary key to be identified. Normalizing to second
normal form solves this.

Page 70
Problems with 3NF Relations
Consider the following example: Teach
• Teach [studentID, courseID, lecturer] studentID courseID lecturer
• {studentID, courseID} à lecturer 1234 INFS1200 Jane
• lecturer à courseID
- Keys: {studentID, courseID}, {studentID, lecturer}
- Teach is in 3NF

Existing anomalies
• Deletion anomaly
- E.g., Deleting studentID 1234 would lead to the unwanted deletion of the
lecturer of INFS1200
• Insertion anomaly
- E.g., cannot store the lecturer of a course that doesn’t have students.

Page 71
Normalization – Overview
1NF 3NF
• Outcome: Removal of non- • Outcome: Removal of partial
atomic values from relations. and transitive dependencies,
• Test: Relation should have no which remove most anomalies
multivalued attributes or • Test: LHS of any non-trivial FD
nested relations. in F+ is a superkey or RHS is a
2NF prime attribute.
• Outcome: Removal of partial BCNF
dependencies, which remove • Outcome: Removal of all
some anomalies. anomalies at the cost of not
• Test: LHS of any non-trivial FD preserving all FDs
in F+ is not a proper subset of a • Test: LHS of any non-trivial FD
candidate key or RHS is a in F+ is a superkey.
prime attribute
Boyce-Codd Normal Form (BCNF)
A relation schema R with the set of F of FDs is in BCNF iff
• For all subset of attributes X Ì R
• and, for all attributes A Î R
• such that for any non-trivial FD: X à A in F+
Then
1. X is a superkey for R

Informally: Whenever a set of attributes of R determine another attribute, it


should determine all the attributes of R.

Example
• Teach [studentID, courseID, lecturer]
• {studentID, courseID} à lecturer
• lecturer à courseID
- Keys: {studentID, courseID}, {studentID, lecturer}
- Lecturer is not a superkey
- lecturer à courseID violates BCNF; therefore, Teach is not in BCNF
Page 73
BCNF Example
For the given relation schema R with the set of FDs F, determine
whether or not R is in BCNF

R (A, B, C, D), F = {AD à BC, B à A}

Find candidate keys: AD and BD


B à A is a non-trivial FD, B is not a superkey
B à A violates BCNF, therefore, R is not in BCNF

Page 74
BCNF is Great, But…
• Guaranteed that there will be no redundancy of data
• Easy to understand (just look for superkeys)
• Easy to do

So what is the main problem with BCNF?


• It may not preserve all functional dependencies

Page 75
Dependency Preservation

Given a relation R and a set F of FD, when R is decomposed into R1, R2,
… Rn, F is decomposed into F1, F2, … Fn
• Fi contains all FDs in F with the attributes completely in Ri
R is dependency preserving If (F1 ∪ F2 ∪ … ∪ Fn)+ = F+

Example: R(ABCD) and F = {AàB, AàD}


• For R1=(ABC) and R2=(BCD), F1 = {AàB}, F2=∅
- AàD is not preserved

• For R1=(ABC) and R2=(ABD), F1 = {AàB}, F2={AàD}


- All FDs in F are preserved

Page 76
Problems with BCNF Relations
studentID courseID lecturer {studentID, courseID} à lecturer
lecturer à courseID
• This relation in not in BCNF

Is lecturer a key?

studentID lecturer lecturer courseID

lecturer à courseID

• The new relations no longer violate BCNF


• {studentID, courseID} à lecturer is no longer preserved
Page 77
So What’s the Problem?
lecturer courseID studentID lecturer
John INFS1200 1234 John
Jane INFS1200 1234 Jane
lecturer à courseID
No problem so far. All local FD’s are satisfied.
Let’s join the relations back into a single table again:

studentID courseID lecturer


1234 INFS1200 John
1234 INFS1200 Jane
Violates the FD: studentID, courseID à lecturer
Decomposition into BCNF may lead to dependencies
not being preserved.
Page 78
BCNF Ì 3NF Ì 2NF Ì 1NF
For a relation R with FD set F, for any non-trivial X à A in F+
• 1NF: Removal of non-atomic values from relations
- Relation should have no multivalued attributes or nested relations

• 2NF: Removal of partial dependencies


- X is not a proper subset of a candidate
key for R, or A is a prime attribute 1NF

• 3NF: Removal of partial and transitive


2NF
dependencies
- X is a superkey for R, or A is a prime 3NF
attribute
BCNF
• BCNF: Removal of all anomalies at the
cost of not preserving all FDs
- X is a superkey for R
Page 79
Question: Validating Normal Form
Assume that the following FDs hold for a relation R(ABCDEF):
ABàCDE
2NF: XàA
CàF X is not a proper subset of a candidate key for R, or
EàAB A is a prime attribute
3NF:
X is a superkey for R, or
A is a prime attribute
BCNF:
X is a superkey for R

What is the highest normal form for the above relation?


A. 1NF
B. 2NF
C. 3NF
D. BCNF
Page 80
Question: Validating Normal Form
Assume that the following FDs hold for a relation R(ABCD):
ABàCD
2NF: XàA
CDàA X is not a proper subset of a candidate key for R, or
DàB A is a prime attribute
3NF:
X is a superkey for R, or
A is a prime attribute
BCNF:
X is a superkey for R

What is the highest normal form for the above relation?


A. 1NF
B. 2NF
C. 3NF
D. BCNF
Page 81
Question: Validating Normal Form
Assume that the following FDs hold for a relation R(ABCDE):
BàCD
2NF: XàA
AàE X is not a proper subset of a candidate key for R, or
A is a prime attribute
3NF:
X is a superkey for R, or
A is a prime attribute
BCNF:
X is a superkey for R

What is the highest normal form for the above relation?


A. 1NF
B. 2NF
C. 3NF
D. BCNF
Page 82
Question: Validating Normal Form
Assume that the following FDs hold for a relation R(ABCDE):
AàBCDE
2NF: XàA
EàA X is not a proper subset of a candidate key for R, or
A is a prime attribute
3NF:
X is a superkey for R, or
A is a prime attribute
BCNF:
X is a superkey for R

What is the highest normal form for the above relation?


A. 1NF
B. 2NF
C. 3NF
D. BCNF
Page 83
Question: BCNF and 3NF
Consider relation R(ABCD) and the following FDs:

ACD à B ; AC à D ; D à C ; AC à B

Which of the following is true:


A. R is in neither BCNF nor 3NF
B. R is in BCNF but not 3NF
C. R is in 3NF but not in BCNF
D. R is in both BCNF and 3NF

Page 84
Normalization
The normalization process takes a relational schema through a
series of tests to certify whether it satisfies certain conditions
• The schemas that satisfy certain conditions are said to be in
a given Normal Form Unsatisfactory schemas are decomposed by
breaking up their attributes into smaller relations that possess
desirable properties

• Most organizations aim for designing relational databases that are in


BCNF 3NF

• Removes all anomalies • Does not removes all anomalies


• Does not preserve all FDs • Preserve all FDs

How can we design relational databases that are in a given


Normal Form (3NF or BCNF)?
Page 85
Design Guidelines

Functional Dependencies

Normalization

Relational Database Schema Design

Page 86
Algorithms for Relational Database
Schema Design
Two algorithms for creating a relational decomposition from a universal
relation:

Lossless join and Lossless join and


anomaly-free dependency-preserving
decomposition into synthesis into
BCNF schemas 3NF schemas

Page 87
BCNF Decomposition
Input: a universal relation R and a set F of FDs

Let D := {R};
While (a relation Q in D is not in BCNF) {
Find one FD X à Y in Q that violates BCNF;
Replace Q in D by Q1(Q - Y) and Q2(X È Y);
};

Q1 Q2
Others X Y

Note: answer may vary depending on order you choose. That’s okay
Page 88
BCNF Decomposition Example
Given relation R(ABCD) and F = {BàC, DàA}, decompose R into a set
of relation schemas which are in BCNF. Q1 Q2
Others X Y
Find closures and keys
R(ABCD)
• {B}+ = BC, {D}+ = DA, {BD}+ = BDCA is the only key
BàC
Considering B à C, is B a superkey in R? R1(ABD) R2(BC)
• No. Decompose R to R1(ABD), R2(BC)
DàA
Considering D à A, it does not exist in R2. R3(DB) R4(DA)
Is D a superkey for R1?
• No. Decompose R1 to R3(DB), R4(DA)

Final answer: R2(BC), R3(DB), R4(DA)


Correctness of the Algorithm
For an offending FD X à Y, Q is replaced by Q1(Q - Y) and Q2(X È Y)
• X à Y no longer violates BCNF in Q1 or Q2
- For Q1, Y does not exist anymore
- For Q2: X is a key
• So it fixes the non-BCNF problem caused by X à Y in Q

It fixes the problems caused by all offending FDs in the end.

The algorithm terminates since all two attribute relations are in BCNF.
• R(a, b)
• No FD so no redundancy
• aàb so a is key, so in BCNF
• bàa so b is key, so in BCNF
• aàb and bàa, both a and b are keys, so in BCNF
Page 90
BCNF Decomposition
Input: a universal relation R and a set F of FDs

Let D := {R};
While (a relation Q in D is not in BCNF) {
Find one FD X à Y in Q that violates BCNF;
Replace Q in D by Q1(Q - Y) and Q2(X È Y);
};

Q1 Q2
Others X Y

How do we know if FD X à Y holds in Q?

Page 91
Determining Which FDs Apply
For an FD X à Y, if the decomposed relation S contains
{X È Y}, and Y Ì X+ then the FD holds for S.

For example
• Consider R(ABCDE) and
• F = {AB à C, BC à D, CD à E, DE à A, and AE à B},
• project these FDs onto S(ABCD)

Does ABàD hold? Does CDàE hold?


• First check if A, B and D are all in S? • First check if C, D and E are
If not, it does not. all in S? If not, it does not.
• Find {AB}+ under F: ABCDE • So, no CD à E does not hold
• So yes, AB à D does hold in S

Page 92
Question:
Determining Which FDs Apply
Consider relation R(ABCDE) with functional dependencies
AB à C, BC à D, CD à E, DE à A, and AE à B.
Project these FD's onto the relation S(ABCD).

Which of the following hold in S?


A. AàB
B. ABàE
C. AEàB
D. BCDàA
E. None of the above

Page 93
BCNF Decomposition Again
R(ABCDE) and F = {AB à C, D à E}
Q1 Q2
Others X Y
Find closures and keys
• {AB}+ = ABC, {D}+ = DE, the only key is ABD
R(ABCDE)
AB à C violates BCNF in R
ABàC
• R1(ABDE), R2(ABC)
R1(ABDE) R2(ABC)

D à E violates BCNF in R1
DàE
• R3(ABD), R4(DE)
R3(ABD) R4(DE)

Final answer: R2(ABC), R3(ABD), R4(DE)


Page 94
Question: BCNF Decomposition
Which of the following is a correct BCNF decomposition for R(ABCDE):
with FDs AD à B, C à DE and A à E
A. C.
R(ABCDE) R(ABCDE)
ADàB ADàB
R1(ADB) R2(ACDE) R1(ABD) R2(ACDE)
CàDE CàDE
R3(CE) R4(ACD) R3(CDE) R4(AC)
B. D.

R(ABCDE) R(ABCDE)
ADàB CàDE
R1(ABD) R2(ABCE) R1(CDE) R2(ABC)
AàE AàB
R3(AE) R4(ABC) R3(AB) R4(AC)
Page 95
BCNF Decomposition
Input: a universal relation R and a set F of FDs

Let D := {R};
While (a relation Q in D is not in BCNF) {
Find one FD X à Y in Q that violates BCNF;
Replace Q in D by Q1(Q - Y) and Q2(X È Y);
};

Q1 Q2
Others X Y

Note that implicit FDs should also be considered.

Page 96
Example: Implicit FDs matter XàY
R(ABCDEF), F = {A à B, DE à F, B à C} Others X Y
Find closures and keys R(ABCDEF)
• {A}+ = ABC, {B}+ = BC, {DE}+ = DEF, the only key is ADE
AàB
• F ⊨ A à C, so we need add A à C to F
R1(ACDEF) R2(AB)
A à B violates BCNF in R
• R1(ACDEF), R2(AB) DEàF
R3(ACDE) R4(DEF)
DE à F violates BCNF in R1
• R3(ACDE), R4(DEF) AàC
R5(ADE) R6(AC)
A à C violates BCNF in R3
• R5(ADE), R6(AC) Final answer: R2(AB), R4(DEF), R5(ADE), R6(AC)
Page 97
Question: BCNF Decomposition
For R(ABCD) with F = {A à B, C à D, AD à C, BC à A},
decompose it into BCNF.

Which of the following is a lossless-join decomposition of R into BCNF?

A. {AB, AC, BD}


B. {AB, AC, CD}
C. {AB, AC, BCD}
D. All of the above
E. None of the above

Page 98
Question: BCNF Decomposition
For R(ABCD) with XàY
F = {A à B, C à D, AD à C, BC à A}, Others X Y
decompose it into BCNF.

R(ABCD)
Find the keys
• There are 3 keys: AàB
{AD}+ = ADBC, {AC}+ = ACBD, {BC}+ = BCAD
R1(ACD) R2(AB)

A à B violates BCNF CàD


• R1(ACD), R2 (AB) R3(AC) R4(CD)

C à D violates BCNF for R1


• R3(AC), R4(CD) B: So the final answer is {AB, AC,
Page 99 CD}
Question: BCNF Decomposition
For R(ABCD) with F = {A à B, C à D, AD à C, BC à A},
decompose it into BCNF.

Which of the following is a lossless-join decomposition of R into BCNF?

A. {AB, AC, BD}


B. {AB, AC, CD}
C. {AB, AC, BCD}
D. All of the above
E. None of the above

Page 100
Exercise: Option ‘A’ Exposed
R(ABCD), F = {A à B, C à D, AD à C, BC à A}
Is {AB, AC, BD} a lossless-join?

decompose Join

A B C D A B A C B D A B C D
1 2 5 6 1 2 1 5 2 6 1 2 5 6
1 2 3 7 8 2 1 3 2 7 1 2 3 7
8 2 9 4 8 9 2 4 8 2 9 4
1 2 3 4
… … … …

Page 101
3NF Synthesis
Input: a universal relation R and a set F of FDs

S = Æ;
Compute the minimal cover G of F;
Combine all FDs in G with the same LHS into one;

For each X à Y in G {
if (no relation in S contains X È Y)
Add a relation with schema X È Y to S;
}
if (any candidate key is missing from the relations)
add a relation with all prime attributes (i.e. all candidate keys);

Eliminate redundant relations from the resulting set of relations

To decompose into 3NF we rely on the minimal cover


Page 102
Minimal Cover for a Set of FDs
Goal: Transform FDs to be as small as possible

G is a minimal cover for F, iff


• F+ = G+ (i.e., they are equivalent in terms of the FDs implied)
• RHS of each FD in G is a single attribute
• If we delete any FD in G or delete any attribute from any FD in G to
get G’, then G+ ≠ G’+

Informally: every FD in G is needed, and is “as small as possible’’ in


order to get the same closure of F

Example:
• F = {AàB, BàC, AàBC}, G = {AàB, BàC}

Page 103
Finding Minimal Covers of FDs

Step Step Step


1 2 3

RHS LHS FD
simplification simplification set simplification
Every FD Remove any redundant Delete
has only one attribute attributes from the LHS any redundant
on RHS of each FD FDs

Page 104
Minimal Cover Example: Step 1
Step 1: Step 2: Step 3:
RHS simplification LHS simplification FD set simplification
• Every FD has only • Remove any redundant • Delete any redundant
one attribute on RHS attributes from the LHS FDs
of each FD

Example
• R (ABCDEFGH)
• F = {AàB, ABCDàE, EFàG, EFàH, ACDFàEG}

Replace the last FD with


• ACDF à E
• ACDF à G

Page 105
Minimal Cover Example: Step 2
Step 1: Step 2: Step 3:
RHS simplification LHS simplification FD set simplification
• Every FD has only • Remove any redundant • Delete any redundant
one attribute on RHS attributes from the LHS FDs
of each FD

For each FD in F in the form of X à A, where X has multiple attributes


including B, if X+ = (X - B)+ in F, then B can be dropped and X à A is
replaced by (X - B) à A).

Example
• R (ABCDEFGH)
• F1 = {AàB, ABCDàE, EFàG, EFàH, ACDFàE, ACDFàG}

Can we take any attributes out from the LHS of ABCD à E?


• {ABCD}+= ABCDE
• {ACD}+= ABCDE, so remove B from the FD
Page 106
Minimal Cover Example: Step 3
Step 1: Step 2: Step 3:
RHS simplification LHS simplification FD set simplification
• Every FD has only • Remove any redundant • Delete any redundant
one attribute on RHS attributes from the LHS FDs
of each FD

Example
• R (ABCDEFGH)
• F2 = {AàB, ACDàE, EFàG, EFàH, ACDFàE, ACDFàG}

Can we drop any FD completely?


• Let’s find {ACDF}+ without considering the highlighted FDs
• {ACDF}+= ACDFEBGH, so the highlighted rules can be removed

Final answer: AàB, ACDàE, EFàG, EFàH


Final answer after using union: AàB, ACDàE, EFàGH
Page 107
Minimal Cover Example Visualised
AàB ABCD à E EF à G EF à H ACDF à EG

Step 1:
RHS simplification EF à G EF à H ACDF à E ACDF à G

Step 2: ACD à E
LHS simplification

Step 3:
Removed Removed
FD set simplification

Final Answer AàB ACD à E EF à G EF à H

Final with Union AàB ACD à E EF à GH

Page 108
Minimal Cover: Another Example
Consider the relation R(CSJDPQV) with FDs
F = {CàSJDPQV, JPàC, SDàP, JàS}, find a minimal cover of F

Step 1:
F = {CàS, CàJ, CàD, CàP, CàQ, CàV, JPàC, SDàP, JàS}

Step 2: consider JPàC, SDàP


• Let’s consider shortening JPàC
- Not possible: JP+ = CSJDPQV, J+ = JS, P+ = P
Let’s consider shortening SDàP
• Not possible: SD+ = SDP, S+ = S and D+ = D

Page 109
Minimal Cover: Another Example
Consider the relation R(CSJDPQV) with FDs
F = {CàSJDPQV, JPàC, SDàP, JàS}, find a minimal cover of F

Step 3:
F1 = {CàS, CàJ, CàD, CàP, CàQ, CàV, JPàC, SDàP, JàS},
can we remove any FDs?
• Let’s consider CàS and find C+ without considering this rule
- C+ = SJDPQV, so we can delete this FD
• Let’s consider CàP and find C+ without considering this rule
- C+ = SJDQVP, so we can delete this FD

So the final answer:


F2 = {CàJ, CàD, CàQ, CàV, JPàC, SDàP, JàS}
F2 with union = {CàJDQV, JPàC, SDàP, JàS}
Page 110
Minimal Cover Example Visualised
CàSJDPQV JPàC SDàP JàS

Step 1:
CàS CàJ CàD CàP CàQ CàV
RHS simplification

Step 2:
LHS simplification
Step 3: removed removed

FD set simplification

Final Answer CàJ CàD CàQ CàV JPàC SDàP JàS

Final with Union CàJDQV JPàC SDàP JàS

Page 111
Question: Minimal Cover
Assume that the following FDs hold for a relation R(ABCDEF):
DEFàC
ABàDC
DàF

Which of the following is a minimal cover for the relation above?


A. {DEF à C, AB à DC, D à F}
B. {DEF à C, AB à D, D à F}
C. {DE à C, AB à D, AB à C, D à F}
D. {DE à C, AB à CD}
Page 112
3NF Decomposition Revisited
Input: a universal relation R and a set F of FDs

S = Æ;
Compute the minimal cover G of F;
Combine all FDs in G with the same LHS into one;

For each X ® Y in G {
if (no relation in S contains X È Y)
Add a relation with schema X È Y to S;
}
if (any candidate key is missing from the relations)
add a relation with all prime attributes;

Eliminate redundant relations from the resulting set of relations

A relation R is considered redundant if R is a projection of another


relation S in the schema.
Page 113
3NF Example
R(ABCDE), F = {AB à C, C à D}
• Cover already minimal
• Key: ABE

Create tables based on F = {AB à C, C à D}


• R1(ABC), R2(CD)

if (any candidate key is missing from the relations)


add a relation with all prime attributes
• R3(ABE)

Remove redundant relations


• Nothing is redundant. Final answer is R1(ABC), R2(CD), R3(ABE)

Page 114
3NF Example
R(CSJDPQV), F = {SDàP, JPàC, JàS}
• Key: JDQV
• F is already minimal

Create tables based on F = {SDàP, JPàC, JàS}


• R1(SDP), R2(JPC), R3(JS)

if (any candidate key is missing from the relations)


add a relation with all prime attributes
• R4(JDQV)

Remove redundant relations


• Nothing is redundant.
• Final answer is R1(SDP), R2(JPC), R3(JS), R4(JDQV)
Page 115
Question: 3NF Synthesis
The following is a minimal cover for a relation R(ABCDEF):
AC à E
BD à A
AàB
E à CF

Which of the following is a 3NF synthesis for R?


A. R1(ACE), R2(DBA), R3(AB), R4(ECF)
B. R1(ACE), R2(ABD), R3(AB), R4(ECF), R5(ACD)
C. R1(ACE), R2(BDA), R3(ECF), R4(ACD)
D. R1(ACE), R2(BDA), R3(ECF), R4(ACD), R5(ADE)

Page 116
Top-Down vs. Bottom-Up Design
Top-Down Approach Bottom-Up Approach

• Design by Analysis • Design by Synthesis


• Start from a universal relations • Start from individual attributes and
and a set of FDs a set of FDs
• Analyze and decompose into a • Synthesize into a set of 3NF
set of BCNF relations relations
• Removes all anomalies • Does not removes all anomalies
• Does not preserve all FDs • Preserve all FDs

For both approaches, the results are non-deterministic. Missing FDs can lead
to sub-optimal or incorrect design
From UoD to Relational Schema

Universal Relation + a set of FDs Decomposition


Approach

RDB

Modeling Mapping Table 1 Table 2


UoD EER Diagram
Table n
•••

Synthesis
All attributes + a set of FDs Approach

Page 118
Denormalization
Given that good decomposition addresses anomalies, it is tempting to
decompose the relation as much as possible
• When a table is decomposed, several relations may need to be
combined to find the answers for a query

Process of intentionally violating a normal form to gain performance


improvements is called denormalization
• Fewer joins (better query processing time)
• Reduces number of foreign keys (less storage and maintenance)

Useful in data analysis or if certain frequent queries require joined


results
• Must be a controlled process

Page 119
Summary
FDs represent data semantics
• Some FDs can be specified by database designers, others can be
inferred
• FDs can be used to enforce constraints on data that should hold on all
instances of a given schema

FD concepts and normalization techniques are formal theories to


measure the “goodness” or quality of a relational schema design
• They can restructure (decompose) schemas, such that the resulting
relations possess desirable properties and are said to be in a given
normal form
• A higher NF is generally better, however other factors such as
performance and decomposition need to be considered

Page 120
Review
Do you know …
• How can we measure the quality of database design?
• What is a functional dependency (FD) constraint?
• What is a normal form (NF)?
• How do you achieve a (higher) NF?

Reading
• Chapters 14 (up to 14.6) and 15 (up to 15.5) in Elmasri & Navathe

Next Module
• Module 5: Database Security

Page 121

You might also like