4-Database Design Theory-Without Inclass Exercises (1)
4-Database Design Theory-Without Inclass Exercises (1)
and Normalization
Page 2
Learning Outcomes
• After successfully completing this module you should be able to reason with
the logical foundation of the relational data model and understand the
fundamental principles of correct relational database design.
Normalization
Page 4
Informal Design Guidelines
Informal measures of relational database schema quality
and design guidelines
Page 5
Guideline 1
Design each relation so that it is easy to explain its meaning
EMPLOYEE F.K.
Ename Ssn Bdate Address Dnumber
P.K.
EMP_DEPT
DEPARTMENT F.K.
Ename Ssn Bdate Address Dnumber Dname Dmgr_ssn
Ename Dnumber Dmgr_ssn
P.K.
Page 6
Redundant Values in Tuples
• One design goal is to minimize the storage space that base relations
occupy.
• The way in which attributes are grouped into relational schema has
a significant impact on storage space.
Page 7
A Motivating Example
Let’s consider an example of a company where an employee’s salary
directly corresponds to the level or position, they hold. For example, a
manager has a fix salary of $70,00 and a developer has a fix salary of
$60,000.
Employee
Modification Anomalies
• Updating the Salary of one developer, makes the “Developer” salary
inconsistent.
Deletion Anomalies
• By deleting ”Charlie” we no longer store the salary of “Administration” Staff.
Insertion Anomalies
• We cannot store the salary of a “Cook” if no employee has that position.
• Inserting a new row with a different Salary for a developer, makes the
“Developer” salary inconsistent.
Page 9
Guideline 2
Design the base relation schema so that no insertion, deletion, or
modification anomalies occur in the relations
• If any do occur, ensure that all applications that access the database
update the relations in such a way as to not compromise the integrity
of the database
Page 10
Guideline 3
As far as possible, avoid placing attributes in a base relation
whose values may be null
Page 11
Decomposing a Relation
• A decomposition of R replaces R by two or more relations
such that:
• Each new relation contains a subset of the attributes of R
(and no attributes not appearing in R)
• Every attribute of R appears in at least one new relation.
R1 A B
R
1 2
A B C
4 5
1 2 3 decomposition
7 8
4 5 6
R2 B C
7 8 9
2 3
5 6
8 9
Page 12
Decomposing a Relation Example
Employee
ID Name Level Salary
1 Paris Developer 60,000
2 Anna Manager 70,000
3 Ben Manager 70,000
4 Rose Driver 50,000
5 Jack Developer 60,000
6 Charlie Administration 50,000
R1 A B
1 2 A B C
4 5 R1 ⋈ R2 1 2 3
7 8 4 5 6
7 8 9
R2 B C
2 3
5 6
8 9
Page 14
Lossless Join Decomposition
Decomposition of R into R1 and R2 is a lossless join if for every
instance r that satisfies F: We will discuss F and r satisfying F later
R = R1 ⋈ R2
Informally: If we break a relation R into bits, when we put the bits back
together, we should get exactly R back again.
Page 15
Example Lossy Join Decomposition
R1 A B
R 1 2 R’
A B C 4 5 A B C
Decompose 7 2 Join 1 2 3
1 2 3
4 5 6 1 2 9
R2 B C
7 2 9 4 5 6
2 3 7 2 3
5 6 7 2 9
2 9
The word loss in lossless refers to loss of information, not loss of tuples
• Maybe a better term here is “addition of spurious information”
Page 17
Design Guidelines
Functional Dependencies
Normalization
Page 18
Functional Dependency
How can we know for sure if all employee members appointed at the
same level have the same salary?
Databases allow you to say that one attribute determines another
through a functional dependency
Assume level determines salary but not id. We say that there is a
functional dependency from level to salary (i.e., if we know an
employee's level, we can find their salary )
Employee
Page 19
Functional Dependency, Formally
A functional dependency (FD) XàY holds on relation R if for every
legal instance of R such as r, for all tuples t1, t2:
if t1[X] = t2[X] à t1[Y] = t2[Y]
• Which means given two tuples in r, if their X values agree, then their Y
values must also agree
• Example: level à salary (i.e., if two employees have the same level,
then they must have the same salary)
A B C D
1 2 3 4
2 3 4 6
6 7 8 9
1 3 4 5
Page 25
Question: Anomalies
Consider the following relation R with the given functional dependencies.
A
A B
B C
C D
D
R[A, B, C, D]: 11 44 22 55
D à AC
22 33 44 33
11 11 22 55
Page 26
Keys
A key is a minimal set of attributes that uniquely identify a relation
• i.e., a key is a minimal set of attributes that functionally determines all
the attributes in the relation
A superkey for a relation uniquely identifies the relation
Employee
Page 28
Question: Possible Superkeys
Assume the following FDs hold for relation R(A,B,C,D):
BàC
CàB
DàABC
Page 29
Explicit and Implicit FDs
Given a set of (explicit) functional dependencies, we can determine
implicit ones
• Explicit FDs: IDà level, levelà salary
• Implicit FD: IDà salary
Page 30
Closure of F
Given a set F of FDs, many implicit FDs can be derived
Trivial FDs
• Everything implies itself, such as A à A, ABC à AB
• These are called trivial FDs (i.e., an FD is trivial if the LHS contains
the RHS)
• The inference of trivial FDs does not depend on any F
Closure of F (denoted as F+): the set of all FDs that can be implied by F
• F+ includes both trivial and non-trivial FDs
Page 31
Inference Rules
It is practically impossible to specify all possible FDs that may hold
Page 32
Armstrong’s Axioms
Armstrong’s Axioms (X, Y, Z are sets of attributes):
• Reflexivity: If Y ⊆ X, then X à Y
- e.g., {ID, level} à level
• Augmentation: If X à Y, then XZ à YZ for any Z
- e.g., if ID à level then {ID, salary} à {level, salary}
• Transitivity: If X à Y and Y à Z, then X à Z
- e.g., if ID à level and level à salary then ID à salary
Page 34
Proof of Decomposition Rule
To prove IR4: X à YZ ⊨ X à Y
IR1 Reflexivity: Y ⊆ X ⊨ X à Y
IR2 Augmentation: X à Y, ⊨ XZ à YZ
IR3 Transitivity: {X à Y, Y à Z} ⊨ X à Z
Proof:
1: X à YZ: Given
2: YZ à Y: Using IR1 (Y is a subset of YZ)
3: X à Y: Using IR3 on results of 1 & 2
Page 35
Proof of Union Rule
To prove IR5: {X à Y, X à Z} ⊨ X à YZ
IR1 Reflexivity: Y ⊆ X ⊨ X à Y
IR2 Augmentation: X à Y, ⊨ XZ à YZ
IR3 Transitivity: {X à Y, Y à Z} ⊨ X à Z
Proof:
1: X à Y: Given
2: X à Z: Given
3: XX à XY: Using IR2 on 1 (adding X)
4: X à XY: Since XX=X (not using IR1-R3)
5: XY à YZ: Using IR2 on 2 (adding Y)
6: X à YZ: Using IR3 on 4 & 5
Page 36
Proof of Pseudotransitive Rule
To prove IR6: {X à Y, WY à Z} ⊨ WX à Z
IR1 Reflexivity: Y ⊆ X ⊨ X à Y
IR2 Augmentation: X à Y, ⊨ XZ à YZ
IR3 Transitivity: {X à Y, Y à Z} ⊨ X à Z
Proof
1: X à Y: Given
2: WY à Z: Given
3: WX à WY: Using IR2 on 1 (adding W)
4: WX à Z: Using IR3 on 3 & 2
Page 37
The Six Inference Rules
Page 38
F+ and X+
F+ is all FDs which can be derived from F
• The six inference rules can be used to compute F+
• Too many and too time-consuming to compute
• And not necessary
Page 39
Computing X+ (Informally)
1. X+ initially contains all attributes in X
Example
Employee (ID, level, salary), F{IDà level, levelà salary, IDà name}
1. ID+ = {ID}
2. ID+ = {ID, level} using IDà level
3. ID+ = {ID, level, salary} using levelà salary
4. ID+ = {ID, name, level, salary} using IDà name
Page 40
Computing X+
X+ := X;
repeat
old X+ := X+ ;
for each FD Y à Z in F do
if Y Í X+ then X+ = X+ È Z;
until (old X+ = X+ );
Example
Employee (ID, level, salary), F{IDà level, levelà salary, IDà name}
1. ID+ = {ID}
2. ID+ = {ID, level} using IDà level
3. ID+ = {ID, level, salary} using levelà salary
4. ID+ = {ID, name, level, salary} using IDà name
Page 41
Example X+ Computation
R (pNumber, pName, pLocation, dNum, dName, mgrSSN, mgrStartDate)
F={
pNumber à {pName, pLocation, dNum},
dNum à {dName, mgrSSN, mgrStartDate}
}
X = {pNumber}
1. X+ := {pNumber}
2. X+ := {pNumber, pName, pLocation, dNum}
3. X+ := {pNumber, pName, pLocation, dNum, dName, mgrSSN,
mgrStartDate}
Page 42
Example: Supplier-Part DB
Suppliers supply parts to projects.
• SupplierPart (sName, city, status, pNum, pName, qty)
- supplier attributes: sName, city, status
- part attributes: pNum, pName
- supplier-part attributes: qty
Functional dependencies:
• fd1: sName à city
• fd2: city à status
• fd3: pNum à pName
• fd4: sName, pNum à qty
Page 43
Example Finding Superkey
Exercise: Show that (sName, pNum) is a superkey for SupplierPart
F={
sName à city,
city à status,
pNum à pName,
{sName, pNum} à qty,
}
• {sName, pNum}+ = {sName, pNum, city, status, pName} Using pNum à pName
• {sName, pNum}+ = {sName, pNum, city, status, pName, qty} Using {sName, pNum} à qty
Page 44
A Note About Finding Keys
Given: A complete set of FDs F on relation R
Page 45
Example Finding Key
Exercise: Show that (sName, pNum) is a key for SupplierPart
F={
sName à city,
city à status,
pNum à pName,
{sName, pNum} à qty,
}
Page 46
Question: Finding Keys
Assume that the following FDs hold for a relation R(ABCDEF):
BàCF
CàE
EFàD
Page 47
Question: Finding Key
Which of the following is a key of relation R(ABCDE) with
F = {D à C,
CE à A,
D à A,
AE à D}
A. ABDE
B. BCE
C. CDE
D. All of these are keys
E. None of these are keys
Page 48
Question: Finding Keys
Assume that the following FDs hold for a relation R(ABCDEF):
ABàE
CàBE
EDàF
Page 49
Approaching Normality
Role of FDs in detecting redundancy:
• If there is no FD (beyond keys), there would be no anomalies.
• Consider a relation R with 3 attributes, A, B, C
- No FDs hold: There is no redundancy here
- Given A à B: Several tuples could have the same A value, and if so,
they’ll all have the same B value!
Page 50
Design Guidelines
Functional Dependencies
Normalization
Relational Database Schema Design
Page 51
Normalization
Normalization is a process that aims at achieving better designed
relational database schemas using
• Functional Dependencies
• Primary Keys
Page 52
“Key” Concepts
Superkey: a set of attributes such that no two tuples have the same values
for these attributes
• That is, its closure contains all attributes in the relation
(Candidate) key: a minimal superkey
• Minimal ≠ shortest
• There can be many candidate keys for one relation
• Any superkey must contain at least one candidate key
Primary key: one of the selected candidate keys
• There can be only one primary key for a relation
Prime attribute: an attribute in any candidate key
• Not necessarily a member of the primary key!
Non-prime attribute: An attribute that is not a member of any candidate key
Page 53
Normalization – Overview
1NF 3NF
• Outcome: Removal of non- • Outcome: Removal of partial
atomic values from relations. and transitive dependencies,
• Test: Relation should have no which remove most anomalies
multivalued attributes or • Test: LHS of any non-trivial FD
nested relations. in F+ is a superkey or RHS is a
2NF prime attribute.
• Outcome: Removal of partial BCNF
dependencies, which remove • Outcome: Removal of all
some anomalies. anomalies at the cost of not
• Test: LHS of any non-trivial FD preserving all FDs
in F+ is not a proper subset of a • Test: LHS of any non-trivial FD
candidate key or RHS is a in F+ is a superkey.
prime attribute
Normalization – Overview
Employee
ID Name Level Salary
1 Paris Developer 60,000
2 Anna Manager 70,000
3 Ben Manager 70,000
4 Rose Driver 50,000
5 Jack Developer 60,000
6 Charlie Administration 50,000
Level à Salary
Page 55
Normalization – Overview
1NF 3NF
• Outcome: Removal of non- • Outcome: Removal of partial
atomic values from relations. and transitive dependencies,
• Test: Relation should have no which remove most anomalies
multivalued attributes or • Test: LHS of any non-trivial FD
nested relations. in F+ is a superkey or RHS is a
2NF prime attribute.
• Outcome: Removal of partial BCNF
dependencies, which remove • Outcome: Removal of all
some anomalies. anomalies at the cost of not
• Test: LHS of any non-trivial FD preserving all FDs
in F+ is not a proper subset of a • Test: LHS of any non-trivial FD
candidate key or RHS is a in F+ is a superkey.
prime attribute
First Normal Form (1NF)
A relation schema is in 1NF if the domains of the attributes include
only atomic (simple, indivisible) values
Page 57
Examples of Non-1NF Relations
name
orderNum item
firstName familyName
Tom Jones 123 Hat
Sri Gupta 876 Glass
Sri Gupta 876 Pencil
Page 58
Normalized to 1NF
Problems:
• Above: redundancy
• Below: lack of flexibility
Page 59
Normalization – Overview
1NF 3NF
• Outcome: Removal of non- • Outcome: Removal of partial
atomic values from relations. and transitive dependencies,
• Test: Relation should have no which remove most anomalies
multivalued attributes or • Test: LHS of any non-trivial FD
nested relations. in F+ is a superkey or RHS is a
2NF prime attribute.
• Outcome: Removal of partial BCNF
dependencies, which remove • Outcome: Removal of all
some anomalies. anomalies at the cost of not
• Test: LHS of any non-trivial FD preserving all FDs
in F+ is not a proper subset of a • Test: LHS of any non-trivial FD
candidate key or RHS is a in F+ is a superkey.
prime attribute
Full and Partial Functional Dependencies
A functional dependency X à Y is a full functional dependency if removal of
any attribute A from X means that the dependency does not hold anymore.
• A Î X, (X - {A}) does not functionally determine Y
Example
Address [houseNum, street, postcode, state, value]
F = {{houseNum, street, postcode} à {state, value}, postcode à state}
• {houseNum, street, postcode} à value full functional dependency
- Since no subset of {houseNum, street, postcode} determines value
Example
Address [houseNum, street, postcode, state, value]
F = {{houseNum, street, postcode} à {state, value}, postcode à state}
• Key = {houseNum, street, postcode}
• State is not a prime attribute
• State is partially dependent on the primary key (because of postcode
à state)
FD postcode à state violates 2NF; therefore, Address is not in 2NF
Example
• Address [houseNum, street, postcode, state, value]
• F = {{houseNum, street, postcode} à {state, value}, postcode à state}
Normalizing Address
• Address [houseNum, street, postcode, value]
• Postcodes [postcode, state]
Address.postcode à Postcodes.postcode
Page 64
Transitive Dependency
A functional dependency X à Y in a relation R is a transitive
dependency if there exists a set of attributes Z in R such that:
• Z is neither a candidate key nor a subset of any key of R.
• Both X à Z and Z à Y
Example
• Employee [ID, name, level, salary]
• level à salary
Modification Anomalies
• Updating the Salary of one developer, makes the “Developer” salary inconsistent.
Deletion Anomalies
• By deleting ”Charlie” we no longer store the salary of “Administration” Staff.
Insertion Anomalies
• We cannot store the salary of a “Cook” if no employee has that position.
• Inserting a new row with a different Salary for a developer, makes the “Developer”
salary inconsistent.
Page 66
Normalization – Overview
1NF 3NF
• Outcome: Removal of non- • Outcome: Removal of partial
atomic values from relations. and transitive dependencies,
• Test: Relation should have no which remove most anomalies
multivalued attributes or • Test: LHS of any non-trivial FD
nested relations. in F+ is a superkey or RHS is a
2NF prime attribute.
• Outcome: Removal of partial BCNF
dependencies, which remove • Outcome: Removal of all
some anomalies. anomalies at the cost of not
• Test: LHS of any non-trivial FD preserving all FDs
in F+ is not a proper subset of a • Test: LHS of any non-trivial FD
candidate key or RHS is a in F+ is a superkey.
prime attribute
Third Normal Form (3NF)
A relation schema R with the set F of FDs is in 3NF iff
• For all subset of attributes X Ì R
• and, for all attributes A Î R
• such that for any non-trivial FD: X à A in F+
Then
1. X is a superkey for R, or
2. A is a prime attribute
Example
• Employee [ID, name, level, salary]
- level à salary (transitive dependency)
• level is not a superkey and salary is not a prime attribute
• level à salary violates 3NF; therefore Employee is not in 3NF
Page 68
Example Normalized to 3NF
Example
• Employee [ID, name, level, salary]
- level à salary (transitive dependency)
Normalizing Employee
• StaffAppointment [ID, name, level]
• StaffIncome [level, salary]
StaffAppointment.level à StaffIncome.level
Most 3NF tables are free of anomalies; however, some 3NF tables,
rarely met with in practice, are still affected by anomalies.
Page 69
Question: partial dependency vs. transitive dependency
What is the difference between partial dependency and transitive dependency?
A. Partial dependency is where an attribute only depends on a subpart of the primary key to be
identified. Normalizing to third normal form solves this. Transitive dependency is where a
non-prime attribute depends on other non-prime attributes to be identified. Normalizing to
second normal form solves this.
B. Partial dependency is where an attribute only depends on a subpart of the primary key to be
identified. Normalizing to second normal form solves this. Transitive dependency is where a
non-prime attribute depends on other non-prime attributes to be identified. Normalizing to
third normal form solves this.
C. Partial dependency is where an attribute only depends on a subpart of the primary key to be
identified. Normalizing to second normal form solves this. Transitive dependency is where a
non-prime attribute depends on other non-prime attributes to be identified. Normalizing to
third normal form solves this.
D. Partial dependency is where a non-prime attribute depends on other non-prime attributes to
be identified. Normalizing to third normal form solves this. Transitive dependency is where an
attribute only depends on a subpart of the primary key to be identified. Normalizing to second
normal form solves this.
Page 70
Problems with 3NF Relations
Consider the following example: Teach
• Teach [studentID, courseID, lecturer] studentID courseID lecturer
• {studentID, courseID} à lecturer 1234 INFS1200 Jane
• lecturer à courseID
- Keys: {studentID, courseID}, {studentID, lecturer}
- Teach is in 3NF
Existing anomalies
• Deletion anomaly
- E.g., Deleting studentID 1234 would lead to the unwanted deletion of the
lecturer of INFS1200
• Insertion anomaly
- E.g., cannot store the lecturer of a course that doesn’t have students.
Page 71
Normalization – Overview
1NF 3NF
• Outcome: Removal of non- • Outcome: Removal of partial
atomic values from relations. and transitive dependencies,
• Test: Relation should have no which remove most anomalies
multivalued attributes or • Test: LHS of any non-trivial FD
nested relations. in F+ is a superkey or RHS is a
2NF prime attribute.
• Outcome: Removal of partial BCNF
dependencies, which remove • Outcome: Removal of all
some anomalies. anomalies at the cost of not
• Test: LHS of any non-trivial FD preserving all FDs
in F+ is not a proper subset of a • Test: LHS of any non-trivial FD
candidate key or RHS is a in F+ is a superkey.
prime attribute
Boyce-Codd Normal Form (BCNF)
A relation schema R with the set of F of FDs is in BCNF iff
• For all subset of attributes X Ì R
• and, for all attributes A Î R
• such that for any non-trivial FD: X à A in F+
Then
1. X is a superkey for R
Example
• Teach [studentID, courseID, lecturer]
• {studentID, courseID} à lecturer
• lecturer à courseID
- Keys: {studentID, courseID}, {studentID, lecturer}
- Lecturer is not a superkey
- lecturer à courseID violates BCNF; therefore, Teach is not in BCNF
Page 73
BCNF Example
For the given relation schema R with the set of FDs F, determine
whether or not R is in BCNF
Page 74
BCNF is Great, But…
• Guaranteed that there will be no redundancy of data
• Easy to understand (just look for superkeys)
• Easy to do
Page 75
Dependency Preservation
Given a relation R and a set F of FD, when R is decomposed into R1, R2,
… Rn, F is decomposed into F1, F2, … Fn
• Fi contains all FDs in F with the attributes completely in Ri
R is dependency preserving If (F1 ∪ F2 ∪ … ∪ Fn)+ = F+
Page 76
Problems with BCNF Relations
studentID courseID lecturer {studentID, courseID} à lecturer
lecturer à courseID
• This relation in not in BCNF
Is lecturer a key?
lecturer à courseID
ACD à B ; AC à D ; D à C ; AC à B
Page 84
Normalization
The normalization process takes a relational schema through a
series of tests to certify whether it satisfies certain conditions
• The schemas that satisfy certain conditions are said to be in
a given Normal Form Unsatisfactory schemas are decomposed by
breaking up their attributes into smaller relations that possess
desirable properties
Functional Dependencies
Normalization
Page 86
Algorithms for Relational Database
Schema Design
Two algorithms for creating a relational decomposition from a universal
relation:
Page 87
BCNF Decomposition
Input: a universal relation R and a set F of FDs
Let D := {R};
While (a relation Q in D is not in BCNF) {
Find one FD X à Y in Q that violates BCNF;
Replace Q in D by Q1(Q - Y) and Q2(X È Y);
};
Q1 Q2
Others X Y
Note: answer may vary depending on order you choose. That’s okay
Page 88
BCNF Decomposition Example
Given relation R(ABCD) and F = {BàC, DàA}, decompose R into a set
of relation schemas which are in BCNF. Q1 Q2
Others X Y
Find closures and keys
R(ABCD)
• {B}+ = BC, {D}+ = DA, {BD}+ = BDCA is the only key
BàC
Considering B à C, is B a superkey in R? R1(ABD) R2(BC)
• No. Decompose R to R1(ABD), R2(BC)
DàA
Considering D à A, it does not exist in R2. R3(DB) R4(DA)
Is D a superkey for R1?
• No. Decompose R1 to R3(DB), R4(DA)
The algorithm terminates since all two attribute relations are in BCNF.
• R(a, b)
• No FD so no redundancy
• aàb so a is key, so in BCNF
• bàa so b is key, so in BCNF
• aàb and bàa, both a and b are keys, so in BCNF
Page 90
BCNF Decomposition
Input: a universal relation R and a set F of FDs
Let D := {R};
While (a relation Q in D is not in BCNF) {
Find one FD X à Y in Q that violates BCNF;
Replace Q in D by Q1(Q - Y) and Q2(X È Y);
};
Q1 Q2
Others X Y
Page 91
Determining Which FDs Apply
For an FD X à Y, if the decomposed relation S contains
{X È Y}, and Y Ì X+ then the FD holds for S.
For example
• Consider R(ABCDE) and
• F = {AB à C, BC à D, CD à E, DE à A, and AE à B},
• project these FDs onto S(ABCD)
Page 92
Question:
Determining Which FDs Apply
Consider relation R(ABCDE) with functional dependencies
AB à C, BC à D, CD à E, DE à A, and AE à B.
Project these FD's onto the relation S(ABCD).
Page 93
BCNF Decomposition Again
R(ABCDE) and F = {AB à C, D à E}
Q1 Q2
Others X Y
Find closures and keys
• {AB}+ = ABC, {D}+ = DE, the only key is ABD
R(ABCDE)
AB à C violates BCNF in R
ABàC
• R1(ABDE), R2(ABC)
R1(ABDE) R2(ABC)
D à E violates BCNF in R1
DàE
• R3(ABD), R4(DE)
R3(ABD) R4(DE)
R(ABCDE) R(ABCDE)
ADàB CàDE
R1(ABD) R2(ABCE) R1(CDE) R2(ABC)
AàE AàB
R3(AE) R4(ABC) R3(AB) R4(AC)
Page 95
BCNF Decomposition
Input: a universal relation R and a set F of FDs
Let D := {R};
While (a relation Q in D is not in BCNF) {
Find one FD X à Y in Q that violates BCNF;
Replace Q in D by Q1(Q - Y) and Q2(X È Y);
};
Q1 Q2
Others X Y
Page 96
Example: Implicit FDs matter XàY
R(ABCDEF), F = {A à B, DE à F, B à C} Others X Y
Find closures and keys R(ABCDEF)
• {A}+ = ABC, {B}+ = BC, {DE}+ = DEF, the only key is ADE
AàB
• F ⊨ A à C, so we need add A à C to F
R1(ACDEF) R2(AB)
A à B violates BCNF in R
• R1(ACDEF), R2(AB) DEàF
R3(ACDE) R4(DEF)
DE à F violates BCNF in R1
• R3(ACDE), R4(DEF) AàC
R5(ADE) R6(AC)
A à C violates BCNF in R3
• R5(ADE), R6(AC) Final answer: R2(AB), R4(DEF), R5(ADE), R6(AC)
Page 97
Question: BCNF Decomposition
For R(ABCD) with F = {A à B, C à D, AD à C, BC à A},
decompose it into BCNF.
Page 98
Question: BCNF Decomposition
For R(ABCD) with XàY
F = {A à B, C à D, AD à C, BC à A}, Others X Y
decompose it into BCNF.
R(ABCD)
Find the keys
• There are 3 keys: AàB
{AD}+ = ADBC, {AC}+ = ACBD, {BC}+ = BCAD
R1(ACD) R2(AB)
Page 100
Exercise: Option ‘A’ Exposed
R(ABCD), F = {A à B, C à D, AD à C, BC à A}
Is {AB, AC, BD} a lossless-join?
decompose Join
A B C D A B A C B D A B C D
1 2 5 6 1 2 1 5 2 6 1 2 5 6
1 2 3 7 8 2 1 3 2 7 1 2 3 7
8 2 9 4 8 9 2 4 8 2 9 4
1 2 3 4
… … … …
Page 101
3NF Synthesis
Input: a universal relation R and a set F of FDs
S = Æ;
Compute the minimal cover G of F;
Combine all FDs in G with the same LHS into one;
For each X à Y in G {
if (no relation in S contains X È Y)
Add a relation with schema X È Y to S;
}
if (any candidate key is missing from the relations)
add a relation with all prime attributes (i.e. all candidate keys);
Example:
• F = {AàB, BàC, AàBC}, G = {AàB, BàC}
Page 103
Finding Minimal Covers of FDs
RHS LHS FD
simplification simplification set simplification
Every FD Remove any redundant Delete
has only one attribute attributes from the LHS any redundant
on RHS of each FD FDs
Page 104
Minimal Cover Example: Step 1
Step 1: Step 2: Step 3:
RHS simplification LHS simplification FD set simplification
• Every FD has only • Remove any redundant • Delete any redundant
one attribute on RHS attributes from the LHS FDs
of each FD
Example
• R (ABCDEFGH)
• F = {AàB, ABCDàE, EFàG, EFàH, ACDFàEG}
Page 105
Minimal Cover Example: Step 2
Step 1: Step 2: Step 3:
RHS simplification LHS simplification FD set simplification
• Every FD has only • Remove any redundant • Delete any redundant
one attribute on RHS attributes from the LHS FDs
of each FD
Example
• R (ABCDEFGH)
• F1 = {AàB, ABCDàE, EFàG, EFàH, ACDFàE, ACDFàG}
Example
• R (ABCDEFGH)
• F2 = {AàB, ACDàE, EFàG, EFàH, ACDFàE, ACDFàG}
Step 1:
RHS simplification EF à G EF à H ACDF à E ACDF à G
Step 2: ACD à E
LHS simplification
Step 3:
Removed Removed
FD set simplification
Page 108
Minimal Cover: Another Example
Consider the relation R(CSJDPQV) with FDs
F = {CàSJDPQV, JPàC, SDàP, JàS}, find a minimal cover of F
Step 1:
F = {CàS, CàJ, CàD, CàP, CàQ, CàV, JPàC, SDàP, JàS}
Page 109
Minimal Cover: Another Example
Consider the relation R(CSJDPQV) with FDs
F = {CàSJDPQV, JPàC, SDàP, JàS}, find a minimal cover of F
Step 3:
F1 = {CàS, CàJ, CàD, CàP, CàQ, CàV, JPàC, SDàP, JàS},
can we remove any FDs?
• Let’s consider CàS and find C+ without considering this rule
- C+ = SJDPQV, so we can delete this FD
• Let’s consider CàP and find C+ without considering this rule
- C+ = SJDQVP, so we can delete this FD
Step 1:
CàS CàJ CàD CàP CàQ CàV
RHS simplification
Step 2:
LHS simplification
Step 3: removed removed
FD set simplification
Page 111
Question: Minimal Cover
Assume that the following FDs hold for a relation R(ABCDEF):
DEFàC
ABàDC
DàF
S = Æ;
Compute the minimal cover G of F;
Combine all FDs in G with the same LHS into one;
For each X ® Y in G {
if (no relation in S contains X È Y)
Add a relation with schema X È Y to S;
}
if (any candidate key is missing from the relations)
add a relation with all prime attributes;
Page 114
3NF Example
R(CSJDPQV), F = {SDàP, JPàC, JàS}
• Key: JDQV
• F is already minimal
Page 116
Top-Down vs. Bottom-Up Design
Top-Down Approach Bottom-Up Approach
For both approaches, the results are non-deterministic. Missing FDs can lead
to sub-optimal or incorrect design
From UoD to Relational Schema
RDB
Synthesis
All attributes + a set of FDs Approach
Page 118
Denormalization
Given that good decomposition addresses anomalies, it is tempting to
decompose the relation as much as possible
• When a table is decomposed, several relations may need to be
combined to find the answers for a query
Page 119
Summary
FDs represent data semantics
• Some FDs can be specified by database designers, others can be
inferred
• FDs can be used to enforce constraints on data that should hold on all
instances of a given schema
Page 120
Review
Do you know …
• How can we measure the quality of database design?
• What is a functional dependency (FD) constraint?
• What is a normal form (NF)?
• How do you achieve a (higher) NF?
Reading
• Chapters 14 (up to 14.6) and 15 (up to 15.5) in Elmasri & Navathe
Next Module
• Module 5: Database Security
Page 121