Chapter 5 - B_DBDesign_II
Chapter 5 - B_DBDesign_II
Database
Normalization
COMP3278 Introduction to
Database Management Systems
Normal form
Boyce-Codd Normal Form (BCNF)
3
Motivating example
Let’s consider the following specifications
Employees have eid (key), name, parkingLot.
Departments have did (key), dname, budget.
An employee works in exactly one department, since some date.
Employees who work in the same department must park at
the same parkingLot.
name dname
since budget
eid parkingLot did
Lossless-join
Decomposition
Slides prepared by - Dr. Chui Chun Kit, http://www.cs.hku.hk/~ckchui/ for students in COMP3278
For other uses, please email : [email protected]
Illustration 1
R Functional dependencies
A B C F = {B →C}
1 1 3
1
2
2
1
2
3
The functional dependency B→C tells us
3 2 2 that for all tuples with the same value in B,
3 1 3
4 2 2 there should be at most one corresponding
4 1 3
value in C (E.g., If B=1, C =3 ; if B=2, C=2)
Decompose Question: Will decomposing R(A,B,C) into
R1(A,B) and R2(A,C) cause information lost?
R1 = A, B(R) R2 = A, C(R)
A B A C
1 1 1 3 Think in this way:
1 2 1 2 Is this decomposition “lossless join
2 1 2 3 decomposition”?
3 2 3 2 I.e., Is there any information lost if
3 1 3 3 we decompose R in this way?
4 2 4 2
4 1 4 3 9
Illustration 1
R Functional dependencies R1 ⋈ R2= A, B(R) ⋈ A, C(R)
A B C F = {B →C} A B C
1 1 3 1 1 3
≠
1 2 2
To check if the
1 1 2
2 1 3 1 2 3 decomposition will cause
3 2 2 1 2 2 information lost, let’s try to
3 1 3 2 1 3
4 2 2 3 2 2 join R1 and R2 and see if we
4 1 3 3 2 3 can recover R.
3 1 2 As we see that R1 ⋈ R2 ≠ R,
Decompose 3 1 3
4 2 2 the decomposition has
4 2 3 information lost.
R1 = A, B(R) R2 = A, C(R) 4 1 2
This is NOT a lossless-join
4 1 3
A B A C decomposition.
1 1 1 3
1 2 1 2
2 1 2 3
3 2 3 2
3 1
This is a bad
3 3
4 2 4 2 decomposition
4 1 4 3
10
Illustration 2
R Functional dependencies R1 ⋈ R2 = A, B(R) ⋈ B, C(R)
A B C F = {B →C} A B C How about
1 1 3 1 1 3
decomposing the
=
1 2 2 1 2 2
2 1 3
2
3
1
2
3
2 3 2 2 relation R(A,B,C)
3 1 3 3
4
1
2
3
2
into R1(A,B) and
4 2 2
4 1 3 4 1 3 R2(B,C)?
Decompose
A B C F = {B →C}
1 1 3
1 2 2
2
3
1
2
3
2
What is/are the condition(s)
3 1 3
4
4
2
1
2
3
for a decomposition to be
lossless-join?
NOT Lossless-join decomposition Lossless-join decomposition
A B C F = {B →C}
1 1 3
1
2
2
1
2
3
1
3 2 2
A B
1 1
3 1 3
4 2 2
Let’s consider the first
4 1 3
tuple (1,1,3) in R.
A B C F = {B →C}
1 1 3
1
2
2
1
2
3
1 2 A C
3 2 2
A B
1 1 1 3
3 1 3 1 2
4 2 2
Let’s consider the first
4 1 3 Since A →AC is NOT a
tuple (1,1,3) in R.
functional dependency
in F+, there can be more
Note that there is only
than one tuples with
ONE tuple in R1 with
NOT Lossless-join A=1, B=1.
A=1 in R2
(e.g., (1,3), (1,2) ) .
decomposition
R1 = A, B(R) R2 = A, C(R)
A B A C
1 1 1 3
1 2 1 2
2 1 2 3
3 2 3 2
3 1 3 3
4 2 4 2 14
4 1 4 3
Lossless-join decomposition
R Functional dependencies
A B C F = {B →C}
1 1 3
1
2
2
1
2
3
1 2 A C
3 A B C
A B
3
3
2
1
2
3
1 1 ⋈ 1
1
3
2
= 1 1 3
1 1 2
4 2 2
Let’s consider the first
4 1 3 Since A →AC is NOT a
tuple (1,1,3) in R. Therefore when we join
functional dependency
in F+, there can be more R1 and R2, more than one
Note that there is only tuples will be generated
than one tuples with
ONE tuple in R1 with (i.e., (1,1) in R1 combine
NOT Lossless-join A=1, B=1.
A=1 in R2
with (1,3) and (1,2) in R2 )
(e.g., (1,3), (1,2) ) .
decomposition
R1 = A, B(R) R2 = A, C(R)
A B A C Observation:
1 1 1 3 The decomposition of R(A,B,C) into R1(A,B) and R2(A,C)
1 2 1 2
2 1 2 3
is NOT lossless-join because
3 2 3 2 A→ AC
3 1 3 3
4 2 4 2
is NOT in F+ , and … (to be explained in the next slide)
4 1 4 3 15
Lossless-join decomposition
R Functional dependencies
A B C F = {B →C}
1 1 3
1
2
2
1
2
3
1
3 2 2
A C
1 3
3 1 3
4 2 2
Let’s consider the
4 1 3
first tuple (1,1,3) in R.
A B C F = {B →C}
1 1 3
1
2
2
1
2
3
1 2 A B
3 2 2
A C
1 1
1 3
3 1 3 1 2
4 2 2
Let’s consider the Since A →AB is NOT a
4 1 3
first tuple (1,1,3) in R. functional dependency
in F+, there can be
Note that there is more than one tuples
only ONE tuple in R2 with A=1 in R1
NOT Lossless-join with A=1, C=3. (i.e., (1,1), (1,2) ) .
decomposition
R1 = A, B(R) R2 = A, C(R)
A B A C
1 1 1 3
1 2 1 2
2 1 2 3
3 2 3 2
3 1 3 3
4 2 4 2 17
4 1 4 3
Lossless-join decomposition
R Functional dependencies
A B C F = {B →C}
1 1 3
1
2
2
1
2
3
1 2 A B 3 A B C
A C
3
3
2
1
2
3
1 3 ⋈ 1
1
1
2
= 1
1
1
2
3
3
4 2 2
Let’s consider the Since A →AB is NOT a
4 1 3 Therefore when we join
first tuple (1,1,3) in R. functional dependency
in F+, there can be R1 and R2, more than one
Note that there is more than one tuples tuples will be generated
only ONE tuple in R2 with A=1 in R1 (i.e., (1,3) in R2 combine
NOT Lossless-join with A=1, C=3. (i.e., (1,1), (1,2) ) . with (1,1) and (1,2) in R1 )
decomposition
R1 = A, B(R) R2 = A, C(R)
A B A C Observation:
1 1 1 3 The decomposition of R(A,B,C) into R1(A,B) and R2(A,C)
1 2 1 2
2 1 2 3
is NOT lossless-join because
3 2 3 2 A→ AC (explained in previous slide), and
3 1 3 3 A→ AB
4 2 4 2
4 1 4 3 are NOT in F+ . 18
Lossless-join decomposition
R Functional dependencies
A B C F = {B →C}
1 1 3
1
2
2
1
2
3
1
3 2 2
A B
1 1
3 1 3
4 2 2
4 1 3 Let’s consider the
first tuple (1,1,3) in R.
Note that there is
only ONE tuple in R1
Lossless-join with A=1, B=1.
decomposition
R1 = A, B(R) R2 = B, C(R)
A B B C
1 1 1 3
1 2 2 2
2 1
3 2
3 1
4 2 19
4 1
Lossless-join decomposition
R Functional dependencies
A B C F = {B →C}
1 1 3
1
2
2
1
2
3
1 2
A B B C
3 2 2
1 1 1 3
3 1 3
4 2 2
4 1 3 Let’s consider the Since B →BC is a
first tuple (1,1,3) in R. functional dependency
Note that there is in F+, there is only one
only ONE tuple in R1 tuple with B=1 in R2.
Lossless-join with A=1, B=1.
decomposition
R1 = A, B(R) R2 = B, C(R)
A B B C
1 1 1 3
1 2 2 2
2 1
3 2
3 1
4 2 20
4 1
Lossless-join decomposition
R Functional dependencies
A B C F = {B →C}
1 1 3
1
2
2
1
2
3
1 2 3
A B B C A B C
3
3
2
1
2
3
1 1 ⋈ 1 3 = 1 1 3
4 2 2
4 1 3 Let’s consider the Since B →BC is a Therefore when we join R1
first tuple (1,1,3) in R. functional dependency and R2, there will be ONLY
Note that there is in F+, there is only one ONE tuple generated, and
only ONE tuple in R1 tuple with B=1 in R2. that must be the
Lossless-join with A=1, B=1. corresponding tuple (1,1,3)
decomposition in R.
Dependency preserving
Decomposition
Slides prepared by - Dr. Chui Chun Kit, http://www.cs.hku.hk/~ckchui/ for students in COMP3278
For other uses, please email : [email protected]
Dependency preserving
When decomposing a relation, we also want to keep
the functional dependencies.
A FD X → Y is preserved in a relation R if R contains all the
attributes of X and Y.
If a dependency is lost when R is decomposed into R1
and R2:
When we insert a new record in R1 and R2, we have to
obtain R1⋈ R2 and check if the new record violates the lost
dependency before insertion.
It could be very inefficient because joining is required in
every insertion! 25
Dependency preserving
Note that A→CD is in F+ because R
of the Transitivity axiom.
A B C D
Consider R(A,B,C,D), F = {A → B, B →CD} 1
2
1
1
3
3
4
4
F+ = {A → B, B →CD, A →CD, trivial FDs} 3
4
2
1
2
3
3
4
B C D 5 1
Inserting 1 4 4 violate any F2 in R2? Although among the two
validations we haven’t checked
This involves checking F2={B →CD}. A→CD, but since A→B is
checked in F1, and B →CD is
We can check F1 on R1 and F2 on R2 only because checked in F2, if we pass both F1
(F1 F2)+ = F+ and F2, it implies A →CD. 28
Dependency preserving
R
A B C D
What about decompose R to R1(A,B), 1
2
1
1
3
3
4
4
R2(A,C,D) ? 3
4
2
1
2
3
3
4
R is decomposed to R1(A,B) , R2(A,C,D)
Decompose
F+ = {A → B, B →CD, A →CD, trivial FDs} R1 = A, B(R) R2=A, C, D(R)
F1 = {A → B, trivials}, the projection of F+ on R1 A B A C D
1 1 1 3 4
F2 = {A → CD , trivials}, the projection of F+ on R2 2 1 2 3 4
3 2 3 2 3
4 1 4 3 4
This is NOT a dependency preserving
decomposition as:
(F1 F2)+ ≠ F+
Let us illustrate the implication of NOT
dependency preserving in the next slide. 29
Dependency preserving
R
A B C D
What about decompose R to R1(A,B), 1
2
1
1
3
3
4
4
R2(A,C,D) ? 3
4
2
1
2
3
3
4
Boyce-Codd
Normal Form
Slides prepared by - Dr. Chui Chun Kit, http://www.cs.hku.hk/~ckchui/ for students in COMP3278
For other uses, please email : [email protected]
FD and redundancy
Consider the following relation: Customer
id name dptID
Customer( id, name, dptID ) 1 Kit 1
2 David 1
F = { {id} → {name, dptID} } 3 Betty 2
4 Helen 2
How to check?
Check if the attribute set closure of {dptID} covers all
attributes in Customer. ({dptID}+ = {dptID, building} ≠ Customer)
R(A, B, C, D), F = {A → B, B → C} 1
1
1
1
1
1
1
2
1 1 1 3
1 1 1 4
To test if R is in BCNF, it is suffices to check 1 1 1 5
only the dependencies in F (but not F+) An example R that satisfies F
Illustration R
A B C D
R(A, B, C, D), F = {A → B, B → C} 1
1
1
1
1
1
1
2
1 1 1 3
Is R2(A, C, D) in BCNF? 1 1 1 4
1 1 1 5
R1(A, B) R2(A, C, D)
Conclusion: When we test whether a A B A C D
decomposed relation is in BCNF, we must 1 1 1 1 1
1 1 2
project F+ onto the relation (e.g., R2), not F! 1 1 3
1 1 4
1 1 5
43
Section 4
Normalization
Slides prepared by - Dr. Chui Chun Kit, http://www.cs.hku.hk/~ckchui/ for students in COMP3278
For other uses, please email : [email protected]
Normalization goal
When we decompose a relation R with a set of
functional dependencies F into R1, R2, …, Rn, we try
to meet the following goals:
1. Lossless-join – Avoid the decomposition result in
information loss.
Fx B→C A→B
51
Example 1 R1(B, C) R2(A, B)
52
Example 1 R1(B, C) R2(A, B)
Fx B→C A→B
57
Example 2
END
COMP3278 Introduction to
Database Management Systems