0% found this document useful (0 votes)
3 views

Chapter 5 - B_DBDesign_II

asdad

Uploaded by

YouTubeATP
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Chapter 5 - B_DBDesign_II

asdad

Uploaded by

YouTubeATP
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 63

Chapter 5B.

Database
Normalization
COMP3278 Introduction to
Database Management Systems

Department of Computer Science, The University of Hong Kong


Slides prepared by - Dr. Chui Chun Kit, http://www.cs.hku.hk/~ckchui/ for students in COMP3278
For other uses, please email : [email protected]
In this chapter…
Outcome 1. Information Modeling
Able to understand the modeling of real life information in a database
system.

Outcome 2. Query Languages


Able to understand and use the languages designed for data access.

Outcome 3. System Design


Able to understand the design of an efficient and reliable database
system.

Outcome 4. Application Development


Able to implement a practical application on a real database.
2
Content
Decomposition
Lossless-join decomposition
Dependency preserving decomposition

Normal form
Boyce-Codd Normal Form (BCNF)

3
Motivating example
Let’s consider the following specifications
Employees have eid (key), name, parkingLot.
Departments have did (key), dname, budget.
An employee works in exactly one department, since some date.
Employees who work in the same department must park at
the same parkingLot.

name dname
since budget
eid parkingLot did

Employees Works_in Departments


4
Motivating example
Reduce to relational tables
Employees( eid, name, parkingLot, did, since)
Foreign key: did references Departments(did)
Departments( did, dname, budget)
Observation: In Employees table, whenever did is 1, parkingLot must be “A”!
Implication: The constraint “Employees who work in the same department
must park at the same parkingLot” is NOT utilized in the design!!!
There are some redundancy in the Employees table.

eid name parkingLot did since did dname budget


1 Kit A 1 1/9/2014 1 Human Resource 4M
2 Ben B 2 2/4/2010 2 Accounting 3.5M
3 Ernest B 2 30/5/2011
4 Betty A 1 22/3/2013 Yes! As parkingLot is
5 David A 1 4/11/2004 “functionally depend” on did, we
6 Joe B 2 12/3/2008
7 Mary B 2 14/7/2009
should not put parkingLot in the
8 Wandy A 1 9/8/2008 Employee table. 5
We are going to learn
Database normalization
The process of organizing the columns and tables of
a relational database to minimize redundancy and
dependency.
To make sure that every relation R is in a “good” form.
If R is not “good”, decompose it into a set of relations {R1,
R2, …, Rn}.
Question: How can we do Yes! The theories
the decomposition? can be explained
Are there any guidelines / through functional
theories developed to dependencies ☺.
decompose a relation? 6
6
Normalization goal
We would like to meet the following goals when we
decompose a relation schema R with a set of
functional dependencies F into R1, R2, …, Rn
1. Lossless-join – Avoid the decomposition result in
information loss.

2. Reduce redundancy – The decomposed relations Ri should


be in Boyce-Codd Normal Form (BCNF). (There are also other
normal forms like 3NF.)

3. Dependency preserving – Avoid the need to join the


decomposed relations to check the functional dependencies
when new tuples are inserted into the database. 7
Section 1

Lossless-join
Decomposition

Slides prepared by - Dr. Chui Chun Kit, http://www.cs.hku.hk/~ckchui/ for students in COMP3278
For other uses, please email : [email protected]
Illustration 1
R Functional dependencies

A B C F = {B →C}
1 1 3
1
2
2
1
2
3
The functional dependency B→C tells us
3 2 2 that for all tuples with the same value in B,
3 1 3
4 2 2 there should be at most one corresponding
4 1 3
value in C (E.g., If B=1, C =3 ; if B=2, C=2)
Decompose Question: Will decomposing R(A,B,C) into
R1(A,B) and R2(A,C) cause information lost?
R1 = A, B(R) R2 = A, C(R)
A B A C
1 1 1 3 Think in this way:
1 2 1 2 Is this decomposition “lossless join
2 1 2 3 decomposition”?
3 2 3 2 I.e., Is there any information lost if
3 1 3 3 we decompose R in this way?
4 2 4 2
4 1 4 3 9
Illustration 1
R Functional dependencies R1 ⋈ R2= A, B(R) ⋈ A, C(R)
A B C F = {B →C} A B C
1 1 3 1 1 3


1 2 2
To check if the
1 1 2
2 1 3 1 2 3 decomposition will cause
3 2 2 1 2 2 information lost, let’s try to
3 1 3 2 1 3
4 2 2 3 2 2 join R1 and R2 and see if we
4 1 3 3 2 3 can recover R.
3 1 2 As we see that R1 ⋈ R2 ≠ R,
Decompose 3 1 3
4 2 2 the decomposition has
4 2 3 information lost.
R1 = A, B(R) R2 = A, C(R) 4 1 2
This is NOT a lossless-join
4 1 3
A B A C decomposition.
1 1 1 3
1 2 1 2
2 1 2 3
3 2 3 2
3 1
This is a bad
3 3
4 2 4 2 decomposition
4 1 4 3
10
Illustration 2
R Functional dependencies R1 ⋈ R2 = A, B(R) ⋈ B, C(R)
A B C F = {B →C} A B C How about
1 1 3 1 1 3
decomposing the

=
1 2 2 1 2 2
2 1 3
2
3
1
2
3
2 3 2 2 relation R(A,B,C)
3 1 3 3
4
1
2
3
2
into R1(A,B) and
4 2 2
4 1 3 4 1 3 R2(B,C)?
Decompose

R1 = A, B(R) R2 = B, C(R) Well done! Since


A B B C R1 ⋈ R2 = R, breaking down
1 1 1 3 R to R1 and R2 in this way
1 2 2 2
2 1 has no information lost.
3 2
3 1
This decomposition is
4 2 lossless-join decomposition.
4 1
11
Lossless-join decomposition
R Functional dependencies

A B C F = {B →C}
1 1 3
1 2 2
2
3
1
2
3
2
What is/are the condition(s)
3 1 3
4
4
2
1
2
3
for a decomposition to be
lossless-join?
NOT Lossless-join decomposition Lossless-join decomposition

R1 = A, B(R) R2 = A, C(R) R1 = A, B(R) R2 = B, C(R)


A B A C A B B C
1 1 1 3 1 1 1 3
1 2 1 2 1 2 2 2
2 1 2 3 2 1
3 2 3 2 3 2
3 1 3 3 3 1
4 2 4 2 4 2
4 1 4 3 4 1 12
Lossless-join decomposition
R Functional dependencies

A B C F = {B →C}
1 1 3
1
2
2
1
2
3
1
3 2 2
A B
1 1
3 1 3
4 2 2
Let’s consider the first
4 1 3
tuple (1,1,3) in R.

Note that there is only


ONE tuple in R1 with
NOT Lossless-join A=1, B=1.
decomposition
R1 = A, B(R) R2 = A, C(R)
A B A C
1 1 1 3
1 2 1 2
2 1 2 3
3 2 3 2
3 1 3 3
4 2 4 2 13
4 1 4 3
Lossless-join decomposition
R Functional dependencies

A B C F = {B →C}
1 1 3
1
2
2
1
2
3
1 2 A C
3 2 2
A B
1 1 1 3
3 1 3 1 2
4 2 2
Let’s consider the first
4 1 3 Since A →AC is NOT a
tuple (1,1,3) in R.
functional dependency
in F+, there can be more
Note that there is only
than one tuples with
ONE tuple in R1 with
NOT Lossless-join A=1, B=1.
A=1 in R2
(e.g., (1,3), (1,2) ) .
decomposition
R1 = A, B(R) R2 = A, C(R)
A B A C
1 1 1 3
1 2 1 2
2 1 2 3
3 2 3 2
3 1 3 3
4 2 4 2 14
4 1 4 3
Lossless-join decomposition
R Functional dependencies

A B C F = {B →C}
1 1 3
1
2
2
1
2
3
1 2 A C
3 A B C
A B
3
3
2
1
2
3
1 1 ⋈ 1
1
3
2
= 1 1 3
1 1 2
4 2 2
Let’s consider the first
4 1 3 Since A →AC is NOT a
tuple (1,1,3) in R. Therefore when we join
functional dependency
in F+, there can be more R1 and R2, more than one
Note that there is only tuples will be generated
than one tuples with
ONE tuple in R1 with (i.e., (1,1) in R1 combine
NOT Lossless-join A=1, B=1.
A=1 in R2
with (1,3) and (1,2) in R2 )
(e.g., (1,3), (1,2) ) .
decomposition
R1 = A, B(R) R2 = A, C(R)
A B A C Observation:
1 1 1 3 The decomposition of R(A,B,C) into R1(A,B) and R2(A,C)
1 2 1 2
2 1 2 3
is NOT lossless-join because
3 2 3 2 A→ AC
3 1 3 3
4 2 4 2
is NOT in F+ , and … (to be explained in the next slide)
4 1 4 3 15
Lossless-join decomposition
R Functional dependencies

A B C F = {B →C}
1 1 3
1
2
2
1
2
3
1
3 2 2
A C
1 3
3 1 3
4 2 2
Let’s consider the
4 1 3
first tuple (1,1,3) in R.

Note that there is


only ONE tuple in R2
NOT Lossless-join with A=1, C=3.
decomposition
R1 = A, B(R) R2 = A, C(R)
A B A C
1 1 1 3
1 2 1 2
2 1 2 3
3 2 3 2
3 1 3 3
4 2 4 2 16
4 1 4 3
Lossless-join decomposition
R Functional dependencies

A B C F = {B →C}
1 1 3
1
2
2
1
2
3
1 2 A B
3 2 2
A C
1 1
1 3
3 1 3 1 2
4 2 2
Let’s consider the Since A →AB is NOT a
4 1 3
first tuple (1,1,3) in R. functional dependency
in F+, there can be
Note that there is more than one tuples
only ONE tuple in R2 with A=1 in R1
NOT Lossless-join with A=1, C=3. (i.e., (1,1), (1,2) ) .
decomposition
R1 = A, B(R) R2 = A, C(R)
A B A C
1 1 1 3
1 2 1 2
2 1 2 3
3 2 3 2
3 1 3 3
4 2 4 2 17
4 1 4 3
Lossless-join decomposition
R Functional dependencies

A B C F = {B →C}
1 1 3
1
2
2
1
2
3
1 2 A B 3 A B C
A C
3
3
2
1
2
3
1 3 ⋈ 1
1
1
2
= 1
1
1
2
3
3
4 2 2
Let’s consider the Since A →AB is NOT a
4 1 3 Therefore when we join
first tuple (1,1,3) in R. functional dependency
in F+, there can be R1 and R2, more than one
Note that there is more than one tuples tuples will be generated
only ONE tuple in R2 with A=1 in R1 (i.e., (1,3) in R2 combine
NOT Lossless-join with A=1, C=3. (i.e., (1,1), (1,2) ) . with (1,1) and (1,2) in R1 )
decomposition
R1 = A, B(R) R2 = A, C(R)
A B A C Observation:
1 1 1 3 The decomposition of R(A,B,C) into R1(A,B) and R2(A,C)
1 2 1 2
2 1 2 3
is NOT lossless-join because
3 2 3 2 A→ AC (explained in previous slide), and
3 1 3 3 A→ AB
4 2 4 2
4 1 4 3 are NOT in F+ . 18
Lossless-join decomposition
R Functional dependencies

A B C F = {B →C}
1 1 3
1
2
2
1
2
3
1
3 2 2
A B
1 1
3 1 3
4 2 2
4 1 3 Let’s consider the
first tuple (1,1,3) in R.
Note that there is
only ONE tuple in R1
Lossless-join with A=1, B=1.
decomposition
R1 = A, B(R) R2 = B, C(R)
A B B C
1 1 1 3
1 2 2 2
2 1
3 2
3 1
4 2 19
4 1
Lossless-join decomposition
R Functional dependencies

A B C F = {B →C}
1 1 3
1
2
2
1
2
3
1 2
A B B C
3 2 2
1 1 1 3
3 1 3
4 2 2
4 1 3 Let’s consider the Since B →BC is a
first tuple (1,1,3) in R. functional dependency
Note that there is in F+, there is only one
only ONE tuple in R1 tuple with B=1 in R2.
Lossless-join with A=1, B=1.
decomposition
R1 = A, B(R) R2 = B, C(R)
A B B C
1 1 1 3
1 2 2 2
2 1
3 2
3 1
4 2 20
4 1
Lossless-join decomposition
R Functional dependencies

A B C F = {B →C}
1 1 3
1
2
2
1
2
3
1 2 3
A B B C A B C
3
3
2
1
2
3
1 1 ⋈ 1 3 = 1 1 3
4 2 2
4 1 3 Let’s consider the Since B →BC is a Therefore when we join R1
first tuple (1,1,3) in R. functional dependency and R2, there will be ONLY
Note that there is in F+, there is only one ONE tuple generated, and
only ONE tuple in R1 tuple with B=1 in R2. that must be the
Lossless-join with A=1, B=1. corresponding tuple (1,1,3)
decomposition in R.

R1 = A, B(R) R2 = B, C(R)


A B B C Observation:
1 1 1 3
1 2 2 2
The decomposition of R(A,B,C) into R1(A,B) and
2 1 R2(B,C) is lossless-join because
3 2
3 1
B→ BC
+
is in F .
4 2
4 1 21
Testing for lossless-join decomposition
Consider a decomposition of R into R1 and R2.
Schema of R = schema of R1  schema of R2.

Let schema of R1  schema of R2 be R1 and R2’s


common attributes.
A decomposition of R into R1 and R2 is lossless-join if and
only if at least one of the following dependencies is in F+ .

Schema of R1  schema of R2 → schema of R1


OR
Schema of R1  schema of R2 → schema of R2
22
Example
Question: Given R(A,B,C), F={B→C}, is the following
a lossless join decomposition of R?
R1(A, B) , R2(B, C)
Answer: To see if (R1, R2) is a lossless join
decomposition of R, we do the following:
Find common attributes of R1 and R2 : B
Verify if any of the FD below holds in F+, if one of the FD
holds, then the decomposition is lossless join.
B → R1 (i.e., B → AB?)
B → R2 (i.e., B → BC?)
Since B → BC (by Augmentation rule on B→C ), R1 and R2 are
lossless join decomposition of R. 23
Section 2

Dependency preserving
Decomposition

Slides prepared by - Dr. Chui Chun Kit, http://www.cs.hku.hk/~ckchui/ for students in COMP3278
For other uses, please email : [email protected]
Dependency preserving
When decomposing a relation, we also want to keep
the functional dependencies.
A FD X → Y is preserved in a relation R if R contains all the
attributes of X and Y.
If a dependency is lost when R is decomposed into R1
and R2:
When we insert a new record in R1 and R2, we have to
obtain R1⋈ R2 and check if the new record violates the lost
dependency before insertion.
It could be very inefficient because joining is required in
every insertion! 25
Dependency preserving
Note that A→CD is in F+ because R
of the Transitivity axiom.
A B C D
Consider R(A,B,C,D), F = {A → B, B →CD} 1
2
1
1
3
3
4
4
F+ = {A → B, B →CD, A →CD, trivial FDs} 3
4
2
1
2
3
3
4

If R is decomposed to R1(A,B) , R2(B,C,D): Decompose

R1 = A, B(R) R2=B, C, D(R)


F1 = {A → B, trivials}, the projection of on R1
F+
A B B C D
F2 = {B → CD, trivials}, the projection of F+ on R2 1 1 1 3 4
2 1 2 2 3
3 2
4 1
This is a dependency preserving
decomposition as:
(F1  F2)+ = F+
Let us illustrate the implication of
dependency preserving in the next slide. 26
Dependency preserving
R
A B C D
Consider R(A,B,C,D), F = {A → B, B →CD} 1
2
1
1
3
3
4
4
F+ = {A → B, B →CD, A →CD, trivial FDs} 3
4
2
1
2
3
3
4
Is this a lossless join decomposition?
Decompose
Yes! As B→R2 (i.e., B→BCD) holds in F+. R1 = A, B(R) R2=B, C, D(R)
That mean we can recover R by R1⋈ R2.
A B B C D
Why it is dependency preserving? 1 1 1 3 4
2 1 2 2 3
Think about it… 3 2
A B C D
If we insert a new record 5 1 4 4 into R1 and R2: 4 1
A B B C D
R1 5 1 R2 1 4 4

We need to check if the new record will make the database


violate any FDs in F+.
Is such decomposition allow us to do the validation on R1
and R2 ONLY? (But no need to join R1 and R2 to validate it?) 27
Dependency preserving
R
A B C D
F+ = { A → B, B →CD, A →CD , trivials} 1 1 3 4
2 1 3 4
Inserting tuple (5,1,4,4) violates B →CD. 3
4
2
1
2
3
3
4
5 1 4 4

The decomposition is dependency Decompose

R1 = A, B(R) R2=B, C, D(R)


preserving as we only need to check:
A B B C D
A B
Inserting 5 1violate any F1 in R1? 1
2
1
1
1 3 4
2 2 3
3 2
This involves checking F1={A→B}. 4 1
1 4 4

B C D 5 1
Inserting 1 4 4 violate any F2 in R2? Although among the two
validations we haven’t checked
This involves checking F2={B →CD}. A→CD, but since A→B is
checked in F1, and B →CD is
We can check F1 on R1 and F2 on R2 only because checked in F2, if we pass both F1
(F1  F2)+ = F+ and F2, it implies A →CD. 28
Dependency preserving
R
A B C D
What about decompose R to R1(A,B), 1
2
1
1
3
3
4
4
R2(A,C,D) ? 3
4
2
1
2
3
3
4
R is decomposed to R1(A,B) , R2(A,C,D)
Decompose
F+ = {A → B, B →CD, A →CD, trivial FDs} R1 = A, B(R) R2=A, C, D(R)
F1 = {A → B, trivials}, the projection of F+ on R1 A B A C D
1 1 1 3 4
F2 = {A → CD , trivials}, the projection of F+ on R2 2 1 2 3 4
3 2 3 2 3
4 1 4 3 4
This is NOT a dependency preserving
decomposition as:
(F1  F2)+ ≠ F+
Let us illustrate the implication of NOT
dependency preserving in the next slide. 29
Dependency preserving
R
A B C D
What about decompose R to R1(A,B), 1
2
1
1
3
3
4
4
R2(A,C,D) ? 3
4
2
1
2
3
3
4

Is this a lossless join decomposition? Decompose


Yes! As A→R1 (i.e., A→AB) holds in F+. R1 = A, B(R) R2=A, C, D(R)
That mean we can recover R by R1⋈ R2. A B A C D
Is it dependency preserving? 1
2
1
1
1
2
3
3
4
4
3 2 3 2 3
Think about it… 4 1 4 3 4
A B C D
If we insert a new record 5 1 4 4 into R1 and R2:
A B A C D
R1 5 1 R2 5 4 4

We need to check if the new record will make the database


violate any FDs in F+. Is such decomposition allow us to do the
validation on R1 and R2 only (but no need to join R1 and R2)? 30
Dependency preserving
R
A B C D
F+ = { A → B, B →CD, A →CD } 1 1 3 4
2 1 3 4
Inserting tuple (5,1,4,4) violates B →CD. 3 2 2 3
4 1 3 4
The decomposition is NOT dependency 5 1 4 4
Decompose
preserving as if we only check: R1 = A, B(R) R2=A, C, D(R)
A B
Inserting 5 1 violate any F1 in R1? A B A C D
1 3 4
This involves checking F1={A→B}. 1
2
1
1 2 3 4
3 2 3 2 3
Inserting A
5
C
4
D
4 violate any F2 in R2? 4 1 4 3 4
5 1 5 4 4
This involves checking F2={A →CD}.
Although we passed F1 and F2,
We CANNOT check F1 on R1 and F2 on R2 only because it doesn’t mean that we
(F1  F2)+  F+ passed all FDs in F!
Decomposition in this way requires joining tables to It is because we lost the FD
validate B →CD for EVERY INSERTION! B →CD in the decomposition.
31
Dependency preserving
What is the condition(s) for a decomposition
to be dependency preserving?

Let F be a set of functional dependencies on R.


R1, R2, …, Rn be a decomposition of R.
Fi be the set of FDs in F+ that include only attributes in Ri.

A decomposition is dependency preserving if and


only if
(F1  F2  …  Fn)+ = F+
Where Fi is the set of FDs in F+ that include only attributes in Ri.
32
Example 1
Given R(A, B, C) , F = {A → B , B → C}
Is R1(A, B), R2(B, C) a dependency preserving decomposition?

First we need to find F+ , F1 and F2.


F+ = {A→B , B→C, A→C, some trivial FDs}
F1 = {A→B and trivial FDs } Note that A→C is in F+ because of
the Transitivity axiom.
F2 = {B→C and trivial FDs }
Then we check if (F1  F2)+ = F+ is true.
Since F1  F2 = F ,this implies (F1  F2)+ = F+.

This decomposition is dependency preserving.


33
Example 2
Given R(A, B, C) , F = {A → B , B → C}
Is R1(A, B), R2(A, C) a dependency preserving decomposition?

First we need to find F+ , F1 and F2.


F+ = {A→B , B→C, A→C, some trivial FDs}
F1 = {A→B and trivial FDs } Note that A→C is in F+ because of
the Transitivity axiom.
F2 = {A→C and trivial FDs }
Then we check if (F1  F2)+ = F+ is true.
Since B→C disappears in R1 and R2, (F1  F2)+  F+ .

This decomposition is NOT dependency preserving.


34
Section 3

Boyce-Codd
Normal Form

Slides prepared by - Dr. Chui Chun Kit, http://www.cs.hku.hk/~ckchui/ for students in COMP3278
For other uses, please email : [email protected]
FD and redundancy
Consider the following relation: Customer
id name dptID
Customer( id, name, dptID ) 1 Kit 1
2 David 1
F = { {id} → {name, dptID} } 3 Betty 2
4 Helen 2

{id} is a key in Customer.


Because the attribute closure of {id} (i.e., {id}+ = {id, name,
dptID} ), which covers all attributes of Customer.
Observation: All non-trivial FDs in F form a
key in the relation Customer.
This implies that there are no other FD that is just
involve a subset of columns in the relation.
This implies that Customer has no redundancy.
36
FD and redundancy
As another example: Customer
id name dptID building
Customer( id, name, dptID, building) 1 Kit 1 CYC
2 David 1 CYC
F = { {id} → {name, dptID , building} 3 Betty 2 HW
{dptID} → {building} } 4 Helen 2 HW

{dptID} → {building} brings redundancy. Why?


Tuples have the same dptID must have the same building
(e.g., dptID=1, building=“CYC”).
But those tuples can have different values in id and name.
For each different id values with the same dptID, building will
be repeated (redundancy). For example, for tuples with (id=1,
dptID=1) and (id=2, dptID=1) , building
must equal “CYC” (redundancy).
37
FD and redundancy
As another example: Customer
id name dptID building
Customer( id, name, dptID, building) 1 Kit 1 CYC
2 David 1 CYC
F = { {id} → {name, dptID , building} 3 Betty 2 HW
{dptID} → {building} } 4 Helen 2 HW

How to check?
Check if the attribute set closure of {dptID} covers all
attributes in Customer. ({dptID}+ = {dptID, building} ≠ Customer)

Redundancy is related to FDs. If there is an FD


→ , where {}+ does not cover all attributes in
R, then we will have redundancy in R!
38
Boyce-Codd Normal Form
Summarizing the observations, a relation R has no
redundancy, or in Boyce-Codd Normal Form (BCNF),
if the following is satisfied:
For all FDs in F+ of the form  → , where   R and   R,
at least one of the following holds:
We won’t border with trivial
 →  is trivial (i.e.,   ) FDs such as A→A, AB→A …etc

i.e., The attribute set closure of


 is a key (superkey) for R , represented as {}+ , covers
all attributes in R.

In another word, in BCNF, every


non-trivial FD forms a key. 39
How to test for BCNF?
Formally, for verifying if R is in BCNF
For each non-trivial dependency  →  in F+ (the
functional dependency closure), check if + covers the
whole relation (i.e., whether  is a superkey).
If any + does not cover the whole relation, R is not in BCNF.
Simplified test:
It is suffices to check only the dependencies in the given F for
violation of BCNF, rather than check all dependencies in F+

For example, given R(A,B,C); F = {A→B, B→C},


we only need to check if both {A}+ and {B}+ cover {A,B,C}.
We do not need to derive F+ = {A→B, B→C, A→C, …etc} and check
each FD because A→C already considered when computing {A}+.
40
How to test for BCNF?
However, if we decompose R into R1 and R2, we cannot
use only F to check if the “decomposed” relations (i.e.,
R1 and R2) is BCNF, we have to use F+ instead.
Illustration R
A B C D

R(A, B, C, D), F = {A → B, B → C} 1
1
1
1
1
1
1
2
1 1 1 3
1 1 1 4
To test if R is in BCNF, it is suffices to check 1 1 1 5
only the dependencies in F (but not F+) An example R that satisfies F

{A}+ covers all {A,B,C,D}? As illustrated through this instance, since


Since {A}+ = {A,B,C} ≠ {A,B,C,D}, {A}+ = {A,B,C} ≠ {A,B,C,D}, this implies
that it will cause redundancy when we
R is not in BCNF. have tuples with the same value across
{ABC} but different values in D.
41
How to test for BCNF?
To illustrate why we cannot use only F to test
decomposed relations for BCNF, let’s try to
decompose R into R1(A, B) and R2(A, C, D)

Illustration R
A B C D

R(A, B, C, D), F = {A → B, B → C} 1
1
1
1
1
1
1
2
1 1 1 3
Is R2(A, C, D) in BCNF? 1 1 1 4
1 1 1 5

When we check R2, none of FDs in F is R1(A, B) R2(A, C, D)


contained in R2. Does this mean no non-trivial A B A C D
FDs are in R2, and R2 is in BCNF? 1 1 1 1 1
1 1 2
1 1 3
No! We need to use F+ to verify if R2 is BCNF 1
1
1
1
4
5
42
How to test for BCNF?
In R2(A, C, D), A→C is in F+, because:
A→C can be obtained by transitivity rule on A→B and B→C
There is a non trivial FD A→C in R2 that we have missed!
R
Therefore in R2 we check {A}+ = {A,C} ≠ {A,C,D} A
1
B
1
C
1
D
1
1 1 1 2
Thus, A is not a key in R2 1 1 1 3
1 1 1 4
R2 is NOT in BCNF. 1 1 1 5

R1(A, B) R2(A, C, D)
Conclusion: When we test whether a A B A C D
decomposed relation is in BCNF, we must 1 1 1 1 1
1 1 2
project F+ onto the relation (e.g., R2), not F! 1 1 3
1 1 4
1 1 5
43
Section 4

Normalization

Slides prepared by - Dr. Chui Chun Kit, http://www.cs.hku.hk/~ckchui/ for students in COMP3278
For other uses, please email : [email protected]
Normalization goal
When we decompose a relation R with a set of
functional dependencies F into R1, R2, …, Rn, we try
to meet the following goals:
1. Lossless-join – Avoid the decomposition result in
information loss.

2. No Redundancy – The decomposed relations Ri should be


in Boyce-Codd Normal Form (BCNF). (There are also other
normal forms.)

3. Dependency preserving – Avoid the need to join the


decomposed relations to check the functional dependencies.
45
Illustration
Consider R(A, B, C), F = {A→B , B→C}, is R in BCNF?
If not, decompose R into relations that are in BCNF.
R
A B C
Is R in BCNF? 1 1 2
2 1 2
Because {B}+={B,C} ≠ {A,B,C} 3 1 2
4 1 2
Since {B}+ does not cover all attributes in R, R is NOT in BCNF.
Think in this way: How should we decompose R such that
the decomposed relations are always lossless join?
Note: A decomposition is lossless join if at least one of the
following dependencies is in F+
Schema of R1  schema of R2 → schema of R1
OR
Schema of R1  schema of R2 → schema of R2 46
Illustration
Idea: To make the decomposition always lossless join, we can
pick the FD A→B and make the decomposed relation as:
R1(A,B) – the attributes in the L.H.S. and R.H.S. of the FD.
R2(A,C) – the attribute(s) in the L.H.S. of the FD, and
the remaining attributes that does not appear in R1.
If we decompose the relation R in this way the
following must be true:
Schema of R1  schema of R2 → schema of R1
Schema of R1  schema of R2 is A.
A→R1= A→AB must be true because R1 must consists of the
L.H.S. and R.H.S. of the FD A→B in F.
47
Illustration R1(A, B) R2(A, C)
F = {A→B , B→C} F+ = {A→B , B→C, A→C, trivial FDs}
Fx A→B A→C
Is R1(A, B) in BCNF?
F1 = {A→B, trivial FDs}, it is a projection of F+ on R1.
R
Since {A}+ = {A,B} = R1, {A} is a key in R1. A B C
1 1 2
Since all FDs in F1 forms a key, R1 is in BCNF. 2 1 2
3 1 2
4 1 2
Is R2(A, C) in BCNF?
F2 = {A→C, trivial FDs}, it is a projection of F+ on R2. R1 R2
A B A C
Since {A}+
= {A,C} = R2, {A} is a key in R2. 1 1 1 2
2 1 2 2
Since all FDs in F2 forms a key, R2 is in BCNF. 3 1 3 2
4 1 4 2
Therefore, decomposing R(A, B, C) with F = {A→B , B→C} to
R1(A, B) and R2(A, C) result in a lossless join decomposition
(no information lost), and BCNF relations (no redundancy) 48
Illustration
Is the decomposition dependency preserving ?
F = {A → B , B → C}
(F1  F2) = (A → B , A→C)

Since B→ C disappears in R1 and R2, (F1  F2)+  F+ .


The decomposition is NOT dependency preserving.

Note: Although the decomposition is


not dependency preserving, but it is
lossless join, so we can join R1 and R2 to
test B→C.
49
BCNF decomposition algorithm
result = {R};
done = false;
compute F+;  is not a key;
while (done == false) { →  causes Ri
to violate BCNF
if (there is a schema Ri in result and Ri is not in BCNF)
let  →  be a non-trivial FD that holds on Ri s.t. {}+  Ri
result = (result – Ri)  ( )  (Ri – )
else
done = true; 3. Create a relation containing
} Ri but with  removed.

1. Delete Ri 2. Create a relation with only  and 

Each Ri is in BCNF, and the


decomposition must be lossless-join 50
Example 1 R1(B, C) R2(A, B)

Fx B→C A→B

Consider R(A, B, C), F = {A→B , B→C},


decompose R into relations that are in BCNF.
R
A B C
Alternative decomposition: To make the 1 1 2
2 1 2
decomposition always lossless join, we can pick the FD 3 1 2
4 1 2
B→C and make the decomposed relation as:
R1(B,C) – the attributes in the L.H.S. and R.H.S. of R1 R2
the FD. B C A B
R2(A,B) – the attribute(s) in the L.H.S. of the FD, and 1 2 1 1
2 1
the remaining attributes that does not appear in R1. 3 1
4 1

51
Example 1 R1(B, C) R2(A, B)

F = {A→B , B→C} F+ = {A→B , B→C, A→C, trivial FDs} Fx B→C A→B

Decomposition: R1(B, C), R2(A, B)


Is R1(B, C) in BCNF? R
A B C
F+
F1 = {B→C, trivial FDs}, it is a projection of on R1. 1 1 2
2 1 2
Since {B}+ = {B,C} = R1, {B} is a key in R1. 3 1 2
4 1 2
Since all FDs in F1 forms a key, R1 is in BCNF.
Is R2(A, B) in BCNF? R1 R2
B C A B
F2 = {A→B, trivial FDs}, it is a projection of F+ on R2. 1 2 1 1
Since {A}+ = {A,B} = R2, {A} is a key in R2. 2
3
1
1
4 1
Since all FDs in F2 forms a key, R2 is in BCNF.

52
Example 1 R1(B, C) R2(A, B)

Fx B→C A→B

Is the decomposition lossless join?


From the illustration in example 1, the R
A B C
decomposition must be lossless join. 1 1 2
2 1 2
Is the decomposition dependency preserving ? 3 1 2
4 1 2
F = {A→B , B→C}
(F1  F2) = (B → C , A→B) R1 R2

Since F = (F1  F2) , this implies (F1  F2)+ = F+ . B C


1 2
A
1
B
1
2 1
The decomposition is dependency preserving. 3 1
4 1
That means if we insert a new tuple, if the new tuple does
not violate F1 in R1, and F2 in R2, it won’t violate F+ in R. 53
Example 2
Consider a relation R in a bank:
R (b_name, b_city, assets, c_name, l_num, amount)
Each specific value in bname is
F = { {b_name} → {assets, b_city}, corresponds to at most one at most
one {asset , b_city} value
{l_num} → {amount, b_name}, Each l_num corresponds to at most
{l_num, c_name} → everything } one at most one {amount, b_name}
value.

Each { l_num, c_name} corresponds


to at most one {b_name, b_city,
Decomposition assets, amount} value.

With {b_name} → {assets, b_city}, {b_name}+ ≠ R,


R is not in BCNF.
Decompose R into R1(b_name, assets, b_city) and
R2(b_name, c_name, l_num, amount). 54
Example 2
Is R1(b_name, assets, b_city) in BCNF?
Projection of F+
F1 = { {b_name} → {assets, b_city}, trivial FDs} on F1.

{b_name}+ = {b_name, assets, b_city} = R1,


so {b_name} is a key in R1.
Since all FD in F1 forms a key in R1, R1 is in BCNF.
Is R2(b_name, c_name, l_num, amount) in BCNF?
F2 = { {l_num} → {amount, b_name} , Projection of F+
{l_num, c_name} → {all attributes} } on F2.

{l_num}+ = {l_num, amount, b_name} ≠ R2,


so {l_num} is NOT a key in R2.
Since NOT all FD in F2 forms a key in R2, R2 is NOT in BCNF.
55
Example 2
Picking {l_num} → {amount, b_name}, R2 is further
decomposed into:
R3(l_num, amount, b_name)
R4(c_name, l_num)

Is R3(l_num, amount, b_name) in BCNF?


F3 = {{l_num} → {amount, b_name}, trivial FDs}
{l_num}+ = {l_num, amount, b_name} = R3, so {l_num} is a
key in R3.
Since all FD in F3 forms a key in R3, R3 is in BCNF.
56
Example 2
Is R4(c_name, l_num) in BCNF?
F4 = {trivial FDs}
Since all FD in F4 forms a key in R4, R4 is in BCNF.
Now, R1, R3 and R4 are in BCNF;

The decomposition is also lossless-join.

57
Example 2

The decomposition is also dependency preserving.


F1 = { {b_name} → {assets, b_city}, trivial FDs}
F3 = {{l_num} → {amount, b_name}, trivial FDs}
{l_num} → {b_name} … (i)
by Decomposition of {l_num} → {amount, b_name}
{l_num} → {assets, b_city} … (ii)
by Transitivity of (i) and {b_name} → {assets, b_city}
{l_num} → {b_name ,assets, b_city, amount} by Union of F3 and (ii)

{l_num, c_name} → {l_num ,c_name, b_name ,assets, b_city, amount} by


Augmentation

Therefore F1  F3  F4 = F, which implies (F1  F3  F4)+ = F+ .


The decomposition is dependency preserving. 58
BCNF doesn’t imply dependency preserving
R
A B C
1 1 2
It is not always possible to get a BCNF 2 1 2
1 2 3
decomposition that is dependency preserving.
R1 R2
Consider R(A, B, C); F = { AB→C, C→B } A B B C
Not lossless
1 1 1 2
2 1 decomposition
There are two candidate keys: 1 2
2 3

{AB}, and {AC}. R1 R2


A B A C
{AB}+ = {A,B,C} = R 1 1 1 2 Not lossless
2 1 2 2 decomposition
{AC}+ = {A,B,C} = R 1 1 1 3

R is not in BCNF, since C is not a key. R1 R2 lossless


A C B C F1= {Ø}
Decomposition of R must fail to 1 2 1 2 F2= {C→B}
2 2 2 3
preserve AB→C. 1 3 Not dependency
preserving
59
Motivating example
Back to our motivating example, we have:
Employees( eid, name, parkingLot, did, since)
Departments( did, dname, budget)

“Employees who work in the same department must


park at the same parkingLot.” implies the following FD:
FD: did → parkingLot
Is Employees in BCNF?
{did}+ = {did, parkingLot} ≠ {eid, name, parkingLot, did, since}
Since did is not a key, Employees is NOT in BCNF.
60
Normalization
Employees( eid, name, parkingLot, did, since) is
decomposed to
Employees2( eid, name, did, since)
Dept_Lots( did, parkingLot)
With Departments( did, dname, budget), the above
two decomposed relations are further refined to
Employees2( eid, name, did, since)
Departments( did, dname, parkingLot, budget)

Good design: parking lots for all employees can be updated


by changing their department-specific parkingLot. 61
Summary
Relational database design goals
Lossless-join
No redundancy (BCNF)
Dependency preservation

It is not always possible to satisfy the three goals.


A lossless join, dependency preserving decomposition into
BCNF may not always be possible.

SQL does not provide a direct way of specifying FDs


other than superkeys.
Can use assertions to check FD, but it is quite expensive.
62
Chapter 5B.

END
COMP3278 Introduction to
Database Management Systems

Department of Computer Science, The University of Hong Kong


Slides prepared by - Dr. Chui Chun Kit, http://www.cs.hku.hk/~ckchui/ for students in COMP3278
For other uses, please email : [email protected]

You might also like