Relational Model
Relational Model
1
CS4221: The Relational Model
Topics:
Basic concepts in Relational Model
o FD, transitive dependency, key, primary key, updating anomalies, properties of FDs
Normal Forms
o 1NF, 2NF, 3NF, BCNF; redundancy in NF relations
Decomposition Approach
o Universal Relation Assumption, problems of decomposition approach
Sythesizing Approach
o FD inference rules, closure of FDs, closure of attributes, FD membership test, criteria for
normalization, local/global redundancy, Bernstein’s Algorithm and its weak points
4NF
o MVDs, MVD inference rules, properties of FDs and MVDs, decomposition approach,
MVDs and hierarchical model
5NF and DKNF
o Will not be covered/examined due to time limit
Will show many commonly misunderstood important concepts
and errors. 2
CS4221: The Relational Model
First Normal Form (1NF) Relation
Defn: Given sets of atomic (i.e. non-decomposable) elements D1,
D2, …, Dn (not necessarily distinct), R is a first normal
form (1NF) relation on these n sets if it is a set of ordered
n-tuples < d1, d2, …, dn > such that
di Di i = 1, 2, ..., n. (Note: means “for all”)
Thus R D1 x D2 x … x Dn
where x is the Cartesian product operator.
Note: A set has no duplicates. An n-tuple is ordered means the orders of
the n components of the tuple are important.
…
4
CS4221: The Relational Model
Defn: A set of attributes Y of R is said to be functionally
dependent (FD) on a set of attributes X of R if each X-value
in R has associated with exactly one Y-value in R at any
time.
This is denoted by
XY
and is called a functional dependency of R.
Defn: If there are more than one key for a relation, one of the
keys is designated as the primary key of the relation.
{S#, P#} is the only key of the relation SP. Prove it!
Relation SP is not in 2NF as Sname is not fully dependent on the key. Q: Why?
There are redundant information on Sname and Pname in SP. 9
CS4221: The Relational Model
Third Normal Form (3NF) Relation
Defn: Let A and B be two distinct sets of attributes (i.e. not
identical) of a relation R, and d be an attribute of R
which does not belong to A or B such that
d
A B B d B / A
A B
/
Then we say that d is transitively dependent on A under R,
and A d is a transitive dependency.
Intuitive meaning: A transitive dependency can be derived from other FDs, so it
is redundant and can be removed.
14
CS4221: The Relational Model
E.g. R (A, B, C, D) with AB CD and D B
15
CS4221: The Relational Model
Decomposition & Synthesizing Method
- for Relational Database Design
Three common methods for relational database schema design are the
decomposition method, the synthesizing method, and the Entity-Relationship
Approach.
The decomposition method is based on the assumption that a database can
be represented by a universal relation which contains all the attributes of the
database (this is called the universal relation assumption) and this relation
is then decomposed into smaller relations in order to remove redundant data.
The synthesizing method is based on the assumption that a database can be
described by a given set of attributes and a given set of functional
dependencies, and 3NF or BCNF relations are then synthesized based on the
given set of dependencies.
Note: Synthesizing method assumes universal relation assumption also.
(1) F F+
(2) Reflexivity: X X F+ X A
(3) Augmentation: X, Y, Z A
If X Z F+ then X Y Z F+
(4) Pseudo-transitivity: X, Y, Z, W A
if X Y F+ , Y Z W F+
then X Z W F+
(5) No other FDs are in F+
Show AB F G+.
Note: The numbers are used to identify the FDs.
Solution:
1 2
AB ABC ABCD
4
ABCDE
3
ABCDEF
F
AB F G+
23
CS4221: The Relational Model
Detailed steps for proving AB F G+
1
(1) Prove AB ABC
Since AB AB (by projectivity)
AB C (given)
so AB ABC (by additivity)
(2) Prove ABC 2
ABCD
Since C D (given)
ABC C (by projectivity)
so ABC D (by transitivity)
Also ABC ABC (by projectivity)
so ABC ABCD (by additivity)
(3) Prove AB 1,2
ABCD
From (1) we have AB ABC
From (2) we have ABC ABCD
so AB ABCD (by transitivity)
(4) …
Note: The proof is too long. Any better way?
24
CS4221: The Relational Model
Alternative Solution to prove X Y in G+
Given a set of attributes X, the closure of X relative to
Defn:
G is defined as:
X+ = { y A | X y G+ }
25
CS4221: The Relational Model
Three Criteria for Normalization
(1) Reconstructibility (or losslessness).
If an original relation R is split into n relations R1, R2, …, Rn,
then Ri = R[Ai] (where [ ] is the projection operator)
and R1 R2 … Rn = R
where Ai is the attribute set of Ri i = 1, 2, …, n
and is the join operator
Algorithm
1. (Eliminate extraneous attributes). Let F be the given set of FDs
where the right side of each FD is a single attribute.
Eliminate extraneous attributes from the left side of each FD in F,
producing the set G.
2. (Finding covering). Find a non-redundant covering H of G.
3. (Partition). Partition H into groups such that all of the FDs in each
group have identical left sides.
4. (Merge equivalent keys). Let J = .
For each pair of groups, say Hi and Hj with left sides X and Y resp.
If X and Y are properly equivalent, then
(a) merge Hi and Hj together
(b) add X Y and Y X to J
(c) if X Z H and Z Y, then delete X Z from H.
Similarly, if Y Z H and Z X, then delete Y Z from H.
28
CS4221: The Relational Model
5. (Eliminate transitive dependencies).
Find a minimal H H such that
(H J)+ = (H J)+
Then add each FD of J into its corresponding group of H.
6. (Construct relations)
Each group in H forms a relation.
Each set of attributes that appears on the left side of any FD in the
group is a key of the relation formed by the group. They are called
explicit keys.
Note: There may have more than one key for some relations constructed.
Step 3 (Partition)
H1 = { A B }
H2 = { B C, B D }
H3 = { D B }
H4 = { A E F }
30
CS4221: The Relational Model
Step 4 (Merge groups)
B and D are properly equivalent
J = { B D, D B }
H1 = {A B}
H2 = H2 H3 – {B D, D B}
= {B C}
H4 = {AE F}
31
CS4221: The Relational Model
Example 2 (need step 5)
Given F = {X1 X2 AD, CD X1 X2 ,
A X1 B, B X2 C, C A}
Step 1. G=F
Step 2. H=G
Step 3 H1 ={ X1 X2 AD}
H2 = {CD X1 X2}
H3 = {A X1 B}
H4 = {B X2 C}
H5 = {C A}
Step 4 J = {X1 X2 CD, CD X1 X2}
H1 = H1 H2 – J
= {X1 X2 A}
H3 = {A X1 B}
H4 = {B X2 C}
H5 = {C A}
32
CS4221: The Relational Model
Step 5 (Eliminate TD)
We can eliminate X 1X 2 A A
since X 1 X 2 CD, C A
and C / X 1 X2
X1 X2 C
so we get
J = {X1 X2 CD, CD X1 X2}
H1 =
H3 = {A X1 B}
H4 = {B X2 C}
H5 = {C A}
Step 6 R1 (X1, X2, C,D) Note: 2 keys: {X1, X2} and {C, D}
R2 (A, X1, B)
R3 (B, X2, C)
R4 (C, A)
Note: If we omit step 5, then R1 will be
R1 (X1, X2, C,D, A)
Which is not in 3NF. Why? 33
CS4221: The Relational Model
Some shortcomings of Bernstein’s algorithm
Shortcoming 1. Bernstein’s algorithm does not guarantee
reconstructibility (or losslessness).
Example 3. Given R (Course#, Preq#, Cname, Cdesc) with
F = {Course#, Preq# Cname
Course# Cname, Cdesc}
Step 1 G = {Course# Cname, Cdesc}
Step 2 H=G
:
Step 6 R1 (Course#, Cname, Cdesc)
Note: We lose information about Preq#.
Q: How to resolve this problem?
In fact we have Course# Preq#
(Note. It is a multi-valued dependency, to be discussed later. Bernstein’s
algorithm does not handle MVDs).
We need another relation:
R2 (Course#, Preq#) 34
CS4221: The Relational Model
Shortcoming 2. Bernstein’s algorithm does not find all the keys.
35
CS4221: The Relational Model
Shortcoming 3. Bernstein’s algorithm does not remove all the
superfluous attributes (i.e. redundant attributes).
Example 5. Given F = { AD B, B C, C D,AB E,
AC F }
Step 1 G=F
Step 2 H=G=F
:
Step 6 R 1 (A, B, C, D, E, F)
R2 (B, C)
R3 (C, D)
Note: C is superfluous in R1, but R1 is in 3NF. However, D
is not superfluous. Remove C from R1 and get
R1 (A, B, D, E, F)
Note: Ling & Tompa & Kameda method removes all superfluous attributes.
36
CS4221: The Relational Model
Shortcoming 4. The set of relations produced by the algorithm depends
on the non-redundant covering found.
Example 6. Given F = {AD B, B C, C D, AB E,
AC F, AD F, AC E}
Case 1 If H = {AD B, B C, C D,AB E, AC F}
Then the set of relation is
R 1 (A , B , C, D , E, F)
R2 (B, C)
R3 (C, D)
Case 2 If H = {AD B, B C, C D, AB E, AD F }
Then the set of relations is
R1 (A, B, D, E, F)
R2 (B, C)
R3 (C, D)
37
CS4221: The Relational Model
Case 3 If H = {AD B, B C, C D,AC F, AC E}
Then we have
R 1 (A , C , D , B , E , F )
R2 (B, C)
R3 (C, D
Note that AB is a key but it is not found by the algorithm.
Case 4 If H = {AD B, B C, C D, AC E, AD F }
Then we have
R 1 (A, C, D, B, E, F)
R2 (B, C)
R3 (C, D)
Note that AB is a key but it is not found by the algorithm.
Note that Case 2 gives the best solution. What is the meaning?
38
CS4221: The Relational Model
Shortcoming 5. A BCNF relation set may contain superfluous attributes,
i.e. redundant attributes which can be removed.
39
CS4221: The Relational Model
Note. Some relations generated by Step 6 may have more than
one key. We need to choose their preliminary key.
Why and how to choose?
42
CS4221: The Relational Model
Another way to view MVD:
Defn: Let R (A, B, C) be a relation and A, B, C be sets of
attributes of R, not necessarily disjoint.
Let Ba c ={ b | (a, b, c) R } /* a and c are some A and C values
The MVD A B is said to hold for R (A, B, C)
if and only if Ba c depends on a only,
i.e. Ba c = Ba c for all a, c, c values of attributes A and C,
whenever Ba c and Ba c are both non-empty.
W W∩Y=
Note: These 5 rules plus the 3 rules of Armstrong’s Axioms for
FDs are sound and complete for FDs and MVDs. 45
CS4221: The Relational Model
Result: 4NF relation is also in BCNF.
Theorem. X Y holds for relation R (X, Y, Z)
if and only if R is the join of its projections
R1 (X, Y) and R2 (X, Z).
Note: We call {R1, R2} is a non-loss decomposition of R. R can be
reconstructed by joining R1 and R2.
Corollary. If a relation is not in 4NF, then there is a non-loss
decomposition of R into a set of 4NF relations.
Note: However, it may not cover all the given FDs.
Below is a a correct design:
Employee Employee
Salary
Employee
(another correct design)
Child Year/Salary
50
CS4221: The Relational Model
More Properties of MVDs
Result: Y in R(Y, Z) iff R is the cartesian product
of its projection R1(Y) and R2 (Z). Prove it!
Q: What is the intuitive meaning of this MVD?
Result: If X Y and X Z
then, X YZ (multivalued union rule)
X YZ (multivalued intersection rule)
X Y–Z (multivalued difference rule)
X Z–Y
Prove them!
51
CS4221: The Relational Model
Example. Let R(A, B, C, G, H, I) with the following set of
dependencies D = { A B, B HI, CG H}
Q: Is A CGH D+ ?
52
CS4221: The Relational Model
(2) Prove A HI D+
Since A B and B HI
By the multivalued transitivity rule, we have
A HI - B
i.e. A HI D+
(3) Prove B H D+
Since B HI
H HI
CG H
CG HI =
By the coalescence rule, we have
B H D+
(4) Prove A CG D+
By (1) we have A CGHI D+
By (2) we have A HI D+
By the difference rule, we have
A CGHI – HI D+
i.e. A CG D+
53
CS4221: The Relational Model
4NF Decomposition Algorithm (Korth’s book page 206)
56
CS4221: The Relational Model
Another method to find 4NF relations
1. Normalize the relation R into a set of 3NF and/or BCNF
relations based on the given set of FDs.
2. For each relation, if all attributes belong to the same key
and there exists non-trivial MVDs in the relation, then
decompose the relation into 2 smaller relations.
60
CS4221: The Relational Model
Example: Prove that if A BC and D C, then A C.
In order to prove AC, we create 2 tuples in the relation with
the same A-value. Our objective is to prove that c1=c2.
A B C D
a b1 c1 d1
a b2 c2 d2
Since D C, and the 1st and 3rd tuples have the same D-value,
so their C-value should be set to equal, i.e. c1=c2.
So, we have proved that A C. 61
CS4221: The Relational Model
Example: Prove that if AB and BC, then AC
in relation R(A,B,C,D).
In order to prove AC, we create 2 tuples with same A-
value in a relation and then show the 2 tuples (a, b1, c2, d1)
and (a, b2, c1, d2) are in the relation.
A B C D A B C D
Since AB
a b1 c1 d1 a b1 c1 d1
a b2 c2 d2
we add 2 tuples. a b2 c2 d2
a b2 c1 d1
Since BC, we add 2 + 2 tuples. a b1 c2 d2
A B C D
a b1 c1 d1
a b2 c2 d2
a b2 c1 d1 The 2 tuples (a, b1, c2, d1) and (a, b2, c1, d2)
a b1 c2 d2 are now in the relation. So we have proved that
a b1 c2 d1 AC
a b1 c1 d2
a b2 c1 d2
62
a b2 c2 d1 CS4221: The Relational Model
Example (Counter example by chase).
Prove or disprove the statement:
If ABC and CD B then AB.
In order to prove or disprove AB, we create 2 tuples with same
A-value in a relation and find out whether we can conclude b1=b2.
A B C D A B C D
a b1 c1 d1
Since ABC a b1 c1 d1
a b2 c2 d2 add 2 tuples a b2 c2 d2
a b2 c2 d1
a b1 c1 d2
65
CS4221: The Relational Model
Some other normal forms
• Fifth Normal Form (5NF) or called
Project-Join Normal Form (PJNF).
• Domain-Key Normal Form (DKNF)
• For your reading pleasure. They will not be
covered/examined.
66
CS4221: The Relational Model
Fifth Normal Form (Project-Join Normal Form)
(5NF, PJNF) (will not be covered/examined)
69
CS4221: The Relational Model
Defn: Let R be a relation and R1, …, Rn be a decomposition of
R. We say that R satisfies the join dependency *{ R1,
R2, …, Rn} iff
n
Ri = R
i=1
( or R1 R2 … Rn = R
or R1 * R2 * … * Rn = R )
Defn: A join dependency (JD) is trivial if one of the Ri is
R itself.
Note: When n = 2, the join dependency of the form
*{R1, R2} is equivalent to a multivalued dependency.
72
CS4221: The Relational Model
Defn: Let D, K, G be the set of domain constraints, the set
of key constraints, and the set of general constraints of a
relation R.
R is said to be in domain-key normal form (DKNF) if
D K logically implies G.
i.e. all constraints can be expressed by only domain
constraints and key constraints.
73
CS4221: The Relational Model
Example. Let Acct(acct#, balance) with acct# balance
and a general constraint:
“ if the first digit of an account is 9,
then the balance of the account is 2500.”
76
CS4221: The Relational Model