0% found this document useful (0 votes)
4 views

Advance Database Systems - Lec 4

The document discusses the process of relational database design, focusing on normalization and its various forms, including 1NF, 2NF, 3NF, and BCNF. It explains the importance of functional dependencies and how they influence the design and decomposition of relations to eliminate anomalies. Examples are provided to illustrate the concepts of partial and transitive dependencies, as well as the implications of combining schemas.

Uploaded by

adil642799
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Advance Database Systems - Lec 4

The document discusses the process of relational database design, focusing on normalization and its various forms, including 1NF, 2NF, 3NF, and BCNF. It explains the importance of functional dependencies and how they influence the design and decomposition of relations to eliminate anomalies. Examples are provided to illustrate the concepts of partial and transitive dependencies, as well as the implications of combining schemas.

Uploaded by

adil642799
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 68

Lecture 4

Relational Database Design

Normalization
Overall Database Design Process

 Given a schema R (Relational DB Schema)


 R can be generated when converting E-R diagram to
a set of tables
 R can be a single relation containing all attributes that
are of interest (called universal relation)
 Normalization breaks R into smaller relations to
remove anomalies
 R can be a result of some ad hoc design of relations,
which we then test/convert to normal form
ER Model and Normalization
 When an E-R diagram is carefully designed, identifying
all entities correctly, the tables generated from the E-R
diagram may not need further normalization
 However, in a real (imperfect) design, there can be
functional dependencies from non-key attributes of an
entity to other attributes of the entity
 Example: an employee entity with attributes
department_number and department_address, and
a functional dependency department_number 
department_address
 Good design would have made department an entity
 Functional dependencies from non-key attributes of a
relationship set possible, but rare --- most relationships
are binary
The Banking Example
The Banking Schema
 branch = (branch_name, branch_city, assets)
 customer = (customer_id, customer_name, customer_street, customer_city)
 loan = (loan_number, amount)
 account = (account_number, balance)
 employee = (employee_id. employee_name, telephone_number, start_date)
 dependent_name = (employee_id, dname)
 account_branch = (account_number, branch_name)
 loan_branch = (loan_number, branch_name)
 borrower = (customer_id, loan_number)
 depositor = (customer_id, account_number)
 cust_banker = (customer_id, employee_id, type)
 works_for = (worker_employee_id, manager_employee_id)
 payment = (loan_number, payment_number, payment_date,
payment_amount)
 savings_account = (account_number, interest_rate)
 checking_account = (account_number, overdraft_amount)
Combine Schemas?
 Suppose we combine borrower and loan to get
bor_loan = (customer_id, loan_number, amount )
 Result is possible repetition of information (L-100 in example
below)
A Combined Schema Without Repetition
 Consider combining loan_branch and loan
loan_amt_br = (loan_number, amount, branch_name)
 No repetition (as suggested by example below)
Goal — Devise a Theory for the Following
 Decide whether a particular relation R is in “good” form.
 In the case that a relation R is not in “good” form,
decompose it into a set of relations {R1, R2, ..., Rn} such
that
 each relation is in good form
 the decomposition is a lossless-join decomposition
 Our theory is based on:
 functional dependencies
 multivalued dependencies
Functional Dependencies
 Constraints on the set of legal relations
 Require that the value for a certain set of attributes
determines uniquely the value for another set of
attributes
 A functional dependency is a generalization of the notion
of a key
Functional Dependencies (Cont.)
 Let R be a relational schema
  R and   R
 The functional dependency

holds on R if and only if for any legal relations r(R), whenever
any two tuples t1 and t2 of r agree on the attributes , they
also agree on the attributes . That is,
t1[] = t2 []  t1[ ] = t2 [ ]
 Example: Consider r(A,B ) with the following instance of r.
1 4
1 5
3 7

 On this instance, A  B does NOT hold, but B  A does


hold.
Functional Dependencies (Cont.)
 K is a superkey for relation schema R if and only if K  R
 K is a candidate key for R if and only if
 K  R, and
 for no   K,   R
 Functional dependencies allow us to express constraints
that cannot be expressed using superkeys. Consider the
schema:
bor_loan = (customer_id, loan_number, amount ).
We expect this functional dependency to hold:
[customer_id , loan_number]  amount
but would not expect the following to hold:
loan_number  amount or
customer_id  amount
Use of Functional Dependencies
 We use functional dependencies to:
 test relations to see if they are legal under a given set of
functional dependencies.
 If a relation r is legal under a set F of functional
dependencies, we say that r satisfies F.
 specify constraints on the set of legal relations
We say that F holds on R if all legal relations on R
satisfy the set of functional dependencies F.
Functional Dependencies (Cont.)
 A functional dependency is trivial if it is satisfied by all
instances of a relation
 Example:
 customer_name, loan_number  customer_name
 customer_name  customer_name
 In general,    is trivial if   
Closure of a Set of Functional Dependencies
 Given a set F of functional dependencies, there are
certain other functional dependencies that are
logically implied by F.
 For example: If A  B and B  C, then we can
infer that A  C
 The set of all functional dependencies logically
implied by F is the closure of F
 We denote the closure of F by F+.
 F+ is a superset of F
 Also called transitivity
Dependencies
 Multivalued Attributes (or repeating groups): non-
key attributes or groups of non-key attributes the
values of which are not uniquely identified by
(directly or indirectly) (not functionally dependent on)
the value of the Primary Key (or its part).

STUDENT

Stud_ID Name Course_ID Units


101 Lennon MSI 250 3.00
101 Lennon MSI 415 3.00
125 Johnson MSI 331 3.00
Dependencies
 Partial Dependency – when a non-key attribute is
determined by a part, but not the whole, of a COMPOSITE
primary key.

Partial
Dependency
CUSTOMER

Cust_ID Name Order_ID


101 AT&T 1234
101 AT&T 156
125 Cisco 1250
Dependencies

 Transitive Dependency – when a non-key attribute


determines another non-key attribute.

Transitive
Dependency

EMPLOYEE

Emp_ID F_Name L_Name Dept_ID Dept_Name


111 Mary Jones 1 Acct
122 Sarah Smith 2 Mktg
Normal Forms
 Unnormalized – There are multivalued attributes or
repeating groups
 1 NF – No multivalued attributes or repeating groups.
 2 NF – 1 NF plus no partial dependencies
 3 NF – 2 NF plus no transitive dependencies
Normalization
 First normal form (1NF) sets the very basic rules for an
organized database:
 Eliminate duplicative columns from the same table or a row of data
cannot contain repeating groups of similar data (atomicity); and
 Each row of data must have a unique identifier (or Primary Key) or
Create separate tables for each group of related data and identify each
row with a unique column or set of columns (the primary key).

 Domain is atomic if its elements are considered to be


indivisible units
 A relational schema R is in first normal form if the
domains of all attributes of R are atomic
 Non-atomic values complicate storage and encourage
redundant (repeated) storage of data
 Example: Set of accounts stored with each customer,
and set of owners stored with each account
Change to First Normal Form
SSN UserName Product1 Product2 MoreProducts EmployerName EmployerAddress
332345432 Amy M Google 1 California drive

666666666 Kevin A B C,D Facebook 22nd Street


Sanfrancisco
919919919 Raj D Google 1 California drive

1.You can only have one value in a column


2.You should not create multiple columns for one-to-many relationship.
SSN UserName EmployerName EmployerAddress Product
332345432 Amy Google 1 California drive M
666666666 Kevin Facebook 22nd Street A
Sanfrancisco
666666666 Kevin Facebook 22nd Street B
Sanfrancisco
666666666 Kevin Facebook 22nd Street C
Sanfrancisco
666666666 Kevin Facebook 22nd Street D
Sanfrancisco
919919919 Raj Google 1 California drive D
The column MoreProducts in the denormalized table is in conflict with the first rule. Which is
eliminated in the First Normal Form.

The SSN alone as primary key is in conflict with the second rule of first normal form. Hence
we combine SSN and products to form primary key to uniquely identify each row.
•Each table cell should contain single value.
•Each record needs to be unique.
Second Normal Form
 Second normal form (2NF) further addresses the concept
of removing duplicative data:
 Meet all the requirements of the first normal form.
 Remove partial dependencies
 Remove subsets of data that apply to multiple rows of
a table and place them in separate tables.
 Create relationships between these new tables and
their predecessors through the use of foreign keys.
2nd NF

Movie Category

Pirates of Caribbean Action

Clash of Titans Action

Forgetting Sarah Marshal Romantic

• Rule 1- Be in 1NF
• Rule 2- Single Column Primary Key
SSN UserName
332345432 Amy
666666666 Kevin
919919919 Raj

SSN EmployerName EmployerAddress


332345432 Google 1 California drive
666666666 Facebook 22nd Street Sanfrancisco

919919919 Google 1 California drive

SSN Product
332345432 M
666666666 A
666666666 B
666666666 C
666666666 D
919919919 D
Third Normal Form
 Third normal form (3NF) goes one large step further:
 Meet all the requirements of the second normal form.
 Remove columns that are not directly dependent upon the
primary key.
 i.e. remove transitive dependencies
Third Normal Form

• In the 2nd NF the SSN UserName


EmployerAddress column 332345432 Amy
depends on
EmployerName which in 666666666 Kevin
turn depends on SSN 919919919 Raj
• It is a transitive (indirect) SSN EmployerName
dependency, which needs 332345432 Google
to be removed in the third 666666666 Facebook
normal form.
919919919 Google
EmployerName EmployerAddress
Google 1 California drive
Facebook 22nd Street Sanfrancisco
SSN Product
332345432 M
666666666 A
666666666 B
666666666 C
666666666 D
919919919 D
Real world Practice
• For most practical applications, once this is done, a
surrogate key (ID) is introduced which replaces the
natural key as the primary key. In which case the
database would look like this.

UserId SSN UserName


1 332345432 Amy
2 666666666 Kevin
3 919919919 Raj
UserEmployerId UserId EmployerId
1 1 1
2 2 2
3 3 1
EmployerId EmployerName EmployerAddress
1 Google 1 California drive
2 Facebook 22nd Street Sanfrancisco
UserProductId UserId Product
1 1 M
2 2 A
3 2 B
4 2 C
5 2 D
6 3 D
Boyce-Codd Normal Form
 The Boyce-Codd Normal Form, also referred to as the "third and
half (3.5) normal form", adds one more requirement:
 Meet all the requirements of the third normal form.
 Every determinant must be a candidate key.
 A relational schema R is in BCNF with respect to a set F of
functional dependencies if for all functional dependencies in F+ it is
of the form

where   R and   R, at least one of the following holds:
   is trivial (i.e.,   )
 is a superkey for R
 Example schema not in BCNF:
bor_loan = ( customer_id, loan_number, amount )
because loan_number  amount holds on bor_loan but
loan_number is not a superkey
Boyce-Codd Normal Form
 When a relation has more than one candidate key,
anomalies may result even though the relation is in 3NF.
 3NF does not deal satisfactorily with the case of a relation
with overlapping candidate keys
 i.e. composite candidate keys with at least one attribute
in common.
 BCNF is based on the concept of a determinant.
 A determinant is any attribute (simple or composite) on
which some other attribute is fully functionally
dependent.
 A relation is in BCNF is, and only if, every determinant is a
candidate key.
Boyce-Codd Normal Form
Patient No Patient Name Appointment No Time Doctor

1 John 0 09:00 Zorro


2 Kerr 0 09:00 Killer
3 Adam 1 10:00 Zorro
4 Robert 0 13:00 Killer
5 Zane 1 14:00 Zorro

 DB(Patno,PatName,appNo,time,doctor)
 Determinants:
 Patno -> PatName
 Patno,appNo -> Time,doctor
 Time -> appNo
 Two options for 1NF primary key selection:
 DB(Patno,PatName,appNo,time,doctor) (example 1a)
 DB(Patno,PatName,appNo,time,doctor) (example 1b)
Boyce-Codd Normal Form
Patient No Patient Name Appointment Id Time Doctor

1 John 0 09:00 Zorro


2 Kerr 0 09:00 Killer
3 Adam 1 10:00 Zorro
4 Robert 0 13:00 Killer
5 Zane 1 14:00 Zorro

 DB(Patno,PatName,appNo,time,doctor)
 No repeating groups, so in 1NF
 2NF – eliminate partial key dependencies:
 DB(Patno,appNo,time,doctor)
 R1(Patno,PatName)
 3NF – no transient dependences so in 3NF
 Now try BCNF.
Boyce-Codd Normal Form
Patient No Patient Name Appointment Id Time Doctor

1 John 0 09:00 Zorro


2 Kerr 0 09:00 Killer
3 Adam 1 10:00 Zorro
4 Robert 0 13:00 Killer
5 Zane 1 14:00 Zorro

 DB(Patno,appNo,time,doctor)
R1(Patno,PatName)
 Is determinant a candidate key?
 Patno -> PatName
Patno is present in DB, but not PatName, so irrelevant.
Goals of Normalization
 Normalization guidelines are cumulative. For a database to be
in 2NF, it must first fulfill all the criteria of a 1NF database.
 Should I Normalize?
While database normalization is often a good idea, it's not an absolute
requirement. In fact, there are some cases where deliberately violating
the rules of normalization is a good practice.
 Let R be a relation scheme with a set F of functional
dependencies.
 Decide whether a relation scheme R is in “good” form.
 In the case that a relation scheme R is not in “good” form,
decompose it into a set of relation scheme {R1, R2, ..., Rn}
such that
 each relation scheme is in good form
 the decomposition is a lossless-join decomposition
 Preferably, the decomposition should be dependency preserving.
Part II
Some Normalization Examples
Example 1: Determine NF

All attributes are directly


 ISBN  Title or indirectly determined
by the primary key;
 ISBN  Publisher
therefore, the relation is
 Publisher  Address at least in 1 NF

BOOK

ISBN Title Publisher Address


Example 1: Determine NF

The relation is at least in 1NF.


 ISBN  Title There is no COMPOSITE
 ISBN  Publisher primary key, therefore there
can’t be partial dependencies.
 Publisher  Address Therefore, the relation is at
least in 2NF
BOOK

ISBN Title Publisher Address


Example 1: Determine NF

Publisher is a non-key attribute,


 ISBN  Title and it determines Address,
another non-key attribute.
 ISBN  Publisher Therefore, there is a transitive
 Publisher  Address dependency, which means that
the relation is NOT in 3 NF.

BOOK

ISBN Title Publisher Address


Example 1: Determine NF

We know that the relation is at


 ISBN  Title least in 2NF, and it is not in 3
NF. Therefore, we conclude
 ISBN  Publisher
that the relation is in 2NF.
 Publisher  Address

BOOK

ISBN Title Publisher Address


Example 1: Determine NF

 ISBN  Title In your solution you will write the


following justification:
 ISBN  Publisher 1) No M/V attributes, therefore at
least 1NF
 Publisher  2) No partial dependencies,
Address therefore at least 2NF
3) There is a transitive dependency
(Publisher  Address), therefore,
not 3NF
Conclusion: The relation is in 2NF

BOOK

ISBN Title Publisher Address


Example 2: Determine NF

 Product_ID  Description

All attributes are directly or


indirectly determined by the
primary key; therefore, the relation
is at least in 1 NF

ORDER

Order_No Product_ID Description


Example 2: Determine NF

 Product_ID  Description
The relation is at least in 1NF.
There is a COMPOSITE Primary Key (PK) (Order_No,
Product_ID), therefore there can be partial
dependencies. Product_ID, which is a part of PK,
determines Description; hence, there is a partial
dependency. Therefore, the relation is not 2NF. No
sense to check for transitive dependencies!

ORDER

Order_No Product_ID Description


Example 2: Determine NF

 Product_ID  Description

We know that the relation is at least


in 1NF, and it is not in 2 NF.
Therefore, we conclude that the
relation is in 1 NF.

ORDER

Order_No Product_ID Description


Example 2: Determine NF

 Product_ID 
Description
In your solution you will write the
following justification:
1) No M/V attributes, therefore at least 1NF
2) There is a partial dependency
(Product_ID  Description), therefore
not in 2NF
Conclusion: The relation is in 1NF

ORDER

Order_No Product_ID Description


Example 3: Determine NF

 Part_ID  Description Comp_ID and No are not


determined by the primary
 Part_ID  Price key; therefore, the relation
 Part_ID, Comp_ID  No is NOT in 1 NF. No sense
in looking at partial or
transitive dependencies.

PART

Part_ID Descr Price Comp_ID No


Example 3: Determine NF

In your solution you will


write the following
 Part_ID  Description justification:
1) There are M/V attributes;
 Part_ID  Price therefore, not 1NF
Conclusion: The relation is
 Part_ID, Comp_ID  No not normalized.

PART

Part_ID Descr Price Comp_ID No


Bringing a Relation to 1NF

STUDENT

Stud_ID Name Course_ID Units


101 Lennon MSI 250 3.00
101 Lennon MSI 415 3.00
125 Johnson MSI 331 3.00
Bringing a Relation to 1NF
 Option 1: Make a determinant of the repeating group
(or the multivalued attribute) a part of the primary key.

Composite
Primary Key

STUDENT

Stud_ID Name Course_ID Units


101 Lennon MSI 250 3.00
101 Lennon MSI 415 3.00
125 Johnson MSI 331 3.00
Bringing a Relation to 1NF
 Option 2: Remove the entire repeating group from
the relation. Create another relation which would
contain all the attributes of the repeating group, plus
the primary key from the first relation. In this new
relation, the primary key from the original relation
and the determinant of the repeating group will
comprise a primary key.
STUDENT

Stud_ID Name Course_ID Units


101 Lennon MSI 250 3.00
101 Lennon MSI 415 3.00
125 Johnson MSI 331 3.00
Bringing a Relation to 1NF
STUDENT

Stud_ID Name
101 Lennon
125 Jonson

STUDENT_COURSE

Stud_ID Course Units


101 MSI 250 3
101 MSI 415 3
125 MSI 331 3
Bringing a Relation to 2NF

Composite
Primary Key

STUDENT

Stud_ID Name Course_ID Units


101 Lennon MSI 250 3.00
101 Lennon MSI 415 3.00
125 Johnson MSI 331 3.00
Bringing a Relation to 2NF
 Goal: Remove Partial Dependencies
Partial
Composite Dependencies
Primary Key

STUDENT

Stud_ID Name Course_ID Units


101 Lennon MSI 250 3.00
101 Lennon MSI 415 3.00
125 Johnson MSI 331 3.00
Bringing a Relation to 2NF
 Remove attributes that are dependent from the part
but not the whole of the primary key from the original
relation. For each partial dependency, create a new
relation, with the corresponding part of the primary
key from the original as the primary key.
STUDENT

Stud_ID Name Course_ID Units


101 Lennon MSI 250 3.00
101 Lennon MSI 415 3.00
125 Johnson MSI 331 3.00
Bringing a Relation to 2NF
CUSTOMER
STUDENT_COURSE
Stud_ID Name Course_ID Units
101 Lennon MSI 250 3.00
101 Lennon MSI 415 3.00 Stud_ID Course_ID
125 Johnson MSI 331 3.00
101 MSI 250
101 MSI 415
125 MSI 331

STUDENT COURSE

Stud_ID Name Course_ID Units


101 Lennon MSI 250 3.00
101 Lennon MSI 415 3.00
125 Johnson MSI 331 3.00
Bringing a Relation to 3NF

 Goal: Get rid of transitive dependencies.

Transitive
Dependency
EMPLOYEE

Emp_ID F_Name L_Name Dept_ID Dept_Name


111 Mary Jones 1 Acct
122 Sarah Smith 2 Mktg
Bringing a Relation to 3NF
 Remove the attributes, which are dependent on a
non-key attribute, from the original relation. For each
transitive dependency, create a new relation with the
non-key attribute which is a determinant in the
transitive dependency as a primary key, and the
dependent non-key attribute as a dependent.

EMPLOYEE

Emp_ID F_Name L_Name Dept_ID Dept_Name


111 Mary Jones 1 Acct
122 Sarah Smith 2 Mktg
Bringing a Relation to 3NF
EMPLOYEE

Emp_ID F_Name L_Name Dept_ID Dept_Name


111 Mary Jones 1 Acct
122 Sarah Smith 2 Mktg

EMPLOYEE

Emp_ID F_Name L_Name Dept_ID


111 Mary Jones 1
122 Sarah Smith 2

DEPARTMENT

Dept_ID Dept_Name
1 Acct
2 Mktg
?

A Complete Process Example


 Consider a typical invoice (Figure below).
 Every piece of information you see here is
important
 How can we capture this information in a database?
?

A Complete Process Example


 Those of us who have an ordered mind but aren't quite
aware of relational databases might try to use a
spreadsheet, such as Microsoft Excel
?

A Complete Process Example


 Those of us who have an ordered mind but aren't quite
aware of relational databases might try to use a
spreadsheet, such as Microsoft Excel
A Complete Process Example
 Take a look at rows 2, 3 and 4 on the spreadsheet in Figure
A-1. These represent all the data we have for a single
invoice (Invoice #125)
 First Normal Form (NF1) wants us to get rid of repeating
elements.
A Complete Process Example
 1NF addresses two issues:
 A row of data cannot contain repeating groups of similar data
(atomicity); and
 Each row of data must have a unique identifier (or Primary Key)
 To achieve second condition, we move to RDBMS
A Complete Process Example
 Second Normal Form - Test for partial dependencies on a
concatenated key
 If any column depends upon only one part of the concatenated key, then
we say that the entire table has failed Second Normal Form and we must
create another table to rectify the failure
A Complete Process Example
 Further refinement
A Complete Process Example
 Third Normal Form: No Dependencies on Non-Key Attributes
 If a customer places more than one order, we have to input all of that customer's contact
information again. This is because there are columns in the orders table that rely on "non-key
attributes".
 Consider the order_date column. Can it exist independent of the order_id column? No: an "order
date" is meaningless without an order. order_date is said to depend on a key attribute (order_id),
What about customer_name — can it exist on its own, outside of the orders table?
 Yes. It is meaningful to talk about a customer name without referring to an order or invoice. The
same goes for customer_address, customer_city, and customer_state. These four columns
actually rely on customer_id, which is not a key in this table (it is a non-key attribute).
A Complete Process Example
 Further refinement
A Complete Process Example
 Further refinement
Denormalization for Performance
 May want to use non-normalized schema for performance
 For example, displaying customer_name along with
account_number and balance requires join of account with
depositor
 Alternative 1: Use denormalized relation containing attributes of
account as well as depositor with all above attributes
 faster lookup
 extra space and extra execution time for updates
 extra coding work for programmer and possibility of error in
extra code
 Alternative 2: use a materialized view defined as
account depositor
 Benefits and drawbacks same as above, except no extra
coding work for programmer and avoids possible errors
Other Design Issues
 Some aspects of database design are not caught by
normalization
 Examples of bad database design, to be avoided:
Instead of earnings (company_id, year, amount ), use
 earnings_2004, earnings_2005, earnings_2006, etc., all on
the schema (company_id, earnings).
 Above are in BCNF, but make querying across years
difficult and needs new table each year
 company_year(company_id, earnings_2004,
earnings_2005,
earnings_2006)
 Also in BCNF, but also makes querying across years
difficult and requires new attribute each year.
 Is an example of a crosstab, where values for one
attribute become column names
 Used in spreadsheets, and in data analysis tools

You might also like