0% found this document useful (0 votes)
512 views61 pages

Normalization in Databases

Uploaded by

Ramandah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
512 views61 pages

Normalization in Databases

Uploaded by

Ramandah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 61

Normalization in Databases

Normalization
We discuss four normal forms: first, second, third, and
Boyce-Codd normal forms
1NF, 2NF, 3NF, and BCNF

Normalization is a process that “improves” a database


design by generating relations that are of higher normal
forms.

The objective of normalization:


“to create relations where every dependency is on the key,
the whole key, and nothing but the key”.

2
Normalization
There is a sequence to normal forms:
1NF is considered the weakest,
2NF is stronger than 1NF,
3NF is stronger than 2NF, and
BCNF is considered the strongest

Also,
any relation that is in BCNF, is in 3NF;
any relation in 3NF is in 2NF; and
any relation in 2NF is in 1NF.

3
91.2914
Normalization

1NF a relation in BCNF, is also


in 3NF
2NF a relation in 3NF is also in
2NF
3NF
a relation in 2NF is also in
1NF
BCNF

4
91.2914
Normalization
We consider a relation in BCNF to be fully normalized.

The benefit of higher normal forms is that update semantics for


the affected data are simplified.

This means that applications required to maintain the database


are simpler.

A design that has a lower normal form than another design has
more redundancy. Uncontrolled redundancy can lead to data
integrity problems.

May 2005 5
91.2914
Consequences of redundancy
• Wasted space
• Potential performance cost
• Potential inconsistency
• Inability to represent data
Why Normalize Tables?
• Eliminates redundant (useless) data/ saves typing of repetitive data-
The data redundancies yield the following anomalies:
• Update anomalies
• Addition anomalies
• Deletion anomalies
• Increases flexibility to query, sort, summarize, and group data
(Simpler to manipulate data!)
• Avoids frequent restructuring of tables and other objects to
accommodate new data
• Reduces disk space
• Ensures data dependencies make sense i.e data is logically stored.
The Normalization Process
• Each table represents a single subject
• No data item will be unnecessarily stored in more than
one table
• All attributes in a table are dependent on the primary key
The Normalization Process…
A Typical Spreadsheet File (unnormalized data)
Emp No Employee Name Time Card No Time Card Date Dept No Dept Name

10 Thomas Arquette 106 11/02/2002 20 Marketing

10 Thomas Arquette 106 11/02/2002 20 Marketing


10 Thomas Arquette 106 11/02/2002 20 Marketing

10 Thomas Arquette 115 11/09/2002 20 Marketing

99 Janice Smitty 10 Accounting

500 Alan Cook 107 11/02/2002 50 Shipping

500 Alan Cook 107 11/02/2002 50 Shipping

700 Ernest Gold 108 11/02/2002 50 Shipping

700 Ernest Gold 116 11/09/2002 50 Shipping

700 Ernest Gold 116 11/09/2002 50 Shipping


Employee, Department, and Time Card Data
in Three (normalized) Tables
Table: Employees Table: Departments
EmpNo EmpFirstName EmpLastName DeptNo DeptNo DeptName
10 Thomas Arquette 20 10 Accounting
500 Alan Cook 50 20 Marketing
700 Ernest Gold 50 50 Shipping
99 Janice Smitty 10

Table: Time Card Data


TimeCardNo EmpNo TimeCardDate
106 10 11/02/2002
107 500 11/02/2002
108 700 11/02/2002
115 10 11/09/2002

Primary Key 116 700 11/09/2002


Types of Normalization
• First Normal Form (1NF)
– each field contains the smallest
meaningful value
– the table does not contain
repeating groups of fields or
repeating data within the same field
• Create a separate field/table for each set of related data.
• Identify each set of related data with a primary key
First Normal Form (1NF)
Student Table: Un-normalized table

Student Age Subject

Adam 15 Biology, Maths

Alex 14 Maths
Student Table: normalized table ( 1NF)

Student Age Subject

Adam 15 Biology

Adam 15 Maths

Alex 14 Maths
Tables Violating First Normal Form

PART (Primary Key) WAREHOUSE


P0010 Warehouse A, Warehouse B, Warehouse C
P0020 Warehouse B, Warehouse D

Really Bad Set-up!


Better, but still flawed!

PART
WAREHOUSE A WAREHOUSE B WAREHOUSE C
(Primary Key)

P0010 Yes No Yes

P0020 No Yes Yes


Table Conforming to First Normal Form

PART WAREHOUSE
(Primary Key) (Primary Key) QUANTITY
P0010 Warehouse A 400

P0010 Warehouse B 543

P0010 Warehouse C 329

P0020 Warehouse B 200

P0020 Warehouse D 278


• Second Normal Form (2NF)
– usually used in tables with a multiple-
field primary key (composite key)
– each non-key field relates to the entire
primary key
– any field that does not relate to the
primary key is placed in a separate table
– MAIN POINT –
• eliminate redundant data in a table
• Create separate tables for sets of values
that apply to multiple records
Student Table: Normalized table ( in 1NF but not in 2NF)

Student Age Subject

Adam 15 Biology

Adam 15 Maths

Alex 14 Maths
Student Table 1: Normalized table (2NF)

Student Age

Adam 15

Adam 15

Alex 14
Student Table 2: Normalized table (2NF)

Student Subject

Adam Biology

Adam Maths

Alex Maths
Table Violating Second Normal Form

PART WAREHOUSE WAREHOUSE


(Primary Key) (Primary Key) QUANTITY ADDRESS

P0010 Warehouse A 400 1608 New Field Road

P0010 Warehouse B 543 4141 Greenway Drive

P0010 Warehouse C 329 171 Pine Lane

P0020 Warehouse B 200 4141 Greenway Drive

P0020 Warehouse D 278 800 Massey Street


Tables Conforming to Second
Normal Form
PART_STOCK TABLE
PART (Primary Key) WAREHOUSE (Primary Key) QUANTITY
P0010 Warehouse A 400
P0010 Warehouse B 543
P0010 Warehouse C 329
P0020 Warehouse B 200
P0020 Warehouse D 278
WAREHOUSE TABLE
1

WAREHOUSE (Primary Key) WAREHOUSE_ADDRESS
Warehouse A 1608 New Field Road
Warehouse B 4141 Greenway Drive
Warehouse C 171 Pine Lane
Warehouse D 800 Massey Street
• Third Normal Form (3NF)
– usually used in tables with a single-
field primary key
– records do not depend on anything
other than a table's primary key
– each non-key field is a fact about the
key
– Values in a record that are not part of that record's
key do not belong in the table. In general, any time
the contents of a group of fields may apply to more
than a single record in the table, consider placing
those fields in a separate table.
3NF - Example
Student_Detail Table ( not in 3NF):
Student Student_
DOB Street city State Zip
_id name

The above table should be in two tables as


highlighted below:
• Student Details
• Address table
Table Violating Third Normal Form
EMPLOYEE_DEPARTMENT TABLE

EMPNO
FIRSTNAME LASTNAME WORKDEPT DEPTNAME
(Primary Key)

000290 John Parker E11 Operations

000320 Ramlal Mehta E21 Software Support

000310 Maude Setright E11 Operations


Tables Conforming to Third
Normal Form
EMPLOYEE TABLE

EMPNO (Primary Key) FIRSTNAME LASTNAME WORKDEPT

000290 John Parker E11

000320 Ramlal Mehta E21

000310 Maude Setright E11

DEPARTMENT TABLE

1

DEPTNO (Primary Key) DEPTNAME

E11 Operations

E21 Software Support


Third Normal Form (3NF)…

The advantages of removing transitive


dependency:
•Amount of data duplication is reduced.
•Data integrity achieved.
Typical example of how data
is normalized

http://www.studytonight.com/dbms/
database-normalization.php
Example 1
• Un-normalized Table:

Student# Advisor# Advisor Adv-Room Class1 Class2 Class3

1022 10 Susan Jones 412 101-07 143-01 159-02

4123 12 Anne Smith 216 101-07 159-02 214-01


• Table in First Normal Form
– No Repeating Fields
– Data in Smallest Parts
Adv-
Student# Advisor# AdvisorFName AdvisorLName Class#
Room
1022 10 Susan Jones 412 101-07

1022 10 Susan Jones 412 143-01

1022 10 Susan Jones 412 159-02

4123 12 Anne Smith 216 101-07

4123 12 Anne Smith 216 159-02

4123 12 Anne Smith 216 214-01


• Tables in Second Normal Form
– Redundant Data Eliminated
Table: Registration
Table: Students

Adv- Student# Class#


Student# Advisor# AdvFirstName AdvLastName
Room
1022 101-07
1022 10 Susan Jones 412

1022 143-01
4123 12 Anne Smith 216
1022 159-02

4123 201-01

4123 211-02

4123 214-01
• Tables in Third Normal Form
– Data Not Dependent On Key is Eliminated

Table: Advisors Table: Registration


Adv-
Advisor# AdvFirstName AdvLastName Student# Class#
Room

10 Susan Jones 412 1022 101-07

12 Anne Smith 216 1022 143-01

1022 159-02
Table: Students
4123 201-01
Student# Advisor# StudentFName StudentLName
4123 211-02
1022 10 Jane Mayo
4123 214-01
4123 12 Mark Baker
Relationships for Example 1

Registration Students Advisors

Student# Student# Advisor#

Class# Advisor# AdvFirstName


AdvLastName
Adv-Room
Example 2
• Un-normalized Table:

EmpID Name Dept Dept Name Proj 1 Time Proj 2 Time Proj 3 Time
Code Proj 1 Proj 2 Proj 3

EN1-26 Sean Breen TW Technical Writing 30-T3 25% 30-TC 40% 31-T3 30%

EN1-33 Amy Guya TW Technical Writing 30-T3 50% 30-TC 35% 31-T3 60%

EN1-36 Liz Roslyn AC Accounting 35-TC 90%


Table in First Normal Form
EmpID Project Time on Last First Dept Dept Name
Number Project Name Name Code
EN1-26 30-T3 25% Breen Sean TW Technical Writing

EN1-26 30-TC 40% Breen Sean TW Technical Writing

EN1-26 31-T3 30% Breen Sean TW Technical Writing

EN1-33 30-T3 50% Guya Amy TW Technical Writing

EN1-33 30-TC 35% Guya Amy TW Technical Writing

EN1-33 31-T3 60% Guya Amy TW Technical Writing

EN1-36 35-TC 90% Roslyn Liz AC Accounting


Tables in Second Normal Form
Table: Employees and Projects Table: Employees
EmpID Project Time on EmpID Last First Dept Dept Name
Number Project Name Name Code
EN1-26 30-T3 25% EN1-26 Breen Sean TW Technical Writing

EN1-26 30-T3 40% EN1-33 Guya Amy TW Technical Writing

EN1-26 31-T3 30% EN1-36 Roslyn Liz AC Accounting

EN1-33 30-T3 50%

EN1-33 30-TC 35%

EN1-33 31-T3 60%

EN1-36 35-TC 90%


Tables in Third Normal Form
Table: Employees_and_Projects Table: Employees
EmpID Project Time on EmpID Last First Dept
Number Project Name Name Code
EN1-26 30-T3 25%
EN1-26 Breen Sean TW
EN1-26 30-T3 40%
EN1-33 Guya Amy TW
EN1-26 31-T3 30%
EN1-36 Roslyn Liz AC
EN1-33 30-T3 50%

EN1-33 30-TC 35%


Table: Departments
EN1-33 31-T3 60%
Dept Code Dept Name
EN1-36 35-TC 90%
TW Technical Writing
AC Accounting
Relationships for Example 2

Employees Departments
Employees_and_Projects
EmpID DeptCode
EmpID
FirstName DeptName
ProjectNumber
LastName
TimeonProject
DeptCode
Example 3
• Un-normalized Table:

EmpID Name Manager Dept Sector Spouse/Children

285 Carl Smithers Engineering 6G


Carlson
365 Lenny Smithers Marketing 8G

458 Homer Mr. Burns Safety 7G Marge, Bart, Lisa, Maggie


Simpson
Table in First Normal Form
Fields contain smallest meaningful values

EmpID FName LName Manager Dept Sector Spouse Child1 Child2 Child3

285 Carl Carlson Smithers Eng. 6G

365 Lenny Smithers Marketing 8G

458 Homer Simpson Mr. Burns Safety 7G Marge Bart Lisa Maggie
Table in First Normal Form
No more repeated fields

EmpID FName LName Manager Department Sector Dependent


285 Carl Carlson Smithers Engineering 6G

365 Lenny Smithers Marketing 8G

458 Homer Simpson Mr. Burns Safety 7G Marge

458 Homer Simpson Mr. Burns Safety 7G Bart

458 Homer Simpson Mr. Burns Safety 7G Lisa

458 Homer Simpson Mr. Burns Safety 7G Maggie


Second/Third Normal Form
Remove Repeated Data From Table
Step 1
EmpID FName LName Manager Department Sector
285 Carl Carlson Smithers Engineering 6G
365 Lenny Smithers Marketing 8G
458 Homer Simpson Mr. Safety 7G
Burns

EmpID Dependent

458 Marge
458 Bart
458 Lisa
458 Maggie
Tables in Second Normal Form
Removed Repeated Data From Table
Step 2
EmpID FName LName ManagerID Dept Sector
285 Carl Carlson 2 Engineering 6G
365 Lenny 2 Marketing 8G
458 Homer Simpson 1 Safety 7G

EmpID Dependent
458 Marge ManagerI Manager
458 Bart D
458 Lisa 1 Mr.
Burns
458 Maggie
2 Smithers
Tables in Third Normal Form
Employees Table Manager Table
EmpID FName LName DeptCode ManagerI Manager
285 Carl Carlson EN D
365 Lenny MK 1 Mr.
Burns
458 Homer Simpson SF
2 Smithers

Dependents Table
Department Table
EmpID Dependent
DeptCode Department Sector ManagerID
458 Marge EN Engineering 6G 2
458 Bart MK Marketing 8G 2
458 Lisa SF Safety 7G 1
458 Maggie
Relationships for Example 3
Example 4
Table Violating 1st Normal Form
Rep ID Representative Client 1 Time 1 Client 2 Time 2 Client 3 Time 3
TS-89 Gilroy Gladstone US Corp. 14 hrs Taggarts 26 hrs Kilroy Inc. 9 hrs
RK-56 Mary Mayhem Italiana 67 hrs Linkers 2 hrs

Table in 1st Normal Form


Client
Rep ID Rep First Name Rep Last Name Client Time With Client
ID*
TS-89 Gilroy Gladstone 978 US Corp 14 hrs
TS-89 Gilroy Gladstone 665 Taggarts 26 hrs
TS-89 Gilroy Gladstone 782 Kilroy Inc. 9 hrs
RK-56 Mary Mayhem 221 Italiana 67 hrs
RK-56 Mary Mayhem 982 Linkers 2 hrs
Tables in 2nd and 3rd Normal Form
Rep ID* First Name Last Name
TS-89 Gilroy Gladstone
Rep ID* Client ID* Time With Client
RK-56 Mary Mayhem
TS-89 978 14 hrs
TS-89 665 26 hrs Client
Client Name
TS-89 782 9 hrs ID*
RK-56 221 67 hrs 978 US Corp
RK-56 982 2 hrs 665 Taggarts
RK-56 665 4 hrs 782 Kilroy Inc.
221 Italiana
982 Linkers
This example comes from a tutorial from
http://www.devhood.com/tutorials/tutorial_details.aspx?tutorial_id=95
and
http://www.devhood.com/tutorials/tutorial_details.aspx?tutorial_id=104
Please check them out, as they are very well done.
Example 5
SupplierID Status City PartID Quantity
S1 20 London P1 300
Table in 1st S1 20 London P2 200
Normal Form S2 10 Paris P1 300
S2 10 Paris P2 400
S3 10 Paris P2 200
S4 20 London P2 200
S4 20 London P4 300

Although this table is in 1NF it contains redundant data. For example, information about the supplier's location and the
location's status have to be repeated for every part supplied. Redundancy causes what are called update anomalies. Update
anomalies are problems that arise when information is inserted, deleted, or updated. For example, the following anomalies
could occur in this table:

INSERT. The fact that a certain supplier (s5) is located in a particular city (Athens) cannot be added until they supplied a part.
DELETE. If a row is deleted, then not only is the information about quantity and part lost but also information about the
supplier.
UPDATE. If supplier s1 moved from London to New York, then two rows would have to be updated with this new information.
Tables in 2NF
Suppliers Parts

SupplierID Status City SupplierID PartID Quantity


S1 20 London S1 P1 300
S2 10 Paris S1 P2 200
S3 10 Paris
S2 P1 300
S4 20 London
S2 P2 400
S5 30 Athens
S3 P2 200
S4 P4 300
S4 P5 400

Tables in 2NF but not in 3NF still contain modification anomalies. In the example of Suppliers, they are:

INSERT. The fact that a particular city has a certain status (Rome has a status of 50) cannot be inserted until
there is a supplier in the city.
DELETE. Deleting any row in SUPPLIER destroys the status information about the city as well as the
association between supplier and city.
Tables in 3NF

Advantages of Third Normal Form


The advantage of having relational tables in 3NF is that it eliminates redundant data which in turn saves space and reduces
manipulation anomalies. For example, the improvements to our sample database are:

INSERT. Facts about the status of a city, Rome has a status of 50, can be added even though there is not supplier in that city.
Likewise, facts about new suppliers can be added even though they have not yet supplied parts.
DELETE. Information about parts supplied can be deleted without destroying information about a supplier or a city.
UPDATE. Changing the location of a supplier or the status of a city requires modifying only one row.
Additional Notes About Example 3
• Going to extremes can create too many tables which in
turn can make it difficult to manage your data. The key
to developing an efficient database is to determine
your needs.
• A postal carrier may need an Address field broken down
into smaller fields for sorting and grouping purposes, but
do you?
• Another good example is Example 3 - leaving the Dept
Code field in our completed table design. If you also
wanted to track information such as pay rate, health
insurance, etc., then a new table that contains company
related data for the employee would be necessary. If all
you need is to track the department an employee
belongs to then leaving it in the Employees table is fine.
In Summary
• If you type a data value more than once then
consider placing the field in another table.
• Consider your sorting and grouping needs. If
you need to sort or group on a portion of a field,
then the field is not broken down into its smallest
meaningful value.
• If you have multiple groups of fields, such as
several telephone numbers, then consider
eliminating those fields and turning them into
records in another table. Think vertically—not
horizontally!
Additional Notes
Denormalization
• Denormalization is a step in the process of
normalization.
• Come as a result of having hundreds or thousands of
small tables -the performance cost of all those joins will
be prohibitive

• Deciding How Far to Normalize

• Factors: Performance speed, cost, nature and size of


business, criticality of data
Deciding How Far to Normalize
1NF
• Spreadsheet should comply with 1NF – can avoid duplicate columns and non-
unique entries. If the hardware can’t handle that and your data is
understandable without it, do you need a real database at all? Searching a
directory of text files might serve you better.
2NF
•Good for smallest and most casual of databases (SMEs). If you coordinate a
group of half a dozen fighters who only battle zombies and operate entirely in a
town, then duplicate data may not be a problem and you may be able to ignore
2NF.
3NF
•Necessary for long term data management purposes- for business operations
where the data is critical- vital for organizations with more data than it could
reasonably process by hand - database as a vital part of the organization’s
mission
Denormalization…
• Creation of normalized relations is
important database design goal
• Processing requirements should also be a
goal
• If tables decomposed to conform to
normalization requirements:
– Number of database tables expands

56
Denormalization…
• Joining the larger number of tables takes
additional input/output (I/O) operations and
processing logic, thereby reducing system
speed
• Conflicts between design efficiency,
information requirements, and processing
speed are often resolved through
compromises that may include
denormalization
57
Denormalization…
• Unnormalized tables in production
database tend to suffer from these
defects:
– Data updates are less efficient because
programs that read and update tables must
deal with larger tables
– Indexing is more cumbersome
– Unnormalized tables yield no simple
strategies for creating virtual tables known
as views
58
Denormalization: Critical Questions

• Is the system’s performance unacceptable with


fully normalized data? Mock up a client and do
some testing.
• If the performance is unacceptable, will
denormalizing make it acceptable? Figure out
where your bottlenecks are.
• If you denormalize to clear those bottlenecks, will
the system and its data still be reliable?

Unless the answers to all three are “yes,” denormalization should


be avoided
Hints for Denormalizing
• Always create a conceptual data model that is
completely normalized.
• Consider denormalization as the last option to boost
performance.
• Never presume denormalization will be required.
• To meet performance objectives, denormalization should
be done during the database design.
• Once performance objectives have been met, do not
implement any further denormalization.
• Fully document all denormalization, stating what was
done to the tables, what application code was added to
compensate for the denormalization, and the reasons for
and against doing it
Denormalization…
• Use denormalization cautiously
• Understand why—under some
circumstances—unnormalized tables are
better choice

61

You might also like