Normalization in Databases
Normalization in Databases
Normalization
We discuss four normal forms: first, second, third, and
Boyce-Codd normal forms
1NF, 2NF, 3NF, and BCNF
2
Normalization
There is a sequence to normal forms:
1NF is considered the weakest,
2NF is stronger than 1NF,
3NF is stronger than 2NF, and
BCNF is considered the strongest
Also,
any relation that is in BCNF, is in 3NF;
any relation in 3NF is in 2NF; and
any relation in 2NF is in 1NF.
3
91.2914
Normalization
4
91.2914
Normalization
We consider a relation in BCNF to be fully normalized.
A design that has a lower normal form than another design has
more redundancy. Uncontrolled redundancy can lead to data
integrity problems.
May 2005 5
91.2914
Consequences of redundancy
• Wasted space
• Potential performance cost
• Potential inconsistency
• Inability to represent data
Why Normalize Tables?
• Eliminates redundant (useless) data/ saves typing of repetitive data-
The data redundancies yield the following anomalies:
• Update anomalies
• Addition anomalies
• Deletion anomalies
• Increases flexibility to query, sort, summarize, and group data
(Simpler to manipulate data!)
• Avoids frequent restructuring of tables and other objects to
accommodate new data
• Reduces disk space
• Ensures data dependencies make sense i.e data is logically stored.
The Normalization Process
• Each table represents a single subject
• No data item will be unnecessarily stored in more than
one table
• All attributes in a table are dependent on the primary key
The Normalization Process…
A Typical Spreadsheet File (unnormalized data)
Emp No Employee Name Time Card No Time Card Date Dept No Dept Name
Alex 14 Maths
Student Table: normalized table ( 1NF)
Adam 15 Biology
Adam 15 Maths
Alex 14 Maths
Tables Violating First Normal Form
PART
WAREHOUSE A WAREHOUSE B WAREHOUSE C
(Primary Key)
PART WAREHOUSE
(Primary Key) (Primary Key) QUANTITY
P0010 Warehouse A 400
Adam 15 Biology
Adam 15 Maths
Alex 14 Maths
Student Table 1: Normalized table (2NF)
Student Age
Adam 15
Adam 15
Alex 14
Student Table 2: Normalized table (2NF)
Student Subject
Adam Biology
Adam Maths
Alex Maths
Table Violating Second Normal Form
EMPNO
FIRSTNAME LASTNAME WORKDEPT DEPTNAME
(Primary Key)
DEPARTMENT TABLE
∞
1
E11 Operations
http://www.studytonight.com/dbms/
database-normalization.php
Example 1
• Un-normalized Table:
1022 143-01
4123 12 Anne Smith 216
1022 159-02
4123 201-01
4123 211-02
4123 214-01
• Tables in Third Normal Form
– Data Not Dependent On Key is Eliminated
1022 159-02
Table: Students
4123 201-01
Student# Advisor# StudentFName StudentLName
4123 211-02
1022 10 Jane Mayo
4123 214-01
4123 12 Mark Baker
Relationships for Example 1
EmpID Name Dept Dept Name Proj 1 Time Proj 2 Time Proj 3 Time
Code Proj 1 Proj 2 Proj 3
EN1-26 Sean Breen TW Technical Writing 30-T3 25% 30-TC 40% 31-T3 30%
EN1-33 Amy Guya TW Technical Writing 30-T3 50% 30-TC 35% 31-T3 60%
Employees Departments
Employees_and_Projects
EmpID DeptCode
EmpID
FirstName DeptName
ProjectNumber
LastName
TimeonProject
DeptCode
Example 3
• Un-normalized Table:
EmpID FName LName Manager Dept Sector Spouse Child1 Child2 Child3
458 Homer Simpson Mr. Burns Safety 7G Marge Bart Lisa Maggie
Table in First Normal Form
No more repeated fields
EmpID Dependent
458 Marge
458 Bart
458 Lisa
458 Maggie
Tables in Second Normal Form
Removed Repeated Data From Table
Step 2
EmpID FName LName ManagerID Dept Sector
285 Carl Carlson 2 Engineering 6G
365 Lenny 2 Marketing 8G
458 Homer Simpson 1 Safety 7G
EmpID Dependent
458 Marge ManagerI Manager
458 Bart D
458 Lisa 1 Mr.
Burns
458 Maggie
2 Smithers
Tables in Third Normal Form
Employees Table Manager Table
EmpID FName LName DeptCode ManagerI Manager
285 Carl Carlson EN D
365 Lenny MK 1 Mr.
Burns
458 Homer Simpson SF
2 Smithers
Dependents Table
Department Table
EmpID Dependent
DeptCode Department Sector ManagerID
458 Marge EN Engineering 6G 2
458 Bart MK Marketing 8G 2
458 Lisa SF Safety 7G 1
458 Maggie
Relationships for Example 3
Example 4
Table Violating 1st Normal Form
Rep ID Representative Client 1 Time 1 Client 2 Time 2 Client 3 Time 3
TS-89 Gilroy Gladstone US Corp. 14 hrs Taggarts 26 hrs Kilroy Inc. 9 hrs
RK-56 Mary Mayhem Italiana 67 hrs Linkers 2 hrs
Although this table is in 1NF it contains redundant data. For example, information about the supplier's location and the
location's status have to be repeated for every part supplied. Redundancy causes what are called update anomalies. Update
anomalies are problems that arise when information is inserted, deleted, or updated. For example, the following anomalies
could occur in this table:
INSERT. The fact that a certain supplier (s5) is located in a particular city (Athens) cannot be added until they supplied a part.
DELETE. If a row is deleted, then not only is the information about quantity and part lost but also information about the
supplier.
UPDATE. If supplier s1 moved from London to New York, then two rows would have to be updated with this new information.
Tables in 2NF
Suppliers Parts
Tables in 2NF but not in 3NF still contain modification anomalies. In the example of Suppliers, they are:
INSERT. The fact that a particular city has a certain status (Rome has a status of 50) cannot be inserted until
there is a supplier in the city.
DELETE. Deleting any row in SUPPLIER destroys the status information about the city as well as the
association between supplier and city.
Tables in 3NF
INSERT. Facts about the status of a city, Rome has a status of 50, can be added even though there is not supplier in that city.
Likewise, facts about new suppliers can be added even though they have not yet supplied parts.
DELETE. Information about parts supplied can be deleted without destroying information about a supplier or a city.
UPDATE. Changing the location of a supplier or the status of a city requires modifying only one row.
Additional Notes About Example 3
• Going to extremes can create too many tables which in
turn can make it difficult to manage your data. The key
to developing an efficient database is to determine
your needs.
• A postal carrier may need an Address field broken down
into smaller fields for sorting and grouping purposes, but
do you?
• Another good example is Example 3 - leaving the Dept
Code field in our completed table design. If you also
wanted to track information such as pay rate, health
insurance, etc., then a new table that contains company
related data for the employee would be necessary. If all
you need is to track the department an employee
belongs to then leaving it in the Employees table is fine.
In Summary
• If you type a data value more than once then
consider placing the field in another table.
• Consider your sorting and grouping needs. If
you need to sort or group on a portion of a field,
then the field is not broken down into its smallest
meaningful value.
• If you have multiple groups of fields, such as
several telephone numbers, then consider
eliminating those fields and turning them into
records in another table. Think vertically—not
horizontally!
Additional Notes
Denormalization
• Denormalization is a step in the process of
normalization.
• Come as a result of having hundreds or thousands of
small tables -the performance cost of all those joins will
be prohibitive
56
Denormalization…
• Joining the larger number of tables takes
additional input/output (I/O) operations and
processing logic, thereby reducing system
speed
• Conflicts between design efficiency,
information requirements, and processing
speed are often resolved through
compromises that may include
denormalization
57
Denormalization…
• Unnormalized tables in production
database tend to suffer from these
defects:
– Data updates are less efficient because
programs that read and update tables must
deal with larger tables
– Indexing is more cumbersome
– Unnormalized tables yield no simple
strategies for creating virtual tables known
as views
58
Denormalization: Critical Questions
61