0% found this document useful (0 votes)
106 views

Lec10 Normalization PDF

The document discusses normalization of database tables. Normalization is the process of evaluating and correcting table structures to minimize data redundancies and reduce data anomalies. It involves transforming tables into higher normal forms such as 1NF, 2NF, 3NF and BCNF to eliminate issues like data inconsistencies, data redundancies, and various data anomalies. The document provides examples to illustrate the normalization process and how it improves database design by producing a set of tables with no unnecessary data duplication, better data integrity and reduced data modification issues.

Uploaded by

Yuvaraj Elsus
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
106 views

Lec10 Normalization PDF

The document discusses normalization of database tables. Normalization is the process of evaluating and correcting table structures to minimize data redundancies and reduce data anomalies. It involves transforming tables into higher normal forms such as 1NF, 2NF, 3NF and BCNF to eliminate issues like data inconsistencies, data redundancies, and various data anomalies. The document provides examples to illustrate the normalization process and how it improves database design by producing a set of tables with no unnecessary data duplication, better data integrity and reduced data modification issues.

Uploaded by

Yuvaraj Elsus
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 50

Normalization of Database Tables

Lecture 10

1
Learning Outcomes
 In this chapter, you will learn:
 What normalization is and what role it plays in the database design
process
 About the normal forms 1NF, 2NF, 3NF, BCNF and 4NF
 How normal forms can be transformed from lower normal forms to
higher normal forms
 That normalization and ER modeling are used concurrently to
produce a good database design
 That some situations require denormalization to generate
information efficiently

2
Database Tables and Normalization
 Normalization
 Process for evaluating and correcting table structures to
minimize data redundancies
 Reduces data anomalies

 Series of stages called normal forms:


 First normal form (1NF)
 Second normal form (2NF)
 Third normal form (3NF)

3
Database Tables and Normalization
 Normalization (continued)
 2NF is better than 1NF; 3NF is better than 2NF
 For most business database design purposes, 3NF is as high
as needed in normalization

4
How Normalization Supports DB Design
If ER Modeling is
performed well, use
normalization as
validation technique.

This lecture follows


Approach 1

5
The Need for Normalization
 Example:
 A company that manages building projects
 Charges its clients by billing hours spent on each contract
 Hourly billing rate is dependent on employee’s position
 Periodically, report is generated that contains information such as
displayed in Table 6.1

A few employees works for one Project Num : 15


project.
Project Name :
Evergreen
Employee Num : 101,
102, 103, 105

6
Tabular Representation of the Report
Format

7
A Sample Report Layout for the Company

8
The Need for Normalization (cont’d.)
 Structure of data set in Figure 6.1 does not handle data very
well
 Proj_Num intended to be primary key but it contains nulls

Null

9
The Need for Normalization (cont’d.)
 Structure of data set in Figure 6.1 does not handle data
very well
 Invite data inconsistencies. E.g., Job_Class value
 “Elect. Engineer” might be entered as “Elect. Eng” or “El. Eng.” or “EE”

 Display data redundancies


 Delete anomalies – Suppose that only one employees is associated
with a given project. If that employee leaves the company and the
employee data are deleted, the project information will also be deleted
 Insert anomalies – A new employee must be assigned to a project.
 Update anomalies – Modifying the Job_Class for employee no. 105
requires many alterations.
The Normalization Process
 Each table represents a single subject
 No data item will be unnecessarily stored in more than
one table
 All nonprime attributes in a table are dependent on the
primary key
 Normalization is three-step procedure

11
The Normalization Process (cont’d.)

12
The Normalization Process (cont’d.)
 Objective of normalization is to ensure that all tables are
in at least 3NF
 Higher forms are not likely to be encountered in business
environment
 Normalization works one relation at a time
 Progressively breaks table into new set of relations based
on identified dependencies
 Normalizing table structure will reduce data redundancies

13
The Concepts of Functional Dependencies

14
Conversion to First Normal Form
 Relational table must not contain repeating groups

Multiple entries
for the same
project no.
(proj_num = 15)

15
Conversion to First Normal Form (cont’d.)
 Step 1: Eliminate the Repeating Groups
 Eliminate nulls: each repeating group attribute contains an
appropriate data value

 Step 2: Identify the Primary Key


 Must uniquely identify attribute value
 New key must be composed

 Step 3: Identify All Dependencies


 Dependencies are depicted with a diagram

16
First Normal Form (NF)

17
Conversion to First Normal Form (cont’d.)
 Dependency diagram:
 Depicts all dependencies found within given table structure
 Helpful in getting bird’s-eye view of all relationships among
table’s attributes
 Makes it less likely that you will overlook an important
dependency

18
Dependency Diagram for 1NF

Transitive dependency:
dependency exists between two
19 nonprime attributes
Conversion to First Normal Form (cont’d.)
 First normal form describes tabular format:
 All key attributes are defined
 No repeating groups in the table
 All attributes are dependent on primary key
 All relational tables satisfy 1NF requirements
 Some tables contain partial dependencies
 Dependencies are based on part of the primary key

20
Conversion to Second Normal Form
 Step 1: Make New Tables to Eliminate Partial Dependencies
 Write each key component on separate line, then write original
(composite) key on last line
 Each component will become key in new table

 Step 2: Assign Corresponding Dependent Attributes


 Determine attributes that are dependent on other attributes
 At this point, most anomalies have been eliminated

21
Second Normal Form

22
Conversion to Second Normal Form (cont’d.)
 Table is in second normal form (2NF) when:
 It is in 1NF and
 It includes no partial dependencies:
 No attribute is dependent on only portion of primary key

 Employee table contains transitive dependencies


 Dependency of one nonprime attribute on another
nonprime attribute
 Job_Class  Chg_Hour

23
Conversion to Third Normal Form
 Step 1: Make New Tables to Eliminate Transitive Dependencies
 For every transitive dependency, write its determinant as PK for new
table
 Determinant: any attribute whose value determines other values
within a row

 Step 2: Reassign Corresponding Dependent Attributes


 Identify attributes dependent on each determinant identified in Step 1
 Name table to reflect its contents and function

24
Third Normal Form

25
Conversion to Third Normal Form
 A table is in third normal form (3NF) when both of the
following are true:
 It is in 2NF
 It contains no transitive dependencies

26
Improving the Design
 Table structures should be cleaned up to eliminate initial
partial and transitive dependencies
 Normalization helps eliminate data redundancies

27
Improving the Design
 Issues to address, in order, to produce a good normalized
set of tables:
 Evaluate PK Assignments

 Evaluate Naming Conventions

 Refine Attribute Atomicity

 Identify New Attributes

Atomic attribute
– an attribute that cannot be subdivided
(e.g., emp_name is not atomic)
- should break emp_name to LastName,
FirstName, MidName to improve querying
flexibility
28
Improving the Design
 Issues to address, in order, to produce a good normalized
set of tables (cont’d.):
 Identify New Relationships (e.g., Employee and Project)

 Refine Primary Keys as Required for Data Granularity

 Maintain Historical Accuracy Data Granularity


– level of details
 Evaluate Using Derived Attributes e.g., assign_hours
(Daily total? Weekly total?
Monthly total?)

Are all job_charge_hr the


- Attribute calculated from other attribute (recap!) same for all the projects ?
Assign_charge
29
= Assign_hrs * Assign_charge_hr
The Completed Database

30
Higher-Level Normal Forms
 Tables in 3NF perform suitably in business transactional
databases
 Higher-order normal forms are useful on occasion
 Two special cases of 3NF:
 Boyce-Codd normal form (BCNF)
 Fourth normal form (4NF)

31
The Boyce-Codd Normal Form (BCNF)
 Most designers consider the BCNF as a special case of
3NF
 When table contains only one candidate key, the 3NF and
the BCNF are equivalent
 Table can be in 3NF and fail to meet BCNF
 In 3NF, there is no partial dependencies, nor does it contain
transitive dependencies
 In BCNF, every determinant in the table be a candidate key
 What happens when a nonkey attribute is the determinant
of a key attribute?

32
A Table in 3NF but not BCNF

A + B  C, D
CB

B is a prime (key)
attribute

33
Sample Data for BCNF Conversion

34
Decomposition of BCNF

A + B  C, D
A + C  B, D
Use A + C as the
CB primary keys, so that
B becomes nonprime
attribute

Class_code Staff_ID
Stu_ID
35
Class_code Enroll_grade
Fourth Normal Form (4NF)
 Table is in fourth normal form (4NF) when both of the
following are true:
 It is in 3NF
 No multiple sets of multivalued dependencies (one key
determines multiple values of two other attributes and
these attributes are independent of each other)
 E.g., one employee can have many service entries and many
assignment entries

36
Tables with Multivalued Dependencies

37
Tables in 4NF

38
Normalization and Database Design
 Normalization should be part of the design process
 Make sure that proposed entities meet required normal
form before table structures are created
 Many real-world databases have been improperly
designed or burdened with anomalies
 You may be asked to redesign and modify existing
databases

39
Normalization and Database Design
 Difficult to separate normalization process from ER modeling
process
 ER diagram
 Identify relevant entities, their attributes and relationships
 Identify additional entities and attributes

 Normalization procedures
 Focus on characteristics of specific entities
 Micro view of entities within ER diagram

40
Initial ERD

 EMPLOYEE(EMP_NUM, EMP_LNAME, EMP_FNAME,


EMP_INITIAL, JOB_DESCRIPTION, JOB_CHG_HOUR)
 PROJECT( PROJ_NUM, PROJ_NAME)
 Transitive dependency
 JOB_DESCRIPTION  JOB_CHG_HOUR

41
Modified ERD After Removing Transitive
Dependency

 EMPLOYEE(EMP_NUM, EMP_LNAME, EMP_FNAME, EMP_INITIAL,


JOB_CODE)
 PROJECT( PROJ_NUM, PROJ_NAME)
 JOB(JOB_CODE, JOB_DESCRIPTION, JOB_CHG_HOUR)
42
Incorrect M:N Relationship

43
Final Contracting COMPANY ERD

 EMPLOYEE(EMP_NUM, EMP_LNAME, EMP_FNAME, EMP_INITIAL,


JOB_CODE)
 PROJECT( PROJ_NUM, PROJ_NAME, EMP_NUM)
 JOB(JOB_CODE, JOB_DESCRIPTION, JOB_CHG_HOUR)
 ASSIGNMENT(ASSIGN_NUM, ASSIGN_DATE, PROJ_NUM, EMP_NUM,
ASSIGN_HOURS, ASSIGN_CHG_HOUR, ASSIGN_CHARGE)
The Implemented Database

45
Denormalization
 Creation of normalized relations is important database
design goal
 Processing requirements should also be a goal
 If tables are decomposed to conform to normalization
requirements:
 Number of database tables expands

46
Denormalization (cont’d.)
 Joining the larger number of tables reduces system speed
 Conflicts are often resolved through compromises that
may include denormalization
 Defects of unnormalized tables:
 Data updates are less efficient because tables are larger
 Indexing is more cumbersome
 No simple strategies for creating virtual tables known as
views

47
Data-Modeling Checklist
 Data modeling translates specific real-world environment
into data model
 Represents real-world data, users, processes, interactions
 Data-modeling checklist helps ensure that data-modeling
tasks are successfully performed
 Based on concepts and tools learned in Part II (Lec 2 - 4)

48
49
50

You might also like