0% found this document useful (0 votes)

89 views

Data Modelling 2 Normalisation: by Haik Richards

This document discusses data normalization and compares it to entity-relationship (E-R) modeling. It explains that normalization is a "bottom up" methodology that analyzes existing database tables to improve their structure by reducing anomalies, while E-R modeling is a "top down" approach. The document then covers the various normal forms (1NF, 2NF, 3NF, etc.) and provides examples of denormalizing tables to remove duplicates and anomalies. It also discusses when derived or computed data may be stored rather than calculated on queries to improve performance.

Uploaded by

venix

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

89 views

Data Modelling 2 Normalisation: by Haik Richards

Uploaded by

venix

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 28

Data Modelling 2 Normalisation

By Haik Richards

Normalisation versus E-R modelling

E-R modelling is a top down methodology. ie look at entities, then attributes, analyse associations and then construct tables. Normalisation is a bottom up methodology. ie the analyst looks at the structure of tables already being used in an enterprise, and applies normalisation (a methodology) to the tables in order to improve the structure.

Normalisation simple example

Question: What should the analyst do with the above table?

Normalised Tables

ie data stored in a table should be about a single entity

Normalisation

Often results in splitting a table into smaller tables. therefore referred to as recognize and split method. Question: What sort of unwanted/undesirable consequences might there be if the employee table was not split?

anomalies

anomalies refer to unexpected or unwanted effects that poorly constructed tables can produce. 3 common types of anomalies:Delete anomaly Update anomaly Insert anomaly

Delete anomaly

If tom leaves the company, and we delete that row of the table. What undesirable effect will happen? We will lose the fact that we have a Sales Department.

Update anomaly

If the name of the department Human Resources changes to Human Division, how many rows will we have to change? ..And what if we dont change all the rows? Our database will be in an inconsistent state.

Insert anomaly

What if we now have a new department called Marketing, but we have not yet assigned any employees to it? We will be forced do introduce nulls for many fields. Nulls are undesirable because they could mean anything

Normalisation

So do organisations have such poorly constructed tables? Yes. They are rife, everywhere. We used simple common sense to normalise the above table. Is there a more precise method of doing normalisation? Yes.

Normal Forms
The process of normalisation requires the analyst to work progressively through a series of normal forms: First Normal Form (1NF) Second Normal Form (2NF) Third Normal Form (3NF) Boyce-Codd Normal Form (BCNF) Fourth Normal Form (4NF) Fifth Normal Form (5NF) Most analysts will work through to 3NF. Up to 3NF most anomalies will be eliminated.

A worked example up to 3NF

A typical order form

Might be stored in the following unnormalised table:

Normalising the table

To begin the process of normalisation we list the column headings in a vertical format as follows:

First Normal Form (1NF) remove repeating groups

Notice that for each order in the system there may be a number of products referred to by the order. We therefore split this information off into its own group, making sure that we maintain the relationship between the information in the group through the common attribute 'ono'. This is illustrated in the table below:

Note: The lower relation has a composite key (two attributes acting as primary key).

We now have 2 tables

.......but duplication and anomalies still exist!

Second Normal Form (2NF)

remove partial dependencies every non-key attribute must depend on the key, the whole key, and nothing but the key (so help me Codd) In our INF column, we have only one group with a composite key - the second group. It is only this group that we therefore need to check to see if there are any partial dependencies. ie for each attribute in this group we check to see whether it depends on the whole composite key or just part of it. If it depends on just part of the composite key, we must split it off into another table.

Second Normal Form (2NF)

Only qty depends on the whole key. The other attributes depend on pno and are split off.

We now have 3 tables:-

.But duplication and anomalies still exist!

Third Normal Form (3NF)

- no transitive dependencies
Note that in the 2NF column, there is a transitive dependency between cno and cname. ie cno can be used as a key for the attribute cname. We can therefore split off the customer information into its own group. This can be seen below.

Therefore, we now have 4 tables

We have now removed most (maybe 99%) anomalies

Example 2 Normalise the following drug card to 3NF:

The data is stored in the following un-normalised table:-

And here is the above normalised to 3NF

Entity Relationship Modelling & Normalisation

Most times, analysts will construct a data model using E-R modelling. This provides a good first stab at design. It is also a good starting point for data modelling as E-R models (being diagrammatic) are relatively easy for end-users to understand. Normalisation will then be used to validate the correctness of the E-R model. To do this, each entity in the E-R model will be checked by going through the normalisation process. ie E-R modelling and normalisation are seen as complimentary methods.

De-normalisation:

normalisation splits database information into multiple tables. To retrieve complete information from multiple tables requires the use of the JOIN operation in SQL Joins produce an overhead on processing power, and very large joins can make retrieval times deteriorate. Therefore....it is sometimes decided to denormalise relations in order to improve access time for queries. De-normalisation is the process of combining data from 2 or more normalised tables into a single table

Derived Data

Data that can be computed (calculated) should not be included in a normalised table. eg the total number of employees in a department should not be an attribute of a department table Why? Can create inconsistencies Requires extra storage Can be computed using count(*) so why store it?

A case for storing derived data

Computations can take a long time to do on large volumes of data reducing query response times. Some computations are very complex - beyond the ability of most managers who need access to summary statistics, aggregated values etc eg total sales for 1st quarter 2007 in North West region for all washing machine components. Do managers/end-users know SQL? To meet the needs of managers, data warehouses are being used. Data warehouses contain the summary statistics managers need for decision making Data warehouse software provides graphical user interfaces that do not require managers to know SQL. Data in a Data Warehouse is not normalised.

Self Assessment Exercise

Represent the following Staff Allocation Sheet as an un-normalised table, and then normalise it to 3NF

The solution will be provided on Blackboard next week.

.and here is an article on de-normalisation:-

http://www.objectarchitects.de/ObjectArchitects/orpatterns/Performance/Denormalizat ion/CraigMullinsGuidelines/i001fe02.htm

ABD22 1st Exam - 6 January - Attempt Review
No ratings yet
ABD22 1st Exam - 6 January - Attempt Review
13 pages
PL 300 Microsoft Power BI Data Analyst Badge20221004-46-1f3469z
No ratings yet
PL 300 Microsoft Power BI Data Analyst Badge20221004-46-1f3469z
1 page
Task 2
No ratings yet
Task 2
6 pages
3 RelModel
No ratings yet
3 RelModel
19 pages
Part 1 - Game Data Analyst PDF
No ratings yet
Part 1 - Game Data Analyst PDF
3 pages
PL300
No ratings yet
PL300
36 pages
Exam Da 100 Analyzing Data With Microsoft Power Bi Skills Measured
No ratings yet
Exam Da 100 Analyzing Data With Microsoft Power Bi Skills Measured
9 pages
Data Warehouse and Data Modelling
No ratings yet
Data Warehouse and Data Modelling
11 pages
Types in The Power Query M Formula Language
No ratings yet
Types in The Power Query M Formula Language
7 pages
Exam Questions DA-100: Analyzing Data With Microsoft Power BI
No ratings yet
Exam Questions DA-100: Analyzing Data With Microsoft Power BI
10 pages
Power Query
No ratings yet
Power Query
4 pages
Excel 2022 The Most Exhaustive Guide To Master Excel Formulas Functions (Scott Burnett)
No ratings yet
Excel 2022 The Most Exhaustive Guide To Master Excel Formulas Functions (Scott Burnett)
240 pages
Part 1: The Star Schema Data Model: Healthcare Data Models UC Davis Continuing and Professional Education
No ratings yet
Part 1: The Star Schema Data Model: Healthcare Data Models UC Davis Continuing and Professional Education
5 pages
Complete Sylabus Advanced Excel, Power BI, SQL
No ratings yet
Complete Sylabus Advanced Excel, Power BI, SQL
7 pages
Inf3707 Exam 2020 Memo
No ratings yet
Inf3707 Exam 2020 Memo
3 pages
SQL Naming Conventions
No ratings yet
SQL Naming Conventions
2 pages
ISO 80000-3 A Complete Guide
From Everand
ISO 80000-3 A Complete Guide
Gerardus Blokdyk
No ratings yet
Classic Star Schema As Data Model of Data Warehouse
No ratings yet
Classic Star Schema As Data Model of Data Warehouse
7 pages
Data Modelling Training 21st Century +917386622889
No ratings yet
Data Modelling Training 21st Century +917386622889
8 pages
DP-203 Exam - Actual Q&As, Page 1 - ExamTopics-2
No ratings yet
DP-203 Exam - Actual Q&As, Page 1 - ExamTopics-2
1 page
A Dimensional Modeling Manifesto
No ratings yet
A Dimensional Modeling Manifesto
8 pages
Assignment Chapter 3 PDF
No ratings yet
Assignment Chapter 3 PDF
2 pages
Vertipaq Vs OLAP - Change Your Data Modeling Approach - Marco Russo
No ratings yet
Vertipaq Vs OLAP - Change Your Data Modeling Approach - Marco Russo
10 pages
Difference Between Temporary Table and Table Variable in SQL Server
No ratings yet
Difference Between Temporary Table and Table Variable in SQL Server
2 pages
SQL01 - Introduction To Business Intelligence
No ratings yet
SQL01 - Introduction To Business Intelligence
75 pages
Power BI
No ratings yet
Power BI
47 pages
SAP BO Designer Int Ques PDF
No ratings yet
SAP BO Designer Int Ques PDF
21 pages
SQL Server
No ratings yet
SQL Server
62 pages
SSRS Interview Questions PDF Download Basic Part 2
No ratings yet
SSRS Interview Questions PDF Download Basic Part 2
3 pages
MDM Training Program
No ratings yet
MDM Training Program
53 pages
SQL Functions
100% (1)
SQL Functions
16 pages
BI Projects
No ratings yet
BI Projects
17 pages
Prince2 Time Triggers
No ratings yet
Prince2 Time Triggers
1 page
Table Partitioning in SQL Server
No ratings yet
Table Partitioning in SQL Server
11 pages
SQL Interview Questions and Answers G
No ratings yet
SQL Interview Questions and Answers G
67 pages
Excel Power BI Interview Questions CA Isha Jaiswal GMU
No ratings yet
Excel Power BI Interview Questions CA Isha Jaiswal GMU
9 pages
ERwin API
No ratings yet
ERwin API
72 pages
Tutorial Analysis Service Tabular Model
No ratings yet
Tutorial Analysis Service Tabular Model
113 pages
Data Context Drivers
No ratings yet
Data Context Drivers
25 pages
SQL Server 2019 Big Data Presentation
No ratings yet
SQL Server 2019 Big Data Presentation
1 page
Excel - Module 2 (Formulas, Functions, and Formatting)
No ratings yet
Excel - Module 2 (Formulas, Functions, and Formatting)
3 pages
Real-Time CDC With SAP Data Services and SAP Replication Server
No ratings yet
Real-Time CDC With SAP Data Services and SAP Replication Server
20 pages
SQL Subquery
100% (1)
SQL Subquery
57 pages
What Is BI Testing
No ratings yet
What Is BI Testing
19 pages
Snake Game Py
No ratings yet
Snake Game Py
15 pages
KSR DATA VISION Fullstack - Powerbi - With - Fabric - Tools
No ratings yet
KSR DATA VISION Fullstack - Powerbi - With - Fabric - Tools
21 pages
Microsoft Power BI Data Analyst Exam (PL-300) : Free Test - Practice Mode
No ratings yet
Microsoft Power BI Data Analyst Exam (PL-300) : Free Test - Practice Mode
19 pages
File Handling in
No ratings yet
File Handling in
10 pages
Powerbi Microsoft
No ratings yet
Powerbi Microsoft
990 pages
Glossary of Retailing Terms
100% (1)
Glossary of Retailing Terms
9 pages
Data Analyst Course Content-1
No ratings yet
Data Analyst Course Content-1
10 pages
PL 300ExamRequirements230131from31January2023
100% (1)
PL 300ExamRequirements230131from31January2023
4 pages
PL300-Manual en
No ratings yet
PL300-Manual en
38 pages
Azure AnalysisServiceOverview
No ratings yet
Azure AnalysisServiceOverview
173 pages
MSTR Architect Project Design Essentials: Course Contents: Basic and Advanced
No ratings yet
MSTR Architect Project Design Essentials: Course Contents: Basic and Advanced
3 pages
Introduction To Data Archiving
No ratings yet
Introduction To Data Archiving
30 pages
Sales Amount by Month - Sort It by The Correct Month Order, Not Alphabetical Order
No ratings yet
Sales Amount by Month - Sort It by The Correct Month Order, Not Alphabetical Order
6 pages
Ssas Real Time Interview Questions and Answers
No ratings yet
Ssas Real Time Interview Questions and Answers
7 pages
Redundancy Dependency Loss of Information
No ratings yet
Redundancy Dependency Loss of Information
61 pages
The Process of Normalisation: Studentnum Coursenum Student Name Address Course
No ratings yet
The Process of Normalisation: Studentnum Coursenum Student Name Address Course
10 pages
Nfs Lan
100% (2)
Nfs Lan
1 page
DSL Technologies1
No ratings yet
DSL Technologies1
39 pages
SSDG - Digital Signature
No ratings yet
SSDG - Digital Signature
34 pages
CM8 Combined 2013
100% (1)
CM8 Combined 2013
140 pages
Learning Curves
No ratings yet
Learning Curves
12 pages
Mulitx Swing Brochure
No ratings yet
Mulitx Swing Brochure
12 pages
Lecture#09: Cryptographic Hash Functions-MD5
No ratings yet
Lecture#09: Cryptographic Hash Functions-MD5
18 pages
Altronics DDNTS IOM 05-2002 PDF
No ratings yet
Altronics DDNTS IOM 05-2002 PDF
30 pages
SPA Eco City Brochure
No ratings yet
SPA Eco City Brochure
38 pages
Semester-6: Course Structure: Lectures: 2/labs: 1 Credit Hours: 3 Objectives
No ratings yet
Semester-6: Course Structure: Lectures: 2/labs: 1 Credit Hours: 3 Objectives
6 pages
Beginners' Guide - ArchWiki
No ratings yet
Beginners' Guide - ArchWiki
26 pages
Design of 4 Bit Barrel Shifter PDF
100% (1)
Design of 4 Bit Barrel Shifter PDF
23 pages
Python Code Smells Detection Using Conventional Machine Learning Models
No ratings yet
Python Code Smells Detection Using Conventional Machine Learning Models
21 pages
Software Engineering Concepts
No ratings yet
Software Engineering Concepts
56 pages
Virtualizing Oracle: Oracle RAC On VMware Vsphere
No ratings yet
Virtualizing Oracle: Oracle RAC On VMware Vsphere
9 pages
2N IP HTTP API Manual EN 2.27 PDF
No ratings yet
2N IP HTTP API Manual EN 2.27 PDF
121 pages
595 Computer Science 2 North West Regional Mock 2022
No ratings yet
595 Computer Science 2 North West Regional Mock 2022
2 pages
Flat Mobile Design Evolved
No ratings yet
Flat Mobile Design Evolved
28 pages
Lab 02
No ratings yet
Lab 02
7 pages
HW13
No ratings yet
HW13
12 pages
G - 100 Top Digital Marketing Mcqs
75% (4)
G - 100 Top Digital Marketing Mcqs
9 pages
CCNAS Chapter 9 - CCNA Security - Implementing Network Security
No ratings yet
CCNAS Chapter 9 - CCNA Security - Implementing Network Security
13 pages
Using Genetic Algorithm For Hybrid Modes of Collaborative Filtering in Online Recommenders
No ratings yet
Using Genetic Algorithm For Hybrid Modes of Collaborative Filtering in Online Recommenders
6 pages
Instructor's Signature With Date: - HOD's Signature With Date: - Print Date: 09.07.2018 Print Time: 19:24:10
No ratings yet
Instructor's Signature With Date: - HOD's Signature With Date: - Print Date: 09.07.2018 Print Time: 19:24:10
5 pages
Digital Image Forgery Detection
No ratings yet
Digital Image Forgery Detection
25 pages
TM04 Designing Program Logic
No ratings yet
TM04 Designing Program Logic
62 pages
Risk
No ratings yet
Risk
2 pages
DSCI 553 Competition Project - F2024 .Docx_553
No ratings yet
DSCI 553 Competition Project - F2024 .Docx_553
5 pages
Panduan User Management Schoolmedia
100% (1)
Panduan User Management Schoolmedia
165 pages
Inteset Secure Lockdown v2 User Guide
No ratings yet
Inteset Secure Lockdown v2 User Guide
12 pages

Data Modelling 2 Normalisation: by Haik Richards

Uploaded by

Data Modelling 2 Normalisation: by Haik Richards

Uploaded by

Data Modelling 2 Normalisation

Normalisation versus E-R modelling

Normalisation simple example

Question: What should the analyst do with the above table?

ie data stored in a table should be about a single entity

A worked example up to 3NF

A typical order form

Might be stored in the following unnormalised table:

Normalising the table

First Normal Form (1NF) remove repeating groups

We now have 2 tables

.......but duplication and anomalies still exist!

Second Normal Form (2NF)

Second Normal Form (2NF)

We now have 3 tables:-

.But duplication and anomalies still exist!

Third Normal Form (3NF)

Therefore, we now have 4 tables

We have now removed most (maybe 99%) anomalies

Example 2 Normalise the following drug card to 3NF:

The data is stored in the following un-normalised table:-

And here is the above normalised to 3NF

Entity Relationship Modelling & Normalisation

A case for storing derived data

Self Assessment Exercise

The solution will be provided on Blackboard next week.

.and here is an article on de-normalisation:-

You might also like