0% found this document useful (0 votes)
89 views

Data Modelling 2 Normalisation: by Haik Richards

This document discusses data normalization and compares it to entity-relationship (E-R) modeling. It explains that normalization is a "bottom up" methodology that analyzes existing database tables to improve their structure by reducing anomalies, while E-R modeling is a "top down" approach. The document then covers the various normal forms (1NF, 2NF, 3NF, etc.) and provides examples of denormalizing tables to remove duplicates and anomalies. It also discusses when derived or computed data may be stored rather than calculated on queries to improve performance.

Uploaded by

venix
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
89 views

Data Modelling 2 Normalisation: by Haik Richards

This document discusses data normalization and compares it to entity-relationship (E-R) modeling. It explains that normalization is a "bottom up" methodology that analyzes existing database tables to improve their structure by reducing anomalies, while E-R modeling is a "top down" approach. The document then covers the various normal forms (1NF, 2NF, 3NF, etc.) and provides examples of denormalizing tables to remove duplicates and anomalies. It also discusses when derived or computed data may be stored rather than calculated on queries to improve performance.

Uploaded by

venix
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 28

Data Modelling 2 Normalisation

By Haik Richards

Normalisation versus E-R modelling

E-R modelling is a top down methodology. ie look at entities, then attributes, analyse associations and then construct tables. Normalisation is a bottom up methodology. ie the analyst looks at the structure of tables already being used in an enterprise, and applies normalisation (a methodology) to the tables in order to improve the structure.

Normalisation simple example

Question: What should the analyst do with the above table?

Normalised Tables

ie data stored in a table should be about a single entity

Normalisation

Often results in splitting a table into smaller tables. therefore referred to as recognize and split method. Question: What sort of unwanted/undesirable consequences might there be if the employee table was not split?

anomalies

anomalies refer to unexpected or unwanted effects that poorly constructed tables can produce. 3 common types of anomalies:Delete anomaly Update anomaly Insert anomaly

Delete anomaly

If tom leaves the company, and we delete that row of the table. What undesirable effect will happen? We will lose the fact that we have a Sales Department.

Update anomaly

If the name of the department Human Resources changes to Human Division, how many rows will we have to change? ..And what if we dont change all the rows? Our database will be in an inconsistent state.

Insert anomaly

What if we now have a new department called Marketing, but we have not yet assigned any employees to it? We will be forced do introduce nulls for many fields. Nulls are undesirable because they could mean anything

Normalisation

So do organisations have such poorly constructed tables? Yes. They are rife, everywhere. We used simple common sense to normalise the above table. Is there a more precise method of doing normalisation? Yes.

Normal Forms
The process of normalisation requires the analyst to work progressively through a series of normal forms: First Normal Form (1NF) Second Normal Form (2NF) Third Normal Form (3NF) Boyce-Codd Normal Form (BCNF) Fourth Normal Form (4NF) Fifth Normal Form (5NF) Most analysts will work through to 3NF. Up to 3NF most anomalies will be eliminated.

A worked example up to 3NF

A typical order form

Might be stored in the following unnormalised table:

Normalising the table


To begin the process of normalisation we list the column headings in a vertical format as follows:

First Normal Form (1NF) remove repeating groups


Notice that for each order in the system there may be a number of products referred to by the order. We therefore split this information off into its own group, making sure that we maintain the relationship between the information in the group through the common attribute 'ono'. This is illustrated in the table below:

Note: The lower relation has a composite key (two attributes acting as primary key).

We now have 2 tables

.......but duplication and anomalies still exist!

Second Normal Form (2NF)

remove partial dependencies every non-key attribute must depend on the key, the whole key, and nothing but the key (so help me Codd) In our INF column, we have only one group with a composite key - the second group. It is only this group that we therefore need to check to see if there are any partial dependencies. ie for each attribute in this group we check to see whether it depends on the whole composite key or just part of it. If it depends on just part of the composite key, we must split it off into another table.

Second Normal Form (2NF)


Only qty depends on the whole key. The other attributes depend on pno and are split off.

We now have 3 tables:-

.But duplication and anomalies still exist!

Third Normal Form (3NF)


- no transitive dependencies
Note that in the 2NF column, there is a transitive dependency between cno and cname. ie cno can be used as a key for the attribute cname. We can therefore split off the customer information into its own group. This can be seen below.

Therefore, we now have 4 tables

We have now removed most (maybe 99%) anomalies

Example 2 Normalise the following drug card to 3NF:

The data is stored in the following un-normalised table:-

And here is the above normalised to 3NF

Entity Relationship Modelling & Normalisation

Most times, analysts will construct a data model using E-R modelling. This provides a good first stab at design. It is also a good starting point for data modelling as E-R models (being diagrammatic) are relatively easy for end-users to understand. Normalisation will then be used to validate the correctness of the E-R model. To do this, each entity in the E-R model will be checked by going through the normalisation process. ie E-R modelling and normalisation are seen as complimentary methods.

De-normalisation:

normalisation splits database information into multiple tables. To retrieve complete information from multiple tables requires the use of the JOIN operation in SQL Joins produce an overhead on processing power, and very large joins can make retrieval times deteriorate. Therefore....it is sometimes decided to denormalise relations in order to improve access time for queries. De-normalisation is the process of combining data from 2 or more normalised tables into a single table

Derived Data

Data that can be computed (calculated) should not be included in a normalised table. eg the total number of employees in a department should not be an attribute of a department table Why? Can create inconsistencies Requires extra storage Can be computed using count(*) so why store it?

A case for storing derived data


Computations can take a long time to do on large volumes of data reducing query response times. Some computations are very complex - beyond the ability of most managers who need access to summary statistics, aggregated values etc eg total sales for 1st quarter 2007 in North West region for all washing machine components. Do managers/end-users know SQL? To meet the needs of managers, data warehouses are being used. Data warehouses contain the summary statistics managers need for decision making Data warehouse software provides graphical user interfaces that do not require managers to know SQL. Data in a Data Warehouse is not normalised.

Self Assessment Exercise


Represent the following Staff Allocation Sheet as an un-normalised table, and then normalise it to 3NF

The solution will be provided on Blackboard next week.

Further Reading

If you are interested in reading a short article on higher normal forms (above 3NF) visit:-

http://www.databasejournal.com/sqletc/article.php/1442971

.and here is an article on de-normalisation:-

http://www.objectarchitects.de/ObjectArchitects/orpatterns/Performance/Denormalizat ion/CraigMullinsGuidelines/i001fe02.htm

You might also like