Data Modelling 2 Normalisation: by Haik Richards
Data Modelling 2 Normalisation: by Haik Richards
By Haik Richards
E-R modelling is a top down methodology. ie look at entities, then attributes, analyse associations and then construct tables. Normalisation is a bottom up methodology. ie the analyst looks at the structure of tables already being used in an enterprise, and applies normalisation (a methodology) to the tables in order to improve the structure.
Normalised Tables
Normalisation
Often results in splitting a table into smaller tables. therefore referred to as recognize and split method. Question: What sort of unwanted/undesirable consequences might there be if the employee table was not split?
anomalies
anomalies refer to unexpected or unwanted effects that poorly constructed tables can produce. 3 common types of anomalies:Delete anomaly Update anomaly Insert anomaly
Delete anomaly
If tom leaves the company, and we delete that row of the table. What undesirable effect will happen? We will lose the fact that we have a Sales Department.
Update anomaly
If the name of the department Human Resources changes to Human Division, how many rows will we have to change? ..And what if we dont change all the rows? Our database will be in an inconsistent state.
Insert anomaly
What if we now have a new department called Marketing, but we have not yet assigned any employees to it? We will be forced do introduce nulls for many fields. Nulls are undesirable because they could mean anything
Normalisation
So do organisations have such poorly constructed tables? Yes. They are rife, everywhere. We used simple common sense to normalise the above table. Is there a more precise method of doing normalisation? Yes.
Normal Forms
The process of normalisation requires the analyst to work progressively through a series of normal forms: First Normal Form (1NF) Second Normal Form (2NF) Third Normal Form (3NF) Boyce-Codd Normal Form (BCNF) Fourth Normal Form (4NF) Fifth Normal Form (5NF) Most analysts will work through to 3NF. Up to 3NF most anomalies will be eliminated.
Note: The lower relation has a composite key (two attributes acting as primary key).
remove partial dependencies every non-key attribute must depend on the key, the whole key, and nothing but the key (so help me Codd) In our INF column, we have only one group with a composite key - the second group. It is only this group that we therefore need to check to see if there are any partial dependencies. ie for each attribute in this group we check to see whether it depends on the whole composite key or just part of it. If it depends on just part of the composite key, we must split it off into another table.
Most times, analysts will construct a data model using E-R modelling. This provides a good first stab at design. It is also a good starting point for data modelling as E-R models (being diagrammatic) are relatively easy for end-users to understand. Normalisation will then be used to validate the correctness of the E-R model. To do this, each entity in the E-R model will be checked by going through the normalisation process. ie E-R modelling and normalisation are seen as complimentary methods.
De-normalisation:
normalisation splits database information into multiple tables. To retrieve complete information from multiple tables requires the use of the JOIN operation in SQL Joins produce an overhead on processing power, and very large joins can make retrieval times deteriorate. Therefore....it is sometimes decided to denormalise relations in order to improve access time for queries. De-normalisation is the process of combining data from 2 or more normalised tables into a single table
Derived Data
Data that can be computed (calculated) should not be included in a normalised table. eg the total number of employees in a department should not be an attribute of a department table Why? Can create inconsistencies Requires extra storage Can be computed using count(*) so why store it?
Computations can take a long time to do on large volumes of data reducing query response times. Some computations are very complex - beyond the ability of most managers who need access to summary statistics, aggregated values etc eg total sales for 1st quarter 2007 in North West region for all washing machine components. Do managers/end-users know SQL? To meet the needs of managers, data warehouses are being used. Data warehouses contain the summary statistics managers need for decision making Data warehouse software provides graphical user interfaces that do not require managers to know SQL. Data in a Data Warehouse is not normalised.
Further Reading
If you are interested in reading a short article on higher normal forms (above 3NF) visit:-
http://www.databasejournal.com/sqletc/article.php/1442971
http://www.objectarchitects.de/ObjectArchitects/orpatterns/Performance/Denormalizat ion/CraigMullinsGuidelines/i001fe02.htm