This document discusses database normalization through three normal forms (1NF, 2NF, 3NF). 1NF requires eliminating duplicate columns and creating tables for each set of related data, with each row having a unique identifier. 2NF builds on 1NF by removing subsets of data that apply to multiple rows into separate tables. 3NF requires data to be in 2NF and for all non-key columns to depend on the whole primary key, with no transitive dependencies. The document provides examples of denormalizing employee data into 1NF and further normalizing into 2NF by separating repeating zip code data.
Download as DOCX, PDF, TXT or read online on Scribd
0 ratings0% found this document useful (0 votes)
88 views
Database Normalization What Is Normalization?
This document discusses database normalization through three normal forms (1NF, 2NF, 3NF). 1NF requires eliminating duplicate columns and creating tables for each set of related data, with each row having a unique identifier. 2NF builds on 1NF by removing subsets of data that apply to multiple rows into separate tables. 3NF requires data to be in 2NF and for all non-key columns to depend on the whole primary key, with no transitive dependencies. The document provides examples of denormalizing employee data into 1NF and further normalizing into 2NF by separating repeating zip code data.
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5
Database Normalization We cant easily search or
What is Normalization? index the data
Normalization allows us to We cant easily change organize data so that it: the data Allows faster access We cant easily reference (dependencies make the data in other tables sense) Breaking the Employee column Reduced space (less into more than 1 column redundancy) doesnt solve our problems: The data may look Normal Forms atomic, but only because Normalization is done through we have many identical changing or transforming data into columns storing a single various Normal Forms. piece of data instead of a There are 5 Normal Forms but single column storing we almost never use 4NF or many pieces of data. 5NF. We still cant easily sort, We will only be concerned with search, or index our 1NF, 2NF, and 3NF. employees. For a database to be in a normal What if a manager has form, it must meet all more than 2 employees, requirements of the previous 10 employees, 100 forms: employees? Wed need to E.g. For a database to be add columns to our in 2NF, it must already be database just for these in 1NF. For a database to cases. be in 3NF, it must already It is still hard to reference be in 1NF and 2NF. our employees in other Sample Data tables. By the way, what would be a good choice of a Primary Key for this table? Students would be expected to answer Manager since each manager is only listed once, and This data has some problems: the employees are scattered The Employees column is across multiple columns. Also, not atomic. an employee may change A column must be managers fairly frequently (but atomic, meaning once a person is a manager, that it can only they are likely to remain hold a single item managers). of data. This column holds more First Normal Form than one employee 1NF means that we must: name. Eliminate duplicate Data that is not atomic means: columns from the We cant easily sort the same table, and data Create separate tables for each group of related data into Of course there may come a separate tables, each day when we hire a second with a unique row employee or manager with the identifier (primary same name. To avoid this, lets key) use an employee ID instead of Lets get started by making their name. our columns atomic Moving to Second Normal Form Atomic Data A database in 2NF must also be in 1NF: Data must be atomic Every row (or tuple) must have a unique primary key Plus: Subsets of data that By breaking each tuple of apply to multiple rows our table into an entry for (repeating data) are each employee, we have moved to separate tables made our data atomic. What would be the primary key? Students should now say that the Employee is the Primary Key since there are now multiple manager values in the table. Only Employee is unique. Primary Key The best primary key would be the Employee column. This data is in 1NF: all fields are Every employee only has one atomic and the CustID serves as manager, therefore an the primary key employee is unique.
First Normal Form
Congratulations! The fact that all our data and columns is atomic and we have a primary key means that we are in 1NF! First Normal Form Revised
But lets pay attention to
the City, State, and Zip fields: There are 2 rows of repeating data: one for Chicago, and one for St. Paul. Both have the same city, the primary key in the Zip code state and zip code table The CustID determines all the Advantages of 2NF data in the row, but U.S. Zip Saves space in the database by codes determines the City and reducing redundancies State. (e.g. A given Zip code If a customer calls, you can just can only belong to one city and ask them for their Zip code and state so storing Zip codes with a youll know their city and state! City and State is redundant) (No more spelling mistakes) This means that City and State If a City name changes, we only are Functionally Dependent on need to make one change to the the value in Zip code and not database. only the primary key. To be in 2NF, this repeating data Summary So Far must be in its own table. 1NF: So: All data is atomic Lets create a Zip code All rows have a unique table that maps Zip primary key codes to their City and 2NF: State. Data is in 1NF Note that Canadian Postal Subsets of data in Codes are different: the multiple columns are same city and state can moved to a new table have many different These new tables are postal codes. related using foreign keys Moving to 3NF To be in 3NF, a database must be: In 2NF All columns must be fully functionally dependent on the primary key (There are no transitive dependencies)
We see that we can actually
save 2 rows in the Zip Code table by removing these redundancies: 9 customer records only need 7 Zip code records. Zip code becomes a foreign key In this table: in the customer table linked to CustomerID and ProdID depend on the OrderID and no other column Lets diagram the (good) dependencies. Stated another way, If We can see that all fields are you know the OrderID, dependent on OrderID, the you know the CustID and Primary Key (white lines) the ProdID But Total is also determined by So: OrderID CustID, Price and Quantity (yellow lines) ProdID This is a derived field But there are some fields (Price x Quantity = Total) that are not dependent on We can save a lot of OrderID: space by getting rid of it Total is the simple altogether and just product of Price*Quantity. calculating total when we As such, has a transitive need it dependency to Price and Price is also determined by both Quantity. ProdID and Quantity rather than Because it is a calculated the primary key (red lines). This value, doesnt need to be is called a transitive included at all. dependency. We must get rid of Also, we can see that Price isnt transitive dependencies to have really dependent on ProdID, or 3NF. OrderID. Customer 1001 bought We do this by moving the AB-111 for $50 (in order 1) and transitive dependency into a for $75 (in order 7), while 1002 second table spent $60 for each item in order By splitting out the table, we 2. can quickly adjust our price Maybe price is dependent on table to meet our competitor, or the ProdID and Quantity: The if the prices changes from our more you buy of a given suppliers. product the cheaper that product becomes! So we ask the business manager and she tells us that this is the case. We say that Price has a transitive dependency on The second table is our pricing ProdID and Quantity. list. This means that Price Think of Quantity as a isnt just determined by range: the OrderID. It is also AB-111: 1-100, 101-500, determined by the size 501 and more (or quantity) of the order ZA-245: 1-10, 11-50, 51 (and of course what is and more ordered). The primary Key for this second table is a composite of ProdID and Quantity. Congratulations! Were now in 3NF! We can also quickly figure out what price to offer our customers for any quantity they It is in 1NF want. There is no repeating To summarize (again) data in its tables. A database is in 3NF if: Put another way, if It is in 2NF we use a It has no transitive composite primary dependencies key, then all A transitive attributes are dependency exists dependent on all when one attribute parts of the key. (or field) is determined by And Finally another non-key attribute (or field) A database is in 1NF if: We remove fields All its attributes are with a transitive atomic (meaning they dependency to a contain only a single unit new table and link them by a foreign or type of data), and key. All rows have a unique Summarizing primary key. A database is in 2NF if:
Download ebooks file The Volatility Edge in Options Trading New Technical Strategies for Investing in Unstable Markets 1st Edition Jeff Augen all chapters