0% found this document useful (0 votes)
118 views

A Step-By-Step Guide To Normalization in DBMS With Examples

The document provides a step-by-step guide to database normalization including examples. It explains what normalization is, why it's important, and discusses different types of data anomalies. The guide also covers first, second, and third normal forms and provides an example student database to demonstrate normalization concepts.

Uploaded by

silas chiponda
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
118 views

A Step-By-Step Guide To Normalization in DBMS With Examples

The document provides a step-by-step guide to database normalization including examples. It explains what normalization is, why it's important, and discusses different types of data anomalies. The guide also covers first, second, and third normal forms and provides an example student database to demonstrate normalization concepts.

Uploaded by

silas chiponda
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

3/9/24, 12:02 PM A Step-By-Step Guide to Normalization in DBMS With Examples

Database Normalization: A Step-By-Step-Guide


With Examples
66 Comments / Last updated: September 26, 2022

Database normalisation is a concept that can be hard to understand.

But it doesn’t have to be.

In this article, I’ll explain what normalisation in a DBMS is and how to do it, in simple terms.

By the end of the article, you’ll know all about it and how to do it.

So let’s get started.

Table of Contents
1. What Is Database Normalization?
2. Why Normalize a Database?
3. What Are The Normal Forms?
4. What Is First Normal Form?
5. What Is Second Normal Form?
6. What Is Third Normal Form?
7. Fourth Normal Form and Beyond

What Is Database Normalization?


Database normalisation, or just normalisation as it’s commonly called, is a process used for data modelling or database creation, where you organise your
data and tables so it can be added and updated efficiently.

It’s something a person does manually, as opposed to a system or a tool doing it. It’s commonly done by database developers and database
administrators.

It can be done on any relational database, where data is stored in tables that are linked to each other. This means that normalization in a DBMS (Database
Management System) can be done in Oracle, Microsoft SQL Server, MySQL, PostgreSQL and any other type of database.

To perform the normalization process, you start with a rough idea of the data you want to store, and apply certain rules to it in order to get it to a more
efficient form.

I’ll show you how to normalise a database later in this article.

Why Normalize a Database?


So why would anyone want to normalize their database?

Why do we want to go through this manual process of rearranging the data?

There are a few reasons we would want to go through this process:

Make the database more efficient


Prevent the same data from being stored in more than one place (called an “insert anomaly”)
Prevent updates being made to some data but not others (called an “update anomaly”)
Prevent data not being deleted when it is supposed to be, or from data being lost when it is not supposed to be (called a “delete anomaly”)

https://www.databasestar.com/database-normalization/ 1/28
3/9/24, 12:02 PM A Step-By-Step Guide to Normalization in DBMS With Examples

Ensure the data is accurate


Reduce the storage space that a database takes up
Ensure the queries on a database run as fast as possible

Normalization in a DBMS is done to achieve these points. Without normalization on a database, the data can be slow, incorrect, and messy.

Data Anomalies

Some of these points above relate to “anomalies”.

An anomaly is where there is an issue in the data that is not meant to be there. This can happen if a database is not normalised.

Let’s take a look at the different kinds of data anomalies that can occur and that can be prevented with a normalised database.

Our Example

We’ll be using a student database as an example in this article, which records student, class, and teacher information.

Let’s say our student database looks like this:

Student ID Student Name Fees Paid Course Name Class 1 Class 2 Class 3

1 John Smith 200 Economics Economics 1 Biology 1

Business Programming
2 Maria Griffin 500 Computer Science Biology 1
Intro 2

3 Susan Johnson 400 Medicine Biology 2

4 Matt Long 850 Dentistry

This table keeps track of a few pieces of information:

The student names


The fees a student has paid
The classes a student is taking, if any

This is not a normalised table, and there are a few issues with this.

Insert Anomaly

An insert anomaly happens when we try to insert a record into this table without knowing all the data we need to know.

For example, if we wanted to add a new student but did not know their course name.

The new record would look like this:

Student ID Student Name Fees Paid Course Name Class 1 Class 2 Class 3

1 John Smith 200 Economics Economics 1 Biology 1

Business
2 Maria Griffin 500 Computer Science Biology 1 Programming 2
Intro

3 Susan Johnson 400 Medicine Biology 2

4 Matt Long 850 Dentistry

5 Jared Oldham 0 ?

We would be adding incomplete data to our table, which can cause issues when trying to analyse this data.

Update Anomaly

An update anomaly happens when we want to update data, and we update some of the data but not other data.

https://www.databasestar.com/database-normalization/ 2/28
3/9/24, 12:02 PM A Step-By-Step Guide to Normalization in DBMS With Examples

For example, let’s say the class Biology 1 was changed to “Intro to Biology”. We would have to query all of the columns that could have this Class field and
rename each one that was found.

Student ID Student Name Fees Paid Course Name Class 1 Class 2 Class 3

Intro to
1 John Smith 200 Economics Economics 1
Biology

Intro to Business
2 Maria Griffin 500 Computer Science Programming 2
Biology Intro

3 Susan Johnson 400 Medicine Biology 2

4 Matt Long 850 Dentistry

There’s a risk that we miss out on a value, which would cause issues.

Ideally, we would only update the value once, in one location.

Delete Anomaly

A delete anomaly occurs when we want to delete data from the table, but we end up deleting more than what we intended.

For example, let’s say Susan Johnson quits and her record needs to be deleted from the system. We could delete her row:

Student ID Student Name Fees Paid Course Name Class 1 Class 2 Class 3

1 John Smith 200 Economics Economics 1 Biology 1

Business
2 Maria Griffin 500 Computer Science Biology 1 Programming 2
Intro

3 Susan Johnson 400 Medicine Biology 2

4 Matt Long 850 Dentistry

But, if we delete this row, we lose the record of the Biology 2 class, because it’s not stored anywhere else. The same can be said for the Medicine course.

We should be able to delete one type of data or one record without having impacts on other records we don’t want to delete.

What Are The Normal Forms?


The process of normalization involves applying rules to a set of data. Each of these rules transforms the data to a certain structure, called a normal form.

There are three main normal forms that you should consider (Actually, there are six normal forms in total, but the first three are the most common).

Whenever the first rule is applied, the data is in “first normal form“. Then, the second rule is applied and the data is in “second normal form“. The third rule
is then applied and the data is in “third normal form“.

Fourth and fifth normal forms are then achieved from their specific rules.

Alright, so there are three main normal forms that we’re going to look at. I’ve written a post on designing a database, but let’s see what is involved in
getting to each of the normal forms in more detail.

What Is First Normal Form?


First normal form is the way that your data is represented after it has the first rule of normalization applied to it. Normalization in DBMS starts with the
first rule being applied – you need to apply the first rule before applying any other rules.

Let’s start with a sample database. In this case, we’re going to use a student and teacher database at a school. We mentioned this earlier in the article
when we spoke about anomalies, but here it is again.

https://www.databasestar.com/database-normalization/ 3/28
3/9/24, 12:02 PM A Step-By-Step Guide to Normalization in DBMS With Examples

Our Example Database

We have a set of data we want to capture in our database, and this is how it currently looks. It’s a single table called “student” with a lot of columns.

Date
Student Fees Subject Teacher Teacher Course
of Address Subject 1 Subject 2 Subject 3
Name Paid 4 Name Address Name
Birth

04- 3 Main Street,


John 18-Jul- Economics 1 Biology 1 James 44 March Way,
Aug- North Boston Economics
Smith 00 (Business) (Science) Peterson Glebe 56100
91 56125

14- 16 Leeds Road,


Maria 10- Biology 1 Business Intro Programming James 44 March Way, Computer
May- South Boston
Griffin Sep-92 (Science) (Business) 2 (IT) Peterson Glebe 56100 Science
01 56128

21 Arrow Street,
Susan 03- 13- Biology 2 Sarah
South Boston Medicine
Johnson Feb-01 Jan-91 (Science) Francis
56128

14 Milk Lane, 105 Mist Road,


Matt 29- 25- Shane
South Boston Faulkner Dentistry
Long Apr-02 Apr-92 Cobson
56128 56410

Everything is in one table.

How can we normalise this?

We start with getting the data to First Normal Form.

To apply first normal form to a database, we look at each table, one by one, and ask ourselves the following questions of it:

1. Does the combination of all columns make a unique row every single time?
2. What field can be used to uniquely identify the row?

Let’s look at the first question.

Does the combination of all columns make a unique row every single time?

No. There could be the same combination of data, and it would represent a different row. There could be the same values for this row and it would be a
separate row (even though it is rare).

The second question says:

What field can be used to uniquely identify the row?

Is this the student name? No, as there could be two students with the same name.

Address? No, this isn’t unique either.

Any other field?

We don’t have a field that can uniquely identify the row.

If there is no unique field, we need to create a new field. This is called a primary key, and is a database term for a field that is unique to a single row.
(Related: The Complete Guide to Database Keys)

When we create a new primary key, we can call it whatever we like, but it should be obvious and consistently named between tables. I prefer using the ID
suffix, so I would call it student ID.

This is our new table:

Student (student ID, student name, fees paid, date of birth, address, subject 1, subject 2, subject 3, subject 4, teacher name, teacher
address, course name)

This can also be represented in an Entity Relationship Diagram (ERD):

https://www.databasestar.com/database-normalization/ 4/28
3/9/24, 12:02 PM A Step-By-Step Guide to Normalization in DBMS With Examples

The way I have written this is a common way of representing tables in text format. The table name is written, and all of the columns are shown in brackets,
with the primary key underlined.

This data is now in first normal form.

This example is still in one table, but it’s been made a little better by adding a unique value to it.

Want to find a tool that creates these kinds of diagrams? There are many tools for creating these kinds of diagrams. I’ve listed 76 of them in this guide to
Data Modeling Tools, along with reviews, price, and other features. So if you’re looking for one to use, take a look at that list.

What Is Second Normal Form?


The rule of second normal form on a database can be described as:

1. Fulfil the requirements of first normal form


2. Each non-key attribute must be functionally dependent on the primary key

What does this even mean?

It means that the first normal form rules have been applied. It also means that each field that is not the primary key is determined by that primary key,
so it is specific to that record. This is what “functional dependency” means.

Let’s take a look at our table.

Student (student ID, student name, fees paid, date of birth, address, subject 1, subject 2, subject 3, subject 4, teacher name, teacher
address, course name)

Are all of these columns dependent on and specific to the primary key?

The primary key is student ID, which represents the student. Let’s look at each column:

student name: Yes, this is dependent on the primary key. A different student ID means a different student name.
fees paid: Yes, this is dependent on the primary key. Each fees paid value is for a single student.
date of birth: Yes, it’s specific to that student.
address: Yes, it’s specific to that student.
subject 1: No, this column is not dependent on the student. More than one student can be enrolled in one subject.
subject 2: As above, more than one subject is allowed.
subject 3: No, same rule as subject 2.
subject 4: No, same rule as subject 2
teacher name: No, the teacher name is not dependent on the student.
teacher address: No, the teacher address is not dependent on the student.
https://www.databasestar.com/database-normalization/ 5/28
3/9/24, 12:02 PM A Step-By-Step Guide to Normalization in DBMS With Examples

course name: No, the course name is not dependent on the student.

We have a mix of Yes and No here. Some fields are dependent on the student ID, and others are not.

How can we resolve those we marked as No?

Let’s take a look.

Subject

First, the subject 1 column. It is not dependent on the student, as more than one student can have a subject, and the subject isn’t a part of the definition of
a student.

So, we can move it to a new table:

Subject (subject name)

I’ve called it subject name because that’s what the value represents. When we are writing queries on this table or looking at diagrams, it’s clearer what
subject name is instead of using subject.

Now, is this field unique? Not necessarily. Two subjects could have the same name and this would cause problems in our data.

So, what do we do? We add a primary key column, just like we did for student. I’ll call this subject ID, to be consistent with the student ID.

Subject (subject ID, subject name)

This means we have a student table and a subject table. We can do this for all four of our subject columns in the student table, removing them from the
student table so it looks like this:

Student (student ID, student name, fees paid, date of birth, address, teacher name, teacher address, course name)

But they are in separate tables. How do we link them together?

We’ll cover that shortly. For now, let’s keep going with our student table.

Teacher

The next column we marked as No was the Teacher Name column. The teacher is separate to the student so should be captured separately. This means
we should move it to its own table.

Teacher (teacher name)

We should also move the teacher address to this table, as it’s a property of the teacher. I’ll also rename teacher address to be just address.

Teacher (teacher name, address)

Just like with the subject table, the teacher name and address is not unique. Sure, in most cases it would be, but to avoid duplication we should add a
primary key. Let’s call it teacher ID,

Teacher (teacher ID, teacher name, address)

https://www.databasestar.com/database-normalization/ 6/28
3/9/24, 12:02 PM A Step-By-Step Guide to Normalization in DBMS With Examples

Course

The last column we have to look at was the Course Name column. This indicates the course that the student is currently enrolled in.

While the course is related to the student (a student is enrolled in a course), the name of the course itself is not dependent on the student.

So, we should move it to a separate table. This is so any changes to courses can be made independently of students.

The course table would look like this:

Course (course name)

Let’s also add a primary key called course ID.

Course (course ID, course name)

We now have our tables created from columns that were in the student table. Our database so far looks like this:

Student (student ID, student name, fees paid, date of birth, address)

Subject (subject ID, subject name)

Teacher (teacher ID, teacher name, address)

Course (course ID, course name)

Using the data from the original table, our data could look like this:

Student

student student date of


fees paid address
ID name birth

3 Main Street, North Boston


1 John Smith 18-Jul-00 04-Aug-91
56125

16 Leeds Road, South Boston


2 Maria Griffin 14-May-01 10-Sep-92
56128

Susan 21 Arrow Street, South Boston


3 03-Feb-01 13-Jan-91
Johnson 56128

14 Milk Lane, South Boston


4 Matt Long 29-Apr-02 25-Apr-92
56128

Subject

subject ID subject name

1 Economics 1 (Business)

2 Biology 1 (Science)

3 Business Intro (Business)

4 Programming 2 (IT)

5 Biology 2 (Science)

Teacher

teacher
teacher name address
ID

https://www.databasestar.com/database-normalization/ 7/28
3/9/24, 12:02 PM A Step-By-Step Guide to Normalization in DBMS With Examples

1 James Peterson 44 March Way, Glebe 56100

2 Sarah Francis

3 Shane Cobson 105 Mist Road, Faulkner 56410

Course

course ID course name

1 Computer Science

2 Dentistry

3 Economics

4 Medicine

How do we link these tables together? We still need to know which subjects a student is taking, which course they are in, and who their teachers are.

Foreign Keys in Tables

We have four separate tables, capturing different pieces of information. We need to capture that students are taking certain courses, have teachers, and
subjects. But the data is in different tables.

How can we keep track of this?

We use a concept called a foreign key.

A foreign key is a column in one table that refers to the primary key in another table. Related: The Complete Guide to Database Keys.

It’s used to link one record to another based on its unique identifier, without having to store the additional information about the linked record.

Here are our two tables so far:

Student (student ID, student name, fees paid, date of birth, address)

Subject (subject ID, subject name)

Teacher (teacher ID, teacher name, teacher address)

Course (course ID, course name)

To link the two tables using a foreign key, we need to put the primary key (the underlined column) from one table into the other table.

Let’s start with a simple one: students taking courses. For our example scenario, a student can only be enrolled in one course at a time, and a course can
have many students.

We need to either:

Add the course ID from the course table into the student table
Add the student ID from the student table into the course table

But which one is it?

In this situation, I ask myself a question to work out which way it goes:

Does a table1 have many table2s, or does a table2 have many table1s?

If it’s the first, then table1 ID goes into table 2, and if it’s the second then table2 ID goes into table1.

So, if we substitute table1 and table2 for course and student:

Does a course have many students, or does a student have many courses?

Based on our rules, the first statement is true: a course has many students.

This means that the course ID goes into the student table.

https://www.databasestar.com/database-normalization/ 8/28
3/9/24, 12:02 PM A Step-By-Step Guide to Normalization in DBMS With Examples

Student (student ID, course ID, student name, fees paid, date of birth, address)

Subject (subject ID, subject name)

Teacher (teacher ID, teacher name, teacher address)

Course (course ID, course name)

I’ve italicised it to indicate it is a foreign key – a value that links to a primary key in another table.

When we actually populate our tables, instead of having the course name in the student table, the course ID goes in the student table. The course name
can then be linked using this ID.

student course student fees date of


address
ID ID name paid birth

4 Aug 3 Main Street, North Boston


1 3 John Smith 200
1991 56125

10 Sep 16 Leeds Road, South Boston


2 1 Maria Griffin 500
1992 56128

Susan 13 Jan 21 Arrow Street, South Boston


3 4 400
Johnson 1991 56128

25 Apr
4 2 Matt Long 850 14 Milk Lane, South Boston 56128
1992

This also means that the course name is stored in one place only, and can be added/removed/updated without impacting other tables.

I’ve created a YouTube video to explain how to identify and diagram one-to-many relationships like this:

You're offline

Teacher

We’ve linked the student to the course. Now let’s look at the teacher.

How are teachers related? Depending on the scenario, they could be related in one of a few ways:

A student can have one teacher that teaches them all subjects
A subject could have a teacher than teaches it
A course could have a teacher that teaches all subjects in a course

https://www.databasestar.com/database-normalization/ 9/28
3/9/24, 12:02 PM A Step-By-Step Guide to Normalization in DBMS With Examples

In our scenario, a teacher is related to a course. We need to relate these two tables using a foreign key.

Does a teacher have many courses, or does a course have many teachers?

In our scenario, the first statement is true. So the teacher ID goes into the course table:

Student (student ID, course ID, student name, fees paid, date of birth, address)

Subject (subject ID, subject name)

Teacher (teacher ID, teacher name, teacher address)

Course (course ID, teacher ID, course name)

The table data would look like this:

Course

course
teacher ID course name
ID

1 1 Computer Science

2 3 Dentistry

3 1 Economics

4 2 Medicine

Teacher

teacher
teacher name address
ID

James
1 44 March Way, Glebe 56100
Peterson

2 Sarah Francis

3 Shane Cobson 105 Mist Road, Faulkner 56410

This allows us to change the teacher’s information without impacting the courses or students.

Student and Subject

So we’ve linked the course, teacher, and student tables together so far.

What about the subject table?

Does a subject have many students, or does a student have many subjects?

The answer is both.

How is that possible?

A student can be enrolled in many subjects at a time, and a subject can have many students in it.

How can we represent that? We could try to put one table’s ID in the other table:

student course subject student fees date of


address
ID ID ID name paid birth

4 Aug 3 Main Street, North Boston


1 3 1, 2 John Smith 200
1991 56125

2, 3, 10 Sep 16 Leeds Road, South Boston


2 1 Maria Griffin 500
2004 1992 56128

https://www.databasestar.com/database-normalization/ 10/28
3/9/24, 12:02 PM A Step-By-Step Guide to Normalization in DBMS With Examples

Susan 13 Jan 21 Arrow Street, South Boston


3 4 5 400
Johnson 1991 56128

25 Apr
4 2 Matt Long 850 14 Milk Lane, South Boston 56128
1992

But if we do this, we’re storing many pieces of information in one column, possibly separated by commas.

This makes it hard to maintain and is very prone to errors.

If we have this kind of relationship, one that goes both ways, it’s called a many to many relationship. It means that many of one record is related to many
of the other record.

Many to Many Relationships

A many to many relationship is common in databases. Some examples where it can happen are:

Students and subjects


Employees and companies (an employee can have many jobs at different companies, and a company has many employees)
Actors and movies (an actor is in multiple movies, and a movie has multiple actors)

If we can’t represent this relationship by putting a foreign key in each table, how can we represent it?

We use a joining table.

This is a table that is created purely for storing the relationships between the two tables.

It works like this. Here are our two tables:

Student (student ID, course ID, student name, fees paid, date of birth, address)

Subject (subject ID, subject name)

And here is our joining table:

Subject_Student (student ID, subject ID)

It has two columns. Student ID is a foreign key to the student table, and subject ID is a foreign key to the subject table.

Each record in the row would look like this:

student ID subject ID

1 1

1 2

2 2

2 3

2 4

3 5

Each row represents a relationship between a student and a subject.

Student 1 is linked to subject 1.

Student 1 is linked to subject 2.

Student 2 is linked to subject 2.

And so on.

This has several advantages:

https://www.databasestar.com/database-normalization/ 11/28
3/9/24, 12:02 PM A Step-By-Step Guide to Normalization in DBMS With Examples

It allows us to store many subjects for each student, and many students for each subject.
It separates the data that describes the records (subject name, student name, address, etc.) from the relationship of the records (linking ID to ID).
It allows us to add and remove relationships easily.
It allows us to add more information about the relationship. We could add an enrolment date, for example, to this table, to capture when a student
enrolled in a subject.

You might be wondering, how do we see the data if it’s in multiple tables? How can we see the student name and the name of the subjects they are
enrolled in?

Well, that’s where the magic of SQL comes in. We use a SELECT query with JOINs to show the data we need. But that’s outside the scope of this article –
you can read the articles on my Oracle Database page to find out more about writing SQL.

One final thing I have seen added to these joining tables is a primary key of its own. An ID field that represents the record. This is an optional step – a
primary key on a single new column works in a similar way to defining the primary key on the two ID columns. I’ll leave it out in this example.

So, our final table structure looks like this:

Student (student ID, course ID, student name, fees paid, date of birth, address)

Subject (subject ID, subject name)

Subject Enrolment (student ID, subject ID)

Teacher (teacher ID, teacher name, teacher address)

Course (course ID, teacher ID, course name)

I’ve called the table Subject Enrolment. I could have left it as the concatenation of both of the related tables (student subject), but I feel it’s better to
rename the table to what it actually captures – the fact a student has enrolled in a subject. This is something I recommend in my SQL Best Practices post.

I’ve also underlined both columns in this table, as they represent the primary key. They can also represent a foreign key, which is why they are also
italicised.

An ERD of these tables looks like this:

This database structure is in second normal form. We almost have a normalised database.

Now, let’s take a look at third normal form.

https://www.databasestar.com/database-normalization/ 12/28
3/9/24, 12:02 PM A Step-By-Step Guide to Normalization in DBMS With Examples

What Is Third Normal Form?


Third normal form is the final stage of the most common normalization process. The rule for this is:

Fulfils the requirements of second normal form


Has no transitive functional dependency

What does this even mean? What is a transitive functional dependency?

It means that every attribute that is not the primary key must depend on the primary key and the primary key only.

For example:

Column A determines column B


Column B determines column C
Therefore, column A determines C

This means that column A determines column B which determines column C. This is a transitive functional dependency, and it should be removed. Column
C should be in a separate table.

We need to check if this is the case for any of our tables.

Student (student ID, course ID, student name, fees paid, date of birth, address)

Do any of the non-primary-key fields depend on something other than the primary key?

No, none of them do. However, if we look at the address, we can see something interesting:

address

3 Main Street, North Boston 56125

16 Leeds Road, South Boston 56128

21 Arrow Street, South Boston 56128

14 Milk Lane, South Boston 56128

We can see that there is a relationship between the ZIP code and the city or suburb. This is common with addresses, and you may have noticed this if you
have filled out any forms for addresses online recently.

How are they related? The ZIP code, or postal code, determines the city, state, and suburb.

In this case, 56128 is South Boston, and 56125 is North Boston. (I just made this up so this is probably inaccurate data).

This falls into the pattern we mentioned earlier: A determines B which determines C.

Student determines the address ZIP code which determines the suburb.

So, how can we improve this?

We can move the ZIP code to another table, along with everything it identifies, and link to it from the student table.

Our table could look like this:

Student (student ID, course ID, student name, fees paid, date of birth, street address, address code ID)

Address Code (address code ID, ZIP code, suburb, city, state)

I’ve created a new table called Address Code, and linked it to the student table. I created a new column for the address code ID, because the ZIP code may
refer to more than one suburb. This way we can capture that fact, and it’s in a separate table to make sure it’s only stored in one place.

Let’s take a look at the other tables:

https://www.databasestar.com/database-normalization/ 13/28
3/9/24, 12:02 PM A Step-By-Step Guide to Normalization in DBMS With Examples

Subject (subject ID, subject name)

Subject Enrolment (student ID, subject ID)

Both of these tables have no columns that aren’t dependent on the primary key.

Teacher (teacher ID, teacher name, teacher address)

The teacher table also has the same issue as the student table when we look at the address. We can, and should use the same approach for storing
address.

So our table would look like this:

Teacher (teacher ID, teacher name, street address, address code ID)

Address Code (address code ID, ZIP code, suburb, city, state)

It uses the same Address Code table as mentioned above. We aren’t creating a new address code table.

Finally, the course table:

Course (course ID, teacher ID, course name)

This table is OK. The course name is dependent on the course ID.

So, what does our database look like now?

Student (student ID, course ID, student name, fees paid, date of birth, street address, address code ID)

Address Code (address code ID, ZIP code, suburb, city, state)

Subject (subject ID, subject name)

Subject Enrolment (student ID, subject ID)

Teacher (teacher ID, teacher name, street address, address code ID)

Course (course ID, teacher ID, course name)

So, that’s how third normal form could look if we had this example.

An ERD of third normal form would look like this:

https://www.databasestar.com/database-normalization/ 14/28
3/9/24, 12:02 PM A Step-By-Step Guide to Normalization in DBMS With Examples

Stopping at Third Normal Form

For most database normalisation exercises, stopping after achieving Third Normal Form is enough.

It satisfies a good relationship rules and will greatly improve your data structure from having no normalisation at all.

There are a couple of steps after third normal form that are optional. I’ll explain them here so you can learn what they are.

Fourth Normal Form and Beyond


Fourth normal form is the next step after third normal form.

What does it mean?

It needs to satisfy two conditions:

Meet the criteria of third normal form.


There are no non-trivial multivalued dependencies other than a candidate key.

So, what does this mean?

A multivalued dependency is probably better explained with an example, which I’ll show you shortly. It means that there are other attributes in the table
that are not dependent on the primary key, and can be moved to another table.

Our database looks like this:

https://www.databasestar.com/database-normalization/ 15/28
3/9/24, 12:02 PM A Step-By-Step Guide to Normalization in DBMS With Examples

Student (student ID, course ID, student name, fees paid, date of birth, street address, address code ID)

Address Code (address code ID, ZIP code, suburb, city, state)

Subject (subject ID, subject name)

Subject Enrolment (student ID, subject ID)

Teacher (teacher ID, teacher name, street address, address code ID)

Course (course ID, teacher ID, course name)

This meets the third normal form rules.

However, let’s take a look at the address fields: street address and address code.

Both the student and teacher table have these


What if a student moves addresses? Do we update the address in this field? If we do, then we lose the old address.
If an address is updated, is it because they moved? Or is it because there was an error in the old address?
What if two students have the same street address. Are they actually at the same address? What if we update one and not the other?
What if a teacher and a student are at the same address?
What if we want to capture a student or a teacher having multiple addresses (for example, postal and residential)?

There are a lot of “what if” questions here. There is a way we can resolve them and improve the quality of the data.

This is a multivalued dependency.

We can solve this by moving the address to a separate table.

The address can then be linked to the teacher and student tables.

Let’s start with the address table.

Address (address ID, street address, address code ID)

In this table, we have a primary key of address ID, and we have stored the street address here. The address code table stays the same.

We need to link this to the student and teacher tables. How do we do this?

Do we also want to capture the fact that a student or teacher can have multiple addresses? It may be a good idea to future proof the design. It’s
something you would want to confirm in your organisation.

For this example, we will design it so there can be multiple addresses for a single student.

Our tables could look like this:

https://www.databasestar.com/database-normalization/ 16/28
3/9/24, 12:02 PM A Step-By-Step Guide to Normalization in DBMS With Examples

Student (student ID, course ID, student name, fees paid, date of birth)

Address (address ID, street address, address code ID)

Address Code (address code ID, ZIP code, suburb, city, state)

Student Address (address ID, student ID)

Subject (subject ID, subject name)

Subject Enrolment (student ID, subject ID)

Teacher (teacher ID, teacher name)

Teacher Address (teacher ID, address ID)

Course (course ID, teacher ID, course name)

An ERD would look like this:

A few changes have been made here:

The address code ID has been removed from the Student table, because the relationships between student and address is now captured in the
joining table called Student Address.
The teacher’s address is also captured in the joining table Teacher Address, and the address code ID has been removed from the Teacher table. I
couldn’t think of a better name for each of these tables.
Address still links to address code ID

This design will let you do a few things:

Store multiple addresses for each student or teacher


Store additional information about an address relationship to a teach or student, such as an effective date (to track movements over time) or an
address type (postal, residential)
Determine which students or teachers live at the same address with certainty (it’s linked to the same record).

So, that’s how you can achieve fourth normal form on this database.

There are a few enhancements you can make to this design, but it depends on your business rules:

Combine the student and teacher tables into a person table, as they are both effectively people, but teachers teach a class and students take a
class. This table could then link to subjects and define that relationship as “teaches” or “is enrolled in”, to cater for this.
Relate a course to a subject, so you can see which subjects are in a course
Split the address into separate fields for unit number, street number, address line 1, address line 2, and so on.
Split the student name and teacher name into first and last names to help with displaying data and sorting.

https://www.databasestar.com/database-normalization/ 17/28
3/9/24, 12:02 PM A Step-By-Step Guide to Normalization in DBMS With Examples

These changes could improve the design, but I haven’t detailed them in any of these steps as they aren’t required for fourth normal form.

I hope this explanation has helped you understand what the normal forms are and what normalization in DBMS is. Do you have any questions on this
process? Share them in the section below.

Lastly, if you enjoy the information and career advice I’ve been providing, sign up to my newsletter below to stay up-to-date on my articles. You’ll also
receive a fantastic bonus. Thanks!

← Previous Post Next Post →

66 thoughts on “Database Normalization: A Step-By-Step-Guide With Examples”

JAMES SHALLOW
APRIL 9, 2018 AT 6:19 AM

Thanks so much for explaining this concept Ben. To me as a learner, this is the best way to grab this concept.

You broke this down to the last atom.


Keep up the good work!

Reply

BEN
APRIL 10, 2018 AT 4:40 AM

Thanks James, glad you found it helpful!

Reply

NICKY
SEPTEMBER 21, 2023 AT 8:25 PM

Absolutely, the best and easiest explanation I have seen. Very helpful.

Reply

RONALD LÓPEZ
AUGUST 22, 2018 AT 8:01 AM

Saludos Ben, buen post. Podrías por favor revisar la simbología que utilizaste en la relación de las tablas Student y Course, dado que comentaste en las
líneas de arriba “significa que la identificación del curso entra en la tabla de estudiantes.” Comenta si la relación sería: Course -< Student

Reply

BEN
AUGUST 25, 2018 AT 8:48 AM

Hi Ronald,
Sure, I’ll check this and update the post.
(Google Translate: Greetings Ben, good post. Could you please check the symbology you used in the Student and Course table relationship,
since you commented on the lines above “it means that the course identification enters the student table.” Comment if the relationship would
be: Course – < Student)

Reply

RONALD LÓPEZ
AUGUST 24, 2018 AT 3:32 AM

Saludos Ben, la simbología de Courses y Student según planteaste es de “1 a n” verifica si sería ” -< "
Buen post.

https://www.databasestar.com/database-normalization/ 18/28
3/9/24, 12:02 PM A Step-By-Step Guide to Normalization in DBMS With Examples

Reply

BEN
AUGUST 25, 2018 AT 8:48 AM

Thanks Ronald!
(Google Translate: Greetings Ben, the symbology of Courses and Student as you raised is “1 to n” verifies if it would be “- <" Good post.)

Reply

KOK
OCTOBER 23, 2018 AT 2:58 PM

thank you for sharing these things to us , damn i really love it. You guys are really awesome

Reply

BOLARINWA
NOVEMBER 7, 2018 AT 7:48 AM

i dont understand the second normal form


linking student id to course id isnt clear

Reply

BLESSEDJENY
FEBRUARY 4, 2019 AT 6:18 PM

Excellent working example

Reply

BEN
FEBRUARY 14, 2019 AT 3:53 AM

Glad you found it useful!

Reply

DJ
APRIL 26, 2019 AT 2:55 PM

This is a nice compilation and a good illustration of how to carry out table normalization. I wish you can provide script samples that represent the ERD you
presented here. It will be so much helpful.

Reply

BEN
MAY 18, 2019 AT 3:40 PM

Hi DJ,
Glad you like the article. Good idea – I’ll create some sample scripts and add them to the post.
Thanks,
Ben

Reply

https://www.databasestar.com/database-normalization/ 19/28
3/9/24, 12:02 PM A Step-By-Step Guide to Normalization in DBMS With Examples

NATI
APRIL 28, 2019 AT 7:11 AM

Good job! This is a great way explaining this topic. You made it look easy to understand. But, one question I have for you is where is a best scenario in real
life used the fourth normal form?

Reply

BEN
MAY 18, 2019 AT 3:43 PM

Hi Nati,
Thanks, I’m glad you like the article.
I’m not sure what would be a realistic example of using fourth normal form. I think many 3NF databases can be turned into 4NF, depending on
how to want to record values (either a value in a table or an ID to another table). I haven’t used it that often.

Reply

SUBZAR AHMED
MAY 15, 2019 AT 5:25 PM

Dear Sir’
Can we call the Fourth Normal Form as a BCNF (Boyce Codd Normal Form).
Or not?

Reply

BEN
MAY 18, 2019 AT 3:23 PM

Hi Subzar, I think Fourth Normal Form is slightly different to BCNF. I haven’t used either method but I know they are a little different. I think 4NF
comes after BCNF.

Reply

AJAY
JUNE 5, 2019 AT 11:59 AM

Hey Ben,
Your notes are too helpful. Will recommend my other friends for sure.
Thanks a lot :)

Reply

SCOTTPLETCHER
JULY 24, 2019 AT 5:16 AM

You’ve done a truly superb job on the majority of this. You’ve used far more details than most people who provide examples do.

Particularly good is the splitting of addresses into a separate table. That is just not done nearly enough.

However, two tables have critical problems:


(1) The Student table should not contain Course ID (nor fees paid); there should be a separate Student_Course intersection table.
(2) Similarly, the Course table should not have a Teacher ID, likewise a separate Course_Teacher intersection table should be created.

Reasoning (condensed):
(1) A student must be able to enroll first, without yet specificying a course. The student’s name and other enrollment data are not related to any specific
course.
(2) A course has data of its own not related to the teacher: # of credit/lab hours, cost(s), first term offered, last term offered, etc.. Should the only teacher
currently teaching a course withdraw from it, the course data should not be lost. Courses have prerequisites, sometimes complex ones, that have nothing
to do with who is teaching the course.

Reply

https://www.databasestar.com/database-normalization/ 20/28
3/9/24, 12:02 PM A Step-By-Step Guide to Normalization in DBMS With Examples

BEN
JULY 25, 2019 AT 5:28 AM

Thanks for the feedback Scott and glad you like the post!
I understand your reasoning, and yes ideally there would be those two intersection tables. These scenarios are things that we, as developers,
would clarify during the design.
The example I went with was a simple one where a student must have a course and a course must have a teacher, but in the real world it would
cater to your scenarios.
Thanks again,
Ben

Reply

TATYI CHEUNG
NOVEMBER 4, 2019 AT 9:36 PM

Good

Reply

PATRICK STAR
NOVEMBER 30, 2019 AT 12:20 AM

bery good

Reply

KRITHIESH PRABHU KUMAR


JANUARY 7, 2020 AT 4:22 AM

Simple & powerfully conveyed. Thank you.

Reply

BEN
JANUARY 7, 2020 AT 12:31 PM

Glad you found it helpful!

Reply

ANJUM
MARCH 22, 2020 AT 5:44 AM

As a grade 11 teacher, I am well aware of the complexities students face when teaching/explaining the concepts within this topic. This article is brilliant
and breaks down such a confusing topic quite nicely with examples along the way. My students will definitely benefit from this, thank you so much.

Reply

MASUNDA BUNDU
APRIL 30, 2020 AT 7:03 AM

I am just starting out in SQL and this was quite helpful. Thanks.

Reply

https://www.databasestar.com/database-normalization/ 21/28
3/9/24, 12:02 PM A Step-By-Step Guide to Normalization in DBMS With Examples

MANI CHANDU
MAY 31, 2020 AT 2:52 AM

what a tremendous insights about the normalisation you have explained and its gave me a lot of confedence Thank u some much ben
my deepest gratitude for sharing knowledge . you truly proved sharing is caring
with regards
chandu, india

Reply

TIM REEKS
JUNE 20, 2020 AT 9:13 PM

Hi

Came across your material while searching for Normalisation material, as wanting to use my time to improve my Club Membership records, having gained
a ‘Database qualification’ some 20 to 30 years years ago, I think, I needed to refresh my memory! Anyway – some queries:

1. Shouldn’t the Student Name be broken down or decomposed into StudentForename and StudentSurname, since doesn’t 1NF require this?
2. Shouldn’t Teacher Name be converted as per 121 above as well?

This would enable records to be retrieved using Surname or Forename

Reply

BEN
JUNE 21, 2020 AT 2:15 PM

Hi Tim, yes that’s a good point and it would be better to break it into two names for that purpose.

Reply

TOBI
JUNE 22, 2020 AT 4:26 AM

Wow, this is the very first website i finally thoroughly understood normalization. Thanks a lot.

Reply

TIM REEKS
JUNE 25, 2020 AT 6:56 PM

Hello again

I have thought thru the data I need to record, I think, time will tell I suspect. Anyway we run upto 18 competitions, played by teams of 1, 2 3 or 4 members,
thus I think that there may be Many to Many relationship between Member and Competition tables, as in Many Member records may be related to Many
Competition records [potential for a Pairs or Triples or Fours teams to win all the Competitions they enter], am I correct?

Also should I design the Competition table as CompID, Year, Comp1, Comp 2, Comp3, each Header having the name of the Competition, then I presume a
table that links the two, along the lines of:

CompID, Year, MemberID OR MemberID, Year, CompID

Regards, Tim

Reply

TIM
JUNE 28, 2020 AT 6:33 PM

Hello again, thinking further, I presume that I could create 18 tables, one per Competition to capture the annual results.

https://www.databasestar.com/database-normalization/ 22/28
3/9/24, 12:02 PM A Step-By-Step Guide to Normalization in DBMS With Examples

Again though, presume my single Comp table (see above) shouldn’t have a column per comp, as this is a repeating group

So do I create a ‘joining table’, that records Year and Comp, another that records Member and Comp and one that records Member and Year.

I may be over thinking it, but as No Comp this year, I think I need to be able to record this, I think

Reply

BEN
JUNE 29, 2020 AT 5:46 AM

Hi Tim, good question. You could create one table per competition, but you would have 18 tables which have the same structure and different
data, which is quite repetitive. I would suggest having a single table for comp, and a column that indicates which competition it is for. It’s OK for
a table to have a value that is repeated like this, as it identifies which competition something refers to.
Yes you could create joining tables for Year and Comp (if there is a many to many relationship between them) and Member and Comp as well.
What’s the relationship between Member and Year?

Reply

TIM REEKS
JULY 1, 2020 AT 12:45 AM

Hello Ben

Thanks for reply, however, would it be easier to say create a Comp table of 18 records, a Comp Type table which has 2 records, that is Annual and One Day,
another table for Comp Year, which will record the annual competition results based on:

CompYear – CompTypeID – CompID – MemID

thus each year would create 32 records [10 x 1 + 4 x2 + 2 x3 + 2 x 4], e.g

2019 – 1 – 4 – 1
2019 – 1 – 4 – 2
2019 – 1 – 4 – 3
2019 – 1 – 4 – 4
2019 – 1 – 3 – 1
2019 – 1 – 3 – 2
2019 – 1 – 3 – 3
2019 – 2 – 1 – 1

so members 1 to 4 won the 2019 one day comp Bickford Fours; and
members 1 to 3 won the 2019 one day comp Arthur Johnson Triples; and
member 1 won the 2019 annual singles championship

Would this work and have I had a light bulb moment?

Reply

BEN
JULY 7, 2020 AT 7:40 AM

Hi Tim, yes I think that would work! Storing the comp data separately from the comp type (and so on) will ensure the data is not repeated and is
only stored in one place. Good luck!

Reply

ANDREW J. CHRUSCICKI
JULY 22, 2020 AT 1:31 AM

Ben,
Thank you so much! I was only able to grasp the concept of normalization in one hour or so because how you simplified the concepts through a simple
running example.

Reply

PRIDE

https://www.databasestar.com/database-normalization/ 23/28
3/9/24, 12:02 PM A Step-By-Step Guide to Normalization in DBMS With Examples
AUGUST 6, 2020 AT 3:10 AM

Thanks a lot sir Daniel i have really understood this you are a great teacher

Reply

WOJTEK
NOVEMBER 6, 2020 AT 8:05 AM

Firstly :Thank you for your website. Secondly: I have still problem with understanding the Second normal form. For example you have written : “student
name: Yes, this is dependent on the primary key. A different student ID means a different student name.”. So in your design I am not allowed to have 2
students with same names? What will happen when this, not so uncommon situation occurs?

Reply

BEN
NOVEMBER 6, 2020 AT 2:26 PM

Glad you like it Wojtek!


Yes, in this design you can have two students with the same name. They would have different IDs and different rows in the table, as two people
with the same name is common like you mentioned.

Reply

OLIVER OVIEDO
NOVEMBER 9, 2020 AT 6:04 AM

Fantastic

This is the best explanation on why and how to normalize Tables… excellent work, maybe the best explanation out there….

Reply

SIDIEU DELPHIN
NOVEMBER 16, 2020 AT 9:46 AM

Hi, Thanks for the post. That is exactly what I was looking for. But I have a question, how would I insert into student address and teacher address. Best
regards

Reply

KAT
JULY 1, 2021 AT 4:22 PM

This is amazing, very well explained. Simple example made it easy to understand. Thank you so much!

Reply

ELBI
AUGUST 4, 2021 AT 6:38 PM

Thanks sir, this very helpful

Reply

https://www.databasestar.com/database-normalization/ 24/28
3/9/24, 12:02 PM A Step-By-Step Guide to Normalization in DBMS With Examples

ROB
JULY 16, 2022 AT 6:52 AM

Sorry but your example is not in 1NF. 1NF dictates that you cannot have a repeating group. The subjects are a repeating group. Not saying you got the
design wrong at the end just that you failed to remove the repeating group in 1NF. When you went to 2NF you made rules for the repeating group multiple
times and took care of it but it should have been taken care of in 1NF.

Reply

JOEL
AUGUST 4, 2022 AT 8:04 PM

Its so helpful easy to understand

Reply

DELINDA ROSS
AUGUST 27, 2022 AT 2:05 AM

This is great!
You make it easy and simple to learn where I can understand.

Reply

BRU
SEPTEMBER 23, 2022 AT 12:21 AM

If you”re going to break address into atoms, (unit, street, city, zip, etc.), would you also break down telephone numbers, (country code, area code,
exchange, unit number) into separate fields in another table? How elemental do you go?

Reply

MOHAMMAD TAUFEEQ
NOVEMBER 5, 2022 AT 10:30 PM

Clearly explained :)

Reply

MADRIEN
NOVEMBER 17, 2022 AT 11:06 PM

Wow c’est vraiment très utile, ça vient d’apporter un plus ma connaissance. Vraiment merci beaucoup pour cet article.

Reply

DC
DECEMBER 5, 2022 AT 3:17 PM

Hello,

Great post!
Just wanted to check – for the ERD diagram in the 2NF example, wouldn’t the relationship between Course and Student be the other way around? A
course can have many students, but a student can only be enrolled in one course. So the crows feet symbol should be on the Student table.
https://www.databasestar.com/database-normalization/ 25/28
3/9/24, 12:02 PM A Step-By-Step Guide to Normalization in DBMS With Examples

Reply

PC
FEBRUARY 15, 2024 AT 1:03 PM

Hi Ben,
I am hoping you can respond to the question above. I noticed the same and I was hoping to find a correction or explanation in the notes.
I second everyone else’s compliments on a great post, too.

Reply

INNOCENT ISAACK
JANUARY 8, 2023 AT 11:17 AM

From table Student (student ID, course ID, student name, fees paid, date of birth, street address, address code ID)
why you put “street address” column remember that “address code ID” is available idenitify address of student i dont understand here

Reply

ALINA
JANUARY 10, 2023 AT 10:20 PM

Hi Ben ! ive got a question for you.


i didnt understand this part:

Does the combination of all columns make a unique row every single time?

No. There could be the same combination of data, and it would represent a different row. There could be the same values for this row and it would be a
separate row (even though it is rare).

So that means in a first normal form duplicate roes are allowed?

Reply

YOUVRAJ POTHEEGADOO
FEBRUARY 9, 2023 AT 10:28 PM

I stumbled on this article explaining so succinctly about normalization of database. I must admit that it helped me understand the concept using an
example far much better than just theory.

Kudos to you @Ben – You are really a DB star.

Will recommend it to my friends. I am in the process of putting this knowledge into an Employee Management System.

Reply

KELVIN
FEBRUARY 20, 2023 AT 1:10 AM

Mmnh ! Thank you sir, it’s very awesome tutor. 👍


Reply

DAN
MARCH 7, 2023 AT 7:35 AM

The model breaks as soon as you add a 2nd course for a student. CourseID should not be in the student table. You need a separate table that ties
studentID and courseID together

https://www.databasestar.com/database-normalization/ 26/28
3/9/24, 12:02 PM A Step-By-Step Guide to Normalization in DBMS With Examples

Reply

BEN
MARCH 8, 2023 AT 7:45 AM

That’s a good point. Yes you would need a separate table for that scenario.

Reply

DINO
JUNE 2, 2023 AT 9:54 AM

The ERDs for the 2nd / 3rd / 4th normal form show the crows foot at the COURSE table, but the STUDENT table receives the FK course_id. The crows
foot represents the many end of the relationship, but there is no representation in the course table of any student data point.

Imho, this is where it is broken in addition to what Dan has pointed to.

Reply

BLACK HERETIC
MAY 27, 2023 AT 11:59 PM

Loved it – easy explanation. Thank you – love from India

Reply

MARVIL
JUNE 27, 2023 AT 11:30 PM

Well done Ben, this is the best normalization tutorial I have ever seen. Thanks so much. keep up the good work.

Reply

TRINASW
AUGUST 2, 2023 AT 2:18 AM

Thanks for taking the time to create this. This is one of the best breakdowns that I’ve read regarding DB Normalization. However, I can’t really wrap my
mind around 4NF as I would like to know how to use it in a real life scenario. Thanks again as this was extremely helpful to me!

Reply

VIJAY
AUGUST 6, 2023 AT 2:20 PM

Thank you . Very clear explanation

Reply

ELODIE
OCTOBER 17, 2023 AT 3:09 AM

If the statement is true that each student can only have 1 course, then the relationship is not shown correctly in the ERD.

Reply

https://www.databasestar.com/database-normalization/ 27/28
3/9/24, 12:02 PM A Step-By-Step Guide to Normalization in DBMS With Examples

KELAN
DECEMBER 5, 2023 AT 7:12 PM

Hi, I don’t quite understand how your example satisfies third normal form. It seems that street address could depend on address ID, which depends upon
the primary key. In other words, if address ID were to change, I would assume street address would also have to change. So doesn’t that create transitive
dependency?

ii) I also don’t understand why you made two address tables, couldn’t they be combined into one?

Thanks.

Reply

TEJAS
JANUARY 7, 2024 AT 3:33 PM

Hi ben thanks for superb work.

What I always think that how to determine functional dependency? In this example , After 1NF, we know that course is not functionally dependent on
student id because it is day to day example & it make sense.

however While designing DB for client whose domain is unknown to us, how to know effectively the functional dependency from client? What question we
should ask as layman?

Reply

TASLIM
JANUARY 14, 2024 AT 4:12 PM

Thank you very much

Reply

Leave a Comment
Your email address will not be published. Required fields are marked *

Type here..

Name* Email* Post Comment »

Website This site uses Akismet to reduce spam. Learn how your comment data is processed.

Copyright © 2024 Database Star | Powered by Astra WordPress Theme

https://www.databasestar.com/database-normalization/ 28/28

You might also like