Cse & It Interview Questions: 1. What Is Normalization? Why Is It Needed? Ans
Cse & It Interview Questions: 1. What Is Normalization? Why Is It Needed? Ans
DBMS:
1. What is normalization? Why is it needed?
The goal of database normalization is to decompose relations with anomalies in order to produce
smaller, well-structured relations. Normalization usually involves dividing large, badly-formed
tables into smaller, well-formed tables and defining relationships between them. The objective is
to isolate data so that additions, deletions, and modifications of a field can be made in just one
table and then propagated through the rest of the database via the defined relationships.
Edgar F. Codd, the inventor of the relational model, introduced the concept of normalization and
what we now know as the First Normal Form (1NF) in 1970.[1] Codd went on to define the
Second Normal Form (2NF) and Third Normal Form (3NF) in 1971,[2] and Codd and Raymond
F. Boyce defined the Boyce-Codd Normal Form (BCNF) in 1974.[3] Higher normal forms were
defined by other theorists in subsequent years, the most recent being the Sixth normal form
(6NF) introduced by Chris Date, Hugh Darwen, and Nikos Lorentzos in 2002.[4]
A standard piece of database design guidance is that the designer should create a fully
normalized design; selective denormalization can subsequently be performed for performance
reasons.[6] However, some modeling disciplines, such as the dimensional modeling approach to
data warehouse design, explicitly recommend non-normalized designs, i.e. designs that in large
part do not adhere to 3NF.[7]
author_id: stories:
000024 novelist, playwright // multiple
values
000034 magazine columnist
002345 novella, newpaper columnist // multiple
values
author_id: stories:
000024 novelist
000024 playwright
000034 magazine columnist
002345 novella
002345 newpaper columnist
Name Description
First Normal Form An entity is in First Normal Form (1NF) when all tables are two-dimensional with no
repeating groups.
A row is in first normal form (1NF) if all underlying domains contain atomic values
only. 1NF eliminates repeating groups by putting each into a separate table and
connecting them with a one-to-many relationship. Make a separate table for each
set of related attributes and uniquely identify each record with a primary key.
An entity is in Second Normal Form (2NF) when it meets the requirement of being
in First Normal Form (1NF) and additionally:
Does not have a composite primary key. Meaning that the primary key can
not be subdivided into separate logical entities.
All the non-key columns are functionally dependent on the entire primary
Second Normal key.
Form A row is in second normal form if, and only if, it is in first normal form and
every non-key attribute is fully dependent on the key.
2NF eliminates functional dependencies on a partial key by putting the
fields in a separate table from those that are dependent on the whole key.
An example is resolving many:many relationships using an intersecting
entity.
An entity is in Third Normal Form (3NF) when it meets the requirement of being in
Second Normal Form (2NF) and additionally:
Boyce Codd Normal Form (BCNF) is a further refinement of 3NF. In his later
writings Codd refers to BCNF as 3NF. A row is in Boyce Codd normal form if, and
only if, every determinant is a candidate key. Most entities in 3NF are already in
BCNF.
Boyce-Codd Normal
Form BCNF covers very specific situations where 3NF misses inter-dependencies
between non-key (but candidate key) attributes. Typically, any relation that is in
3NF is also in BCNF. However, a 3NF relation won't be in BCNF if (a) there are
multiple candidate keys, (b) the keys are composed of multiple attributes, and (c)
there are common attributes between the keys.
Fourth Normal Form An entity is in Fourth Normal Form (4NF) when it meets the requirement of being in
Third Normal Form (3NF) and additionally:
An entity is in Fifth Normal Form (5NF) if, and only if, it is in 4NF and every join
Fifth Normal Form
dependency for the entity is a consequence of its candidate keys.
The various types of key with e.g. in SQL are mentioned below, (For examples let suppose we have an Employee
Table with attributes ‘ID’ , ‘Name’ ,’Address’ , ‘Department_ID’ ,’Salary’)
(I) Super Key – An attribute or a combination of attribute that is used to identify the records uniquely is known as
Super Key. A table can have many Super Keys.
E.g. of Super Key
1 ID
2 ID, Name
3 ID, Address
4 ID, Department_ID
5 ID, Salary
6 Name, Address
7 Name, Address, Department_ID ………… So on as any combination which can identify the records uniquely will be
a Super Key.
(II) Candidate Key – It can be defined as minimal Super Key or irreducible Super Key. In other words an attribute
or a combination of attribute that identifies the record uniquely but none of its proper subsets can identify the
records uniquely.
E.g. of Candidate Key
1 Code
2 Name, Address
For above table we have only two Candidate Keys (i.e. Irreducible Super Key) used to identify the records from the
table uniquely. Code Key can identify the record uniquely and similarly combination of Name and Address can
identify the record uniquely, but neither Name nor Address can be used to identify the records uniquely as it might
be possible that we have two employees with similar name or two employees from the same house.
(III) Primary Key – A Candidate Key that is used by the database designer for unique identification of each row in
a table is known as Primary Key. A Primary Key can consist of one or more attributes of a table.
E.g. of Primary Key - Database designer can use one of the Candidate Key as a Primary Key. In this case we have
“Code” and “Name, Address” as Candidate Key, we will consider “Code” Key as a Primary Key as the other key is
the combination of more than one attribute.
(IV) Foreign Key – A foreign key is an attribute or combination of attribute in one base table that points to the
candidate key (generally it is the primary key) of another table. The purpose of the foreign key is to ensure
referential integrity of the data i.e. only values that are supposed to appear in the database are permitted.
E.g. of Foreign Key – Let consider we have another table i.e. Department Table with Attributes “Department_ID”,
“Department_Name”, “Manager_ID”, ”Location_ID” with Department_ID as an Primary Key. Now the
Department_ID attribute of Employee Table (dependent or child table) can be defined as the Foreign Key as it can
reference to the Department_ID attribute of the Departments table (the referenced or parent table), a Foreign Key
value must match an existing value in the parent table or be NULL.
(V) Composite Key – If we use multiple attributes to create a Primary Key then that Primary Key is called
Composite Key (also called a Compound Key or Concatenated Key).
E.g. of Composite Key, if we have used “Name, Address” as a Primary Key then it will be our Composite Key.
(VI) Alternate Key – Alternate Key can be any of the Candidate Keys except for the Primary Key.
E.g. of Alternate Key is “Name, Address” as it is the only other Candidate Key which is not a Primary Key.
(VII) Secondary Key – The attributes that are not even the Super Key but can be still used for identification of
records (not unique) are known as Secondary Key.
E.g. of Secondary Key can be Name, Address, Salary, Department_ID etc. as they can identify the records but they
might not be unique.
6. What is entity integrity, referential integrity etc?