0% found this document useful (0 votes)
0 views19 pages

ERD Model and Normalization: Database

The document provides a comprehensive guide on the Entity-Relationship (ER) model and its application in database design, detailing the purpose, advantages, and components such as entities, attributes, and relationships. It also covers advanced concepts like the Enhanced Entity-Relationship (EER) model, transformation of EER into relations, and mapping rules for implementing a relational schema. Key topics include cardinalities, weak entities, relationship strengths, and the process of normalization to 3NF.

Uploaded by

abrarhussain1390
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
0 views19 pages

ERD Model and Normalization: Database

The document provides a comprehensive guide on the Entity-Relationship (ER) model and its application in database design, detailing the purpose, advantages, and components such as entities, attributes, and relationships. It also covers advanced concepts like the Enhanced Entity-Relationship (EER) model, transformation of EER into relations, and mapping rules for implementing a relational schema. Key topics include cardinalities, weak entities, relationship strengths, and the process of normalization to 3NF.

Uploaded by

abrarhussain1390
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

Database Study Guide: ERD to 3NF Normalization

Phase 2: ERD and Database Design Concepts

ER Model Concept of Relational Diagram and its Purpose and Advantages

The Entity-Relationship (ER) model is a high-level conceptual data model. It is a design tool that allows us
to view the real world as entities and relationships. It is widely used in database design because it is easy
to understand and can be easily converted into a relational schema. The ER model was proposed by Peter
Chen in 1976.

Purpose of ER Diagrams:

Conceptual Design: ER diagrams help in the conceptual design of a database by providing a


graphical representation of the data and its relationships. This allows designers to understand the
data requirements of an organization before implementing the database.

Communication Tool: ER diagrams serve as an effective communication tool between database


designers, developers, and end-users. They provide a clear and unambiguous representation of the
data structure, facilitating discussions and ensuring that all stakeholders have a shared
understanding of the database.

Blueprint for Database Implementation: An ER diagram acts as a blueprint for creating the
relational schema (tables, columns, and relationships) in a relational database management system
(RDBMS). It helps in translating the conceptual design into a logical design.

Documentation: ER diagrams provide excellent documentation of the database structure, which is


crucial for maintenance, future enhancements, and understanding the system over time.

Advantages of ER Diagrams:

Simplicity: ER diagrams are relatively simple to understand and use, even for non-technical users,
due to their graphical nature.

Effectiveness in Data Modeling: They are highly effective in modeling complex real-world scenarios
into a database structure.

Easy Conversion to Relational Model: The concepts in an ER model (entities, attributes,


relationships) directly map to concepts in the relational model (tables, columns, foreign keys),
making the conversion process straightforward.

Better Understanding of Data: By visualizing the relationships between different data elements, ER
diagrams provide a better understanding of the data and its constraints.
Entities, Attributes, Relationships, Cardinalities, Connectivity, Existence Dependency,
Composite Entities

Entities:

An entity is a real-world object that is distinguishable from other objects. It can be a person, place, event,
or concept about which data is stored. In an ER diagram, an entity is represented by a rectangle. Examples
include Student , Course , Professor , Department .

Entity Set: A collection of similar entities. For example, all students in a university form an entity
set.

Attributes:

An attribute is a property or characteristic of an entity. It describes the entity. In an ER diagram, attributes


are represented by ovals and are connected to their respective entities. Examples for a Student entity
could be StudentID , Name , Address , DateOfBirth .

Simple Attribute: An attribute that cannot be further divided into smaller components. E.g., Age .

Composite Attribute: An attribute that can be divided into smaller sub-parts, each with its own
meaning. E.g., Address can be composed of Street , City , State , ZipCode . In an ER diagram, a
composite attribute is represented by an oval connected to the main attribute oval, which is then
connected to the entity.

Single-valued Attribute: An attribute that holds a single value for a particular entity. E.g.,
StudentID .

Multi-valued Attribute: An attribute that can have multiple values for a single entity. E.g., Phone
Number (a student might have multiple phone numbers). In an ER diagram, a multi-valued attribute
is represented by a double oval.

Derived Attribute: An attribute whose value can be derived from other attributes. It is not stored in
the database but calculated when needed. E.g., Age can be derived from DateOfBirth . In an ER
diagram, a derived attribute is represented by a dashed oval.

Key Attribute: An attribute or a set of attributes that uniquely identifies an entity in an entity set. In
an ER diagram, a key attribute is underlined. E.g., StudentID for a Student entity.

Relationships:

A relationship describes an association among two or more entities. It shows how entities are connected
or related to each other. In an ER diagram, a relationship is represented by a diamond shape and is
connected to the participating entities by lines. Examples include Enrolls (between Student and
Course ), Teaches (between Professor and Course ).

Relationship Set: A collection of similar relationships. For example, all instances of students
enrolling in courses form a relationship set.

Cardinalities (Mapping Cardinalities):


Cardinality defines the number of instances of one entity that can be associated with the number of
instances of another entity through a relationship. It specifies the minimum and maximum number of
relationship occurrences in which an entity can participate. The common types are:

One-to-One (1:1): An instance of entity A is associated with at most one instance of entity B, and an
instance of entity B is associated with at most one instance of entity A. E.g., A Person Manages a
Department (where a department has only one manager, and a person manages only one
department).

One-to-Many (1:N): An instance of entity A is associated with one or more instances of entity B, but
an instance of entity B is associated with at most one instance of entity A. E.g., A Department Has
many Employees , but an Employee belongs to only one Department .

Many-to-One (N:1): An instance of entity A is associated with at most one instance of entity B, but
an instance of entity B is associated with one or more instances of entity A. E.g., Many Employees
WorkFor one Department .

Many-to-Many (M:N): An instance of entity A is associated with one or more instances of entity B,
and an instance of entity B is associated with one or more instances of entity A. E.g., Students
EnrollIn many Courses , and Courses have many Students .

Connectivity:

Connectivity describes the relationship classification (e.g., one-to-one, one-to-many, many-to-many). It is


often used interchangeably with cardinality, but more precisely, cardinality specifies the exact number of
instances, while connectivity describes the type of relationship.

Existence Dependency (Participation Constraints):

Existence dependency specifies whether the existence of an entity depends on the existence of another
entity in a relationship. It defines whether an entity's participation in a relationship is mandatory or
optional.

Total Participation (Mandatory): Every entity in the entity set must participate in the relationship.
Represented by a double line connecting the entity to the relationship. E.g., Every Employee
WorksFor a Department .

Partial Participation (Optional): Not every entity in the entity set needs to participate in the
relationship. Represented by a single line connecting the entity to the relationship. E.g., A
Professor MayTeach a Course (a professor might not be teaching any course in a given semester).

Composite Entities (Associative Entities / Junction Tables):

A composite entity, also known as an associative entity or a junction table, is an entity that represents a
many-to-many relationship between two or more entities. When a many-to-many relationship is
converted to a relational model, it typically requires an intermediate table (the composite entity) to
resolve the relationship into two one-to-many relationships. This composite entity often has its own
attributes in addition to the primary keys of the entities it connects. E.g., in a Student EnrollIn Course
(M:N) relationship, an Enrollment composite entity might be created with attributes like EnrollmentID ,
Grade , DateEnrolled , along with StudentID and CourseID as foreign keys forming a composite
primary key.
Relationship Strengths, Weak Entities, Relationship Degree, Recursive Relationships

Relationship Strengths:

Relationship strength refers to how dependent one entity is on another in a relationship. It is primarily
categorized into identifying (strong) and non-identifying (weak) relationships.

Strong (Identifying) Relationship: A relationship where the primary key of the child entity includes
the primary key of the parent entity as part of its own primary key. This implies that the child entity
cannot be uniquely identified without the parent entity. This is typically seen with weak entities.

Weak (Non-Identifying) Relationship: A relationship where the primary key of the child entity does
not include the primary key of the parent entity. The child entity can be uniquely identified on its
own, or its primary key is independent of the parent's primary key.

Weak Entities:

A weak entity is an entity that cannot be uniquely identified by its own attributes alone. It depends on
another entity (called the identifying or owner entity) for its existence and primary key. The primary key
of a weak entity is formed by a combination of its partial key (discriminator) and the primary key of its
owner entity.

Representation: In an ER diagram, a weak entity is represented by a double rectangle, and the


identifying relationship connecting it to its owner entity is represented by a double diamond. The
partial key of the weak entity is usually underlined with a dashed line.

Example: Consider a Dependent entity that belongs to an Employee . A dependent cannot exist
without an employee, and their identity might be (EmployeeID, DependentName) . Here,
DependentName is the partial key, and EmployeeID comes from the Employee entity.

Relationship Degree:

Relationship degree refers to the number of entity types participating in a relationship. The most
common degrees are unary, binary, and ternary.

Unary (Recursive) Relationship: A relationship where both participants are the same entity type.
This occurs when an entity relates to itself. E.g., an Employee Manages other Employees .

Binary Relationship: A relationship between two different entity types. This is the most common
type of relationship. E.g., a Student EnrollsIn a Course .

Ternary Relationship: A relationship among three different entity types. These are less common
but are used when an association inherently involves three entities. E.g., a Student RegistersFor
a Course with a specific Professor in a particular Semester .

Recursive Relationships:

A recursive relationship is a special type of unary relationship where an entity type relates to itself. It
represents different roles that instances of the same entity type can play in the relationship.

Example: In an Employee entity, one employee can be a Manager and manage other Employees
who are Subordinates . Both Manager and Subordinate are roles played by instances of the
Employee entity. In an ER diagram, a recursive relationship is shown by a relationship diamond
connected to the same entity type twice, with labels indicating the roles.

Enhanced ER Model (EERD)

The Enhanced Entity-Relationship (EER) model is an extension of the original ER model. It includes
additional concepts to address more complex modeling requirements, particularly those related to
generalization, specialization, and aggregation. These concepts allow for a more precise and detailed
representation of real-world entities and their relationships.

Key Concepts in EER Model:

Generalization: This is a bottom-up approach where common properties of several entity types are
combined to form a higher-level entity type. It represents an "is-a" relationship. For example, Car
and Truck can be generalized into a Vehicle entity. The Vehicle entity would have common
attributes like VehicleID , Make , Model , while Car and Truck would have their specific
attributes.

Specialization: This is a top-down approach where a higher-level entity type is divided into one or
more lower-level entity subtypes based on their distinguishing characteristics. It also represents an
"is-a" relationship. For example, an Employee entity can be specialized into SalariedEmployee and
HourlyEmployee , each with specific attributes relevant to their type of employment.

Attribute-Defined Specialization: The specialization is based on the value of an attribute of


the supertype. For example, EmployeeType attribute in Employee entity can determine if an
employee is Salaried or Hourly .

User-Defined Specialization: The specialization is not based on a single attribute but is


determined by the user or business rules.

Categorization (Union Type): This concept is used to represent a single subtype that has multiple
supertypes, where the supertypes represent different entity types. It is often used to model a
relationship that involves entities from different supertypes. For example, a PropertyOwner could
be either a Person or a Corporation .

Aggregation: This concept allows us to treat a relationship between entities as a higher-level entity.
It is used to model a relationship that involves an entire relationship set and other entities. For
example, a Project AssignedTo a Team (a relationship) can be aggregated into a new entity
ProjectAssignment which then IsMonitoredBy a Manager .

Constraints on Specialization/Generalization:

Disjointness Constraint: Specifies whether an entity can be a member of more than one subtype of
a specialization.

Disjoint (d): An entity can belong to at most one subtype. E.g., an Employee can be either
Salaried or Hourly , but not both.

Overlapping (o): An entity can belong to more than one subtype. E.g., a Person can be both a
Student and an Instructor .
Completeness Constraint: Specifies whether every entity in the supertype must belong to at least
one subtype in the specialization.

Total (double line): Every entity in the supertype must belong to one of the subtypes. E.g.,
every Employee must be either Salaried or Hourly .

Partial (single line): An entity in the supertype may or may not belong to any of the subtypes.
E.g., a Vehicle might be a Car or a Truck , but there might be other types of vehicles not
explicitly modeled.

EER diagrams provide a more powerful and expressive way to model complex database schemas,
especially when dealing with hierarchical relationships and shared characteristics among entities.

Transforming EER into Relations

Once the EER diagram is created, the next step is to convert it into a set of relations (tables) that can be
implemented in a relational database. This process involves a set of rules for mapping the various
components of the EER model (entities, attributes, relationships, etc.) to relational tables.

Mapping Rules:

1. Map Regular Entities: For each regular (strong) entity type, create a new relation (table). The
attributes of the entity become the columns of the table. The key attribute of the entity becomes the
primary key of the table.

2. Map Weak Entities: For each weak entity type, create a new relation. The primary key of this
relation is a composite key formed by the primary key of the owner entity (as a foreign key) and the
partial key of the weak entity. The attributes of the weak entity become the columns of the table.

3. Map Binary Relationships:

One-to-One (1:1): Choose one of the participating entities (usually the one with total
participation) and include the primary key of the other entity as a foreign key in its relation. If
both have total participation, you can merge them into a single table.

One-to-Many (1:N): In the relation corresponding to the entity on the 'many' side of the
relationship, include the primary key of the entity on the 'one' side as a foreign key.

Many-to-Many (M:N): Create a new relation (a junction table or composite entity) to represent
the relationship. The primary key of this new relation is a composite key formed by the primary
keys of the two participating entities (as foreign keys). Any attributes of the relationship itself
become columns in this new relation.

4. Map Multi-valued Attributes: For each multi-valued attribute, create a new relation. This relation
will have two columns: one for the primary key of the entity to which the attribute belongs (as a
foreign key) and another for the multi-valued attribute itself. The primary key of this new relation is
the combination of both columns.

5. Map Ternary and Higher-Degree Relationships: For each ternary or higher-degree relationship,
create a new relation. The primary key of this relation is a composite key formed by the primary keys
of all participating entities (as foreign keys). Any attributes of the relationship become columns in
this new relation.

6. Map Specialization/Generalization: There are several ways to map specialization/generalization


hierarchies:

Separate Tables for Supertype and Subtypes: Create a table for the supertype and a separate
table for each subtype. The supertype table contains the common attributes, and each subtype
table contains the specific attributes and a foreign key referencing the supertype's primary key.
This is the most common and flexible approach.

Single Table with All Attributes: Create a single table that includes all the attributes of the
supertype and all the subtypes. A special attribute (a 'type' attribute) is used to indicate the
subtype of each row. This approach can lead to many null values if the subtypes have many
specific attributes.

Separate Tables for Each Subtype: Create a separate table for each subtype, and include all
the attributes of the supertype in each subtype table. This approach avoids nulls but can lead
to data redundancy.

By following these rules, you can systematically convert an EER diagram into a relational schema that is
ready for implementation in a database.

Phase 3: Relational Model and Database Concepts

Relational Database Model

The relational database model is the most widely used database model today. It was proposed by Edgar F.
Codd at IBM in 1970. In this model, data is organized into one or more tables (or relations), where each
table consists of rows and columns. Each row in a table represents a record, and each column represents
an attribute of that record.

Key Characteristics of the Relational Model:

Data Representation: Data is represented as a collection of two-dimensional tables. Each table has
a unique name.

Rows and Columns: Each table is composed of rows (also called tuples or records) and columns
(also called attributes or fields).
Rows: Each row represents a single, distinct entity or a relationship instance. The order of rows
in a table is not significant.

Columns: Each column represents a specific attribute of the entity or relationship. Each
column has a unique name within the table and a defined data type (e.g., integer, string, date).

Domains: Each column has a domain, which is the set of all possible values that an attribute can
take. For example, the domain for a DateOfBirth column would be all valid dates.

Schema: The schema of a relation (table) describes its structure, including the table name, column
names, and their data types. For example, Student (StudentID, Name, DateOfBirth, Major) is a
schema for a Student table.

Relational Algebra and Calculus: The relational model is based on a strong mathematical
foundation, including relational algebra and relational calculus, which provide a formal way to
manipulate and query data.

Data Integrity: The model supports various integrity constraints to ensure the accuracy and
consistency of data.

Advantages of the Relational Model:

Simplicity: The tabular representation of data is intuitive and easy to understand for users.

Flexibility: The relational model is highly flexible and can easily adapt to changes in data
requirements.

Data Independence: It provides logical and physical data independence, meaning changes in the
physical storage or logical structure of the database do not affect the application programs.

Strong Theoretical Foundation: The mathematical foundation of relational algebra and calculus
provides a solid basis for database design and query optimization.

Data Integrity: The model supports various integrity constraints (e.g., primary key, foreign key, not
null) to maintain data accuracy and consistency.

SQL (Structured Query Language): The relational model is strongly associated with SQL, a
powerful and widely adopted language for managing and querying relational databases.

Disadvantages of the Relational Model:

Complexity for Complex Data: While simple for structured data, modeling very complex,
unstructured, or semi-structured data (like multimedia or hierarchical data) can sometimes be
challenging.

Performance for Large Datasets: For extremely large and complex datasets, especially those
requiring very high transaction rates or real-time analytics, performance can sometimes be a
concern, though modern RDBMS are highly optimized.

Object-Relational Impedance Mismatch: When integrating with object-oriented programming


languages, there can be a mismatch between the object-oriented paradigm and the relational
paradigm, requiring object-relational mapping (ORM) tools.

Despite some limitations, the relational model remains the dominant paradigm for database
management due to its robustness, flexibility, and the widespread support of SQL and relational database
management systems (RDBMS).

Keys, Integrity Rules, Data Dictionary, System Catalog, Indexes

Keys:

Keys are fundamental to the relational model, serving to uniquely identify rows within a table and
establish relationships between tables. They are crucial for maintaining data integrity and enabling
efficient data retrieval.
Super Key: A set of one or more attributes that, taken collectively, allow us to uniquely identify a
tuple (row) in a relation. If a set of attributes is a super key, then any superset of these attributes is
also a super key.

Example: In a Student table with attributes (StudentID, Name, Age, Address) ,


(StudentID, Name) is a super key. (StudentID, Name, Age) is also a super key.

Candidate Key: A super key for which no proper subset is a super key. It is a minimal super key. A
relation can have multiple candidate keys.

Example: For the Student table, if StudentID alone can uniquely identify a student, then
StudentID is a candidate key. If (Name, Address) can also uniquely identify a student
(assuming no two students have the same name and address), then (Name, Address) is also a
candidate key.

Primary Key: One of the candidate keys chosen by the database designer to uniquely identify each
tuple in a relation. It cannot contain NULL values and must be unique for each row. There can be
only one primary key per table.

Example: From the candidate keys StudentID and (Name, Address) , StudentID would
typically be chosen as the primary key for the Student table.

Alternate Key: A candidate key that is not chosen as the primary key.

Example: If StudentID is the primary key, then (Name, Address) would be an alternate key.

Foreign Key: A set of attributes in one table (the referencing table) that refers to the primary key of
another table (the referenced table). Foreign keys establish relationships between tables and
enforce referential integrity.

Example: In a CourseEnrollment table with (StudentID, CourseID, Grade) , StudentID


would be a foreign key referencing the StudentID in the Student table, and CourseID would
be a foreign key referencing the CourseID in the Course table.

Integrity Rules (Constraints):

Integrity rules are a set of rules that ensure the accuracy and consistency of data in a relational database.
They prevent invalid data from being entered into the database.

Entity Integrity Rule: States that the primary key of a relation cannot contain NULL values. This
ensures that each row in a table can be uniquely identified.

Referential Integrity Rule: States that if a foreign key exists in a relation, either the foreign key
value must match a primary key value in the referenced relation, or the foreign key value must be
entirely NULL. This ensures that relationships between tables are valid and consistent.

Domain Integrity Rule: Specifies that all values in a column must be from the same predefined
domain. This ensures that data types and value ranges are respected.

Data Dictionary:
A data dictionary is a centralized repository of information about data, such as meaning, relationships to
other data, origin, usage, and format. It contains metadata (data about data) rather than the actual data
itself. It is a crucial component of a DBMS.

Contents: It typically stores information about tables, columns, data types, constraints (primary
keys, foreign keys), views, indexes, users, and their privileges.

Purpose: It helps in managing and documenting the database schema, enforcing data standards,
and supporting database administration tasks.

System Catalog:

The system catalog (also known as the data dictionary in many commercial DBMS) is a collection of tables
and views that store metadata about the database. It is maintained by the DBMS itself and is accessible to
authorized users and database administrators. The system catalog is essentially the data dictionary
implemented within the database system.

Self-Describing: A key characteristic of a relational database is that it is self-describing, meaning the


database schema is itself stored in the database, within the system catalog.

Accessibility: Users can query the system catalog (e.g., using SQL SELECT statements) to retrieve
information about the database structure, just like querying any other table.

Indexes:

An index is a database object that provides fast lookup of data in a table. It is a data structure (like a B-tree
or hash table) that improves the speed of data retrieval operations on a database table at the cost of
additional writes and storage space to maintain the index data structure.

Purpose: Indexes are used to quickly locate data without having to scan every row in a database
table. They are particularly useful for columns frequently used in WHERE clauses, JOIN conditions,
or ORDER BY clauses.

Types of Indexes:
Clustered Index: Determines the physical order of data in a table. A table can have only one
clustered index. The data rows are stored in the same order as the index.

Non-Clustered Index: Does not alter the physical order of the table rows. It creates a separate
structure that contains the indexed column values and pointers to the actual data rows. A table
can have multiple non-clustered indexes.

Benefits: Faster query execution, improved performance for data retrieval.

Drawbacks: Increased storage space, slower data modification (insert, update, delete) operations
because the index also needs to be updated.

Three-Tier Schema Architecture

The three-tier schema architecture, also known as the three-schema approach, is a fundamental concept
in database management systems (DBMS) that separates the user applications from the physical
database. This separation provides data independence, allowing changes at one level to not affect other
levels, thus simplifying database design, development, and maintenance.
The three levels are:

1. External Level (View Level):

This is the highest level of data abstraction. It describes the part of the database that is
relevant to a particular user or application program.

Users at this level are only concerned with the data they need to see and how they perceive it,
not with how the data is actually stored or structured in the entire database.

It presents different views of the same data to different users, tailored to their specific needs
and access rights. For example, a student might see their grades and course schedule, while an
administrator might see all student records.

The external schema is defined using a Data Definition Language (DDL) and is mapped to the
conceptual schema.

2. Conceptual Level (Logical Level):

This level describes the entire database structure for a community of users. It hides the details
of physical storage structures and focuses on describing entities, their attributes, and
relationships.

It represents the logical structure of the database, including all entities, relationships, and
constraints, without regard to how the data is physically stored.

The conceptual schema is defined by the database administrator (DBA) and serves as a stable,
global view of the data. It acts as a bridge between the external views and the internal storage
details.

It includes data types, relationships, user operations, and integrity constraints.

3. Internal Level (Physical Level):

This is the lowest level of data abstraction. It describes the physical storage structure of the
database on the storage media.

It deals with how the data is actually stored, including file organization, indexing, data
compression, and physical record formats.

The internal schema is defined by the DBA and is concerned with efficient storage and retrieval
of data.

It specifies the physical devices where data is stored, the access methods used, and the details
of data representation.

Mappings between Levels:

External/Conceptual Mapping: Transforms requests from the external view to the conceptual
schema and results from the conceptual schema back to the external view. This allows users to
interact with their specific views without knowing the entire database structure.

Conceptual/Internal Mapping: Transforms requests from the conceptual schema to the internal
schema and results from the internal schema back to the conceptual schema. This allows the
conceptual schema to remain stable even if the physical storage structure changes.
Data Independence:

The three-tier architecture provides two levels of data independence:

Logical Data Independence: The ability to change the conceptual schema without affecting the
external schemas or application programs. For example, adding a new attribute to a table at the
conceptual level should not require changes to existing user views that do not use that attribute.

Physical Data Independence: The ability to change the internal schema (physical storage) without
affecting the conceptual schema or external schemas. For example, changing the file organization or
adding a new index should not require changes to application programs.

This architecture is crucial for managing complex databases, providing flexibility, and ensuring the
longevity of applications built on top of the database.

Phase 4: Functional Dependencies and Normalization Theory

Normalization: What is Normalization and How it Works

Normalization is a systematic process of organizing the columns and tables of a relational database to
minimize data redundancy and improve data integrity. It involves decomposing a table into smaller, well-
structured tables and defining relationships between them. The primary goal of normalization is to
eliminate undesirable characteristics like Insertion, Update, and Deletion Anomalies.

What are Anomalies?

Anomalies are problems that can arise in poorly designed databases, leading to inconsistencies and
difficulties in data management.

Insertion Anomaly: Occurs when certain facts cannot be recorded in the database without
including other unrelated facts. For example, in a table that stores both employee and department
information, you cannot add a new department until you have an employee to assign to it.

Update Anomaly: Occurs when the same piece of information is stored in multiple places, and
updating it requires updating all occurrences. If one occurrence is missed, it leads to data
inconsistency. For example, if a department name changes, and it's stored in multiple employee
records, all those records must be updated.

Deletion Anomaly: Occurs when deleting a record results in the loss of other, unrelated
information. For example, if the last employee in a department is deleted from a combined
employee-department table, the information about that department might also be lost.

Purpose of Normalization:

Eliminate Redundancy: Reduces duplicate data, saving storage space and preventing
inconsistencies.

Improve Data Integrity: Ensures that data is accurate and consistent by enforcing rules and
relationships.

Better Database Design: Leads to a more organized, flexible, and maintainable database structure.
Faster Query Execution: While normalization can sometimes lead to more joins for queries, a well-
normalized database often performs better for complex queries and data modifications due to
reduced data redundancy and improved data integrity.

How Normalization Works:

Normalization is achieved by applying a series of rules called normal forms. Each normal form builds
upon the previous one, progressively reducing redundancy and improving data integrity. The most
commonly used normal forms are 1NF, 2NF, and 3NF, with BCNF (Boyce-Codd Normal Form) being a
stricter version of 3NF.

The process typically involves:

1. Identifying Functional Dependencies: Understanding how attributes in a table depend on each


other. A functional dependency (FD) A -> B means that the value of attribute A uniquely
determines the value of attribute B.

2. Decomposition: Breaking down a large table into smaller, related tables based on these functional
dependencies.

3. Applying Normal Form Rules: Ensuring that each new table adheres to the rules of a specific
normal form.

Example of a Non-Normalized Table (Illustrative):

Consider a table Student_Course_Details :

StudentID StudentName StudentMajor CourseID CourseName Instructor InstructorOffice Grade

Computer
101 Alice CS101 Intro to CS Dr. Smith A101 A
Science

Computer
101 Alice MA101 Calculus Dr. Jones B202 B
Science

102 Bob Electrical Eng. CS101 Intro to CS Dr. Smith A101 C

In this table: * StudentName , StudentMajor are repeated for each course a student takes (redundancy). *
CourseName , Instructor , InstructorOffice are repeated for each student in a course (redundancy). *
If Dr. Smith changes office, multiple rows need updating (Update Anomaly). * If student 102 drops CS101,
and no other student is taking CS101, information about CS101, Dr. Smith, and A101 might be lost
(Deletion Anomaly).

Normalization will break this table down into smaller, more manageable tables, such as Students ,
Courses , Instructors , and Enrollments , to eliminate these anomalies.

Functional dependencies are the cornerstone of normalization, as they help identify the relationships
between attributes and guide the decomposition process. We will delve deeper into functional
dependencies and each normal form in the subsequent sections.
Phase 5: Normal Forms (1NF, 2NF, 3NF)

First Normal Form (1NF)

First Normal Form (1NF) is the most basic level of database normalization. A relation (table) is said to be in
1NF if and only if it satisfies the following conditions:

1. Atomic Values: Each column must contain only atomic (indivisible) values. This means that each cell
in the table should contain a single value, not a list or a set of values.

2. No Repeating Groups: There should be no repeating groups of columns. This means that for each
row, there should not be multiple columns representing the same type of data (e.g., Phone1 ,
Phone2 , Phone3 ). Instead, these repeating groups should be moved to a separate table.

3. Unique Rows: Each row in the table must be unique. This is typically achieved by having a primary
key.

In essence, 1NF ensures that each intersection of a row and a column contains exactly one value, and that
there are no duplicate rows.

Example of a Non-1NF Table:

Consider a table Student_Courses that stores student information and the courses they are enrolled in:

StudentID StudentName Courses

101 Alice CS101, MA101

102 Bob PH201

103 Charlie CH301, BI401, CS101

This table is not in 1NF because:

The Courses column contains multiple values (e.g., "CS101, MA101") for a single StudentID . This
violates the atomic values rule.

Converting to 1NF:

To convert the Student_Courses table to 1NF, we need to eliminate the multi-valued Courses attribute.
This is done by creating separate rows for each course a student takes, or by creating a new table for
courses and linking it with the student table.

Option 1: Expanding Rows (if the multi-valued attribute is part of the primary key or forms a
composite primary key with other attributes):
StudentID StudentName CourseID

101 Alice CS101

101 Alice MA101

102 Bob PH201

103 Charlie CH301

103 Charlie BI401

103 Charlie CS101

Now, each cell contains an atomic value, and there are no repeating groups. The primary key for this table
would be (StudentID, CourseID) .

Option 2: Creating a Separate Table (more common and generally preferred for multi-valued
attributes that are not part of the primary key of the original entity):

Student Table (1NF):

StudentID StudentName

101 Alice

102 Bob

103 Charlie

Student_Course_Enrollment Table (1NF):

StudentID CourseID

101 CS101

101 MA101

102 PH201

103 CH301

103 BI401

103 CS101

In this option, Student_Course_Enrollment acts as a linking table (or junction table) to handle the many-
to-many relationship between students and courses. Both tables are now in 1NF. The
Student_Course_Enrollment table has a composite primary key (StudentID, CourseID) .

Achieving 1NF is the first crucial step in database normalization, laying the groundwork for further
normalization to reduce redundancy and improve data integrity.
Second Normal Form (2NF)

Second Normal Form (2NF) builds upon First Normal Form (1NF). A relation is in 2NF if it is in 1NF and all
non-key attributes are fully functionally dependent on the primary key. This means that no non-key
attribute should be dependent on only a part of a composite primary key.

Conditions for 2NF:

1. Must be in 1NF: The table must already satisfy all the conditions of 1NF.

2. No Partial Dependencies: All non-key attributes must be fully functionally dependent on the entire
primary key. If the primary key is composite (made of two or more attributes), then no non-key
attribute should depend on only a subset of the primary key.

What is a Partial Dependency?

A partial dependency occurs when a non-key attribute is functionally dependent on only a part of a
composite primary key. For example, if (A, B) is the composite primary key and C is a non-key
attribute, then C has a partial dependency if A -> C (meaning C depends only on A , which is part of
the primary key, not on the whole primary key (A, B) ).

Example of a Non-2NF Table:

Let's consider the Student_Course_Enrollment table from the 1NF example, but now with additional
attributes:

StudentID CourseID StudentName CourseName Grade

101 CS101 Alice Intro to CS A

101 MA101 Alice Calculus B

102 CS101 Bob Intro to CS C

In this table, the composite primary key is (StudentID, CourseID) .

StudentName is dependent only on StudentID (part of the primary key): StudentID ->
StudentName .

CourseName is dependent only on CourseID (part of the primary key): CourseID -> CourseName .

These are partial dependencies because StudentName and CourseName do not depend on the entire
primary key (StudentID, CourseID) . This leads to redundancy (e.g., Alice and Intro to CS are
repeated) and potential update anomalies.

Converting to 2NF:

To convert the table to 2NF, we need to remove the partial dependencies by decomposing the table into
smaller tables, where each non-key attribute is fully dependent on the primary key of its respective table.

Decomposition:
1. Student Table: Create a table for Student information, where StudentName is fully dependent on
StudentID .

StudentID StudentName

101 Alice

102 Bob

2. Course Table: Create a table for Course information, where CourseName is fully dependent on
CourseID .

CourseID CourseName

CS101 Intro to CS

MA101 Calculus

3. Enrollment Table: Keep a table for the Enrollment details, where Grade is fully dependent on the
composite primary key (StudentID, CourseID) .

StudentID CourseID Grade

101 CS101 A

101 MA101 B

102 CS101 C

Now, all three tables are in 2NF: * In the Student table, StudentName is fully dependent on StudentID
(the primary key). * In the Course table, CourseName is fully dependent on CourseID (the primary key).
* In the Enrollment table, Grade is fully dependent on (StudentID, CourseID) (the composite primary
key).

By achieving 2NF, we eliminate partial dependencies, further reducing data redundancy and improving
data integrity.

Third Normal Form (3NF)

Third Normal Form (3NF) builds upon Second Normal Form (2NF). A relation is in 3NF if it is in 2NF and
there are no transitive dependencies of non-key attributes on the primary key.

Conditions for 3NF:

1. Must be in 2NF: The table must already satisfy all the conditions of 2NF.

2. No Transitive Dependencies: No non-key attribute should be functionally dependent on another


non-key attribute. In other words, a non-key attribute should not be indirectly dependent on the
primary key through another non-key attribute.

What is a Transitive Dependency?


A transitive dependency occurs when a non-key attribute is dependent on another non-key attribute,
which in turn is dependent on the primary key. If A is the primary key, and B and C are non-key
attributes, then A -> B and B -> C implies a transitive dependency A -> C via B .

Example of a Non-3NF Table:

Let's consider a Student_Details table that is in 2NF:

StudentID StudentName Major MajorAdvisor

101 Alice CS Dr. Smith

102 Bob EE Dr. Jones

103 Charlie CS Dr. Smith

In this table: * StudentID is the primary key. * StudentID -> StudentName (full dependency) *
StudentID -> Major (full dependency) * Major -> MajorAdvisor (transitive dependency:
MajorAdvisor depends on Major , and Major depends on StudentID )

The MajorAdvisor attribute is transitively dependent on StudentID through Major . This means that if a
Major changes its MajorAdvisor , multiple rows might need to be updated, leading to update
anomalies. Also, if all students from a particular major are deleted, the information about that major's
advisor might be lost (deletion anomaly).

Converting to 3NF:

To convert the table to 3NF, we need to remove the transitive dependency by decomposing the table into
smaller tables, ensuring that all non-key attributes are directly dependent on the primary key.

Decomposition:

1. Student Table: Create a table for Student information, where StudentName and Major are
directly dependent on StudentID .

StudentID StudentName Major

101 Alice CS

102 Bob EE

103 Charlie CS

2. Major_Advisor Table: Create a new table for Major and MajorAdvisor information, where
MajorAdvisor is directly dependent on Major .

Major MajorAdvisor

CS Dr. Smith

EE Dr. Jones
Now, both tables are in 3NF: * In the Student table, StudentName and Major are directly dependent on
StudentID . * In the Major_Advisor table, MajorAdvisor is directly dependent on Major .

By achieving 3NF, we eliminate transitive dependencies, further reducing data redundancy and improving
data integrity. This form is generally considered sufficient for most business database applications.

You might also like