0% found this document useful (0 votes)
10 views

UNIT-3

This document provides an overview of SQL queries, constraints, and triggers in database management systems (DBMS). It covers the structure of basic SQL queries, including clauses like SELECT, FROM, WHERE, and various operators such as UNION, INTERSECT, and EXCEPT. Additionally, it discusses integrity constraints, aggregation operators, and the use of triggers to maintain data integrity and automate tasks in the database.

Uploaded by

M. Madhusudhan M
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

UNIT-3

This document provides an overview of SQL queries, constraints, and triggers in database management systems (DBMS). It covers the structure of basic SQL queries, including clauses like SELECT, FROM, WHERE, and various operators such as UNION, INTERSECT, and EXCEPT. Additionally, it discusses integrity constraints, aggregation operators, and the use of triggers to maintain data integrity and automate tasks in the database.

Uploaded by

M. Madhusudhan M
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 64

UNIT–III

SQL: QUERIES, CONSTRAINTS, TRIGGERS

Form of basic SQL query in DBMS


The basic form of an SQL query, specifically when retrieving data, is
composed of a combination of clauses. The most elementary form of an
SQL query for data retrieval can be represented as

Syntax
SELECT [DISTINCT] column1, column2, ...
FROM tablename
WHERE condition;

Let's break it down:

1. SELECT Clause: This is where you specify the columns you want
to retrieve. Use an asterisk (*) to retrieve all columns.

2. FROM Clause: This specifies from which table or tables you want
to retrieve the data.

3. WHERE Clause (optional): This allows you to filter the results


based on a condition.

4. DISTINCT Clause (optional): is an optional keyword indicating


that the answer should not contain duplicates. Normally if we write

1
the SQL without DISTINCT operator then it does not eliminate the
duplicates.

Here are the primary components of SQL queries:

 SELECT: Retrieves data from one or more tables.

 FROM: Specifies the table from which you're retrieving the data.

 WHERE: Filters the results based on a condition.

 GROUP BY: Groups rows that have the same values in specified
columns.

 HAVING: Filters the result of a GROUP BY.

 ORDER BY: Sorts the results in ascending or descending order.

 JOIN: Combines rows from two or more tables based on related


columns.

To provide a more holistic view, here are a few more SQL examples,
keeping them as basic as possible:

1. Retrieve all columns from a table:

Syntax

SELECT * FROM tablename;

2
2. Retrieve specific columns from a table:

Syntax

SELECT column1, column2 FROM tablename;

3. Retrieve data with a condition:

Syntax

SELECT column1, column2 FROM tablename WHERE column1 = 'value';

4. Sort retrieved data:

Syntax

SELECT column1, column2 FROM tablename ORDER BY column1 ASC;

Regular expressions in the SELECT Command


SQL provides support for pattern matching through the LIKE operator,
along with the use of the wild-card symbols.

Regular expressions: is a sequence of characters that define a search


pattern, mainly for use in pattern matching with strings, or string
matching.

3
Examples:
Finds Names that start or ends with "a“
Finds names that start with "a" and are at least 3 characters in length.

LIKE: The LIKE operator is used in a 'WHERE' clause to search for a


specified pattern in a column

wild-card: There are two primary wildcards used in conjunction with


the `LIKE` operator
percent sign (%) Represents zero, one, or multiple characters
underscore sign(_) Represents a single character

Here's a breakdown of how you can use these wildcards with the `LIKE`
operator:

Using `%` Wildcard


1. Find values that start with a specific pattern:
Syntax

SELECT column_name
FROM table_name
WHERE column_name LIKE 'pattern%';

4
For example, to find all customers whose names start with "Ma":

Example

SELECT FirstName
FROM Customers
WHERE FirstName LIKE 'Ma%';
2. Find values that end with a specific pattern:
Syntax
SELECT column_name
FROM table_name
WHERE column_name LIKE '%pattern';

For instance, to find all products that end with "ing":

Example

SELECT ProductName
FROM Products
WHERE ProductName LIKE '%ing';

3. Find values that have a specific pattern anywhere:


Syntax

SELECT column_name
FROM table_name
WHERE column_name LIKE '%pattern%';

5
Example, to find all books that have the word "life" anywhere in the
title:

Example

SELECT BookTitle
FROM Books
WHERE BookTitle LIKE '%life%';

Using `_` Wildcard

1. Find values of a specific length where you only know some


characters:
Syntax

SELECT column_name
FROM table_name
WHERE column_name LIKE 'p_ttern';

For instance, if you're looking for a five-letter word where you know the
first letter is "h" and the third letter is "l", you could use:

Example

SELECT Word
FROM Words
WHERE Word LIKE 'h_l__';

6
Combining `%` and `_`
You can use both wildcards in the same pattern. For example, to find
any value that starts with "A", followed by two characters, and then "o":

Example

SELECT column_name
FROM table_name
WHERE column_name LIKE 'A__o%';

UNION in DBMS
The UNION operator in DBMS is used to combine the result sets of two
or more SELECT statements. However, it will only select distinct
values. The UNION operator selects only distinct values by default. If
you want to allow duplicate values, you can use UNION ALL.

Here's the basic syntax:

Syntax

SELECT column_name(s) FROM table1


UNION
SELECT column_name(s) FROM table2;
For instance, let's assume we have two
tables, Customers and Suppliers, and we want to find all cities where
we have either a customer or a supplier. If the Customers table has a

7
column City and the Suppliers table also has a column City, we can use
a UNION to get a list of all cities:

Example:

SELECT City FROM Customers


UNION
SELECT City FROM Suppliers;

This would return a list of cities, with each city listed only once, even if
it appears in both the Customers and Suppliers tables.

Remember, the number and order of the columns, as well as the data
types of the corresponding columns, must be the same in all the
SELECT statements that you're combining with UNION.

If you wanted to include duplicates, you would use UNION ALL:

Example:

SELECT City FROM Customers


UNION ALL
SELECT City FROM Suppliers;

In this case, a city would be listed once for every time it appears in
either the Customers or Suppliers table.

8
INTERSECT in DBMS
The INTERSECT operator in a DBMS is used to combine two
SELECT statements and return only the records that are common to
both.
The basic syntax of the INTERSECT clause in SQL is:

Syntax

SELECT column_name(s) FROM table1

INTERSECT

SELECT column_name(s) FROM table2;

For example, if you have two tables, Orders and Deliveries, and you
want to find all orders that have been delivered (assuming order_id is a
common column), you could write:

Example:

SELECT order_id FROM Orders


INTERSECT
SELECT order_id FROM Deliveries;

This would return a list of order_ids that appear in both the Orders and
Deliveries tables.

9
Here are some key points about the INTERSECT operator:

 The number and order of columns, and the data types in both the
SELECT statements should be the same.

 It removes duplicate rows from the result set.

 It returns records that are common to both the SELECT statement


queries.

However, not all DBMSs support the INTERSECT operator. For


example, MySQL does not have a built-in INTERSECT operator, you
can achieve the same result using a combination of INNER
JOIN, UNION, or EXISTS.

Here is an example of how to emulate INTERSECT using INNER


JOIN:

Syntax

SELECT table1.id FROM table1


INNER JOIN table2 ON table1.id = table2.id;
This would also return ids that exist in both table1 and table2.

For example, if you have two tables, Orders and Deliveries, and you
want to find all orders that have been delivered (assuming order_id is a
common column), you could write:

10
Example:

SELECT o.order_id FROM Orders o


INNER JOIN
Deliveries d ON o.order_id = d.order_id

This would return a list of order_ids that appear in both the Orders and
Deliveries tables.

EXCEPT in DBMS
The EXCEPT operator in a DBMS is used to return the difference
between two SELECT statements. It returns the records from the first
SELECT statement that are not present in the second SELECT
statement.

Here is the basic syntax of the EXCEPT clause in SQL:

Syntax

SELECT column_name(s) FROM table1


EXCEPT
SELECT column_name(s) FROM table2;
For example, if you have two tables, Orders and Deliveries, and you
want to find all orders that have not been delivered yet (assuming
order_id is a common column), you could write:

11
Example:

SELECT order_id FROM Orders


EXCEPT
SELECT order_id FROM Deliveries;

This would return a list of order_ids that appear in the Orders table but
not in the Deliveries table.

Here are some key points about the EXCEPT operator:

 The number and order of columns, and the data types in both
SELECT statements should match.

 It removes duplicate rows from the result set.

 It only returns records from the first SELECT statement that are
not in the second SELECT statement.

Just like the INTERSECT operator, all DBMSs does not support
the EXCEPT operator. MySQL doesn't support the EXCEPT operator
directly, you can simulate EXCEPT using a combination of LEFT
JOIN or NOT EXISTS.

Here's how you might do it with LEFT JOIN:

12
Syntax

SELECT column_name(s) FROM table1


LEFT JOIN table2
ON table1.column_name = table2.column_name
WHERE table2.column_name IS NULL;
This query will return the rows from table1 where there is no matching
row in table2 for the specified column.

For example, if you have two tables, Orders and Deliveries, and you
want to find all orders that have not been delivered yet (assuming
order_id is a common column), you could write:

Example:

SELECT o.OrderID FROM Orders o


LEFT JOIN Deliveries d
ON o.OrderID = d.OrderID
WHERE d.OrderID IS NULL

This would return a list of OrderIDs that appear in the Orders table but
not in the Deliveries table.

13
Aggregation Operators in DBMS
In a DBMS, aggregation operators are used to perform operations on a
group of values to return a single summarizing value. The most common
aggregation operators include COUNT, SUM, AVG, MIN, and MAX.

Here are some examples of how you might use these operators:

COUNT:

Returns the number of rows that matches a specified criterion.

Syntax:COUNT(expression)

Example:

SELECT COUNT(*) FROM Employees;

This query would return the total number of rows in the Employees
table.

SUM
Returns the total sum of a numeric column.

Syntax

SUM(expression)

Example:

SELECT SUM(salary) FROM Employees;

14
This query would return the total sum of the salary column values in the
Employees table.

AVG
Returns the average value of a numeric column.

Syntax

AVG(expression)

Example:

SELECT AVG(salary) FROM Employees;

This query would return the average salary from the Employees table.

MIN
Returns the smallest value of the selected column.

Syntax

MIN(expression)

Example:

SELECT MIN(salary) FROM Employees;

This query would return the lowest salary from the Employees table.

MAX
Returns the largest value of the selected column.

15
Syntax

MAX(expression)

Example:

SELECT MAX(salary) FROM Employees;

This query would return the highest salary from the Employees table.

Returns the smallest value of the selected column.

Syntax

MIN(expression)

Example:

SELECT MIN(salary) FROM Employees;

This query would return the lowest salary from the Employees table.

MAX
Returns the largest value of the selected column.

Syntax

MAX(expression)

Example:

SELECT MAX(salary) FROM Employees;

This query would return the highest salary from the Employees table.
16
These aggregation operators are often used with the GROUP BY clause
to group the result-set by one or more columns. For example, to find the
highest salary in each department, you could write:

Example:

SELECT department_id, MAX(salary)


FROM Employees
GROUP BY department_id;

This query would return the highest salary for each department in the
Employees table.

Complex Integrity Constraints in SQL


Integrity constraints in SQL are rules that help ensure the accuracy and
reliability of data in the database. They ensure that certain conditions are
met when data is inserted, updated, or deleted. While primary key,
unique, and foreign key constraints are commonly discussed and used,
SQL allows for more complex constraints through the use of CHECK
and custom triggers. Here are some examples of complex integrity
constraints:

17
1. Using CHECK Constraints
Ensuring a range: You might want a column to only have values within
a certain range.

Example:

CREATE TABLE Employees (


ID INT PRIMARY KEY,
Age INT CHECK (Age >= 18 AND Age <= 30)
);

Pattern matching: Ensure data in a column matches a particular format.

Example:

CREATE TABLE Students (


ID INT PRIMARY KEY,
Email VARCHAR(255) CHECK (Email LIKE '%@%.%')
);

2. Composite Primary and Foreign Keys


These are cases where the uniqueness or referential integrity constraint
is applied over more than one column.

18
Example:

CREATE TABLE OrderDetails (


OrderID INT,
ProductID INT,
Quantity INT,
PRIMARY KEY (OrderID, ProductID),
FOREIGN KEY (OrderID) REFERENCES Orders(OrderID),
FOREIGN KEY (ProductID) REFERENCES Products(ProductID)
);

3. Using Stored Procedures


Sometimes, instead of direct data manipulation on tables, using stored
procedures can help maintain more complex integrity constraints by
wrapping logic inside the procedure. For instance, you could have a
procedure that checks several conditions before inserting a record.

4. Using TRIGGERS
A trigger is a procedural code in a database that automatically executes
in response to certain events on a particular table or view. Essentially,
triggers are special types of stored procedures that run automatically
when an INSERT, UPDATE, or DELETE operation occurs.

A trigger is a predefined action that the database automatically executes


in response to certain events on a particular table or view. Triggers are
19
typically used to maintain the integrity of the data, automate data-related
tasks, and extend the database functionalities.

When implementing complex constraints, it's crucial to strike a balance.


While they can ensure data integrity, they can also add overhead to the
database system and increase the complexity of the schema and the
operations performed on it. Proper documentation and understanding of
each constraint's purpose are essential.

Triggers and Active data bases in DBMS


Triggers and active databases are closely related concepts in the domain
of DBMS. Let's delve into what each of them means and how they are
interconnected.

Triggers
A trigger is a predefined action that the database automatically executes
in response to certain events on a particular table or view. Triggers are
typically used to maintain the integrity of the data, automate data-related
tasks, and extend the database functionalities.
There are various types of triggers based on when they are executed:

BEFORE: Trigger is executed before the triggering event.


AFTER: Trigger is executed after the triggering event.
INSTEAD OF: Trigger is used to override the triggering event,
primarily for views.
20
They can also be categorized by the triggering event:

INSERT: Trigger is executed when a new row is inserted.


UPDATE: Trigger is executed when a row is updated.
DELETE: Trigger is executed when a row is deleted.

Here's the basic syntax for creating a trigger in SQL, using MySQL as an

Syntax

CREATE TRIGGER trigger_name


trigger_time trigger_event
ON table_name FOR EACH ROW
trigger_body;

trigger_name: Name of the trigger.


trigger_time: BEFORE, AFTER, or INSTEAD OF.
trigger_event: INSERT, UPDATE, or DELETE.
table_name: The name of the table associated with the trigger.
trigger_body: The set of SQL statements to be executed.

21
Key Features of Triggers

1. Automatic Execution: Triggers run automatically in response to


data modification events. You don't have to explicitly call them.

2. Event-Driven: They are defined to execute before or after


INSERT, UPDATE, and DELETE events.

3. Transitional Access: Triggers can access the "old" (pre-


modification) and "new" (post-modification) values of the rows
affected.

Example of a Trigger
Suppose we have an `Employees` table and we want to maintain
an `AuditLog` table that keeps a record of salary changes for
employees.

Employees Table

CREATE TABLE Employees (


EmployeeID INT PRIMARY KEY,
Name VARCHAR(255),
Salary DECIMAL(10, 2)
);

22
AuditLog Table

CREATE TABLE AuditLog (


LogID INT AUTO_INCREMENT PRIMARY KEY,
EmployeeID INT,
OldSalary DECIMAL(10, 2),
NewSalary DECIMAL(10, 2),
ChangeDate DATETIME
);

Now, let's create a trigger that automatically inserts a record into


the `AuditLog` table whenever there's an update to the `Salary` column
in the `Employees` table.
Trigger
mysql> DELIMITER //
mysql> CREATE TRIGGER AfterSalaryUpdate
AFTER UPDATE ON Employees
FOR EACH ROW
BEGIN
IF OLD.Salary != NEW.Salary THEN
INSERT INTO AuditLog (EmployeeID, OldSalary, NewSalary,
ChangeDate)
VALUES (OLD.EmployeeID, OLD.Salary, NEW.Salary, NOW());
END IF;
END;
//
mysql> DELIMITER ;

23
How the Trigger Works
- The trigger is named `AfterSalaryUpdate`.
- It activates `AFTER` an `UPDATE` on the `Employees` table.
- It compares the old and new salary values. If there's a change
(`OLD.Salary != NEW.Salary`), it inserts a new record into the
`AuditLog` table with the details of the change and the current date and
time (`NOW()`).

With this trigger in place, every time an employee's salary is updated in


the `Employees` table, an entry is automatically added to
the `AuditLog` table recording the change.

Active Databases

An active database is a database that uses triggers and other event-driven


functionalities. The term "active" signifies that the DBMS reacts
automatically to changes in data and predefined events. Triggers are a
primary mechanism that makes a database "active."

Key Features of Active Databases

1. Event-Condition-Action (ECA) Rule: This is the foundational


concept of active databases. When a specific event occurs, the
database checks a particular condition, and if that condition is met,
an action is executed.

24
2. Reactive Behavior: The database can react to changes without
external applications or users having to intervene, thanks to the
ECA rules.

3. Flexibility: Active databases provide more flexibility in data


management and ensure better data integrity and security.

Why are Active Databases Important?

 Integrity Maintenance: Active databases can enforce more


complex business rules that can't be enforced using standard
integrity constraints.

 Automation: They can automate certain tasks, reducing manual


interventions.

 Alerts: They can notify users or applications when specific


conditions are met.

Relation between Triggers and Active Databases


Triggers are what give an active database its "active" nature. The ability
of the database to react to events automatically is primarily because of
triggers that execute in response to these events.

In essence, while "trigger" refers to the specific procedural code blocks


that run in response to events, "active database" refers to the broader
capability of a DBMS to support and use such event-driven
functionalities.

25
Schema Refinement
Problems caused by redundancy:

Redundancy means having multiple copies of the same data in the


database. This problem arises when a database is not normalized.
Suppose a table of student details attributes is: student ID, student
name, college name, college rank, and course opted.

Student_ID Name Contact College Course Rank

100 Himanshu 7300934851 GEU B.Tech 1

101 Ankit 7900734858 GEU B.Tech 1

102 Ayush 7300936759 GEU B.Tech 1

103 Ravi 7300901556 GEU B.Tech 1

It can be observed that values of attribute college name, college rank,


and course are being repeated which can lead to problems. Problems
caused due to redundancy are:

 Insertion anomaly
 Deletion anomaly
 Updation anomaly

26
Insertion Anomaly

 If a student detail has to be inserted whose course is not being


decided yet then insertion will not be possible till the time course
is decided for the student.

Student_ID Name Contact College Course Rank

100 Himanshu 7300934851 GEU 1

 This problem happens when the insertion of a data record is not


possible without adding some additional unrelated data to the
record.

Deletion Anomaly
If the details of students in this table are deleted then the details of the
college will also get deleted which should not occur by common
sense. This anomaly happens when the deletion of a data record results
in losing some unrelated information that was stored as part of the
record that was deleted from a table.
It is not possible to delete some information without losing some other
information in the table as well.

27
Updation Anomaly
Suppose the rank of the college changes then changes will have to be
all over the database which will be time-consuming and
computationally costly.

Student_ID Name Contact College Course Rank

100 Himanshu 7300934851 GEU B.Tech 1

101 Ankit 7900734858 GEU B.Tech 1

102 Ayush 7300936759 GEU B.Tech 1

103 Ravi 7300901556 GEU B.Tech 1

All places should be updated, If updation does not occur at all places
then the database will be in an inconsistent state.
Redundancy in a database occurs when the same data is stored in
multiple places. Redundancy can cause various problems such as data
inconsistencies, higher storage requirements, and slower data retrieval.

Problems Caused Due to Redundancy:

 Data Inconsistency: Redundancy can lead to data inconsistencies,


where the same data is stored in multiple locations, and changes to
28
one copy of the data are not reflected in the other copies. This can
result in incorrect data being used in decision-making processes and
can lead to errors and inconsistencies in the data.
 Storage Requirements: Redundancy increases the storage
requirements of a database. If the same data is stored in multiple
places, more storage space is required to store the data. This can lead
to higher costs and slower data retrieval.
 Update Anomalies: Redundancy can lead to update anomalies,
where changes made to one copy of the data are not reflected in the
other copies. This can result in incorrect data being used in decision-
making processes and can lead to errors and inconsistencies in the
data.
 Performance Issues: Redundancy can also lead to performance
issues, as the database must spend more time updating multiple
copies of the same data. This can lead to slower data retrieval and
slower overall performance of the database.
 Security Issues: Redundancy can also create security issues, as
multiple copies of the same data can be accessed and manipulated by
unauthorized users. This can lead to data breaches and compromise
the confidentiality, integrity, and availability of the data.
 Maintenance Complexity: Redundancy can increase the complexity
of database maintenance, as multiple copies of the same data must be
updated and synchronized. This can make it more difficult to

29
troubleshoot and resolve issues and can require more time and
resources to maintain the database.
 Data Duplication: Redundancy can lead to data duplication, where
the same data is stored in multiple locations, resulting in wasted
storage space and increased maintenance complexity. This can also
lead to confusion and errors, as different copies of the data may have
different values or be out of sync.
 Data Integrity: Redundancy can also compromise data integrity, as
changes made to one copy of the data may not be reflected in the
other copies. This can result in inconsistencies and errors and can
make it difficult to ensure that the data is accurate and up-to-date.
 Usability Issues: Redundancy can also create usability issues, as
users may have difficulty accessing the correct version of the data or
may be confused by inconsistencies and errors. This can lead to
frustration and decreased productivity, as users spend more time
searching for the correct data or correcting errors.
To prevent redundancy in a database, normalization techniques can be
used. Normalization is the process of organizing data in a database to
eliminate redundancy and improve data
integrity. Normalization involves breaking down a larger table into
smaller tables and establishing relationships between them. This
reduces redundancy and makes the database more efficient and reliable.

30
Advantages of data redundancy in DBMS

o Provides Data Security: Data redundancy can enhance data


security as it is difficult for cyber attackers to attack data that are in
different locations.
o Provides Data Reliability: Reliable data improves accuracy
because organizations can check and confirm whether data is
correct.
o Create Data Backup: Data redundancy helps in backing up the
data.

Disadvantages of data redundancy in DBMS

o Data corruption: Redundant data leads to high chances of data


corruption.
o Wastage of storage: Redundant data requires more space, leading
to a need for more storage space.
o High cost: Large storage is required to store and maintain
redundant data, which is costly.

How to reduce data redundancy in DBMS


We can reduce data redundancy using the following methods:

o Database Normalization: We can normalize the data using the


normalization method. In this method, the data is broken down into
pieces, which means a large table is divided into two or more small
tables to remove redundancy. Normalization removes insert
anomaly, update anomaly, and delete anomaly.

31
o Deleting Unused Data: It is important to remove redundant data
from the database as it generates data redundancy in the DBMS. It
is a good practice to remove unwanted data to reduce redundancy.
o Master Data: The data administrator shares master data across
multiple systems. Although it does not remove data redundancy,
but it updates the redundant data whenever the data is changed.

Decomposition

Decomposition refers to the division of tables into multiple tables to


produce consistency in the data. In this article, we will learn about the
Database concept. This article is related to the concept of
Decomposition in DBMS. It explains the definition of Decomposition,
types of Decomposition in DBMS, and its properties.

What is Decomposition in DBMS?


When we divide a table into multiple tables or divide a relation into
multiple relations, then this process is termed Decomposition in DBMS.
We perform decomposition in DBMS when we want to process a
particular data set. It is performed in a database management system
when we need to ensure consistency and remove anomalies and
duplicate data present in the database. When we perform decomposition
in DBMS, we must try to ensure that no information or data is lost.

32
Decomposition in DBMS

Types of Decomposition
There are two types of Decomposition:
 Lossless Decomposition
 Lossy Decomposition

Types of Decomposition

33
Lossless Decomposition
If the information is not lost from the relation that is
decomposed, then the decomposition will be lossless.
The lossless decomposition guarantees that the join of relations
will result in the same relation as it was decomposed.
The relation is said to be lossless decomposition if natural joins
of all the decomposition give the original relation.
Example:
EMPLOYEE_DEPARTMENT table:

EMP_I EMP_NAM EMP_AG EMP_CIT DEPT_I DEPT_NAM


D E E Y D E

22 Denim 28 Mumbai 827 Sales

33 Alina 25 Delhi 438 Marketing

46 Stephan 30 Bangalore 869 Finance

34
52 Katherine 36 Mumbai 575 Production

60 Jack 40 Noida 678 Testing

The above relation is decomposed into two relations EMPLOYEE and


DEPARTMENT

EMPLOYEE table:

EMP_ID EMP_NAME EMP_AGE EMP_CITY

22 Denim 28 Mumbai

33 Alina 25 Delhi

46 Stephan 30 Bangalore

52 Katherine 36 Mumbai

60 Jack 40 Noida

DEPARTMENT table

DEPT_ID EMP_ID DEPT_NAME

827 22 Sales
438 33 Marketing
869 46 Finance

35
575 52 Production
678 60 Testing

Now, when these two relations are joined on the common column
"EMP_ID", then the resultant relation will look like:

Employee ⋈ Department

EMP_I EMP_NA EMP_A EMP_CI DEPT_I DEPT_NA


D ME GE TY D ME

22 Denim 28 Mumbai 827 Sales

33 Alina 25 Delhi 438 Marketing

46 Stephan 30 Bangalore 869 Finance

52 Katherine 36 Mumbai 575 Production

60 Jack 40 Noida 678 Testing

Hence, the decomposition is Lossless join decomposition.

Example:

There is a relation called R(A, B, C)


A B C

55 16 27

48 52 89

36
Now we decompose this relation into two sub relations R1 and R2
R1(A, B)

A B

55 16

48 52

R2(B, C)
B C

16 27

52 89
After performing the Join operation we get the same original relation

A B C

55 16 27

48 52 89

Now, if we take the natural join of R1 and R2 on attribute A, we get


back the original relation R. Therefore, this is a lossless decomposition.

Example: Let's consider a table `R(A, B, C)` with a dependency `A →


B`. If you decompose it into `R1(A, B)` and `R2(B, C)`, it would be
lossy because you can't recreate the original table using natural joins.

37
Example: Consider a relation R(A,B,C) with the following data:
|A |B |C |
|----|----|----|
|1 |X |P |
|1 |Y |P |
|2 |Z |Q |
Suppose we decompose R into R1(A,B) and R2(A,C).

R1(A, B):
|A |B |
|----|----|
|1 |X |
|1 |Y |
|2 |Z |
R2(A, C):
|A |C |
|----|----|
|1 |P |
|1 |P |
|2 |Q |
Now, if we take the natural join of R1 and R2 on attribute A, we get
back the original relation R. Therefore, this is a lossless decomposition.

Dependency Preserving
o It is an important constraint of the database.
38
o In the dependency preservation, at least one decomposed table
must satisfy every dependency.
o If a relation R is decomposed into relation R1 and R2, then the
dependencies of R either must be a part of R1 or R2 or must be
derivable from the combination of functional dependencies of R1
and R2.
o For example, suppose there is a relation R (A, B, C, D) with
functional dependency set (A->BC). The relational R is
decomposed into R1(ABC) and R2(AD) which is dependency
preserving because FD A->BC is a part of relation R1(ABC).
o Dependency Preservation: A Decomposition D = { R1, R2,
R3…Rn } of R is dependency preserving wrt a set F of Functional
dependency if

o (F1 ? F2 ? … ? Fm)+ = F+.


o Consider a relation R
o R ---> F{...with some functional dependency(FD)....}
o R is decomposed or divided into R1 with FD { f1 } and R2 with
{ f2 }, then
o there can be three cases:
o f1 U f2 = F -----> Decomposition is dependency preserving.
o f1 U f2 is a subset of F -----> Not Dependency preserving.
o f1 U f2 is a super set of F -----> This case is not possible.

39
40
Problem:
Let a relation R (A, B, C, D ) and functional dependency {AB –>
C, C –> D, D –> A}. Relation R is decomposed into R1( A, B, C) and
R2(C, D). Check whether decomposition is dependency preserving or
not.

Solution:

R1(A, B, C) and R2(C, D)

Let us find closure of F1 and F2


To find closure of F1, consider all combination of
ABC. i.e., find closure of A, B, C, AB, BC and AC
Note ABC is not considered as it is always ABC

closure(A) = { A } // Trivial
closure(B) = { B } // Trivial
closure(C) = {C, A, D} but D can't be in closure as D is not present R1.
= {C, A}
C--> A // Removing C from right side as it is trivial attribute

closure(AB) = {A, B, C, D}
= {A, B, C}
AB --> C // Removing AB from right side as these are trivial attributes

41
closure(BC) = {B, C, D, A}
= {A, B, C}
BC --> A // Removing BC from right side as these are trivial attributes

closure(AC) = {A, C, D}
NULL SET

F1 {C--> A, AB --> C, BC --> A}.


Similarly F2 { C--> D }

In the original Relation Dependency { AB --> C , C --> D , D --> A}.


AB --> C is present in F1.
C --> D is present in F2.
D --> A is not preserved.

F1 U F2 is a subset of F. So given decomposition is not dependency


preserving.

Problems Related to Decomposition

1. Loss of Information(lossless decomposition or lossy decomposition)

2. Loss of Functional Dependency

42
 Once tables are decomposed, certain functional dependencies
might not be preserved, which can lead to the inability to enforce
specific integrity constraints.

 Example: If you have the functional dependency `A → B` in the


original table, but in the decomposed tables, there is no table with
both `A` and `B`, this functional dependency can't be preserved.

Example: Let's consider a relation R with attributes A,B, and C and the
following functional dependencies:

A→B
B→C

Now, suppose we decompose R into two relations:

R1(A,B)withFDA→B
R2(B,C) with FD B → C

In this case, the decomposition is dependency-preserving because all the


functional dependencies of the original relation R can be found in the
decomposed relations R1 and R2. We do not need to join R1 and R2 to
enforce or check any of the functional dependencies.

However, if we had a functional dependency in R, say A → C, which


cannot be determined from either R1 or R2 without joining them, then

43
the decomposition would not be dependency-preserving for that specific
FD.

3. Increased Complexity

 Decomposition leads to an increase in the number of tables, which


can complicate queries and maintenance tasks. While tools and
ORM (Object-Relational Mapping) libraries can mitigate this to
some extent, it still adds complexity.

4. Redundancy

 Incorrect decomposition might not eliminate redundancy, and in


some cases, can even introduce new redundancies.

5. Performance Overhead

 An increased number of tables, while aiding normalization, can


also lead to more complex SQL queries involving multiple joins,
which can introduce performance overheads.

Functional Dependency
44
The functional dependency is a relationship that exists between two
attributes. It typically exists between the primary key and non-key
attribute within a table.

X → Y

The left side of FD is known as a determinant, the right side of the


production is known as a dependent.

Example:

Assume we have an employee table with attributes: Emp_Id,


Emp_Name, Emp_Address.

Here Emp_Id attribute can uniquely identify the Emp_Name attribute of


employee table because if we know the Emp_Id, we can tell that
employee name associated with it.

Functional dependency can be written as:

Emp_Id → Emp_Name

We can say that Emp_Name is functionally dependent on Emp_Id.

45
Types of Functional dependency

1.Trivial functional dependency

o A → B has trivial functional dependency if B is a subset of A.


o The following dependencies are also trivial like: A → A, B → B

Example:
Consider a table with two columns Employee_Id and Employee_Name.
{Employee_id, Employee_Name}→ Employee_Id is a trivial functional
dependency as
Employee_Id is a subset of {Employee_Id, Employee_Name}.
Also, Employee_Id → Employee_Id and Employee_Name → Employ
ee_Name are trivial dependencies too.

2. Non-trivial functional dependency


A → B has a non-trivial functional dependency if B is not a subset of A.

46
When A intersection B is NULL, then A → B is called as complete non-
trivial.

Example:

ID → Name,
Name → DOB
Reasoning about Functional Dependancy:
Inference Rule (IR):

1.The Armstrong's axioms are the basic inference rule.

2.Armstrong's axioms are used to conclude functional dependencies on a


relational database.

3.The inference rule is a type of assertion. It can apply to a set of


FD(functional dependency) to derive other FD.

4.Using the inference rule, we can derive additional functional


dependency from the initial set.

The Functional dependency has 6 types of inference rule:

1. Reflexive Rule (IR1):

In the reflexive rule, if Y is a subset of X, then X determines Y.

If X ⊇ Y then X → Y

Example:

X = {a, b, c, d, e}

47
Y = {a, b, c}

2. Augmentation Rule (IR2):

The augmentation is also called as a partial dependency. In


augmentation, if X determines Y, then XZ determines YZ for any Z.

If X → Y then XZ → YZ
Example:

For R(ABCD), if A → B then AC → BC

3. Transitive Rule (IR3):

In the transitive rule, if X determines Y and Y determine Z, then X must


also determine Z.

If X → Y and Y → Z then X → Z

4. Union Rule (IR4):

Union rule says, if X determines Y and X determines Z, then X must


also determine Y and Z.

If X → Y and X → Z then X → YZ

Proof:

1.X→Y(given)
2.X→Z(given)
3.X→XY(using IR2 on 1 by augmentation with X. Where XX=X)

48
4.XY→YZ(using IR2 on 2 by augmentation with Y)
5. X → YZ (using IR3 on 3 and 4)
5. Decomposition Rule (IR5):

Decomposition rule is also known as project rule. It is the reverse of


union rule.

This Rule says, if X determines Y and Z, then X determines Y and X


determines Z separately.

If X → YZ then X → Y and X → Z

Proof:

1.X→YZ(given)
2.YZ→Y(usingIR1 Rule)
3. X → Y (using IR3 on 1 and 2)
6. Pseudo transitive Rule (IR6):

In Pseudo transitive Rule, if X determines Y and YZ determines W, then


XZ determines W.

If X → Y and YZ → W then XZ → W

Proof:

1.X→Y(given)
2.WY→Z(given)
3.WX→WY(using IR2 on 1 by augmenting with W)
4. WX → Z (using IR3 on 3 and 2)

49
Normalization:
o Normalization is the process of organizing the data in the database.
o Normalization is used to minimize the redundancy from a relation
or set of relations. It is also used to eliminate the undesirable
characteristics like Insertion, Update and Deletion Anomalies.
o Normalization divides the larger table into the smaller table and
links them using relationship.
o The normal form is used to reduce redundancy from the database
table.

Types of Normal Forms:


There are the four types of normal forms:

50
Normal Description
Form
1NF A relation is in 1NF if it contains an atomic value.
2NF A relation will be in 2NF if it is in 1NF and all non-key
attributes are fully functional dependent on the primary
key.
3NF A relation will be in 3NF if it is in 2NF and no transition
dependency exists.
4NF A relation will be in 4NF if it is in Boyce Codd normal
form and has no multi-valued dependency.
5NF A relation is in 5NF if it is in 4NF and not contains any
join dependency and joining should be lossless.

First Normal Form (1NF):


o A relation will be 1NF if it contains an atomic value.
o It states that an attribute of a table cannot hold multiple values. It
must hold only single-valued attribute.
o First normal form disallows the multi-valued attribute, composite
attribute, and their combinations.

Example: Relation EMPLOYEE is not in 1NF because of multi-valued


attribute EMP_PHONE.

EMPLOYEE table:
51
EMP_ID EMP_NAME EMP_PHONE EMP_STATE
14 John 7272826385, UP
9064738238
20 Harry 8574783832 Bihar
12 Sam 7390372389, Punjab
8589830302

The decomposition of the EMPLOYEE table into 1NF has been shown below:

EMP_ID EMP_NAME EMP_PHONE EMP_STATE


14 John 7272826385 UP
14 John 9064738238 UP
20 Harry 8574783832 Bihar
12 Sam 7390372389 Punjab
12 Sam 8589830302 Punjab

Second Normal Form (2NF):


o In the 2NF, relational must be in 1NF.
o In the second normal form, all non-key attributes are fully
functional dependent on the primary key

Example: Let's assume, a school can store the data of teachers and the
subjects they teach. In a school, a teacher can teach more than one
subject.

TEACHER table

TEACHER_ID SUBJECT TEACHER_AGE

52
25 Chemistry 30
25 Biology 30
47 English 35
83 Math 38
83 Computer 38
In the given table, non-prime attribute TEACHER_AGE is dependent on
TEACHER_ID which is a proper subset of a candidate key. That's why
it violates the rule for 2NF.

To convert the given table into 2NF, we decompose it into two tables:

TEACHER_DETAIL Table:
TEACHER_ID TEACHER_AGE
25 30
47 35
83 38

TEACHER_SUBJECT Table:
TEACHER_ID SUBJECT
25 Chemistry
25 Biology
47 English
83 Math
83 Computer
Third Normal Form (3NF):

53
o A relation will be in 3NF if it is in 2NF and not contain any
transitive partial dependency.
o 3NF is used to reduce the data duplication. It is also used to
achieve the data integrity.
o If there is no transitive dependency for non-prime attributes, then
the relation must be in third normal form.

A relation is in third normal form if it holds atleast one of the following


conditions for every non-trivial function dependency X → Y.

1. X is a super key.
2. Y is a prime attribute, i.e., each element of Y is part of some
candidate key.

Example:
EMPLOYEE_DETAIL table:

EMP_I EMP_NAM EMP_ZI EMP_STAT EMP_CIT


D E P E Y
222 Harry 201010 UP Noida
333 Stephan 02228 US Boston
444 Lan 60007 US Chicago
555 Katharine 06389 UK Norwich
666 John 462007 MP Bhopal

Super key in the table above:

54
1. {EMP_ID}, {EMP_ID, EMP_NAME}, {EMP_ID, EMP_NAME,
EMP_ZIP}....so on

Candidate key: {EMP_ID}

Non-Prime Attributes: In the given table, all attributes except


EMP_ID are non-prime.

Here, EMP_STATE & EMP_CITY dependent on EMP_ZIP and


EMP_ZIP dependent on EMP_ID. The non-prime attributes
(EMP_STATE, EMP_CITY) transitively dependent on super
key(EMP_ID). It violates the rule of third normal form.

That's why we need to move the EMP_CITY and EMP_STATE to


the new <EMPLOYEE_ZIP> table, with EMP_ZIP as a Primary
key.

EMPLOYEE Table:

EMP_ID EMP_NAME EMP_ZIP


222 Harry 201010
333 Stephan 02228
444 Lan 60007
555 Katharine 06389
666 John 462007

EMPLOYEE_ZIP Table:

55
EMP_ZIP EMP_STATE EMP_CITY
201010 UP Noida
02228 US Boston
60007 US Chicago
06389 UK Norwich
462007 MP Bhopal

Boyce Codd Normal Form (BCNF):


o BCNF is the advance version of 3NF. It is stricter than 3NF.
o A table is in BCNF if every functional dependency X → Y, X is
the super key of the table.
o For BCNF, the table should be in 3NF, and for every FD, LHS is
super key.
Example: Let's assume there is a company where employees work in
more than one department.

EMPLOYEE table:
EMP_ID EMP_COUNTRY EMP_DEPT DEPT_TYPE EMP_DEPT_NO
264 India Designing D394 283
264 India Testing D394 300
364 UK Stores D283 232
364 UK Developing D283 549

In the above table Functional dependencies are as follows:

1. EMP_ID → EMP_COUNTRY

56
2. EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}

Candidate key: {EMP-ID, EMP-DEPT}

The table is not in BCNF because neither EMP_DEPT nor EMP_ID


alone are keys.

To convert the given table into BCNF, we decompose it into three


tables:

EMP_COUNTRY table:

EMP_ID EMP_COUNTRY
264 India
264 India

EMP_DEPT table:

EMP_DEPT DEPT_TYPE EMP_DEPT_NO


Designing D394 283
Testing D394 300
Stores D283 232
Developing D283 549

EMP_DEPT_MAPPING table:

EMP_ID EMP_DEPT
D394 283
D394 300
D283 232
D283 549

Functional dependencies:

57
1. EMP_ID → EMP_COUNTRY
2. EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}

Candidate keys:

For the first table: EMP_ID


For the second table: EMP_DEPT
For the third table: {EMP_ID, EMP_DEPT}

Now, this is in BCNF because left side part of both the functional
dependencies is a key.

Multivalued Dependency:

o Multivalued dependency occurs when two attributes in a table are


independent of each other but, both depend on a third attribute.
o A multivalued dependency consists of at least two attributes that
are dependent on a third attribute that's why it always requires at
least three attributes.

Example: Suppose there is a bike manufacturer company which


produces two colors(white and black) of each model every year.

BIKE_MODEL MANUF_YEAR COLOR

M2011 2008 White

M2001 2008 Black

58
M3001 2013 White

M3001 2013 Black

M4006 2017 White

M4006 2017 Black

Here columns COLOR and MANUF_YEAR are dependent on


BIKE_MODEL and independent of each other.

In this case, these two columns can be called as multivalued dependent


on BIKE_MODEL. The representation of these dependencies is shown
below:

BIKE_MODEL → → MANUF_YEAR
BIKE_MODEL → → COLOR

This can be read as "BIKE_MODEL multidetermined MANUF_YEAR"


and "BIKE_MODEL multidetermined COLOR".

Fourth Normal Form (4NF):


o A relation will be in 4NF if it is in Boyce Codd normal form and
has no multi-valued dependency.

59
o For a dependency A → B, if for a single value of A, multiple
values of B exists, then the relation will be a multi-valued
dependency.

Example:

STUDENT

STU_ID COURSE HOBBY


21 Computer Dancing
21 Math Singing
34 Chemistry Dancing
74 Biology Cricket
59 Physics Hockey

The given STUDENT table is in 3NF, but the COURSE and HOBBY
are two independent entity. Hence, there is no relationship between
COURSE and HOBBY.

In the STUDENT relation, a student with STU_ID, 21 contains two


courses, Computer and Math and two hobbies, Dancing and Singing.
So there is a Multi-valued dependency on STU_ID, which leads to
unnecessary repetition of data.

So to make the above table into 4NF, we can decompose it into two
tables:

STUDENT_COURSE

60
STU_ID COURSE
21 Computer
21 Math
34 Chemistry
74 Biology
59 Physics

STUDENT_HOBBY

STU_ID HOBBY
21 Dancing
21 Singing
34 Dancing
74 Cricket
59 Hockey

Join Dependency:

1.Join decomposition is a further generalization of Multivalued


dependencies.

2.If the join of R1 and R2 over C is equal to relation R, then we can say
that a join dependency (JD) exists.

3.Where R1 and R2 are the decompositions R1(A, B, C) and R2(C, D)


of a given relations R (A, B, C, D).

4.Alternatively, R1 and R2 are a lossless decomposition of R.

61
5.A JD ⋈ {R1, R2,..., Rn} is said to hold over a relation R if R1, R2,.....,
Rn is a lossless-join decomposition.

6.The *(A, B, C, D), (C, D) will be a JD of R if the join of join's


attribute is equal to the relation R.

7.Here, *(R1, R2, R3) is used to indicate that relation R1, R2, R3 and so
on are a JD of R.

Fifth Normal Form (5NF):


o A relation is in 5NF if it is in 4NF and not contains any join
dependency and joining should be lossless.
o 5NF is satisfied when all the tables are broken into as many tables
as possible in order to avoid redundancy.
o 5NF is also known as Project-join normal form (PJ/NF).

Example
SUBJECT LECTURER SEMESTER
Computer Anshika Semester 1
Computer John Semester 1
Math John Semester 1
Math Akash Semester 2
Chemistry Praveen Semester 1

In the above table, John takes both Computer and Math class for
Semester 1 but he doesn't take Math class for Semester 2. In this case,
combination of all these fields required to identify a valid data.

62
Suppose we add a new Semester as Semester 3 but do not know about
the subject and who will be taking that subject so we leave Lecturer and
Subject as NULL. But all three columns together acts as a primary key,
so we can't leave other two columns blank.

So to make the above table into 5NF, we can decompose it into three
relations P1, P2 & P3:

P1

SEMESTER SUBJECT
Semester 1 Computer
Semester 1 Math
Semester 1 Chemistry
Semester 2 Math

P2

SUBJECT LECTURER
Computer Anshika
Computer John
Math John
Math Akash
Chemistry Praveen

P3

SEMSTER LECTURER
Semester 1 Anshika
Semester 1 John

63
Semester 1 John
Semester 2 Akash
Semester 1 Praveen

64

You might also like