0% found this document useful (0 votes)
19 views

Dbms 3 Notes

Uploaded by

22b61a6603
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views

Dbms 3 Notes

Uploaded by

22b61a6603
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 32

UNIT-3

The SQL UNION Operator


The UNION operator is used to combine the result-set of two or
more SELECT statements.
Every SELECT statement within UNION must have the same number of
columns
The columns must also have similar data types
The columns in every SELECT statement must also be in the same order
UNION Syntax
SELECT column_name(s) FROM table1
UNION
SELECT column_name(s) FROM table2;

• UNION ALL Syntax


• The UNION operator selects only distinct values by default. To allow
duplicate values, use UNION ALL:
• SELECT column_name(s) FROM table1
UNION ALL
SELECT column_name(s) FROM table2;

• The SQL SELECT DISTINCT Statement


• The SELECT DISTINCT statement is used to return only distinct
(different) values.
• Inside a table, a column often contains many duplicate values; and
sometimes you only want to list the different (distinct) values.

• SELECT DISTINCT Syntax


• SELECT DISTINCT column1, column2, ...
FROM table_name;
• The SQL ORDER BY Keyword
• The ORDER BY keyword is used to sort the result-set in ascending or
descending order.
• The ORDER BY keyword sorts the records in ascending order by default.
To sort the records in descending order, use the DESC keyword.
• ORDER BY Syntax
• SELECT column1, column2, ...
FROM table_name
ORDER BY column1, column2, ... ASC|DESC;

IS NULL Syntax
SELECT column_names
FROM table_name
WHERE column_name IS NULL;

IS NOT NULL Syntax


SELECT column_names
FROM table_name
WHERE column_name IS NOT NULL;

• The SQL UPDATE Statement


• The UPDATE statement is used to modify the existing records in a table.
• UPDATE Syntax
• UPDATE table_name
SET column1 = value1, column2 = value2, ...
WHERE condition;

• SELECT TOP 3 * FROM Customers;


• The SQL MIN() and MAX() Functions
• The MIN() function returns the smallest value of the selected column.
• The MAX() function returns the largest value of the selected column.
• MIN() Syntax
• SELECT MIN(column_name)
FROM table_name
WHERE condition;

MAX() Syntax
SELECT MAX(column_name)
FROM table_name
WHERE condition;
SELECT MIN(Price) AS SmallestPrice
FROM Products;
SELECT MAX(Price) AS LargestPrice
FROM Products;

• The SQL COUNT(), AVG() and SUM() Functions


• The COUNT() function returns the number of rows that matches a
specified criterion.
• COUNT() Syntax
SELECT COUNT(column_name)
FROM table_name
WHERE condition;
The AVG() function returns the average value of a numeric column.
AVG() Syntax
SELECT AVG(column_name)
FROM table_name
WHERE condition;

The SUM() function returns the total sum of a numeric column.


SUM() Syntax
SELECT SUM(column_name)
FROM table_name
WHERE condition;

The SQL LIKE Operator


The LIKE operator is used in a WHERE clause to search for a specified
pattern in a column.
There are two wildcards often used in conjunction with
the LIKE operator:
The percent sign (%) represents zero, one, or multiple characters
The underscore sign (_) represents one, single character
SQL Aliases
SQL aliases are used to give a table, or a column in a table, a temporary
name.
Aliases are often used to make column names more readable.
An alias only exists for the duration of that query.
An alias is created with the AS keyword.
Alias Column Syntax
SELECT column_name AS alias_name
FROM table_name;

The SQL IN Operator


The IN operator allows you to specify multiple values in
a WHERE clause.
The IN operator is a shorthand for multiple OR conditions.
IN Syntax
SELECT column_name(s)
FROM table_name
WHERE column_name IN (value1, value2, ...);
SELECT * FROM Customers
WHERE Country IN ('Germany', 'France', 'UK');
SELECT * FROM Customers
WHERE Country NOT IN ('Germany', 'France', 'UK');
SELECT * FROM Customers
WHERE Country IN (SELECT Country FROM Suppliers);

The SQL BETWEEN Operator


The BETWEEN operator selects values within a given range. The values
can be numbers, text, or dates.
The BETWEEN operator is inclusive: begin and end values are
included.
BETWEEN Syntax
SELECT column_name(s)
FROM table_name
WHERE column_name BETWEEN value1 AND value2;

SELECT * FROM Products


WHERE Price BETWEEN 10 AND 20;
SQL JOIN
A JOIN clause is used to combine rows from two or more tables, based
on a related column between them.

• Different Types of SQL JOINs


• Here are the different types of the JOINs in SQL:
• (INNER) JOIN: Returns records that have matching values in both tables
• LEFT (OUTER) JOIN: Returns all records from the left table, and the
matched records from the right table
• RIGHT (OUTER) JOIN: Returns all records from the right table, and the
matched records from the left table
FULL (OUTER) JOIN: Returns all records when there is a match in
either left or right table
• SQL INNER JOIN Keyword
• The INNER JOIN keyword selects records that have matching values in
both tables.
• INNER JOIN Syntax
• SELECT column_name(s)
FROM table1
INNER JOIN table2
ON table1.column_name = table2.column_name;

SELECT Orders.OrderID, Customers.CustomerName


FROM Orders
INNER JOIN Customers ON Orders.CustomerID =
Customers.CustomerID;

SQL LEFT JOIN Keyword


The LEFT JOIN keyword returns all records from the left table (table1),
and the matching records from the right table (table2).
The result is 0 records from the right side, if there is no match.
LEFT JOIN Syntax
SELECT column_name(s)
FROM table1
LEFT JOIN table2
ON table1.column_name = table2.column_name;

SELECT Customers.CustomerName, Orders.OrderID


FROM Customers
LEFT JOIN Orders ON Customers.CustomerID = Orders.CustomerID
ORDER BY Customers.CustomerName;

SELECT A.CustomerName AS CustomerName1,


B.CustomerName AS CustomerName2, A.City
FROM Customers A, Customers B
WHERE A.CustomerID <> B.CustomerID
AND A.City = B.City
ORDER BY A.City;
create table s22 (eno int,ename VARCHAR(20),mngrno int);
SELECT A.ENAME AS 'EMPLOYEE',B.ENAME AS 'WORKS FOR' FROM
S22 A,S22 B WHERE A.MNGRNO=B.ENO;

• The SQL GROUP BY Statement


• The GROUP BY statement groups rows that have the same values into
summary rows, like "find the number of customers in each country".
• The GROUP BY statement is often used with aggregate functions
(COUNT(), MAX(), MIN(), SUM(), AVG()) to group the result-set by one or
more columns.
• GROUP BY Syntax
• SELECT column_name(s)
FROM table_name
WHERE condition
GROUP BY column_name(s)
ORDER BY column_name(s);

• SELECT COUNT(CustomerID), Country


FROM Customers
GROUP BY Country;
• SELECT COUNT(CustomerID), Country
FROM Customers
GROUP BY Country
ORDER BY COUNT(CustomerID) DESC;
• The SQL HAVING Clause
• The HAVING clause was added to SQL because the WHERE keyword
cannot be used with aggregate functions.

HAVING Syntax

SELECT column_name(s)
FROM table_name
WHERE condition
GROUP BY column_name(s)
HAVING condition
ORDER BY column_name(s);

SELECT COUNT(CustomerID), Country


FROM Customers
GROUP BY Country
HAVING COUNT(CustomerID) > 5;

• The SQL EXISTS Operator


• The EXISTS operator is used to test for the existence of any record in a
subquery.
• The EXISTS operator returns TRUE if the subquery returns one or more
records.
• EXISTS Syntax
• SELECT column_name(s)
FROM table_name
WHERE EXISTS
(SELECT column_name FROM table_name WHERE condition);

• SELECT SupplierName
FROM Suppliers
WHERE EXISTS (SELECT ProductName FROM Products WHERE Produ
cts.SupplierID = Suppliers.supplierID AND Price < 20);
The SQL ANY and ALL Operators

The ANY and ALL operators allow you to perform a comparison between
a single column value and a range of other values.
The SQL ANY Operator
The ANY operator:
returns a boolean value as a result
returns TRUE if ANY of the subquery values meet the condition
ANY means that the condition will be true if the operation is true for any
of the values in the range.

• SELECT column_name(s)
FROM table_name
WHERE column_name operator ALL
(SELECT column_name
FROM table_name
WHERE condition);
• SQL SELECT INTO Examples
• The following SQL statement creates a backup copy of Customers:
• SELECT * INTO CustomersBackup2017
FROM Customers;

Normalization
Normalization is the process of organizing the data in the database.
Normalization is used to minimize the redundancy from a relation or set
of relations. It is also used to eliminate the undesirable characteristics
like Insertion, Update and Deletion Anomalies.
Normalization divides the larger table into the smaller table and links
them using relationship.
The normal form is used to reduce redundancy from the database table.

Purpose of Normalization
Normalization is the process of structuring and handling the
relationship between data to minimize redundancy in the relational
table and avoid the unnecessary anomalies properties from the database
like insertion, update and delete.
It helps to divide large database tables into smaller tables and make a
relationship between them. It can remove the redundant data and ease
to add, manipulate or delete table fields.

A normalization defines rules for the relational table as to whether it


satisfies the normal form.
A normal form is a process that evaluates each relation against defined
criteria and removes the multivalued, joins, functional and trivial
dependency from a relation.
If any data is updated, deleted or inserted, it does not cause any
problem for database tables and help to improve the relational table'
integrity and efficiency.

Objective of Normalization
It is used to remove the duplicate data and database anomalies from the
relational table.
Normalization helps to reduce redundancy and complexity by examining
new data types used in the table.
It is helpful to divide the large database table into smaller tables and
link them using relationship.
It avoids duplicate data or no repeating groups into a table.
It reduces the chances for anomalies to occur in a database.
Types of Anomalies

2. Insert Anomaly:
An insert anomaly occurs in the relational database when some
attributes or data items are to be inserted into the database without
existence of other attributes.
For example, In the Student table, if we want to insert a new courseID,
we need to wait until the student enrolled in a course. In this way, it is
difficult to insert new record in the table. Hence, it is called insertion
anomalies.

3. Update Anomalies:
The anomaly occurs when duplicate data is updated only in one place
and not in all instances. Hence, it makes our data or table inconsistent
state.
For example, suppose there is a student 'James' who belongs to Student
table. If we want to update the course in the Student, we need to update
the same in the course table; otherwise, the data can be inconsistent.
And it reflects the changes in a table with updated values where some of
them will not.

4. Delete Anomalies:
An anomaly occurs in a database table when some records are lost or
deleted from the database table due to the deletion of other records. For
example, if we want to remove Trent Bolt from the Student table, it also
removes his address, course and other details from the Student table.
Therefore, we can say that deleting some attributes can remove other
attributes of the database table.
So, we need to avoid these types of anomalies from the tables and
maintain the integrity, accuracy of the database table. Therefore, we use
the normalization concept in the database management system.

Types of Normal Forms


There are the four types of normal forms:

Closure Set of an Attribute


It is used to find out how many attributes can be searched .
(AB)+ ---closure set of AB
With AB what other attributes we can search.
This can be used for finding the candidate key which is very important
to understand normalization.

First Normal Form


Rules for First Normal Form
The first normal form expects you to follow a few simple rules while
designing your database, and they are:
Rule 1: Single Valued Attributes
Each column of your table should be single valued which means they
should not contain multiple values. We will explain this with help of an
example later, let's see the other rules for now.
Rule 2: Attribute Domain should not change
This is more of a "Common Sense" rule. In each column the values
stored must be of the same kind or type.
• For example: If you have a column dob to save date of births of a set of
people, then you cannot or you must not save 'names' of some of them
in that column along with 'date of birth' of others in that column. It
should hold only 'date of birth' for all the records/rows.
• Rule 3: Unique name for Attributes/Columns
• This rule expects that each column in a table should have a unique
name. This is to avoid confusion at the time of retrieving data or
performing any other operation on the stored data.
• If one or more columns have same name, then the DBMS system will be
left confused.
• Rule 4: Order doesn't matters
• This rule says that the order in which you store the data in your table
doesn't matter.
Second Normal Form (2NF)
In the 2NF, relation must be in 1NF.
All non prime attributes should depend on whole of candidate key and
not on partial key.
If an attribute depends on only part of candidate key then it is partial
dependency.
In 2NF no partial dependency should exists

Example:
R(A B C D)
AB->D
B->C
Find essential attribute and C.key
(AB)=ABCD
Prime attribute—A B
Non Prime attribute --- C D

Prime attribute—A B Non Prime attribute --- C D


R(A B C D)
AB->D
B->C
AB->D D depends on AB ---no issue
B->C C depends on only B and not AB, so it is partial
dependency.
So table is not in 2NF
• So get this in 2NF(decomposition)
• R(A B C D)
• AB->D
• B->C
• R1(A B D) R2(B C)

• R(ABCDE)
• AB->C
• D->E
• AB->C IS PD
• D->E Is PD
• So not in 2NF
• So decompose

R(ABCDE)
AB->C
D->E.
Try This
R(ABCDE)
A->B
B->E
C->D

NO NOT IN 2NF
R(ABC
DEFGH
IJ)

AB->C
AD->GH
BD->EF
A->I
H->J
Third Normal Form (3NF)
A relation will be in 3NF if it is in 2NF and not contain any transitive
partial dependency.
3NF is used to reduce the data duplication. It is also used to achieve the
data integrity.
If there is no transitive dependency for non-prime attributes, then the
relation must be in third normal form.

A Table is said to be in 3NF only when it is in 2NF and should not have
any transitive dependency.
TD means a non prime attribute depending on non prime attribute( like
an irregular stud depends on another irregular stud)
R(ABC)
A->B
B->C
A- CKEY PA=A NPA=BC
3NF

R(ABCBDE)
A->B
B->E
C->D
AC-ESSENTIAL
(AC)=ABCDE---C.KEY

BCNF—BOYCE CODD NORMAL FORM


R(ABC)
AB->C
C->B
(A)=*
(AB)=ABC
(AC)=ABC
AB-> NO PD
C->B NO PD BCOS C-PRIME B-IS PRIME
SO IN 2NF

AB->C --- NO TD
C->B----NO TD
If there exists FD from a->b
Then a should be super key and b can be anything
A prime attribute shud not depend on prime attribute
Decompose—R1(ab) R2(AC) R3(BC)

R(ABCDE)---CHECK THIS
AB->CD
D->A
BC->DE

R(ABCDE) ----CHECK THIS


BC->ADE
D->B
Fourth normal form (4NF):
Fourth normal form (4NF) is a level of database normalization where
there are no non-trivial multivalued dependencies other than a
candidate key.
It builds on the first three normal forms (1NF, 2NF and 3NF) and the
Boyce-Codd Normal Form (BCNF).
It states that, in addition to a database meeting the requirements of
BCNF, it must not contain more than one multivalued dependency.
Properties – A relation R is in 4NF if and only if the following conditions
are satisfied:
1. It should be in the Boyce-Codd Normal Form (BCNF).
2. the table should not have any Multi-valued Dependency.

A table with a multivalued dependency violates the normalization


standard of Fourth Normal Form (4NK) because it creates unnecessary
redundancies and can contribute to inconsistent data. To bring this up
to 4NF, it is necessary to break this information into two tables.
Here for certain values of course there are certain values for instructor
and textbook, but there exists no relation between instructor and
textbook. So it is called multi value dependency.
Course->-> instructor
Course ->-> textbook
Functional Dependency
Functional dependency (FD) is a set of constraints between two
attributes in a relation. Functional dependency says that if two
tuples have same values for attributes A1, A2,..., An, then those
two tuples must have to have same values for attributes B1, B2,
..., Bn.

Functional dependency is represented by an arrow sign (→) that


is, X→Y, where X functionally determines Y. The left-hand side
attributes determine the values of attributes on the right-hand
side.

Trivial Functional Dependency


 Trivial − If a functional dependency (FD) X → Y holds,
where Y is a subset of X, then it is called a trivial FD.
Trivial FDs always hold.

 Non-trivial − If an FD X → Y holds, where Y is not a subset


of X, then it is called a non-trivial FD.

Normalization
If a database design is not perfect, it may contain anomalies,
which are like a bad dream for any database administrator.
Managing a database with anomalies is next to impossible.

 Update anomalies − If data items are scattered and are not


linked to each other properly, then it could lead to strange
situations. For example, when we try to update one data
item having its copies scattered over several places, a few
instances get updated properly while a few others are left
with old values. Such instances leave the database in an
inconsistent state.

 Deletion anomalies − We tried to delete a record, but parts


of it was left undeleted because of unawareness, the data is
also saved somewhere else.

 Insert anomalies − We tried to insert data in a record that


does not exist at all.

Normalization is a method to remove all these anomalies and bring


the database to a consistent state.

First Normal Form


First Normal Form is defined in the definition of relations (tables)
itself. This rule defines that all the attributes in a relation must
have atomic domains. The values in an atomic domain are

indivisible units.
We re-arrange the relation (table) as below, to convert it to First
Normal Form.

Second Normal Form


Before we learn about the second normal form, we need to
understand the following −

 Prime attribute − An attribute, which is a part of the


candidate-key, is known as a prime attribute.

 Non-prime attribute − An attribute, which is not a part of the


prime- key, is said to be a non-prime attribute.

If we follow second normal form, then every non-prime attribute


should be fully functionally dependent on prime key attribute.
That is, if X → A holds, then there should not be any proper
subset Y of X, for which Y → A also holds true.
We see here in Student_Project relation that the prime key
attributes are Stu_ID and Proj_ID. According to the rule, non-key
attributes, i.e. Stu_Name and Proj_Name must be dependent
upon both and not on any of the prime key attribute individually.
But we find that Stu_Name can be identified by Stu_ID and
Proj_Name can be identified by Proj_ID independently. This is
called partial dependency, which is not allowed in Second Normal
Form.

We broke the relation in two as depicted in the above picture. So


there exists no partial dependency.

Third Normal Form


For a relation to be in Third Normal Form, it must be in Second
Normal form and the following must satisfy −

 No non-prime attribute is transitively dependent on


prime key attribute.
 For any non-trivial functional dependency, X → A, then either

o X is a superkey or,
o A is prime attribute.
We find that in the above Student_detail relation, Stu_ID is the
key and only prime key attribute. We find that City can be
identified by Stu_ID as well as Zip itself. Neither Zip is a superkey
nor is City a prime attribute. Additionally, Stu_ID → Zip → City,
so there exists transitive dependency.

To bring this relation into third normal form, we break the relation
into two relations as follows −

Boyce-Codd Normal Form


Boyce-Codd Normal Form (BCNF) is an extension of Third Normal
Form on strict terms. BCNF states that −

 For any non-trivial functional dependency, X → A, X must


be a super- key.
In the above image, Stu_ID is the super-key in the relation
Student_Detail and Zip is the super-key in the relation ZipCodes.
So,

Stu_ID → Stu_Name,

Zip and

Zip → City

Which confirms that both the relations are in BCNF.

Fourth Normal Form (4NF)

In the fourth normal form,


 It should meet all the requirement of 3NF
 Attribute of one or more rows in the table should not result
in more than one rows of the same table leading to multi-
valued dependencies

To understand it clearly, consider a table with Subject, Lecturer


who teaches each subject and recommended Books for each
subject.

If we observe the data in the table above it satisfies 3NF. But


LECTURER and BOOKS are two independent entities here. There
is no relationship between Lecturer and Books. In the above
example, either Alex or Bosco can teach Mathematics. For
Mathematics subject , student can refer either 'Maths Book1' or
'Maths Book2'. i.e.;

SUBJECT -->

LECTURER

SUBJECT-->BOOKS

This is a multivalued dependency on SUBJECT. If we need to


select both lecturer and books recommended for any of the
subject, it will show up (lecturer, books) combination, which
implies lecturer who recommends which book. This is not correct.

To eliminate this dependency, we divide the table into two as below:

You might also like