UNIT-3
UNIT-3
Syntax
SELECT [DISTINCT] column1, column2, ...
FROM tablename
WHERE condition;
1. SELECT Clause: This is where you specify the columns you want
to retrieve. Use an asterisk (*) to retrieve all columns.
2. FROM Clause: This specifies from which table or tables you want
to retrieve the data.
1
the SQL without DISTINCT operator then it does not eliminate the
duplicates.
FROM: Specifies the table from which you're retrieving the data.
GROUP BY: Groups rows that have the same values in specified
columns.
To provide a more holistic view, here are a few more SQL examples,
keeping them as basic as possible:
Syntax
2
2. Retrieve specific columns from a table:
Syntax
Syntax
Syntax
3
Examples:
Finds Names that start or ends with "a“
Finds names that start with "a" and are at least 3 characters in length.
Here's a breakdown of how you can use these wildcards with the `LIKE`
operator:
SELECT column_name
FROM table_name
WHERE column_name LIKE 'pattern%';
4
For example, to find all customers whose names start with "Ma":
Example
SELECT FirstName
FROM Customers
WHERE FirstName LIKE 'Ma%';
2. Find values that end with a specific pattern:
Syntax
SELECT column_name
FROM table_name
WHERE column_name LIKE '%pattern';
Example
SELECT ProductName
FROM Products
WHERE ProductName LIKE '%ing';
SELECT column_name
FROM table_name
WHERE column_name LIKE '%pattern%';
5
Example, to find all books that have the word "life" anywhere in the
title:
Example
SELECT BookTitle
FROM Books
WHERE BookTitle LIKE '%life%';
SELECT column_name
FROM table_name
WHERE column_name LIKE 'p_ttern';
For instance, if you're looking for a five-letter word where you know the
first letter is "h" and the third letter is "l", you could use:
Example
SELECT Word
FROM Words
WHERE Word LIKE 'h_l__';
6
Combining `%` and `_`
You can use both wildcards in the same pattern. For example, to find
any value that starts with "A", followed by two characters, and then "o":
Example
SELECT column_name
FROM table_name
WHERE column_name LIKE 'A__o%';
UNION in DBMS
The UNION operator in DBMS is used to combine the result sets of two
or more SELECT statements. However, it will only select distinct
values. The UNION operator selects only distinct values by default. If
you want to allow duplicate values, you can use UNION ALL.
Syntax
7
column City and the Suppliers table also has a column City, we can use
a UNION to get a list of all cities:
Example:
This would return a list of cities, with each city listed only once, even if
it appears in both the Customers and Suppliers tables.
Remember, the number and order of the columns, as well as the data
types of the corresponding columns, must be the same in all the
SELECT statements that you're combining with UNION.
Example:
In this case, a city would be listed once for every time it appears in
either the Customers or Suppliers table.
8
INTERSECT in DBMS
The INTERSECT operator in a DBMS is used to combine two
SELECT statements and return only the records that are common to
both.
The basic syntax of the INTERSECT clause in SQL is:
Syntax
INTERSECT
For example, if you have two tables, Orders and Deliveries, and you
want to find all orders that have been delivered (assuming order_id is a
common column), you could write:
Example:
This would return a list of order_ids that appear in both the Orders and
Deliveries tables.
9
Here are some key points about the INTERSECT operator:
The number and order of columns, and the data types in both the
SELECT statements should be the same.
Syntax
For example, if you have two tables, Orders and Deliveries, and you
want to find all orders that have been delivered (assuming order_id is a
common column), you could write:
10
Example:
This would return a list of order_ids that appear in both the Orders and
Deliveries tables.
EXCEPT in DBMS
The EXCEPT operator in a DBMS is used to return the difference
between two SELECT statements. It returns the records from the first
SELECT statement that are not present in the second SELECT
statement.
Syntax
11
Example:
This would return a list of order_ids that appear in the Orders table but
not in the Deliveries table.
The number and order of columns, and the data types in both
SELECT statements should match.
It only returns records from the first SELECT statement that are
not in the second SELECT statement.
Just like the INTERSECT operator, all DBMSs does not support
the EXCEPT operator. MySQL doesn't support the EXCEPT operator
directly, you can simulate EXCEPT using a combination of LEFT
JOIN or NOT EXISTS.
12
Syntax
For example, if you have two tables, Orders and Deliveries, and you
want to find all orders that have not been delivered yet (assuming
order_id is a common column), you could write:
Example:
This would return a list of OrderIDs that appear in the Orders table but
not in the Deliveries table.
13
Aggregation Operators in DBMS
In a DBMS, aggregation operators are used to perform operations on a
group of values to return a single summarizing value. The most common
aggregation operators include COUNT, SUM, AVG, MIN, and MAX.
Here are some examples of how you might use these operators:
COUNT:
Syntax:COUNT(expression)
Example:
This query would return the total number of rows in the Employees
table.
SUM
Returns the total sum of a numeric column.
Syntax
SUM(expression)
Example:
14
This query would return the total sum of the salary column values in the
Employees table.
AVG
Returns the average value of a numeric column.
Syntax
AVG(expression)
Example:
This query would return the average salary from the Employees table.
MIN
Returns the smallest value of the selected column.
Syntax
MIN(expression)
Example:
This query would return the lowest salary from the Employees table.
MAX
Returns the largest value of the selected column.
15
Syntax
MAX(expression)
Example:
This query would return the highest salary from the Employees table.
Syntax
MIN(expression)
Example:
This query would return the lowest salary from the Employees table.
MAX
Returns the largest value of the selected column.
Syntax
MAX(expression)
Example:
This query would return the highest salary from the Employees table.
16
These aggregation operators are often used with the GROUP BY clause
to group the result-set by one or more columns. For example, to find the
highest salary in each department, you could write:
Example:
This query would return the highest salary for each department in the
Employees table.
17
1. Using CHECK Constraints
Ensuring a range: You might want a column to only have values within
a certain range.
Example:
Example:
18
Example:
4. Using TRIGGERS
A trigger is a procedural code in a database that automatically executes
in response to certain events on a particular table or view. Essentially,
triggers are special types of stored procedures that run automatically
when an INSERT, UPDATE, or DELETE operation occurs.
Triggers
A trigger is a predefined action that the database automatically executes
in response to certain events on a particular table or view. Triggers are
typically used to maintain the integrity of the data, automate data-related
tasks, and extend the database functionalities.
There are various types of triggers based on when they are executed:
Here's the basic syntax for creating a trigger in SQL, using MySQL as an
Syntax
21
Key Features of Triggers
Example of a Trigger
Suppose we have an `Employees` table and we want to maintain
an `AuditLog` table that keeps a record of salary changes for
employees.
Employees Table
22
AuditLog Table
23
How the Trigger Works
- The trigger is named `AfterSalaryUpdate`.
- It activates `AFTER` an `UPDATE` on the `Employees` table.
- It compares the old and new salary values. If there's a change
(`OLD.Salary != NEW.Salary`), it inserts a new record into the
`AuditLog` table with the details of the change and the current date and
time (`NOW()`).
Active Databases
24
2. Reactive Behavior: The database can react to changes without
external applications or users having to intervene, thanks to the
ECA rules.
25
Schema Refinement
Problems caused by redundancy:
Insertion anomaly
Deletion anomaly
Updation anomaly
26
Insertion Anomaly
Deletion Anomaly
If the details of students in this table are deleted then the details of the
college will also get deleted which should not occur by common
sense. This anomaly happens when the deletion of a data record results
in losing some unrelated information that was stored as part of the
record that was deleted from a table.
It is not possible to delete some information without losing some other
information in the table as well.
27
Updation Anomaly
Suppose the rank of the college changes then changes will have to be
all over the database which will be time-consuming and
computationally costly.
All places should be updated, If updation does not occur at all places
then the database will be in an inconsistent state.
Redundancy in a database occurs when the same data is stored in
multiple places. Redundancy can cause various problems such as data
inconsistencies, higher storage requirements, and slower data retrieval.
29
troubleshoot and resolve issues and can require more time and
resources to maintain the database.
Data Duplication: Redundancy can lead to data duplication, where
the same data is stored in multiple locations, resulting in wasted
storage space and increased maintenance complexity. This can also
lead to confusion and errors, as different copies of the data may have
different values or be out of sync.
Data Integrity: Redundancy can also compromise data integrity, as
changes made to one copy of the data may not be reflected in the
other copies. This can result in inconsistencies and errors and can
make it difficult to ensure that the data is accurate and up-to-date.
Usability Issues: Redundancy can also create usability issues, as
users may have difficulty accessing the correct version of the data or
may be confused by inconsistencies and errors. This can lead to
frustration and decreased productivity, as users spend more time
searching for the correct data or correcting errors.
To prevent redundancy in a database, normalization techniques can be
used. Normalization is the process of organizing data in a database to
eliminate redundancy and improve data
integrity. Normalization involves breaking down a larger table into
smaller tables and establishing relationships between them. This
reduces redundancy and makes the database more efficient and reliable.
30
Advantages of data redundancy in DBMS
31
o Deleting Unused Data: It is important to remove redundant data
from the database as it generates data redundancy in the DBMS. It
is a good practice to remove unwanted data to reduce redundancy.
o Master Data: The data administrator shares master data across
multiple systems. Although it does not remove data redundancy,
but it updates the redundant data whenever the data is changed.
Decomposition
32
Decomposition in DBMS
Types of Decomposition
There are two types of Decomposition:
Lossless Decomposition
Lossy Decomposition
Types of Decomposition
33
Lossless Decomposition
If the information is not lost from the relation that is
decomposed, then the decomposition will be lossless.
The lossless decomposition guarantees that the join of relations
will result in the same relation as it was decomposed.
The relation is said to be lossless decomposition if natural joins
of all the decomposition give the original relation.
Example:
EMPLOYEE_DEPARTMENT table:
34
52 Katherine 36 Mumbai 575 Production
EMPLOYEE table:
22 Denim 28 Mumbai
33 Alina 25 Delhi
46 Stephan 30 Bangalore
52 Katherine 36 Mumbai
60 Jack 40 Noida
DEPARTMENT table
827 22 Sales
438 33 Marketing
869 46 Finance
35
575 52 Production
678 60 Testing
Now, when these two relations are joined on the common column
"EMP_ID", then the resultant relation will look like:
Employee ⋈ Department
Example:
55 16 27
48 52 89
36
Now we decompose this relation into two sub relations R1 and R2
R1(A, B)
A B
55 16
48 52
R2(B, C)
B C
16 27
52 89
After performing the Join operation we get the same original relation
A B C
55 16 27
48 52 89
37
Example: Consider a relation R(A,B,C) with the following data:
|A |B |C |
|----|----|----|
|1 |X |P |
|1 |Y |P |
|2 |Z |Q |
Suppose we decompose R into R1(A,B) and R2(A,C).
R1(A, B):
|A |B |
|----|----|
|1 |X |
|1 |Y |
|2 |Z |
R2(A, C):
|A |C |
|----|----|
|1 |P |
|1 |P |
|2 |Q |
Now, if we take the natural join of R1 and R2 on attribute A, we get
back the original relation R. Therefore, this is a lossless decomposition.
Dependency Preserving
o It is an important constraint of the database.
38
o In the dependency preservation, at least one decomposed table
must satisfy every dependency.
o If a relation R is decomposed into relation R1 and R2, then the
dependencies of R either must be a part of R1 or R2 or must be
derivable from the combination of functional dependencies of R1
and R2.
o For example, suppose there is a relation R (A, B, C, D) with
functional dependency set (A->BC). The relational R is
decomposed into R1(ABC) and R2(AD) which is dependency
preserving because FD A->BC is a part of relation R1(ABC).
o Dependency Preservation: A Decomposition D = { R1, R2,
R3…Rn } of R is dependency preserving wrt a set F of Functional
dependency if
39
40
Problem:
Let a relation R (A, B, C, D ) and functional dependency {AB –>
C, C –> D, D –> A}. Relation R is decomposed into R1( A, B, C) and
R2(C, D). Check whether decomposition is dependency preserving or
not.
Solution:
closure(A) = { A } // Trivial
closure(B) = { B } // Trivial
closure(C) = {C, A, D} but D can't be in closure as D is not present R1.
= {C, A}
C--> A // Removing C from right side as it is trivial attribute
closure(AB) = {A, B, C, D}
= {A, B, C}
AB --> C // Removing AB from right side as these are trivial attributes
41
closure(BC) = {B, C, D, A}
= {A, B, C}
BC --> A // Removing BC from right side as these are trivial attributes
closure(AC) = {A, C, D}
NULL SET
42
Once tables are decomposed, certain functional dependencies
might not be preserved, which can lead to the inability to enforce
specific integrity constraints.
Example: Let's consider a relation R with attributes A,B, and C and the
following functional dependencies:
A→B
B→C
R1(A,B)withFDA→B
R2(B,C) with FD B → C
43
the decomposition would not be dependency-preserving for that specific
FD.
3. Increased Complexity
4. Redundancy
5. Performance Overhead
Functional Dependency
44
The functional dependency is a relationship that exists between two
attributes. It typically exists between the primary key and non-key
attribute within a table.
X → Y
Example:
Emp_Id → Emp_Name
45
Types of Functional dependency
Example:
Consider a table with two columns Employee_Id and Employee_Name.
{Employee_id, Employee_Name}→ Employee_Id is a trivial functional
dependency as
Employee_Id is a subset of {Employee_Id, Employee_Name}.
Also, Employee_Id → Employee_Id and Employee_Name → Employ
ee_Name are trivial dependencies too.
46
When A intersection B is NULL, then A → B is called as complete non-
trivial.
Example:
ID → Name,
Name → DOB
Reasoning about Functional Dependancy:
Inference Rule (IR):
If X ⊇ Y then X → Y
Example:
X = {a, b, c, d, e}
47
Y = {a, b, c}
If X → Y then XZ → YZ
Example:
If X → Y and Y → Z then X → Z
If X → Y and X → Z then X → YZ
Proof:
1.X→Y(given)
2.X→Z(given)
3.X→XY(using IR2 on 1 by augmentation with X. Where XX=X)
48
4.XY→YZ(using IR2 on 2 by augmentation with Y)
5. X → YZ (using IR3 on 3 and 4)
5. Decomposition Rule (IR5):
If X → YZ then X → Y and X → Z
Proof:
1.X→YZ(given)
2.YZ→Y(usingIR1 Rule)
3. X → Y (using IR3 on 1 and 2)
6. Pseudo transitive Rule (IR6):
If X → Y and YZ → W then XZ → W
Proof:
1.X→Y(given)
2.WY→Z(given)
3.WX→WY(using IR2 on 1 by augmenting with W)
4. WX → Z (using IR3 on 3 and 2)
49
Normalization:
o Normalization is the process of organizing the data in the database.
o Normalization is used to minimize the redundancy from a relation
or set of relations. It is also used to eliminate the undesirable
characteristics like Insertion, Update and Deletion Anomalies.
o Normalization divides the larger table into the smaller table and
links them using relationship.
o The normal form is used to reduce redundancy from the database
table.
50
Normal Description
Form
1NF A relation is in 1NF if it contains an atomic value.
2NF A relation will be in 2NF if it is in 1NF and all non-key
attributes are fully functional dependent on the primary
key.
3NF A relation will be in 3NF if it is in 2NF and no transition
dependency exists.
4NF A relation will be in 4NF if it is in Boyce Codd normal
form and has no multi-valued dependency.
5NF A relation is in 5NF if it is in 4NF and not contains any
join dependency and joining should be lossless.
EMPLOYEE table:
51
EMP_ID EMP_NAME EMP_PHONE EMP_STATE
14 John 7272826385, UP
9064738238
20 Harry 8574783832 Bihar
12 Sam 7390372389, Punjab
8589830302
The decomposition of the EMPLOYEE table into 1NF has been shown below:
Example: Let's assume, a school can store the data of teachers and the
subjects they teach. In a school, a teacher can teach more than one
subject.
TEACHER table
52
25 Chemistry 30
25 Biology 30
47 English 35
83 Math 38
83 Computer 38
In the given table, non-prime attribute TEACHER_AGE is dependent on
TEACHER_ID which is a proper subset of a candidate key. That's why
it violates the rule for 2NF.
To convert the given table into 2NF, we decompose it into two tables:
TEACHER_DETAIL Table:
TEACHER_ID TEACHER_AGE
25 30
47 35
83 38
TEACHER_SUBJECT Table:
TEACHER_ID SUBJECT
25 Chemistry
25 Biology
47 English
83 Math
83 Computer
Third Normal Form (3NF):
53
o A relation will be in 3NF if it is in 2NF and not contain any
transitive partial dependency.
o 3NF is used to reduce the data duplication. It is also used to
achieve the data integrity.
o If there is no transitive dependency for non-prime attributes, then
the relation must be in third normal form.
1. X is a super key.
2. Y is a prime attribute, i.e., each element of Y is part of some
candidate key.
Example:
EMPLOYEE_DETAIL table:
54
1. {EMP_ID}, {EMP_ID, EMP_NAME}, {EMP_ID, EMP_NAME,
EMP_ZIP}....so on
EMPLOYEE Table:
EMPLOYEE_ZIP Table:
55
EMP_ZIP EMP_STATE EMP_CITY
201010 UP Noida
02228 US Boston
60007 US Chicago
06389 UK Norwich
462007 MP Bhopal
EMPLOYEE table:
EMP_ID EMP_COUNTRY EMP_DEPT DEPT_TYPE EMP_DEPT_NO
264 India Designing D394 283
264 India Testing D394 300
364 UK Stores D283 232
364 UK Developing D283 549
1. EMP_ID → EMP_COUNTRY
56
2. EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}
EMP_COUNTRY table:
EMP_ID EMP_COUNTRY
264 India
264 India
EMP_DEPT table:
EMP_DEPT_MAPPING table:
EMP_ID EMP_DEPT
D394 283
D394 300
D283 232
D283 549
Functional dependencies:
57
1. EMP_ID → EMP_COUNTRY
2. EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}
Candidate keys:
Now, this is in BCNF because left side part of both the functional
dependencies is a key.
Multivalued Dependency:
58
M3001 2013 White
BIKE_MODEL → → MANUF_YEAR
BIKE_MODEL → → COLOR
59
o For a dependency A → B, if for a single value of A, multiple
values of B exists, then the relation will be a multi-valued
dependency.
Example:
STUDENT
The given STUDENT table is in 3NF, but the COURSE and HOBBY
are two independent entity. Hence, there is no relationship between
COURSE and HOBBY.
So to make the above table into 4NF, we can decompose it into two
tables:
STUDENT_COURSE
60
STU_ID COURSE
21 Computer
21 Math
34 Chemistry
74 Biology
59 Physics
STUDENT_HOBBY
STU_ID HOBBY
21 Dancing
21 Singing
34 Dancing
74 Cricket
59 Hockey
Join Dependency:
2.If the join of R1 and R2 over C is equal to relation R, then we can say
that a join dependency (JD) exists.
61
5.A JD ⋈ {R1, R2,..., Rn} is said to hold over a relation R if R1, R2,.....,
Rn is a lossless-join decomposition.
7.Here, *(R1, R2, R3) is used to indicate that relation R1, R2, R3 and so
on are a JD of R.
Example
SUBJECT LECTURER SEMESTER
Computer Anshika Semester 1
Computer John Semester 1
Math John Semester 1
Math Akash Semester 2
Chemistry Praveen Semester 1
In the above table, John takes both Computer and Math class for
Semester 1 but he doesn't take Math class for Semester 2. In this case,
combination of all these fields required to identify a valid data.
62
Suppose we add a new Semester as Semester 3 but do not know about
the subject and who will be taking that subject so we leave Lecturer and
Subject as NULL. But all three columns together acts as a primary key,
so we can't leave other two columns blank.
So to make the above table into 5NF, we can decompose it into three
relations P1, P2 & P3:
P1
SEMESTER SUBJECT
Semester 1 Computer
Semester 1 Math
Semester 1 Chemistry
Semester 2 Math
P2
SUBJECT LECTURER
Computer Anshika
Computer John
Math John
Math Akash
Chemistry Praveen
P3
SEMSTER LECTURER
Semester 1 Anshika
Semester 1 John
63
Semester 1 John
Semester 2 Akash
Semester 1 Praveen
64