0% found this document useful (0 votes)
6 views

DBMS Unit-3

Uploaded by

whtpepsi1999
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

DBMS Unit-3

Uploaded by

whtpepsi1999
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 28

Unit-3

Overview

Normalization is the process of organizing the data and the attributes of a database. It is
performed to reduce the data redundancy in a database and to ensure that data is stored logically.
Data redundancy in DBMS means having the same data but at multiple places. It is necessary to
remove data redundancy because it causes anomalies in a database which makes it very hard for
a database administrator to maintain it.

Why Do We Need Normalization?

As we have discussed above, normalization is used to reduce data redundancy. It provides a


method to remove the following anomalies from the database and bring it to a more consistent
state:

A database anomaly is a flaw in the database that occurs because of poor planning and
redundancy.

i)Insertion anomalies: This occurs when we are not able to insert data into a database because
some attributes may be missing at the time of insertion.

ii)Updation anomalies: This occurs when the same data items are repeated with the same values
and are not linked to each other.

iii)Deletion anomalies: This occurs when deleting one part of the data deletes the other
necessary information from the database.

Normal Forms
There are four types of normal forms that are usually used in relational databases as you can see
in the following figure:
1. 1NF: A relation is in 1NF if all its attributes have an atomic value.
2. 2NF: A relation is in 2NF if it is in 1NF and all non-key attributes are fully functional
dependent on the candidate key in DBMS.
3. 3NF: A relation is in 3NF if it is in 2NF and there is no transitive dependency.
4. BCNF: A relation is in BCNF if it is in 3NF and for every Functional Dependency, LHS
is the super key.

To understand the above-mentioned normal forms, we first need to have an understanding of the
functional dependencies.

Functional dependency is a relationship that exists between two sets of attributes of a relational
table where one set of attributes can determine the value of the other set of attributes. It is
denoted by X -> Y, where X is called a determinant and Y is called dependent.

There are various levels of normalizations. Let’s go through them one by one:

First Normal Form (1NF)


A relation is in 1NF if every attribute is a single-valued attribute or it does not contain any multi-
valued or composite attribute, i.e., every attribute is an atomic attribute. If there is a composite or
multi-valued attribute, it violates the 1NF. To solve this, we can create a new row for each of the
values of the multi-valued attribute to convert the table into the 1NF.

Let’s take an example of a relational table <EmployeeDetail> that contains the details of the
employees of the company.

<EmployeeDetail>

Employee Code Employee Name Employee Phone Number


101 John 98765623,998234123
101 John 89023467
102 Ryan 76213908
103 Stephanie 98132452

Here, the Employee Phone Number is a multi-valued attribute. So, this relation is not in 1NF.

To convert this table into 1NF, we make new rows with each Employee Phone Number as a new
row as shown below:

<EmployeeDetail>

Employee Code Employee Name Employee Phone Number


101 John 998234123
Employee Code Employee Name Employee Phone Number
101 John 98765623
101 John 89023467
102 Ryan 76213908
103 Stephanie 98132452

Second Normal Form (2NF)


The normalization of 1NF relations to 2NF involves the elimination of partial dependencies.
A partial dependency in DBMS exists when any non-prime attributes, i.e., an attribute not a part
of the candidate key, is not fully functionally dependent on one of the candidate keys.

For a relational table to be in second normal form, it must satisfy the following rules:

1. The table must be in first normal form.


2. It must not contain any partial dependency, i.e., all non-prime attributes are fully
functionally dependent on the primary key.

If a partial dependency exists, we can divide the table to remove the partially dependent
attributes and move them to some other table where they fit in well.

Let us take an example of the following <EmployeeProjectDetail> table to understand what is


partial dependency and how to normalize the table to the second normal form:

<EmployeeProjectDetail>

Employee Code Project ID Employee Name Project Name


101 P03 John Project103
101 P01 John Project101
102 P04 Ryan Project104
103 P02 Stephanie Project102

In the above table, the prime attributes of the table are Employee Code and Project ID. We have
partial dependencies in this table because Employee Name can be determined by Employee Code
and Project Name can be determined by Project ID. Thus, the above relational table violates the
rule of 2NF.

The prime attributes in DBMS are those which are part of one or more candidate keys.

To remove partial dependencies from this table and normalize it into second normal form, we
can decompose the <EmployeeProjectDetail> table into the following three tables:
<EmployeeDetail>

Employee Code Employee Name


101 John
101 John
102 Ryan
103 Stephanie

<EmployeeProject>

Employee Code Project ID


101 P03
101 P01
102 P04
103 P02

<ProjectDetail>

Project ID Project Name


P03 Project103
P01 Project101
P04 Project104
P02 Project102

Thus, we’ve converted the <EmployeeProjectDetail> table into 2NF by decomposing it into
<EmployeeDetail>, <ProjectDetail> and <EmployeeProject> tables. As you can see, the above
tables satisfy the following two rules of 2NF as they are in 1NF and every non-prime attribute is
fully dependent on the primary key.

The relations in 2NF are clearly less redundant than relations in 1NF. However, the decomposed
relations may still suffer from one or more anomalies due to the transitive dependency. We will
remove the transitive dependencies in the Third Normal Form.

Third Normal Form (3NF)

The normalization of 2NF relations to 3NF involves the elimination of transitive dependencies in
DBMS.

A functional dependency X -> Z is said to be transitive if the following three functional


dependencies hold:

 X -> Y
 Y does not -> X
 Y -> Z
For a relational table to be in third normal form, it must satisfy the following rules:

1. The table must be in the second normal form.


2. No non-prime attribute is transitively dependent on the primary key.
3. For each functional dependency X -> Z at least one of the following conditions hold:

 X is a super key of the table.


 Z is a prime attribute of the table.

If a transitive dependency exists, we can divide the table to remove the transitively dependent
attributes and place them to a new table along with a copy of the determinant.

Let us take an example of the following <EmployeeDetail> table to understand what is transitive
dependency and how to normalize the table to the third normal form:

<EmployeeDetail>

Employee Code Employee Name Employee Zipcode Employee City


101 John 110033 Model Town
101 John 110044 Badarpur
102 Ryan 110028 Naraina
103 Stephanie 110064 Hari Nagar

The above table is not in 3NF because it has Employee Code -> Employee City transitive
dependency because:

 Employee Code -> Employee Zipcode


 Employee Zipcode -> Employee City

Also, Employee Zipcode is not a super key and Employee City is not a prime attribute.

To remove transitive dependency from this table and normalize it into the third normal form, we
can decompose the <EmployeeDetail> table into the following two tables:

<EmployeeDetail>

Employee Code Employee Name Employee Zipcode


101 John 110033
101 John 110044
102 Ryan 110028
103 Stephanie 110064
<EmployeeLocation>

Employee Zipcode Employee City


110033 Model Town
110044 Badarpur
110028 Naraina
110064 Hari Nagar

Thus, we’ve converted the <EmployeeDetail> table into 3NF by decomposing it into
<EmployeeDetail> and <EmployeeLocation> tables as they are in 2NF and they don’t have any
transitive dependency.

The 2NF and 3NF impose some extra conditions on dependencies on candidate keys and remove
redundancy caused by that. However, there may still exist some dependencies that cause
redundancy in the database. These redundancies are removed by a more strict normal form
known as BCNF.

Boyce-Codd Normal Form (BCNF)


Boyce-Codd Normal Form(BCNF) is an advanced version of 3NF as it contains additional
constraints compared to 3NF.

For a relational table to be in Boyce-Codd normal form, it must satisfy the following rules:

1. The table must be in the third normal form.


2. For every non-trivial functional dependency X -> Y, X is the superkey of the table. That
means X cannot be a non-prime attribute if Y is a prime attribute.

A superkey is a set of one or more attributes that can uniquely identify a row in a database table.

Let us take an example of the following <EmployeeProjectLead> table to understand how to


normalize the table to the BCNF:

<EmployeeProjectLead>

Employee Code Project ID Project Leader


101 P03 Grey
101 P01 Christian
102 P04 Hudson
103 P02 Petro

The above table satisfies all the normal forms till 3NF, but it violates the rules of BCNF because
the candidate key of the above table is {Employee Code, Project ID}. For the non-trivial
functional dependency, Project Leader -> Project ID, Project ID is a prime attribute but Project
Leader is a non-prime attribute. This is not allowed in BCNF.
To convert the given table into BCNF, we decompose it into three tables:

<EmployeeProject>

Employee Code Project ID


101 P03
101 P01
102 P04
103 P02

<ProjectLead>

Project Leader Project ID


Grey P03
Christian P01
Hudson P04
Petro P02

Thus, we’ve converted the <EmployeeProjectLead> table into BCNF by decomposing it into
<EmployeeProject> and <ProjectLead> tables.

Conclusion
 Normal forms are a mechanism to remove redundancy and optimize database storage.
 In 1NF, we check for atomicity of the attributes of a relation.
 In 2NF, we check for partial dependencies in a relation.
 In 3NF, we check for transitive dependencies in a relation.
 In BCNF, we check for the superkeys in LHS of all functional dependencies.

Decomposition in DBMS

Decomposition in Database Management System is to break a relation into multiple


relations to bring it into an appropriate normal form. It helps to
remove redundancy, inconsistencies, and anomalies from a database. The decomposition of a
relation R in a relational schema is the process of replacing the original relation R with two or
more relations in a relational schema. Each of these relations contains a subset of the attributes
of R and together they include all attributes of R.
Rules for Decomposition
Whenever we decompose a relation, there are certain properties that must be satisfied to ensure
no information is lost while decomposing the relations. These properties are:

1. Lossless Join Decomposition.


2. Dependency Preserving.

Lossless Join Decomposition


A lossless Join decomposition ensures two things:

 No information is lost while decomposing from the original relation.


 If we join back the sub decomposed relations, the same relation that was decomposed is
obtained.

We can follow certain rules to ensure that the decomposition is a lossless join
decomposition Let’s say we have a relation R and we decomposed it into R1 and R2, then the
rules are:

1. The union of attributes of both the sub relations R1 and R2 must contain all the attributes
of original relation R.

R1 ∪ R2 = R

2. The intersection of attributes of both the sub relations R1 and R2 must not be null, i.e.,
there should be some attributes that are present in both R1 and R2.

R1 ∩ R2 ≠ ∅

3. The intersection of attributes of both the sub relations R1 and R2 must be the superkey of
R1 or R2, or both R1 and R2.

R1 ∩ R2 = Super key of R1 or R2

Let’s see an example of a lossless join decomposition. Suppose we have the following
relation EmployeeProjectDetail as:

<EmployeeProjectDetail>

Employee_Code Employee_Name Employee_Email Project_Name Project_ID


101 John [email protected] Project103 P03
101 John [email protected] Project101 P01
102 Ryan [email protected] Project102 P02
103 Stephanie [email protected] Project102 P02
Now, we decompose this relation into EmployeeProject and ProjectDetail relations as:

<EmployeeProject>

Employee_Code Project_ID Employee_Name Employee_Email


101 P03 John [email protected]
101 P01 John [email protected]
102 P04 Ryan [email protected]
103 P02 Stephanie [email protected]

The primary key of the above relation is {Employee_Code, Project_ID}.

<ProjectDetail>

Project_ID Project_Name
P03 Project103
P01 Project101
P04 Project104
P02 Project102

The primary key of the above relation is {Project_ID}.

Now, let’s see if this is a lossless join decomposition by evaluating the rules discussed
above:

Let’s first check the EmployeeProject ∪ ProjectDetail:

<EmployeeProject ∪ ProjectDetail>

Employee_Code Project_ID Employee_Name Employee_Email


101 P03 John [email protected]
101 P01 John [email protected]
102 P04 Ryan [email protected]
103 P02 Stephanie [email protected]

EmployeeProject ∪ ProjectDetail relation and it is the same as the original relation. So


As we can see all the attributes of EmployeeProject and ProjectDetail are in

the first condition holds.

Now let’s check the EmployeeProject ∩ ProjectDetail:

<EmployeeProject ∩ ProjectDetail>
Project_ID
P03
P01
P04
P02

As we can see this is not null, so the the second condition holds as well. Also
the EmployeeProject ∩ ProjectDetail = Project_Id. This is the super key of the
ProjectDetail relation, so the third condition holds as well.

Now, since all three conditions hold for our decomposition, this is a lossless join
decomposition.

Lossless vs Lossy Decomposition

In a lossy decomposition, one or more of these conditions would fail and we will not be
able to recover Complete information as present in the original relation. For example,
let's say we decompose our original relation EmployeeProjectDetail into
EmployeeProject and ProjectDetail relations as:

<EmployeeProject>

Employee_Code Employee_Name Employee_Email


101 John [email protected]
102 Ryan [email protected]
103 Stephanie [email protected]

The primary key of the above relation is {Employee_Code}.

<ProjectDetail>

Project_ID Project_Name
P03 Project103
P01 Project101
P04 Project104
P02 Project102

The primary key of the above relation is {Project_ID}.

Now, the intersection EmployeeProject ∩ ProjectDetail is null. Therefore there is no way


for us to map a project to its employees. Thus this is a lossy decomposition.
Dependency Preserving

The second property of lossless decomposition is dependency preservation which says


that after decomposing a relation R into R1 and R2, all dependencies of the original
relation R must be present either in R1 or R2 or they must be derivable using the
combination of functional dependencies present in R1 and R2.

Let’s understand this from the same example above:

<EmployeeProjectDetail>

Employee_Code Employee_Name Employee_Email Project_Name Project_ID


101 John [email protected] Project103 P03
101 John [email protected] Project101 P01
102 Ryan [email protected] Project104 P04
103 Stephanie [email protected] Project102 P02

In this relation we have the following FDs:

 Employee_Code -> {Employee_Name, Employee_Email}


 Project_ID - > Project_Name

Now, after decomposing the relation into EmployeeProject and ProjectDetail as:

<EmployeeProject>

Employee_Code Project_ID Employee_Name Employee_Email


101 P03 John [email protected]
101 P01 John [email protected]
102 P04 Ryan [email protected]
103 P02 Stephanie [email protected]

In this relation we have the following FDs:

 Employee_Code -> {Employee_Name, Employee_Email}

<ProjectDetail>

Project_ID Project_Name
P03 Project103
P01 Project101
P04 Project104
P02 Project102

In this relation we have the following FDs:


 Project_ID - > Project_Name

As we can see that all FDs in EmployeeProjectDetail are either part of the
EmployeeProject or the ProjectDetail, So this decomposition is dependency preserving.

Conclusion

 Decomposition is the process of breaking an original relation into multiple sub relations.
 Decomposition helps to remove anomalies, redundancy, and other problems in a DBMS.
 Decomposition can be lossy or lossless.
 An ideal decomposition should be lossless join decomposition and dependency
preserving.

What is NULL ?
In Structured Query Language Null Or NULL is a special type of marker which is used to tell us
about that a data value does not present in the database. In Structured Query Language (SQL)
Null is a predefined word which is used to identity this marker. It is very important to understand
that a NULL value is totally different than a zero value.

In other words we can say that a NULL attribute value is equivalent of nothing that means in
database there is an attribute that has a value which indicates nothing or Null, An attributes does
not exist or we can say that it is missing . In database a Null value in tables is that value in the
fields that appears to be blank. It is a field that has no value.

An Example to illustrate testing for NULL in SQL :


Suppose there is a table named as CUSTOMERS that having records as given below.

ID NAME AGE ADDRESS SALARY

1 RAJESH 45 INDORE 48000.00

2 ANURAG 40 UJJAIN 57000.00

MAYAN
3 38 BHOPAL 45000.00
K
4 GAURAV 23 PUNE 35000.00

5 DEEPAK 29 MUMBAI 28000.00

6 NAMAN 25 NOIDA

7 AYUSH 33 GWALIOR

Now we can use IS NOT NULL operator and write a query which is as following.

SQL> SELECT *

FROM CUSTOMERS

WHERE SALARY IS NOT NULL;

After execution this query would produce the following result-

I
NAME AGE ADDRESS SALARY
D

1 RAJESH 45 INDORE 48000.00

2 ANURAG 40 UJJAIN 57000.00

3 MAYANK 38 BHOPAL 45000.00

4 GAURAV 23 PUNE 35000.00

5 DEEPAK 29 MUMBAI 28000.00

Here we can see that in CUSTOMERS table , ID no. 6 and 7 which is named as NAMAN and
AYUSH and their salary column is empty and in other words it is Null . That’s why after query
execution it would produce a table where these two names NAMAN and AYUSH not present
because we use IS NOT NULL operator.

Now we can use IS NULL operator and write a query.

SQL> SELECT *

FROM CUSTOMERS

WHERE SALARY IS NULL;

After execution this query would produce the following results-

ID NAME AGE ADDRESS SALARY

NAMA
6 25 NOIDA
N

7 AYUSH 33 GWALIOR

Here we can that in CUSTOMERS table , ID no. 6 and 7 which is named as NAMAN and
AYUSH and their salary column is empty and in other words it is Null. That’s why after query
execution it would produce a table where these two names NAMAN and AYUSH not present
because we use IS NULL operator.

What is Dangling tuple problem?


In DBMS if there is a tuple that does not participate in a natural join we called it as dangling
tuple . It may gives indication consistency problem in the database.

Another definition of dangling problem tuple is that a tuple with a foreign key value that not
appear in the referenced relation is known as dangling tuple. In DBMS Referential integrity
constraints specify us exactly when dangling tuples indicate problem.

1. Problems with NULL Values and Dangling Tuples

We must carefully consider the problems associated with NULLs when designing a relational
database schema. There is no fully satisfactory relational design theory as yet that
includes NULL values. One problem occurs when some tuples have NULL values for attributes
that will be used to join individual relations in the decomposition. To illustrate this, consider the
database shown in Figure 16.2(a), where two relations EMPLOYEE and DEPARTMENT are
shown. The last two employee tuples— ‘Berger’ and ‘Benitez’—represent newly hired
employees who have not yet been assigned to a department (assume that this does not violate any
integrity constraints). Now suppose that we want to retrieve a list of (Ename, Dname) values for
all the employees. If we apply the NATURAL JOIN operation
on EMPLOYEE and DEPARTMENT (Figure 16.2(b)), the two aforementioned tuples
will not appear in the result. The OUTER JOIN operation, discussed in Chapter 6, can deal with
this problem. Recall that if we take the LEFT OUTER
JOIN of EMPLOYEE with DEPARTMENT, tuples in EMPLOYEE that have NULL for the join
attribute will still appear in the result, joined with an imaginary tuple in DEPARTMENT that
has NULLs for all its attribute values. Figure 16.2(c) shows the result.

In general, whenever a relational database schema is designed in which two or more relations are
interrelated via foreign keys, particular care must be devoted to watch-ing for
potential NULL values in foreign keys. This can cause unexpected loss of information in queries
that involve joins on that foreign key. Moreover, if NULLs occur in other attributes, such
as Salary, their effect on built-in functions such as SUM and AVERAGE must be carefully
evaluated.

A related problem is that of dangling tuples, which may occur if we carry a decomposition too
far. Suppose that we decompose the EMPLOYEE relation in Figure 16.2(a) further
into EMPLOYEE_1 and EMPLOYEE_2, shown in Figure 16.3(a) and 16.3(b). If we apply
the NATURAL JOIN operation to EMPLOYEE_1 and EMPLOYEE_2, we get the
original EMPLOYEE relation. However, we may use the alternative representation, shown in
Figure 16.3(c), where we do not include a tuple
in EMPLOYEE_3 if the employee has not been assigned a department (instead of including a
tuple with NULL for Dnum as in EMPLOYEE_2). If we use EMPLOYEE_3 instead
of EMPLOYEE_2 and apply a NATURAL JOIN on EMPLOYEE_1 and
EMPLOYEE_3, the tuples for Berger and Benitez will not appear in the result; these are
called dangling tuples in EMPLOYEE_1 because they are represented in only one of the two
relations that represent employees, and hence are lost if we apply an (INNER) JOIN operation.

The query optimizer (also known as the optimizer) is database software that identifies the most
efficient way (like by reducing time) for a SQL statement to access data

Introduction to Query Optimization in DBMS

The process of selecting an efficient execution plan for processing a query is known as query
optimization.

Following query parsing which is a process by which this decision making is done that for a
given query, calculating how many different ways there are in which the query can run, then the
parsed query is delivered to the query optimizer, which generates various execution plans to
analyze the parsed query and select the plan with the lowest estimated cost. The catalog manager
assists the optimizer in selecting the optimum plan to perform the query by generating the cost of
each plan.

Query optimization is used to access and modify the database in the most efficient way possible.
It is the art of obtaining necessary information in a predictable, reliable, and timely manner.
Query optimization is formally described as the process of transforming a query into an
equivalent form that may be evaluated more efficiently. The goal of query optimization is to find
an execution plan that reduces the time required to process a query. We must complete two
major tasks to attain this optimization target.

The first is to determine the optimal plan to access the database, and the second is to reduce the
time required to execute the query plan.

Purpose of the Query Optimizer in DBMS

The optimizer tries to come up with the best execution plan possible for a SQL statement.

Among all the candidate plans reviewed, the optimizer chooses the plan with the lowest cost.
The optimizer computes costs based on available facts. The cost computation takes into account
query execution factors such as I/O, CPU, and communication for a certain query in a given
context.

Sr. No Class Name Role

01 10 Shreya CR

02 10 Ritik

For example, there is a query that requests information about students who are in leadership
roles, such as being a class representative. If the optimizer statistics show that 50% of students
are in positions of leadership, the optimizer may decide that a full table search is the most
efficient. However, if data show that just a small number of students are in positions of
leadership, reading an index followed by table access by row id may be more efficient than a full
table scan.

Because the database has so many internal statistics and tools at its disposal, the optimizer is
frequently in a better position than the user to decide the best way to execute a statement. As a
result, the optimizer is used by all SQL statements.

Optimizer Components

The optimizer is made up of three parts: the transformer, the estimator, and the plan generator.
The figure below depicts those components.
 Query Transformer The query transformer determines whether it is advantageous to
rewrite the original SQL statement into a semantically equivalent SQL statement at a
lower cost for some statements.

When a plausible alternative exists, the database compares the costs of each alternative and
chooses the one with the lowest cost. The query transformer shown in the query below can be
taken as an example of how query optimization is done by transforming an OR-based input
query into a UNION ALL-based output query.

SELECT *

FROM sales

WHERE promo_id=12

OR prod_id=125;

The given query is transformed using query transformer

SELECT *

FROM sales
WHERE promo_id=124

UNION ALL

SELECT *

FROM sales

WHERE promo_id=12

AND LNNVL(prod_id=125);/*LNNVL provides a concise way to evaluate a condition when one


or both operands of the condition may be null. */

 Estimator

The estimator is the optimizer component that calculates the total cost of a given execution plan.

To determine the cost, the estimator employs three different methods:

 Selectivity: The query picks a percentage of the rows in the row set, with 0 indicating no
rows and 1 indicating all rows. Selectivity is determined by a query predicate, such
as WHERE the last name LIKE X%, or by a mix of predicates. As the selectivity value
approaches zero, a predicate gets more selective, and as the value nears one, it becomes
less selective (or more unselective).

For example, The row set can be a base table, a view, or the result of a join. The selectivity is
tied to a query predicate, such as last_name = 'Prakash', or a combination of predicates, such
as last_name = 'Prakash' AND job_id = 'SDE'.

 Cardinality: The cardinality of an execution plan is the number of rows returned by each
action. This input is shared by all cost functions and is essential for determining the best
strategy. Cardinality in DBMS can be calculated using DBMS STATS table statistics or
after taking into account the impact of predicates (filter, join, and so
on), DISTINCT or GROUP BY operations, and so on. In an execution plan, the Rows
column displays the estimated cardinality.

For example, if the optimizer estimates that a full table scan will yield 100 rows, then the
cardinality estimate for this operation is 100. The cardinality estimate appears in the execution
plan's Rows column.

 Cost: This metric represents the number of units of labor or resources used. The query
optimizer uses disc I/O, CPU utilization, and memory usage as units of effort. For
example, if the plan for query A has a lower cost than the plan for query B, then the
following outcomes are possible: A executes faster than B, A executes slower
than B or A executes in the same amount of time as B.
 Plan Generator

The plan generator investigates multiple plans for a query block by experimenting with various
access paths, join methods, and join orders.

Because of the different combinations that the database can utilize to generate the same outcome,
many plans are available. The plan with the lowest cost is chosen by the optimizer.

Automatic Tuning Optimizer

Depending on how it is invoked, the optimizer performs different actions.

The database offers the following optimization types:

 Normal optimization The optimizer parses the SQL and produces an execution plan. For
most SQL statements, the usual mode gives a reasonable plan. The optimizer when
operating under normal mode it has stringent time limits, usually a fraction of a second,
during which it must identify an optimal plan.
 SQL Tuning Advisor optimization The optimizer is known as Automatic Tuning
Optimizer when SQL Tuning Advisor invokes it by taking one or more SQL statements
as an input. In this situation, the optimizer conducts further analysis to improve the plan
generated in regular mode. The optimizer produces a set of activities, along with their
reasoning and predicted reward, to produce a considerably better plan.

Methods of Query Optimization in DBMS

There are two methods of query optimization. They are as follows.

Cost-Based Query Optimization in DBMS

Query optimization is the process of selecting the most efficient way to execute a SQL statement.
Because SQL is a nonprocedural language, the optimizer can merge, restructure, and process
data in any sequence.

The Optimizer allocates a cost in numerical form for each step of a feasible plan for a given
query and environment, and then discovers these values together to get a cost estimate for the
plan or possible strategy. The Optimizer aims to find the plan with the lowest cost estimate after
evaluating the costs of all feasible plans. As a result, the Optimizer is sometimes known as the
Cost-Based Optimizer.

 Execution Plans:

An execution plan specifies the best way to execute a SQL statement.

The plan describes the steps taken by Oracle Database to execute a SQL statement. Each step
physically retrieves or prepares rows of data from the database for the statement's user.

An execution plan shows the total cost of the plan, which is stated on line 0, as well as the cost of
each individual operation. A cost is an internal unit that appears solely in the execution plan to
allow for plan comparisons. As a result, the cost value cannot be fine-tuned or adjusted.

 Query Blocks The optimizer receives a parsed representation of a SQL statement as


input. Each SELECT block in the original SQL statement is internally represented by a
query block. A query block can be a statement at the top level, a subquery, or an
unmerged view. Let’s take an example where the SQL statement that follows is made up
of two query sections. The inner query block is the subquery in parentheses. The
remainder of the outer query block of the SQL statement obtains the names of employees
in the departments whose IDs were supplied by the subquery. The query form specifies
how query blocks are connected.

SELECT first_name, last_name

FROM hr.employees
WHERE department_id

IN (SELECT department_id

FROM hr.departments

WHERE location_id = 1800);

 Query Sub Plans

The optimizer creates a query sub-plan for each query block.

From the bottom up, the database optimizes query blocks separately. As a result, the database
optimizes the innermost query block first, generating a sub-plan for it, before generating the
outer query block, which represents the full query.

The number of query block plans is proportional to the number of items in the FROM clause. As
the number of objects rises, this number climbs exponentially. The possibilities for a join of five
tables, for example, are far higher than those for a connection of two tables.

 Analogy for the Optimizer

An online trip counselor is one analogy for the optimizer.

A biker wishes to find the most efficient bicycle path from point A to point B. A query is
analogous to the phrase "I need the quickest route from point A to point B" or "I need the
quickest route from point A to point B via point C". To choose the most efficient route, the trip
advisor employs an internal algorithm that takes into account factors such as speed and
difficulty. The biker can sway the trip advisor's judgment by saying things like "I want to arrive
as quickly as possible" or "I want the simplest route possible.”

In this example, an execution plan is a possible path generated by the travel advisor. Internally,
the advisor may divide the overall route into multiple subroutes (sub plans) and compute the
efficiency of each subroute separately. For example, the trip advisor may estimate one subroute
to take 15 minutes and be of medium difficulty, another subroute to take 22 minutes and be of
low difficulty, and so on.

Based on the user-specified goals and accessible facts about roads and traffic conditions, the
advisor selects the most efficient (lowest cost) overall route. The better the guidance, the more
accurate the statistics. For example, if the advisor is not kept up to date on traffic delays, road
closures, and poor road conditions, the proposed route may prove inefficient (high cost).

Adaptive Query Optimization in DBMS

Adaptive query optimization allows the optimizer to make run-time changes to execution plans
and uncover new information that can lead to improved statistics.
When existing facts are insufficient to produce an ideal strategy, adaptive optimization comes in
handy. The image below depicts the feature set for adaptive query optimization.

 Adaptive Query Plans The optimizer can defer the final plan decision for a statement
with an adaptive plan until execution time.

 Purpose of Adaptive Query Plans: The optimizer's ability to alter a plan based on
information gained during execution can significantly increase query performance

Because the optimizer occasionally chooses an inferior default plan due to a cardinality
misestimate, adaptive plans are important. The capacity to modify the plan during execution
based on actual execution statistics leads to a more optimal end plan. The optimizer uses the final
plan for further executions after selecting it, ensuring that the poor plan is not reused.

 How Adaptive Query Plans Work

An adaptive plan is made up of several predefined sub plans and an optimizer statistics collector.

A sub-plan is a section of a plan that the optimizer can use as an alternative during execution. A
nested loops join, for example, might be converted to a hash join during execution. An optimizer
statistics collector is a row source that is added at crucial points in a plan to collect run-time
statistics. These statistics assist the optimizer in making a final choice amongst numerous sub
plans.
During statement execution, the statistics collector collects execution information and buffers
some rows received by the sub-plan. The optimizer selects a sub-plan based on the information
collected by the collector. At this point, the collector stops collecting statistics and buffering
rows, and permits rows to pass through instead. On subsequent executions of the child cursor, the
optimizer continues to use the same plan unless the plan ages out of the cache, or a different
optimizer feature (for example, adaptive cursor sharing or statistics feedback) invalidates the
plan.

 Adaptive Query Plans: Parallel Distribution Methods

Parallel execution typically necessitates data redistribution in order to conduct operations like as
parallel sorts, aggregations, and joins.

Oracle Database supports a wide range of data dissemination mechanisms. The approach is
selected by the database based on the number of rows to be distributed and the number of
concurrent server processes involved in the operation. Consider the following potential
scenarios:

 Few rows are distributed by many concurrent server processes. The database has the
option of using the broadcast distribution method. Each row in the result set is received
by each simultaneous server process in this situation.

 Few parallel server processes disseminate a large number of rows. If a data skew is found
during the data redistribution, the statement's performance may suffer. To ensure that
each parallel server process receives an equal number of rows, the database is more likely
to use a hash distribution.

All these methods need not be always true. It also depends on the table size, column size, type of
selection, projection, join sort, constraints, indexes, statistics etc. The above optimization
describes the best way of optimizing the queries.

Conclusion
 Query Optimization in DBMS is the process of selecting the most efficient way to
execute a SQL statement. Because SQL is a nonprocedural language, the optimizer can
merge, restructure, and process data in any sequence.

 Query Optimization in DBMS has a significant application in database designing.

 There are two types of query optimization in DBMS: Cost-Based Optimization and
Adaptive Query Optimization.

For any given query, there may be a number of different ways to execute it. The process
of choosing a suitable one for processing a query is known as query optimization.

Forms

The two forms of query optimization are as follows −

 Heuristic optimization − Here the query execution is refined based on heuristic rules for
reordering the individual operations.

 Cost based optimization − the overall cost of executing the query is systematically
reduced by estimating the costs of executing several different execution plans.

Example

Select name from customer, account where customer.name=account.name and


account.balance>2000;

There are two evaluation plans −

 Πcustomer.name(σcustomer.name=account.name ^ account.balance>2000(customerXaccount)

 Πcustomer.name(σcustomer.name=account.name(customerXσ account.balance>2000(account)

Cost evaluator evaluates the cost of different evaluation plans and chooses the evaluation
plan with lowest cost. Disk access time, CPU time, number of operations, number of
tuples, size of tuples are considered for cost calculations.

Heuristic approach is also called rule-based optimization. There are three ways for
transforming relational-algebra queries are −

 Perform the SELECTION process foremost in the query. This should be the first action
for any SQL table. By doing so, we can decrease the number of records required in the
query, rather than using all the tables during the query.

 Perform all the projection as soon as achievable in the query. Somewhat like a selection
but this method helps in decreasing the number of columns in the query.
 Perform the most restrictive joins and selection operations. What this means is that select
only those set of tables and/or views which will result in a relatively lesser number of
records and are extremely necessary in the query. Obviously any query will execute
better when tables with few records are joined.

You might also like