0% found this document useful (0 votes)
1 views

Chapter 2-Query Processing_110554

Chapter Two discusses Query Processing and Optimization, outlining the steps involved in transforming high-level SQL queries into low-level queries, including parsing, optimization, and evaluation. It explains the importance of relational algebra operations such as selection, projection, union, and joins in query processing. The chapter also covers approaches to query optimization, emphasizing heuristic methods to improve query performance.

Uploaded by

chala
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1 views

Chapter 2-Query Processing_110554

Chapter Two discusses Query Processing and Optimization, outlining the steps involved in transforming high-level SQL queries into low-level queries, including parsing, optimization, and evaluation. It explains the importance of relational algebra operations such as selection, projection, union, and joins in query processing. The chapter also covers approaches to query optimization, emphasizing heuristic methods to improve query performance.

Uploaded by

chala
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 38

Chapter Two

Query Processing and


Optimization
Chala B. (MTech) CSE
Dept: Computer
Harambe University
Query Processing
• Query Processing: is a procedure of transforming high level query
(SQL) into low level Query.
• It is the activity performed in extracting data from the database. In
query processing the basic steps of Query processing is:
1. Parsing and Translation
2. Optimization
3. Evaluation.
Con’t…
Con’t…
1. Parser and translator:
•Check the syntax
•Check the schema elements
•Convert the query into relational algebera.
•Eg. SELECT empName
from employee
having name=‘”abc”
Having is invalid(where is correct )syntax error
•Assume emp.gmail is not available in the table;
Select emp.gmail
from employee.schema element error
Con’t…
2. Optimizer:
• Find the best way /the best plan to evaluate Relational algebra.
• Increase the performance of query by selecting the best query.
• Example: SQL:
SELECT book, title, price
FROM Book
WHERE price > 30
Book: title price A.Id Book
DB 10 12 Database

PR 40 13 Programming

Co 50 80 Computer Organization
Con’t…
3. Query evaluation plan:
• Query tree: is represent the relational algebraic expression in the
form of:
•Tree data structure
•Input as leaf node
•Operators as internal node
Con’t…

• Different evaluation plans for a given query can have different cost.it
is responsibility of query optimizer to generate least costly plan.
Con’t
4. Query evaluation engine:
• Evaluates the above plan and gets the results.
• Is directly accessing data warehouse.
Translating SQL queries into Relational
Algebra
Relational Algebra
• Is a procedural query language, which takes instance of relations as
input and yields instance of relation as output.
• It use operator to perform queries.
• The fundamental operation of relational algebra are:
- select - union
- project - set different
Select Operation (o)
• It selects tuples that satisfy the given predicate from a relation.
• Notation: where: o – selection predicate
p – prepositional logic
formula (condition)
r – for relation
• Example:
• Output: select tuples from books where subject is database.
Project Operation (π)
• It project columns that satisfy a given predicate .
• Notation:
• where: A1, A2,…. An. attribute names of relation r. duplicated rows
are automatically eliminated, as relation is a set.
• example: (π subject, Author (books))

• output: select and project columns named as subject and author from the relation
books.
Union Operator (u)
• It performs binary union between two given relations and is defined
as - r u s ={t\t € r or t € s}
• Notation: r u s r and s are relations.
• For a union operations to be valid, the following condition must be
hold.
• r and s must have the same number of attribute
• Attribute domains must be compatible
• Duplicate tuples are automatically eliminated.
π author (books) u π author (articles)

• Output: project the names of the authors who have either


written a book or an article or both.
Set difference (-)
• The result of set difference operation is tuples, which are present in
one relation but not in the second relation.
• Notation: r – s means: find all the tuples that are present in r but not
in s.
π author (books) --- π author (articles)
• Output: provides the name of authors who have written books but
not articles.
Cartesian Product (x)
• Combines information of two different relations into one.
• Notation: rxs means: r and s are relations.
rxs = {qt\q € r and t € s}
oauthor = ‘tutorialspoint’ (books x articles)

• Output: yields a relation, which shows all the books and articles written by tutorials
point.
Rename (p)
• The result of relational algebra are also relation but without any
name.
• The rename operation (p) allow us to rename the output relation.
• Notation: px(E) where: the result of expression E is saved with
name x.
Natural Join (⋈)
• Natural join can only be performed if there is a common attribute
(column) between the relations.
• The name and type of the attribute must be same.
• Example: consider the following two tables.
C D C ⋈D
num square num cube num square cube
2 4 2 8 2 4 8
3 9 3 27 3 9 27
Outer Join
• Along with tuples that satisfy the matching criteria, we also include
some or all tuples that do not match the criteria.

Left outer join (A ⟕ B)


• Operation allows keeping all tuple in the left relation.
• However, if there is no matching tuples is found in right relation, then
the attributes of right relation in the join result are filled with null
values.
Con’t…
• Example:
A B A⟕B
Num Square Num Cube Num Square Cube
2 4 2 8 2 4 8
3 9 3 27 3 9 27
4 16 5 75 4 16 --

Right outer join (A ⟖ B)


• Operation allows keeping all tuple in the right relation.
• However, if there is no matching tuples is found in left relation, then
the attributes of left relation in the join result are filled with null
values.
Con’t…
• A ⟖ B of the above example is:
Num Cube Square
2 8 4
3 27 9
5 75 --

Full outer Join (A ⟗ B)


• In a full outer join, all tuples from both relations are included in the
result.
Num Cube Square
A⟗B 2 8 4
3 27 9
4 --- 16
5 75 --
Con’t…
• Example: simple college admissions database:
College (cname, state, enrollment)
Student (SID, sname, GPA, sizeHS)
Apply (SID, cname, major, decision)
• Find the students with GPA >3.7
o (Student)
GPA = 3.7

• Find the student with GPA >3.7 and HS <1000


o (Student)
GPA = 3.7 ^ HS < 1000
Con’t…
• List of application major and decision
π major, decision (Apply)
major decision
apply

• Names and GPA of students with HS > 1000 who applied to cs and
were rejected.

π sname, GPA( ostudent.SID=apply.SID ^ sizeHS >1000 ^

major=‘cs’ ^ decision=‘’R’ (Student x Apply))


sname GPA
Example 2:
• Schema for SRS (student registration system)
Student (Id, Name,Address,Status)
Cource (DeptId, CrsCode,CrsName)
Teaching (profId,Crscode,Semester)
Department (DeptId, Name)
• Translate the following SQL to Relational Algebera.
1. select CrsName Solution:
π CrsName( oDeptId=‘CS’
from Cource
(Cource) )

where DeptId = ‘CS’


Con’t…
2. Select CrsName
from Course, Teaching
where C.CrsCode=T.CrsCode and T.Semester=‘52000’
Solution:

πCrsName(oC.CrsCode = T.CrsCode ^ T.Semester=’52000’ (Cource x


Teaching) ) OR

π CrsName (Course ⋈ oT.Semester=’52000’ (Teaching) )


OR

π CrsName ( oT.Semester=’52000’ (Cource


⋈ Teaching) )
Approaches to Query Optimization
Heuristics Approaches: This method is:
• also known as rule based optimization.
• based on the equivalence rule on relational expressions.
• creates relational tree for the given query based on the equivalence
rules.
• The most important set of rules followed in this method is listed
bellow.
Con’t...
a) Perform all the selection operation as early as possible. that will
reduce the cardinality (number of tuples) of the relation.
b) Perform projection as early as possible. that will reduce the degree
(number of attributes) of the relation.
c) Select and join operations with most restrictive conditions resulting
with smallest absolute size should be executed before other similar
operations.
• This is achieved by reordering the nodes with join.
Example:
• Consider the following schema and the query where the employee
and the project relations are related by the works on relation.
Employee (EEmpID, Fname, Lname, Salary, Dept, sex, DoB)
Project (PProjID, PName,Plocation, Pfound, PManagerID)
Works_ON (WEmpID, WProjID)

employee identification project identification

foreign keys
Query:
• The manager of the company working on road construction would like
to view employees name born before January 1 1965 who are
working on the project name ring road.
• Solution: relational algebra representation of the query is:

πFName, LName(oDOB<Jan 1 1965 ^ WEmpID=EEmpID ^


(Employee x Works_ON x
PProjID = WProjID ^ PName=Ring Road
Project) )
Con’t…
• The SQL equivalence for the above query will be:
select FName, LName
from Employee, Works-ON, Project
where DoB < Jan 1 1965 ^ EEmpID = WEmpID ^
WProjID=PProjID ^ PName= “Ring Road”
The initial query tree
Con’t…
 By applying the first step (cascading the selection) we
will come up with the following structure.

(DoB<Jan 1 1965)
((WEmpID=EEmpID)((PProjID=WProjID)
((PName=’Ring Road’) (EMPLOYEE X WORKS_ON X
PROJECT ) ) )

 By applying the second step it can be seen that some


conditions have attribute that belong to a single
relation (DoB belongs to EMPLOYEE and PName belongs
to PROJECT) thus the selection operation can be
commuted with Cartesian Operation.
Con’t…
• Then, since the condition WEmpID=EEmpID base the
employee and WORKS_ON relation the selection with
this condition can be cascaded.

((PProjID=WProjID) ((PName=’RingRoad’) PROJECT) X((WEmpID=EEmpID)


(WORKS_ON X ((DoB<Jan 1 1965) EMPLOYEE))))
Con’t…  < FName, LName >

• The query tree after this modification will be:


 ( PProjID=WProjID )

 (WEmp ID=EEmpID )
 ( PName=’Ring Road’ )

PROJECT
X

 ( DoB<Jan 1 1965 ) WORKS_ON

EMPLOYEE
Con’t…
• Using the third step, perform most restrictive operations first.
• From the query given we can see that selection on PROJECT is most
restrictive than selection on EMPLOYEE.
• Thus, it is better to perform selection on PROJECT BEFORE on
EMPLOYEE. rearrange the nodes to achieve this.
 <FName, LName
>

 ( WEmpID=EEmpID
)

 ( PProjID=WProjID
)
 (DoB<Jan1 1965
)

EMPLOY EE
X

 (PName=’Ring
Road’) WORKS_ON

PROJECT
• Using the forth step, Perform Cartesian Operations with the
subsequent Selection Operation.  < FName, LName >

( WEmpID=EE mpID )

 (DoB<Jan 1 1965)

( PProjID=WProjID )

EMPLOY EE

 (PName=’Ring Road ‘)
WORKS_ON

PROJECT
• Using the fifth step, Perform the projection as early as possible.
 <FName, LName>

( WEmpID=EEmpID )

 < WEmpID >

( PProjID=WProjID )
 <FName,LName,EEmpID>

WORKS_ON
 (DoB<Jan 1 1965)

 <PProjID > EMPLOY EE

 (PName=’Ring Road’)

PROJECT
Reading Assignment
• Cost Estimation Approach to Query Optimization
END!

You might also like