0% found this document useful (0 votes)
144 views

L15-16 (Query Decomposition) PDF

The document discusses various steps in query decomposition including normalization, analysis, elimination of redundancy, and rewriting the query in relational algebra. It provides examples of transforming SQL queries into conjunctive normal form and constructing query graphs and operator trees to represent the relational algebra operations. The goal of query decomposition is to transform the input query, which refers to global relations, into an equivalent relational algebra expression for further query processing.

Uploaded by

9545417941
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
144 views

L15-16 (Query Decomposition) PDF

The document discusses various steps in query decomposition including normalization, analysis, elimination of redundancy, and rewriting the query in relational algebra. It provides examples of transforming SQL queries into conjunctive normal form and constructing query graphs and operator trees to represent the relational algebra operations. The goal of query decomposition is to transform the input query, which refers to global relations, into an equivalent relational algebra expression for further query processing.

Uploaded by

9545417941
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 57

BITS Pilani

Dr. Manik Gupta


BITS Pilani Assistant Professor
Department of CSIS
Hyderabad Campus
BITS Pilani
Hyderabad Campus

Distributed Data Systems(CS G544)


Lecture 15-16
Wednesday, 23rd September 2020
Lecture Recap

• Overview of Query Processing

BITS Pilani, Hyderabad Campus


Introduction

• Query decomposition is the first phase of query


processing that transforms a relational calculus query
into a relational algebra query.
• Both input and output queries refer to global relations,
without the knowledge of distribution of data.
• Hence query decomposition is same for centralized and
distributed systems.
• We assume that the input query is syntactically correct.

BITS Pilani, Hyderabad Campus


Query decomposition
• The successive steps of Query decomposition are:
–Normalization
–Analysis
–Elimination of redundancy
–Rewriting

BITS Pilani, Hyderabad Campus


Normalization
• The input query may be arbitrarily complex depending on
the facilities provided by the language.
• It is the goal of normalization to transform the query to a
normalized form to facilitate further processing.
• In SQL the most important transformation is that of query
qualification (the WHERE clause), which may be an
arbitrarily complex, quantifier-free predicate, preceded
by all necessary quantifiers.

BITS Pilani, Hyderabad Campus


Normal forms
– The conjunctive normal form is the conjunction
(Ʌ predicate) of disjunctions (V predicate).

(p11 V p12 V p13 V p1n ) Ʌ… Ʌ (pm1 V pm2 V pm3 V pmn )

– The disjunctive normal form is the disjunction


(V predicate) of conjunctions (Ʌ predicate)

(p11 Ʌ p12 Ʌ p13 Ʌ p1n ) V… V (pm1 Ʌ pm2 Ʌ pm3 Ʌ pmn )

BITS Pilani, Hyderabad Campus


Normalization
– The transformation of the quantifier-free predicate is
straightforward using the well-known equivalence rules for
logical operations.

BITS Pilani, Hyderabad Campus


Example
EMP(ENO, ENAME, TITLE)
PROJ(PNO, PNAME, BUDGET)
ASG(ENO, PNO, DUR,RESP)
PAY(TITLE, SAL)

• Find the names of employees who have been working on P1 for


12 or 24 months

SELECT ENAME
FROM EMP, ASG
WHERE EMP.ENO = ASG.ENO
AND ASG.PNO = "P1"
AND DUR = 12 OR DUR = 24

BITS Pilani, Hyderabad Campus


Example
Qualification in conjunctive NF is
EMP.ENO = ASG.ENO ∧ ASG.PNO = “P1” ∧ (DUR = 12 ∨ DUR = 24)

Qualification in disjunctive NF is
(EMP.ENO = ASG.ENO ∧ ASG.PNO = “P1” ∧ DUR = 12) ∨
(EMP.ENO = ASG.ENO ∧ ASG.PNO = “P1” ∧ DUR = 24)

Which one leads to redundant work?


The latter form lead to redundant work because common
sub expressions are not eliminated

BITS Pilani, Hyderabad Campus


Analysis
– Query analysis enables rejection of normalized queries
for which further processing is impossible or not
necessary.
– The main reasons for rejection are:
• Type incorrectness:
– A query is type incorrect if any of its attribute or relation names are
not defined in the global schema.
– Or if the operations are being applied to attributes of wrong type.
• Semantic incorrectness:
– If components of query do not contribute to the results.
– It is not possible to determine semantic correctness of general
queries

BITS Pilani, Hyderabad Campus


Analysis

• The following SQL query on the engineering database is


type incorrect for two reasons.
SELECT E#
FROM EMP
WHERE ENAME > 200

• First, attribute E# is not declared in the schema.


• Second, the operation “>200” is incompatible with the type string
of ENAME.

BITS Pilani, Hyderabad Campus


Query graph

• Query graph or connection graph can be used to determine the


semantic correctness.
• In a query graph, one node indicates the result relation, and any
other node indicates an operand relation.
• An edge between two nodes one of which does not correspond to
the result represents a join, whereas an edge whose destination
node is the result represents a project.
• Furthermore, a non-result node may be labeled by a select or a self-
join (join of the relation with itself) predicate.
• An important subgraph of the query graph is the join graph, in
which only the joins are considered. The join graph is particularly
useful in the query optimization phase.

BITS Pilani, Hyderabad Campus


Example

• “Find the names and responsibilities of programmers who have


been working on the CAD/CAM project for more than 3 years.”
• The query expressed in SQL is

SELECT ENAME, RESP


FROM EMP, ASG, PROJ
WHERE EMP.ENO = ASG.ENO
AND ASG.PNO = PROJ.PNO
AND PNAME = "CAD/CAM”
AND DUR > 36
AND TITLE = "Programmer”

BITS Pilani, Hyderabad Campus


How is query graph useful?

• The query graph is useful to determine the semantic


correctness of a conjunctive multivariable query without
negation.
• Such a query is semantically incorrect if its query
graph is not connected. In this case one or more
subgraphs (corresponding to subqueries) are
disconnected from the graph that contains the result
relation.

BITS Pilani, Hyderabad Campus


Disconnected Query graph

Let us consider the following SQL query:

SELECT ENAME, RESP


FROM
EMP, ASG, PROJ
WHERE EMP.ENO= ASG.ENO
AND PNAME = "CAD/CAM"
AND DUR>36
AND TITLE = "Programmer"

Problem is that join predicates are missing and the query should be
rejected.

BITS Pilani, Hyderabad Campus


Elimination of redundancy
• Query qualification may contain redundant predicates.
• Redundancy can be eliminated by applying idempotency
rules.

BITS Pilani, Hyderabad Campus


Example

SELECT TITLE
FROM EMP
WHERE (NOT (TITLE = "Programmer") AND (TITLE = "Programmer"
OR TITLE = "Elect. Eng.")
AND NOT (TITLE = "Elect. Eng.")) OR ENAME = "J. Doe”
Can be written as
SELECT TITLE
FROM EMP
WHERE ENAME = "J. Doe"

Let p1 be TITLE = “Programmer”,


p2 be TITLE = “Elect. Eng.”, and
p3 be ENAME = “J. Doe”.
The query qualification is (¬p1 ∧(p1 ∨ p2)∧¬p2)∨ p3

BITS Pilani, Hyderabad Campus


Example

The query qualification is (¬p1 ∧(p1 ∨ p2)∧¬p2)∨ p3

BITS Pilani, Hyderabad Campus


Rewriting
– This step of decomposition rewrites the query in
relational algebra.

BITS Pilani, Hyderabad Campus


Operator Tree

• It is customary to represent the relational algebra query


graphically by an operator tree.
• An operator tree is a tree in which a leaf node is a
relation stored in the database, and a non-leaf node is
an intermediate relation produced by a relational algebra
operator.
• The sequence of operations is directed from the leaves
to the root, which represents the answer to the query.

BITS Pilani, Hyderabad Campus


Example

• “Find the names of employees other than J. Doe who


worked on the CAD/CAM project for either one or two
years” whose SQL expression is

SELECT ENAME
FROM PROJ, ASG, EMP
WHERE ASG.ENO = EMP.ENO
AND ASG.PNO = PROJ.PNO
AND ENAME != "J. Doe"
AND PROJ.PNAME = "CAD/CAM" AND (DUR=12 OR DUR=24)

BITS Pilani, Hyderabad Campus


Example of Operator Tree

• The transformation of a tuple relational calculus


query into an operator tree can easily be
achieved as follows.
• First, a different leaf is created for each
different tuple variable (corresponding to a
relation). In SQL, the leaves are immediately
available in the FROM clause.
• Second, the root node is created as a
project operation involving the result
attributes. These are found in the SELECT
clause in SQL.
• Third, the qualification (SQL WHERE
clause) is translated into the appropriate
sequence of relational operations (select,
join, union, etc.) going from the leaves to the
root. The sequence can be given directly by
the order of appearance of the predicates
and operators.

BITS Pilani, Hyderabad Campus


Transformation rules

– Application of six rules enable generation of many


equivalent trees
• Commutativity of binary operators
• Associativity of binary operators
• Idempotence of unary operators
• Commuting selection with projection
• Commuting selection with binary operators
• Commuting projection with binary operators

Exercise: To study the expressions from textbook

BITS Pilani, Hyderabad Campus


Equivalent Operator Tree

Is this tree good?

BITS Pilani, Hyderabad Campus


Transformation rules

• The application of the six rules enables the generation


of many equivalent trees and can be used restructure
the tree in a systematic way so that the “bad” operator
trees are eliminated.
• First, they allow the separation of the unary operations, simplifying the query
expression.
• Second, unary operations on the same relation may be grouped so that
access to a relation for performing unary operations can be done only once.
• Third, unary operations can be commuted with binary operations so that
some operations (e.g., selection) may be done first.
• Fourth, the binary operations can be ordered.

BITS Pilani, Hyderabad Campus


Rewritten Operator Tree

Why is tree good?

BITS Pilani, Hyderabad Campus


Localization of distributed
data
• So far, in the query decomposition, we have not considered
the data distribution into account. This is the role of
localization layer.
• The localization layer translates an relational algebra query
on global relations into relational algebraic query expressed
on physical fragments.
• Localization uses the information stored in fragment schema.

BITS Pilani, Hyderabad Campus


Localization of distributed
data
• Input: Relational algebra expression on global,
distributed relations (distributed query)
• Determine which fragments are involved in a query (over
global, distributed relations) and transform such a query
into an equivalent one over such fragments (localized
query)

BITS Pilani, Hyderabad Campus


Localization of distributed
data
• Recall that fragmentation is obtained by several
application of rules expressed by relational algebra
➡ primary horizontal fragmentation: selection σ
➡ derived horizontal fragmentation: semijoin ⋉
➡ vertical fragmentation: projection Π

• Reconstruction (reverse fragmentation) rules are also


expressed in relational algebra
➡ horizontal fragmentation: union ∪
➡ vertical fragmentation: join ⋈

BITS Pilani, Hyderabad Campus


Localization of distributed
data
• A naïve way of localizing a distributed query is to generate a
query where each global relation is substituted by it
localization program.
• This can be viewed as replacing the leaves of the operator
tree of the distributed query with subtrees corresponding to
the localization programs to obtain the localized query.

BITS Pilani, Hyderabad Campus


Localization of distributed
data
• Localization program: relational algebra expression that
reconstructs a global relation from its fragments, by
reverting the rules employed for fragmentation
• A localized query is obtained from distributed, global
query by replacing leaves (global relations) with (the tree
of) its corresponding localization program
➡ Leaves of localized queries are fragments

BITS Pilani, Hyderabad Campus


Example

BITS Pilani, Hyderabad Campus


Provides Parallellism

BITS Pilani, Hyderabad Campus


Eliminates Unnecessary Work

BITS Pilani, Hyderabad Campus


Reduction for primary
horizontal fragmentation
– The horizontal fragmentation function distributes a relation based on
selection predicates.
EMP(eno, ename, title)
EMP1 = σeno≤”E3”(EMP)
EMP2 = σ ”E3”< eno ≥”E6”(EMP)
EMP3 = σ eno>”E6”(EMP)

– The localization program for an horizontally fragmented relation is the


union of the fragments.
EMP= EMP1 U EMP2 U EMP3

– Thus the localized form of any query specified on EMP is obtained by


replacing it by
EMP1 U EMP2 U EMP3

BITS Pilani, Hyderabad Campus


Reduction for primary
horizontal fragmentation
• The reduction of queries on horizontally fragmented
relations consists primarily of determining, after
restructuring the subtrees, those that will produce empty
relations, and removing them.
• Horizontal fragmentation can be exploited to simplify
both selection and join operations.

BITS Pilani, Hyderabad Campus


Reduction for PHF with
selection

BITS Pilani, Hyderabad Campus


Example - Reduction for PHF
with selection
Select * from EMP where eno=“E5”;
• Applying the naive approach to localize EMP from EMP1, EMP2, and EMP3
gives the localized query

• By commuting the selection with the union operation, it is easy to detect


that the selection predicate contradicts the predicates of EMP1 and EMP3,
thereby producing empty relations. The reduced query is simply applied to
EMP2

BITS Pilani, Hyderabad Campus


Reduction for PHF with Join

• Joins on horizontally fragmented relations can be


simplified when the joined relations are fragmented
according to the join attribute.
• The simplification consists of distributing joins over
unions and eliminating useless joins.

BITS Pilani, Hyderabad Campus


Reduction for PHF with Join

BITS Pilani, Hyderabad Campus


Example - Reduction for PHF
with join
Select * from EMP, ASG where EMP.eno=ASG.eno;
ASG1= σeno < “E3” (ASG)
ASG2= σeno > “E3” (ASG)

– EMP1 and ASG1 are defined by the same predicate.


– Furthermore, the predicate defining ASG2 is the union of
the predicates defining EMP2 and EMP3.

BITS Pilani, Hyderabad Campus


Example - Reduction for PHF
with join
• The equivalent localized query is shown below.
• The query is reduced by distributing joins over unions
and implemented as a union of three partial joins that
can be done in parallel.

BITS Pilani, Hyderabad Campus


Reduction for vertical
fragmentation
• The vertical fragmentation function distributes a relation
based on projection predicates.
• Since the reconstruction operator for vertical
fragmentation is the join, the localization program for a
vertically fragmented relation consists of the join of the
fragments on the common attribute.
• Example
EMP(eno, ename, title)
EMP1 = Πeno, ename(EMP)
EMP2 = Πeno, title(EMP)

Localization program is
EMP= EMP1⋈eno EMP2
BITS Pilani, Hyderabad Campus
Reduction for vertical
fragmentation
• Similar to horizontal fragmentation, queries on vertical
fragments can be reduced by determining the useless
intermediate relations and removing the subtrees
that produce them.
• Projections on a vertical fragment that has no attributes
in common with the projection attributes (except the key
of the relation) can produce useless, though not empty
relations.

BITS Pilani, Hyderabad Campus


Example
Select ename from EMP;

• The equivalent localized query on EMP1 and EMP2 is shown


below.
• By commuting the projection with the join (i.e., projecting on ENO,
ENAME), we can see that the projection on EMP2 is useless
because ENAME is not in EMP2. Therefore, the projection needs
to apply only to EMP1

BITS Pilani, Hyderabad Campus


Reduction for Derived
Fragmentation

BITS Pilani, Hyderabad Campus


Reduction for Derived
Fragmentation
• Given a one-to-many relationship from EMP to ASG, relation ASG(ENO,
PNO, RESP, DUR) can be indirectly fragmented according to the following
rules:
ASG1= ASG ⋉eno(EMP1)
ASG2= ASG ⋉eno(EMP2)

EMP1 = σtitle=“programmer”(EMP)
EMP2 = σtitle≠“programmer”(EMP)

• The localization program for a horizontally fragmented relation is the union of


the fragments.
ASG=ASG1 U ASG2

BITS Pilani, Hyderabad Campus


Example - Reduction for
Derived Fragmentation
Select *
From ASG, EMP
WHERE ASG.eno=EMP.eno
AND title=“MechEngr”;

BITS Pilani, Hyderabad Campus


Example - Reduction for
Derived Fragmentation
• The localized query on fragments EMP1, EMP2, ASG1, and ASG2

• By pushing selection down to fragments EMP1 and EMP2, the query


reduces. This is because the selection predicate conflicts with that of EMP1,
and thus EMP1 can be removed.

BITS Pilani, Hyderabad Campus


Example - Reduction for
Derived Fragmentation
• In order to discover conflicting join predicates, distribute joins over unions.

• The left subtree joins two fragments, ASG1 and EMP2, whose qualifications
conflict because of predicates TITLE = “Programmer” in ASG1, and TITLE
≠“Programmer” in EMP2. Therefore the left subtree which produces an
empty relation can be removed, and the reduced query is obtained.

BITS Pilani, Hyderabad Campus


Reduction for Hybrid
Fragmentation
• The goal of hybrid fragmentation is to support, efficiently,
queries involving projection, selection, and join. The
localization program for a hybrid fragmented relation
uses unions and joins of fragments.
• Here is an example of hybrid fragmentation of relation
EMP:

• The localization program is

BITS Pilani, Hyderabad Campus


Example - Reduction for
Hybrid Fragmentation
• Queries on hybrid fragments can be reduced by
combining the rules used, respectively, in primary
horizontal, vertical, and derived horizontal fragmentation.
• These rules can be summarized as follows:
• Remove empty relations generated by contradicting selections on horizontal
fragments.
• Remove useless relations generated by projections on vertical fragments.
• Distribute joins over unions in order to isolate and remove useless joins.

BITS Pilani, Hyderabad Campus


Example - Reduction for
Hybrid Fragmentation
• Following example query in SQL illustrates the
application of rules (1) and (2) to the horizontal-vertical
fragmentation of relation EMP into EMP1, EMP2 and
EMP3 :
SELECT ENAME
FROM EMP
WHERE ENO="E5"

BITS Pilani, Hyderabad Campus


Example Reduction for Hybrid
Fragmentation
• The localized query can be reduced by first pushing
selection down, eliminating fragment EMP1, and then
pushing projection down, eliminating fragment EMP3.

BITS Pilani, Hyderabad Campus


Lecture Summary

• Query Decomposition
• Data Localization

BITS Pilani, Hyderabad Campus


Thanks…

• Next Lecture
Query Optimization

• Questions??

BITS Pilani, Hyderabad Campus

You might also like