0% found this document useful (0 votes)
7 views

Lecture 6

Uploaded by

ffff
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

Lecture 6

Uploaded by

ffff
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 48

Lecture 6

Relational Algebra & calculus in DB


Relational Algebra
Relational algebra presents the basic set of operations for
relational model.

It is a procedural language, which describes the procedure to


obtain the result. (provide theoretical foundation for db)

 The sequence of operations in relation algebra is called relational algebra


expression.

The relational algebra and calculus were developed before the SQL language.
Preliminaries
A query is applied to relation instances, and the result of a query
is also a relation instance.
◦ Schemas of input relations for a query are fixed.
◦ The schema for the result of a given query is also fixed!
Determined by definition of query language constructs.
Relational Algebra
Basic operations:
◦ Selection ( ) Selects a subset of rows from relation.
◦ 
Projection ( ) Deletes unwanted columns from relation.
◦ 
Cross-product ( ) Allows us to combine two relations.


Set-difference ( ) Tuples in reln. 1, but not in reln. 2.
◦ Union ( ) Tuples in reln. 1 and in reln. 2.

Additional operations:
◦ Intersection, join, division, renaming

Since each operation returns a relation, operations can be composed! (Algebra is


“closed”: value with in the set.)
Selection and Projection are called unary relational
operations.

Set-difference , Union , Cartesian product and Intersection


are called Relational Algebra Operations From Set Theory.

Join and division are called binary relational operations.


R1 S1
ExampleInstances: “Sailors”and“Reserves”relationsforourexamples.

sid bid day sid sname rating age


22 101 10/10/96 22 dustin 7 45.0
58 103 11/12/96 31 lubber 8 55.5
58 rusty 10 35.0
B1 S2
bid bname color sid sname rating age
101 Interlake blue
28 yuppy 9 35.0
102 Interlake red
103 Clipper green
31 lubber 8 55.5
104 Marine red 44 guppy 5 35.0
58 rusty 10 35.0
sid sname rating age
Selection 28 yuppy 9 35.0
58 rusty 10 35.0
Selects rows that satisfy
selection condition.  rating 8(S2)
No duplicates in result!
Schema of result identical to
schema of (only) input
relation.
sname rating
Result relation can be the
input for another relational
yuppy 9
algebra operation! (Operator rusty 10
composition.)
σ<selection condition>(R)  sname,rating( rating 8(S2))
sname rating
Projection yuppy
lubber
9
8
Deletes attributes that are not in projection list. guppy 5
Schema of result contains exactly the fields in rusty 10
the projection list, with the same names that
they had in the (only) input relation.  sname,rating(S2)
Projection operator has to eliminate duplicates!
π<attribute list>(R)
age
35.0
55.5
 age(S2)
Cascade and equivalent expressions
UNION INTERSECTION
Union, Intersection, Set-Difference
sid sname rating age
All of these operations take two 22 dustin 7 45.0
input relations, which must be 31 lubber 8 55.5
union-compatible: 58 rusty 10 35.0
◦ Same number of fields.

44 guppy 5 35.0
`Corresponding’ fields have the same type.
28 yuppy 9 35.0
S1 S2

sid sname rating age


sid sname rating age 31 lubber 8 55.5
22 dustin 7 45.0 58 rusty 10 35.0
S1 S2 S1 S2
Cross-Product
Each row of S1 is paired with each row of R1.
Result schema has one field per field of S1 and R1, with field
names `inherited’ if possible.
 Conflict: Both S1 and R1 have a field called sid.
(sid) sname rating age (sid) bid day
22 dustin 7 45.0 22 101 10/10/96
22 dustin 7 45.0 58 103 11/12/96
31 lubber 8 55.5 22 101 10/10/96
31 lubber 8 55.5 58 103 11/12/96
58 rusty 10 35.0 22 101 10/10/96
58 rusty 10 35.0 58 103 11/12/96

 Renaming operator:  (C(1 sid1, 5  sid 2), S1 R1)


Join
Joins
Condition Join:
(sid) sname rating age (sid) bid day
22 dustin 7 45.0 58 103 11/12/96
31 lubber 8 55.5 58 103 11/12/96
S1  R1
S1. sid  R1. sid

Result schema same as that of cross-product.


Fewer tuples than cross-product, might be able to compute more
efficiently Sometimes called a theta-join.
Joins
Equi-Join: A special case of condition join where the condition c
contains only equalities.

sid sname rating age bid day


22 dustin 7 45.0 101 10/10/96
58 rusty 10 35.0 103 11/12/96
S1  R1
sid

Result schema similar to cross-product, but only one copy of fields


for which equality is specified.
Natural Join: Equijoin on all common fields.
Division
For a tuple t to appear in the result T of the DIVISION, the values in t must appear in
R in combination with every tuple in S.

sno pno pno pno pno


s1 p1 p2 p2 p1
s1 p2 B1 p4 p2
s1 p3 p4
s1 p4
B2
B3
s2 p1 sno
s2 p2 s1
s3 p2 s2 sno
s4 p2 s3 s1 sno
s4 p4 s4 s4 s1
A/B1 A/B2 A/B3
A
R/S SSN_PNOS by SMITH_PNOS
More Examples
π Sex, Salary (EMPLOYEE) is equivalent to SQL SELECT DISTINCT Sex, Salary FROM
EMPLOYEE

π Fname, Lname, Salary(σ Dno=5 (EMPLOYEE)) Using renaming operator


DEP5_EMPS ← σ Dno=5 (EMPLOYEE)
RESULT ← π Fname, Lname, Salary (DEP5_EMPS)
Relational Calculus
Comes in two flavors: Tuple relational calculus (TRC) and Domain relational
calculus (DRC).
Calculus has variables, constants, comparison ops, logical connectives and
quantifiers.
TRC: Variables range over (i.e., get bound to) tuples.
DRC: Variables range over domain elements (= field values).
we may use
Logical operators:- AND (∧) , OR (∨) NOT(¬)
Quantifiers:- ∀, for all (every )
∃, which means there exist(at least one)
Tuple relational calculus
In tuple relational calculus, we work on filtering tuples based on the given
condition.
Syntax: { t | Condition(t) }
For example if our table is Student, we would put it as Student(t)
Then comes the condition part, to specify a condition applicable for a particular
attribute(column), we can use the . dot variable with the tuple variable to specify it
Eg. if we want to get data for students with age greater than 17, then, we can write
it as, t.age > 17, where t is our tuple variable.
{ t.name | Student(t) AND t.age > 17}
Eg.2 {t.Fname, t.Lname | EMPLOYEE(t) AND t.Salary>50000}}
Tuple Relational Calculus (TRC)

First_Name Last_Name Age


Rachel Eric 30
Samantha Eric 31
Roy Johnson 27
Carl Pratap 28
A. Query to display the last name of those students where age is greater than 30 will be
{ t.Last_Name | Student(t) AND t.age > 30 } : Eric
B. Query to display all the details of students where Last name is ‘Eric’
{ t | Student(t) AND t.Last_Name = ‘Eric' } :
Domain Relational Calculus (DRC)

In domain relational calculus, filtering is done based on the domain of the attributes
and not based on the tuple values.
Syntax: { c1, c2, c3, ..., cn | F cond(c1, c2, c3, ... ,cn)}
we must have n of these domain variables—one for each attribute.
where, c1, c2... etc represents domain of attributes(columns) and F defines the formula
including the condition for fetching the data.
For example,
{< name, age > | ∈ Student ∧ age > 17}
The above query will return the names and ages of the students in the table Student
who are older than 17.
DRC Example
Consider two relations
◦ EMP(Name, MGR, DEPT, SAL)
◦ CHILDREN(Ename, Cname, Age)
Q1: Retrieve Salary and Children’s name of Employees whose manager is
‘white’
{s,c|(u)(v)(w)(x)(y)(EMP(u,v,w,s) AND CHILDREN(x,c,y) AND
/* initiate domain variables */
u = x AND /* join condition */
v = ‘white’ } /* selection condition */
/* projection is implied (s, c) */
Domain Relational Calculus (DRC)

First_Name Last_Name Age


Rachel Eric 30
Samantha Eric 31
Roy Johnson 27
Carl Pratap 28
Eg. Query to find the first name and age of students where student age is greater than 27
{< First_Name, Age > | ∈ Student ∧ Age > 27} or
{f,a|(l)(Sudent(f,a,l) AND a>’27’}
Examples
Query 1. List the name and address of all employees who work for the

‘Research’ department.(TRC)

Query 2. For every project located in ‘Stafford’, list the project number, the

controlling department number, and the department manager’s last name,

birth date, and address.(TRC)


Query 3. List the birth date and address of the employee whose name is ‘John
B. Smith’.(DRC)
Cont’d
Query 1. List the name and address of all employees who work for the ‘Research’
department.
Q1: {t.Fname, t.Lname, t.Address | EMPLOYEE(t) AND DEPARTMENT(d)
AND d.Dname=‘Research’}
Query 2. For every project located in ‘Stafford’, list the project number, the controlling
department number, and the department manager’s last name, birth date, and address.
Q2: {p.Pnumber, p.Dnum, t.Lname, t.Bdate, t.Address | EMPLOYEE(t) AND
PROJECT(p) AND p.Plocation=‘Stafford’ AND ((DEPARTMENT(d)
AND p.Dnum=d.Dnumber AND d.Mgr_ssn=t.Ssn))}
Query3 : List the birth date and address of the employee whose name is ‘John

B. Smith’

Q0: {u, v | (∃ q) (∃ r) (∃ s) (∃ t) (∃ w) ( ∃ x) ( ∃ y) ( ∃ z) (EMPLOYEE(qrstuvwxyz)


AND q=‘John’ AND r=‘B’ AND s=‘Smith’)}
Chapter 5 : QUERY PROCESSING AND OPTIMIZATION

Query Processing is a procedure of transforming a high-level query (such as


SQL) into a correct and efficient execution plan expressed in low-level
language.

When a database system receives a query for update or retrieval of


information, it goes through a series of compilation steps, called execution
plan.
Query Analysis

The query is syntactically analyzed using the programming language


compiler (parser).
Example: SELECT emp_Fname FROM EMPLOYEE WHERE
emp_Lname >100
This query will be rejected because the comparison “>100” is
incompatible with the data type of emp_Lname which is character string.
At the end of query analysis phase, the high-level query (SQL) is
transformed into some internal representation that is more suitable for
processing.
This internal representation is typically a kind of query Tree.
There are different strategies for query processing
Query processing goes through various phases:

First phase is called syntax checking phase, the system parses the
query and checks that it follows the syntax rules or not. then matches
the objects in the query syntax with the view tables and columns listed
in the system table.

In second phase the SQL query is translated in to an algebraic


expression. the process of transforming a high-level SQL query into a
relational algebraic form is called Query Decomposition. The
relational algebraic expression now passes to the query optimizer.
In third phase optimization is performed by substituting
equivalent expression.
Query optimization module work with the join manager module
to improve the order in which joins are performed.
At this stage the cost model and several other estimation formulas
are used to rewrite the query.

• The modified query is written to utilize system resources, to


bring the optimal performance. This action plans are converted
into a query codes that are finally executed by a run time
database processor.
Strategies for query processing

Algorithms for External Sorting

The sorting of relations which do not fit in the memory because their size is
larger than the memory size. Such type of sorting is known as External Sorting.

Sorting is one of the primary algorithms used in query processing.

For example, whenever an SQL query specifies an ORDER BY-clause, the


query result must be sorted.

Sorting is also a key component in sort-merge algorithms used for JOIN and
other operations (such as UNION and INTERSECTION).
Cont’d
The typical external sorting algorithm uses a sort-merge strategy, which starts

by sorting small subfiles—called runs—of the main file and then merges the

sorted runs, creating larger sorted subfiles that are merged in turn.

consists of two phases: the sorting phase and the merging phase .
Access structures in DB
Indexing is a data structure technique to efficiently retrieve records from the
database files based on some attributes on which the indexing has been done.
Indexing in database systems is similar to what we see in books.
Index file is typically much smaller than the data file, searching the index using a
binary search is a better option.
Indexing can be of the following types
Primary Index − Primary index is defined on an ordered data file. The data file
is ordered on a key field. The key field is generally the primary key of the
relation.
Clustering Index − Clustering index is defined on an ordered data file. The data
file is ordered on a non-key field.
Secondary Index − Secondary index may be generated from a field which is a
candidate key and has a unique value in every record, or a non-key with duplicate
values
Hashing
The search condition must be an equality condition on a single field, called
the hash field. In most cases, the hash field is also a key field of the file, in
which case it is called the hash key.

The idea behind hashing is to provide a function h, called a hash function or


randomizing function, which is applied to the hash field value of a record and
yields the address of the disk block in which the record is stored.

 hashing is typically implemented as a hash table through the use of an


array of records.
Algorithms for Search (select)
For Simple Selection: A number of search algorithms are possible for selecting
records from a file. These are also known as file scans, If the search algorithm
involves the use of an index, the index search is called an index scan.
Linear search (brute force algorithm). Retrieve every record in the file, and test
whether its attribute values satisfy the selection condition. (buffer← disk)
Binary search: If the selection condition involves an equality comparison on a key
attribute on which the file is ordered.
Finds mid value and search.
Using a primary index: If the selection condition involves an equality comparison
on a key attribute with a primary index. Eg. σSsn = ‘123456789’ (EMPLOYEE)
Cont’d
Search Methods for Complex Selection.

If a condition of a SELECT operation is a conjunctive condition—that is, if it is


made up of several simple conditions connected with the AND logical connective

Conjunctive selection using an individual index.

Conjunctive selection using a composite index

Conjunctive selection by intersection of record


Reading Assignment
1. Read on how to Translate SQL into the Relational Algebra and vice versa
2. Detail reading on Indexing and hashing techniques
Fundamentals of DB systems 6th edition page 583
Algorithms for Project and set Operations
Algorithm for PROJECT operations
π <attribute list>(R)
1. If <attribute list> has a key of relation R, extract all tuples from R with only
the values for the attributes in <attribute list>.
2. If <attribute list> does NOT include a key of relation R, duplicated tuples must
be removed from the results.

Of all the operations, CARTESIAN PRODUCT operation is very expensive


and should be avoided if possible
Cont’d
1. UNION
Sort the two relations on the same attributes.
Scan and merge both sorted files concurrently, whenever the same tuple exists in
both relations, only one is kept in the merged results.
2. INTERSECTION
Sort the two relations on the same attributes.
Scan and merge both sorted files concurrently, keep in the merged results only
those tuples that appear in both relations.
3. SET DIFFERENCE R-S
Keep in the merged results only those tuples that appear in relation R but not in
relation S.
Combining Operations using Pipelining

A query is mapped into a sequence of operations.

Each execution of an operation produces a temporary result.

Generating and saving temporary files on disk is time consuming and expensive.

Alternative:

Avoid constructing temporary results as much as possible.

Pipeline the data through multiple operations - pass the result of a previous
operator to the next without saving to complete the previous operation.
Parallel query processing

Parallel query processing designates the transformation of high-level queries


into execution plans that can be efficiently executed in parallel, on a
multiprocessor computer

Parallel query processing can improve the performance of the following types of
queries: select statements that scan large numbers of pages but return
relatively few rows, such as: Table scans or clustered index scans with grouped
or ungrouped aggregates.

Eg. SELECT * FROM Vehicles ORDER BY Model_Number;


Query Tree
A Query Tree is a tree data structure that corresponds expression. A Query
Tree is also called a relational algebra tree.

Leaf node of the tree, representing the base input relations of the query.

Internal nodes result of applying an operation in the algebra.

Root of the tree representing a result of the query.

A relational algebra expression may have many equivalent expressions

Eg. σbalance<2500(πbalance(account)) ≡ πbalance(σbalance<2500(account))


Example
SELECT (P.proj_no, P.dept_no, E.name, E.add, E.dob)
FROM PROJECT P, DEPARTMENT D, EMPLOYEE E
WHERE D.mgr_id = E.emp_id AND P.dept_no = D.d_no AND P.proj_loc =
‘Mumbai’ ;

You might also like