0% found this document useful (0 votes)

59 views

Relational Operators

The document discusses relational database operators like selection, projection, join, set difference, and union. It explains that relational operators can be composed to process complex queries more efficiently. The main relational operators - selection, projection, join, set difference, union and aggregation - are introduced along with examples of join algorithms like nested loops join, block nested loops join, sort merge join and hash join. Optimizing the order of composing relational operators can improve query performance.

Uploaded by

Gilbert Heß

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

59 views

Relational Operators

Uploaded by

Gilbert Heß

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 114

Relational Operators

First comes thought; then

organization of that thought, into
ideas and plans; then
transformation of those plans into
reality. The beginning, as you will
observe, is in your imagination.

Napolean Hill
CS3223 - Relational Operators 1
Introduction
• We’ve covered the basic underlying storage, buffering, and
indexing technology
• Now we are ready to move on to query processing
• Some database operations are EXPENSIVE
• Can greatly improve performance by being “smart”
• e.g., can speed up 1,000,000x over naïve approach
• Main approaches are:
• clever implementation techniques for operators
• exploit “equivalences” of relational operators
• use statistics and cost models to choose among these

CS3223 - Relational Operators 2

Steps of processing a high-level
query

Database
Statistics Cost Model

Parsed Query Query QEP Query

Parser Optimizer Evaluator
P1: Sequential Scan
High Level Query P2: Use SAL index
Query Result
SELECT * FROM EMP
WHERE SAL > 50k

CS3223 - Relational Operators 3

Relational Operations
• We will consider how to implement:
• Selection () Selects a subset of rows from relation.
• Projection (  ) Deletes unwanted columns from relation.
• Join ( ) Allows us to combine two relations.
• Set-difference ( - ) Tuples in reln. 1, but not in reln. 2.
• Union ( U ) Tuples in reln. 1 and in reln. 2.
• Aggregation (SUM, MIN, etc.) and GROUP BY

Since each op returns a relation, ops can be composed!

Queries that require multiple ops to be composed may be composed in
different ways - thus optimization is necessary for good performance

CS3223 - Relational Operators 4

SELECT S.sname
FROM Reserves R, Sailors S
WHERE R.sid=S.sid AND
Example
R.bid=100 AND S.rating>5

sname
sname
sname
rating > 5
bid=100 rating > 5

sid=sid
sid=sid

bid=100 rating > 5

sid=sid bid=100 Sailors

Reserves Sailors
Reserves
Sailors Reserves

CS3223 - Relational Operators 5

Paradigm
• Iteration-based
• Index
• B+-tree, Hash
• assume index entries to be (rid,pointer) pair
• Clustered, Unclustered
• Sort
• Hash

CS3223 - Relational Operators 6

Schema for Examples
Sailors (sid: integer, sname: string, rating: integer, age: real)
Reserves (sid: integer, bid: integer, day: dates, rname: string)

• Reserves (R):
• ||R|| - number of tuples
• |R| - number of pages
• pR tuples per page, |R| = M. Let pR = 100, M = 1000, ||R|| = 100*1000
• Sailors (S):
• pS tuples per page, |S| = N. Let pS = 80, N = 500, ||S|| = 80*500
• Cost metric: # of I/Os (pages)
• We will ignore output costs in the following discussion

CS3223 - Relational Operators 7

Equality Joins With One Join Column

SELECT * sid=sid

FROM Reserves R, Sailors S

WHERE R.sid=S.sid Sailors Reserves

• In algebra: R S
• Most frequently used operation; very costly operation
• ||R S|| =   (||R||  ||S|| ) where  is the join selectivity

CS5208 - Relational Operators

CS3223 8
Join Example

CS3223 - Relational Operators 9

Join Algorithms
• Iteration-based
• Block nested loop
• Index-based
• Index nested loop
• Sort-based
• Sort-merge join
• Partition-based
• Hash join

CS3223 - Relational Operators 10

Join Algorithms
• Things to consider when choosing an algorithm
 Types of join predicates
• Equality predicates (e.g., R.A = S.B)
• Inequality predicates (e.g., R.A < S.B)
 Sizes of join operands
 Available buffer space
 Available access methods

CS3223 - Relational Operators 11

Simple (Tuple-based) Nested Loops Join
foreach tuple r in R do
foreach tuple s in S do
if r.sid == s.sid then add <r, s> to result

• For each tuple in the outer relation R, we scan the

entire inner relation S
• I/O Cost?
• Memory?

CS3223 - Relational Operators 12

Simple Nested Loops Join
foreach tuple r in R do
foreach tuple s in S do
if r.sid == s.sid then add (r, s) to result

• For each tuple in the outer relation R, we scan the

entire inner relation S
Scan S
Scan R

• Cost: |R| + ||R|| x |S|

• M + (pR * M) * N = 1000 + 100*1000*500 I/Os

CS3223 - Relational Operators 13

Simple Nested Loops Join
foreach tuple r in R do
foreach tuple s in S do
if r.sid == s.sid then add (r, s) to result

• For each tuple in the outer relation R, we scan the

entire inner relation S.
• Cost: M + (pR * M) * N = 50,001,000 I/Os
• Memory: 3 pages!

Can we do better??
CS3223 - Relational Operators 14
Page-based Nested Loop Join
for each page PR of R do
for each page PS of S do
for each tuple r PR do
for each tuple s PS do
if (r.sid == s.sid) then output (r ,s) to result

• I/O cost = |R| + |R| x |S| = M + M * N = 1000 + 1000*500 I/Os

• Memory = 3 pages!
Can we do better??

CS3223 - Relational Operators 15

Block Nested Loops Join
• Motivation: How to better exploit buffer space to minimize
number of I/Os?
R S

For each R tuple, scan S For each page of R tuples, For each block of R tuples,
(memory size = 3 pages) scan S (memory = 3 pages) scan S (memory > 3 pages)
Number of iterations of S: ||R|| Number of iterations: |R| block size (B): buffer size – 2
Number of iterations: |R|/B
CS3223 - Relational Operators 16
Block Nested Loops Join
• Use one page as an input buffer for scanning the inner S, one page as the
output buffer, and use all remaining pages (B -2) to hold “block” of outer R
• For each matching tuple r in R-block, s in S-page, add (r, s) to result. Then read
next R-block, scan S, etc