0% found this document useful (0 votes)
58 views30 pages

Relational Algebra PDF

The document provides an overview of relational algebra including: 1) Relational algebra consists of operations that take one or two relations as input and produce a single relation as output, along with a language for combining these operations. 2) Common relational algebra operations include selection, projection, union, intersection, difference, cartesian product, and join. 3) Relational algebra was not intended as a user query language but provides a foundation for query languages like SQL and a standard way to represent queries algebraically.

Uploaded by

Mano Ranjani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
58 views30 pages

Relational Algebra PDF

The document provides an overview of relational algebra including: 1) Relational algebra consists of operations that take one or two relations as input and produce a single relation as output, along with a language for combining these operations. 2) Common relational algebra operations include selection, projection, union, intersection, difference, cartesian product, and join. 3) Relational algebra was not intended as a user query language but provides a foundation for query languages like SQL and a standard way to represent queries algebraically.

Uploaded by

Mano Ranjani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

Course Notes on Relational Algebra

Relational Algebra: Summary


• Operators
3 Selection
3 Projection
3 Union, Intersection, Difference
3 Cartesian Product
3 Join
3 Division
• Equivalences
• Outer Join, Outer Union
• Transitive Closure

Relational Algebra, October 9, 2008 – 1


What is the Relational Algebra?
• Relational algebra =
3 a collection of operations each acting on one or two relations and producing
one relation as result, and
3 a language for combining those operations
• The algebra has played a central role in the relational model: algebraic operations
characterize high-level set-at-a-time access
• The algebra in practice
3 it was never a real user language (calculus-based languages and SQL are
simpler)
3 its semantics is clear and a de facto standard
3 a precise syntax for the algebra is more complicated than its semantics

Relational Schema for the Company Example


Employee
SSN FName LName BDate Address Sex Salary SuperSSN DNo

Department
DNumber DName DMgr MgrStartDate

DeptLocations
DNumber DLocation

Project
PNumber PName PLocation DNumber

WorksOn
PNo ESSN Hours

Dependent
ESSN DependentName Sex BDate Relationship

Relational Algebra, October 9, 2008 – 2


Company Example: Population of the Database (1)
Employee
FName MInit LName SSN BDate Address Sex Salary SuperSSN DNo
John B Smith 123456789 09-Jan-55 ... M 30000 333445555 5
Franklin T Wong 333445555 08-Dec-45 ... M 40000 888665555 5
Alicia J Zelaya 999887777 19-Jul-58 ... F 25000 987654321 4
Jennifer S Wallace 987654321 20-Jul-31 ... F 43000 888665555 4
Ramesh K Narayan 666884444 15-Sep-52 ... M 38000 333445555 5
Joyce A English 453453453 31-Jul-62 ... F 25000 333445555 5
Ahmad V Jabbar 987987987 29-Mar-59 ... M 25000 987654321 4
James E Borg 888665555 10-Nov-27 ... M 55000 null 1
Department DeptLocations
DNumber DName MgrSSN MgrStartDate DNumber DLocation
5 Research 333445555 22-May-78 1 Houston
4 Administration 987654321 01-Jan-85 4 Stafford
1 Headquarters 888665555 19-Jun-71 5 Bellaire
5 Sugarland
5 Houston
Project
PNumber PName PLocation DNum
1 ProductX Bellaire 5
2 ProductY Sugarland 5
3 ProductZ Houston 5
10 Computerization Stafford 4
20 Reorganization Houston 1
30 Newbenefits Stafford 4

Company Example: Population of the Database (2)


WorksOn
ESSN PNo Hours
123456789 1 32.5
123456789 2 7.5
666884444 3 40
453453453 1 20
453453453 2 20
333445555 2 10
333445555 3 10
333445555 10 10
333445555 20 10
999887777 30 30.0
999887777 10 10.0
987987987 10 35.0
987987987 30 5.0
987654321 30 20.0
987654321 20 15.0
888665555 20 null
Dependent
ESSN DependentName Sex BDate Relationship
333445555 Alice F 05-Apr-76 Daughter
333445555 Theodore M 25-Oct-73 Son
333445555 Joy F 03-May-48 Spouse
987654321 Abner M 29-Feb-32 Spouse
123456789 Michael M 01-Jan-78 Son
123456789 Alice M 31-Dec-78 Daughter
123456789 Elizabeth F 05-May-57 Spouse

Relational Algebra, October 9, 2008 – 3


Selection (or Restriction)
• Select tuples satisfying a condition

σcondition (R) = {r ∈ R | condition(r)}

• Example: σDNo=5 (Employee)

SSN FName MInit LName BDate Address Sex Salary SuperSSN DNo
123456789 John B Smith ... ... M 30000 333445555 5
333445555 Franklin T Wong ... ... M 40000 888665555 5
666884444 Ramesh K Narayan ... ... M 38000 333445555 5
453453453 Joyce A English ... ... F 25000 333445555 5

• Intuition: selection is a “horizontal” slice of relation Employee


• Operational meaning: the condition is applied to every tuple; if it is satisfied,
the tuple is kept in the answer

Possible Forms of Conditions in Selection


½
hattributei hcomparisoni hattributei
• Simple Condition =
hattributei hcomparisoni hconstanti

• Comparisons: =, 6=, <, ≤, >, ≥


• Condition: combination of simple conditions with AND, OR, NOT
• Simple conditions are the most frequent

Relational Algebra, October 9, 2008 – 4


Projection
• Retain a subset of the attributes (columns) of a relation

πattributes (relation)

• Example: πLName,FName,Salary (Employee)


LName FName Salary
Smith John 30000
Wong Franklin 40000
Zelaya Alicia 25000
Wallace Jennifer 43000
Narayan Ramesh 38000
English Joyce 25000
Jabbar Ahmad 25000
Borg James 55000
• Intuition: projection is a “vertical” slice of relation Employee

“Duplicate Removal”
• The result of a projection is a relation ⇒ projection involves duplicate removal
• Example: πDNo (Employee)
DNo
5
5
DNo
4
5
4
4
5
1
5
4
1

Relational Algebra, October 9, 2008 – 5


• What is wrong with duplicate tuples?

3 “If something is true, then saying it twice does not make it more true” (Codd)
Even if repetition is a proven pedagogical technique :-)
3 Identical things are indistinguishable, there is no need to represent them twice
3 “Objects in the real world have only one thing in common: they are all different”
(anonymous, A. Taivalsaari, JOOP, Nov. 1997)
3 Distinct things should have some value to distinguish them
3 It is difficult to manipulate “duplicate tuples” unless for counting, averaging, etc
3 The obvious modeling of identical objects is as a common description plus the
number of copies of the object

• Duplicate removal (from a multiset to a set) can be expensive; it can be done with
3 nested loops
3 sort/merge
3 hashing

A Precise Definition of the Relational Algebra


• Algebraic operations:
3 operate on one or two relations and produce a relation as result
3 the result relation has no name
3 rules are needed to specify the attribute names in the result of algebraic
operations with two operands

10

• For a precise syntax and semantics of the algebra, see A Precise Definition of Basic
Relational Notions and the Relational Algebra, A. Pirotte, ACM SIGMOD Record,
13-1, 1982, pp. 30-45.

Relational Algebra, October 9, 2008 – 6


Combining Algebraic Operations
List the name and salary of employees in department 5

(1) Nested form: πFName,LName,Salary ( σDNo=5 (Employee) )

(2) Sequential form: Temp ← σDNo=5 (Employee)

Temp
SSN FName MInit LName BDate Address Sex Salary SuperSSN DNo
123456789 John B Smith 09-Jan-55 ... M 30000 333445555 5
333445555 Franklin T Wong 08-Dec-45 ... M 40000 888665555 5
666884444 Ramesh K Narayan 15-Sep-52 ... M 38000 333445555 5
453453453 Joyce A English 31-Jul-62 ... F 25000 333445555 5

R(FirstName, LastName, Salary) ← πFName,LName,Salary (Temp)


R
FirstName LastName Salary
John Smith 30000
Franklin Wong 40000
Ramesh Narayan 38000
Joyce English 25000

11

Nesting or Sequencing Operations


• Several relational algebra operations may be needed to express a given request:
3 by nesting several algebraic operations within a single relational algebra
expression
3 by applying operations one at a time in a sequence of steps and creating
named intermediate relations by assignment operations
3 the correspondence between nested form and sequential form is immediate
• Nesting and closure
3 the result of an algebraic operation is a relation
3 algebraic operations can be nested like functions
3 closure is essential for the full power of the algebra

12

Relational Algebra, October 9, 2008 – 7


• Nesting
3 nesting = classical functional composition:
∗ Y3 → f3 (f1 (X1 , X2 ), X4 , f2 (X3 )), is equivalent to the sequence
· Y1 → f1 (X1 , X2 )
· Y2 → f2 (X3 )
· Y3 → f3 (Y1 , X4 , Y2 )

• Closure
3 closure = make nesting freely usable for combining operations, i.e., the result of
every algebraic operation is a relation and it can be used as operand of another
algebraic operation
3 closure can be violated by
∗ the definition of language structure (SQL does )
∗ operations whose result is not a relation (e.g., “relations” with duplicate
tuples)
3 What is really needed is “compositionality”
∗ the result of a query can be used as argument for another query
∗ this is a version of what is called orthogonality (i.e., the generality of
combining pieces of the definition of a language)

Set-Theoretical Operations
• Standard operations on sets are automatically applicable to relations
• Union compatibility
3 relations have to be defined on the same domains
3 type compatibility would be more adequate
3 more precise definition: two relations R and S are union-compatible if there
is a one-to-one correspondence between attributes of R and attributes of S
such that corresponding attributes are associated with the same domain
• A mechanism for defining the attribute names of the result is needed

13

• A relation is a set of tuples


• Union, intersection, and difference on union-compatible relations R and S have
their usual meaning:

Relational Algebra, October 9, 2008 – 8


3 R ∪ S = all tuples (without duplication) in R, in S, or in both
3 R ∩ S = all tuples in both R and S
3 R − S = all tuples in R but not in S

• Union compatibility is similar to type checking in programming languages

Union, Intersection, Difference


Student Prof
FN LN FName LName
Susan Yao John Smith
Ramesh Shah Ricardo Brown
Barbara Jones Susan Yao
Amy Ford Francis Johnson
Jimmy Wang Ramesh Shah
Student ∪ Prof Student ∩ Prof Student - Prof
FN LN FN LN FN LN
Susan Yao Susan Yao Barbara Jones
Ramesh Shah Ramesh Shah Amy Ford
Barbara Jones Jimmy Wang
Amy Ford
Jimmy Wang
John Smith
Ricardo Brown
Francis Johnson

14

• Rules must be stated to specify the attribute names of the result; for example
3 the attributes of the first operand
3 the attributes of the second operand
3 explicitly-specified attributes, e.g., R1 (F N, LN ) ← Student ∪ P rof

Relational Algebra, October 9, 2008 – 9


Cartesian Product
Temp ← Department × DeptLocations
DNumber DName MgrSSN MgrStartDate DNo DLocation
5 Research 333445555 22-May-78 1 Houston
4 Administration 987654321 01-Jan-85 1 Houston
1 Headquarters 888665555 19-Jun-71 1 Houston
5 Research 333445555 22-May-78 4 Stafford
4 Administration 987654321 01-Jan-85 4 Stafford
1 Headquarters 888665555 19-Jun-71 4 Stafford
5 Research 333445555 22-May-78 5 Bellaire
4 Administration 987654321 01-Jan-85 5 Bellaire
1 Headquarters 888665555 19-Jun-71 5 Bellaire
5 Research 333445555 22-May-78 5 Sugarland
4 Administration 987654321 01-Jan-85 5 Sugarland
1 Headquarters 888665555 19-Jun-71 5 Sugarland
5 Research 333445555 22-May-78 5 Houston
4 Administration 987654321 01-Jan-85 5 Houston
1 Headquarters 888665555 19-Jun-71 5 Houston

• 15 tuples in result, if 3 tuples in Department and 5 tuples in DeptLocations

15

• The Cartesian product associates every tuple of the first relation with every tuple of
the second one
• The result relation has all the attributes of the operand relations: if tha attributes of
the operand relations are not all distinct, some of them must be renamed

• The relational algebra thus needs a renaming operation, with as a possible syntax:
3 rename (relation name, (oldname → newname, ...) )
3 in the example: rename(DeptLocations,(DNumber→DNo)

Relational Algebra, October 9, 2008 – 10


Semantics of Cartesian Product
• The Cartesian product associates every tuple of one relation with every tuple of
the other (this is not a very useful operation)
• The following operation is more useful
πDNumber,DName,MgrSSN,MgrStartDate,DLocation (σDNumber=DNo (Temp))

DNumber DName MgrSSN MgrStartDate DLocation


1 Headquarters 888665555 19-Jun-71 Houston
4 Administration 987654321 01-Jan-85 Stafford
5 Research 333445555 22-May-78 Bellaire
5 Research 333445555 22-May-78 Sugarland
5 Research 333445555 22-May-78 Houston

• The result is a join of relations Department and DeptLocations, that associates


each department with its locations

16

• The restriction retains only the tuples of the Cartesian product where the values of
DNumber and of DNo are equal
• The projection suppresses the DNo attribute

• All these operations (Cartesian product, selection, and projection) can can be ex-
pressed as a single algebra operation: the join

• The join is a fundamental operation for meaningfully creating bigger relations from
smaller ones: but it is not always the inverse of projection (see later)

Relational Algebra, October 9, 2008 – 11


Join
3 Join (1) combines two relations into one on the basis of a condition
R 1condition S = {concat(r, s) | r ∈ R ∧ s ∈ S ∧ condition(r, s)}

3 Add the location information to the information about departments


DeptLocs ← Department 1DNumber=DNo DeptLocations
DNumber DName MgrSSN MgrStartDate DNo Location
5 Research 333445555 22-May-78 5 Bellaire
5 Research 333445555 22-May-78 5 Sugarland
5 Research 333445555 22-May-78 5 Houston
4 Administration 987654321 01-Jan-85 4 Stafford
1 Headquarters 888665555 19-Jun-71 1 Houston

17

3 Remember that, if R and S have an attribute in common, then some attributes


of R and/or S have to be renamed
3 The condition can be more general than a test for equality of 2 attribute values
∗ simple condition = hR’s attributei hcomparisoni hS’s attributei
∗ comparison: =, 6=, <, ≤, >, ≥
∗ condition: combination of simple conditions with AND
3 Two ways of defining the semantics of joins
∗ declarative: create a tuple in the result for each pair of tuples in the relation
arguments that satisfy the condition
∗ operational (evaluation strategy): the condition is applied to every tuple
of the Cartesian product; if it is satisfied, the tuple is kept in the answer
3 As seen above, the join can be evaluated as a combination of Cartesian product,
selection, and projection but this is not an efficient evaluation strategy; there are
various strategies for implementing joins (see later); a basic method to implement
the operation definition above is with nested loops:
R 1A=B S =
for each r ∈ R
do
for each s ∈ S
do
if r.A = s.B
then concat(r,s) ⇒ result
fi
end
end

Relational Algebra, October 9, 2008 – 12


• Kinds of joins
3 theta join: it is the general join (when all the attributes of the operand relations
appear in the result of the join and the join condition is not simply a test of
equality of 2 attributes)
3 equijoin: when the join condition is a simple equality (e.g., A = B)
3 natural Join = equijoin + only one of the attributes tested for equality is
included in the result
∗ R ∗A=B S: only one of A and B (say, A) is retained in the result
∗ R ∗ S: an equijoin is performed that tests equality of the attributes that have
the same name in R and S, and only one of them is retained in the result (this
version of join does not require the attributes of R and S to be all different)

Natural Join
• Add the location information to the information about departments (with the
DNumber attribute of DeptLocs renamed as DNo)
DeptLocs ← Department ∗DNumber=DNo DeptLocations
DNumber DName MgrSSN MgrStartDate Location
5 Research 333445555 22-May-78 Bellaire
5 Research 333445555 22-May-78 Sugarland
5 Research 333445555 22-May-78 Houston
4 Administration 987654321 01-Jan-85 Stafford
1 Headquarters 888665555 19-Jun-71 Houston

18

Relational Algebra, October 9, 2008 – 13


Other Example of Natural Join
• Create a relation associating project information with information about the depart-
ment from which the projects depend
ProjDept ← Project ∗ Department

PNumber PName PLocation DNum DName MgrSSN MgrStartDate


1 ProductX Bellaire 5 Research 333445555 22-May-78
2 ProductY Sugarland 5 Research 333445555 22-May-78
3 ProductZ Houston 5 Research 333445555 22-May-78
10 Computerization Stafford 4 Administration 987654321 01-Jan-85
20 Reorganization Houston 1 Headquarters 888665555 19-Jun-71
30 NewBenefits Stafford 4 Administration 987654321 01-Jan-85

• The join condition test equality of DNumber values in Project and Department
• It can also be written Project ∗DNumber=Dnumber Department

19

Relative Order of Selection and Join


Find name and address of employees who work for the Research department
• Selection on an operand of the join, or selection “before” or “inside” join
ResDept ← σDName=‘Research’ (Department)
ResDeptEmps ← ResDept 1DNumber=DNo Employee
Result ← πLName,Address (ResDeptEmps)

πLName,Address ( σDName=‘Research’ (Department) 1DNumber=DNo Employee )

• Selection on the result of the join, or selection “after” or “outside” join


DeptsEmps ← Department 1DNumber=DNo Employee))
ResDeptEmps ← σDName=‘Research’ (DeptsEmps)
Result ← πLName,Address (ResDeptEmps)

πLName,Address ( σDName=‘Research0 ( Department 1DNumber=DNo Employee) )

20

Relational Algebra, October 9, 2008 – 14


About query optimization
• The two formulations of the query above are equivalent (it should be clear that they
produce the same result)
• The first one does the selection before the join (i.e., the result of the selection serves
as operand of the join), while the second one evaluates the selection on the result of
the join

• Such equivalences are frequent in the relational algebra


• If the algebraic formulation was taken as guidance for actual evaluation, then in general
there would be differences in performance (in the example above, the first formulation
would probably produce a more efficient execution, as the selection on DName in
Department produces a small relation before the join is evaluated)
• Query optimizers of current relational technology are able to perform comparative
evaluation of performance and select a good strategy
• Query optimizers for SQL will not beat the best Cobol programmer, but
3 good programmers are scarce and differences in individual productivity are enor-
mous compared to other human activities
3 it is impossible, in practice and in theory, to similarly optimize the compilation
of programs in imperative languages (e.g., Cobol or C)
3 the best strategy selected depends on the database populations; if populations
change so much that the best evaluation strategy ceases to be the best
∗ Cobol programs have to be rewritten to adjust to the new situation and
remain optimum
∗ SQL optimization is redone dynamically by the DBMS: they can be optimized
without being rewritten
• Demonstrates the clear superiority of nonprocedural approaches over imperative ones:
this was a key factor in establishing the relational model
• For actual evaluation, a useful heuristics is to perform selections before joins, be-
cause joins are expensive operations that should be evaluated on operands as small as
possible

• The user has to choose one of the two algebraic formulations (but the query optimizer
may decide to evaluate the query with the other one)

Relational Algebra, October 9, 2008 – 15


An Example with Two Joins
For every project located in ‘Brussels’, list the project number, the controling de-
partment number, and the department manager’s last name, address, and birth date
• The result of joining Project and Department is joined with Employee

BrusselsProjs ← σPLocation=‘Brussels’ (Project)


ProjDept ← BrusselsProjs 1DNum=DNumber Department
ProjDeptMgr ← ProjDept 1MgrSSN=SSN Employee
Result ← πPNumber,DNum,LName,Address,BDate (ProjDeptMgr)

• The result of joining Department and Employee is joined with Project

BrusselsProjs ← σPLocation=‘Brussels’ (Project)


DeptMgr ← Department 1MgrSSN=SSN Employee
ProjDeptMgr ← BrusselsProjs 1DNum=DNumber (DeptMgr)
Result ← πPNumber,DNum,LName,Address,BDate (ProjDeptMgr)

21

• The query requires two binary joins and formulations differ in the ordering of those
joins
• For actual evaluation, the query optimizer will choose the most efficient order (in this
case, most probably the second one).

• A third formulation would express a product of Employee and Project, and would join
the result with Department

Relational Algebra, October 9, 2008 – 16


Join or Intersection
List the names of managers who have at least one dependent
EmpDep ← Employee ∗SSN=ESSN Dependent
MgrDep ← EmpDep ∗SSN=MgrSSN Department
Result ← πLName (MgrDep)
• Projections can be “moved down” to make join arguments smaller
• Joins of one-attribute relations are intersections
• For actual evaluation, the query optimizer chooses the location of projections
Mgrs(SSN) ← πMgrSSN (Department)
EmpsWithDeps(SSN) ← πESSN (Dependent)
MgrsWithDeps ← Mgrs ∩ EmpsWithDeps
Result ← πLName (MgrsWithDeps ∗ Employee)

22

• The join Employee ∗SSN=ESSN Dependent illustrates the loss of information in a join
(see later, outer join)
• The join MgrsWithDeps ∗ Employee is sometimes called a semi-join, it is similar to a
selection: it selects the tuples of Employee whose SSN appears in the one-attribute
relation MgrsWithDeps.

Relational Algebra, October 9, 2008 – 17


Difference
List the names of employees who have no dependent
(1) AllEmps ← πSSN (Employee)
EmpsWithDeps(SSN) ← πESSN (Dependent)
EmpsWithoutDeps ← AllEmps − EmpsWithDeps
Result ← πLName (EmpsWithoutDeps ∗ Employee)

EmpsWithoutDeps
z }| {
(2) πLName ( (πSSN (Employee) − πESSN (Dependent)) ∗Employee)
| {z } | {z }
AllEmps EmpsWithDeps

EmpsWithDeps0
z }| {
(3) πLName (Employee − Employee ∗SSN=ESSN (πESSN (Dependent)) )
| {z }
EmpsWithDeps

23

Union
List of project numbers for projects that involve an employee whose last name is
‘Smith’ as a worker or as a manager of the department that controls the project
Smiths(ESSN) ← πSSN (σLName=‘Smith’ (Employee))
SmithWorkerProjs ← πPNo (WorksOn ∗ Smiths)
Mgrs ← πLName,DNumber (Employe 1SSN=MgrSSN Department)
SmithMgrs ← σLName=‘Smith’ (Mgrs)
SmithManagedDepts(DNum) ← πDNumber (SmithMgrs)
SmithMgrProjs(PNo) ← πPNumber (SmithManagedDepts ∗ Project)
Result ← SmithWorkerProjs ∪ SmithMgrProjs

This query really is the union of two simpler queries

24

Relational Algebra, October 9, 2008 – 18


Division
• R(A, B) ÷ S(B) =
3 all tuples of πA (R) that appear in R with at least every element of S
3 {a ∈ πA (R) | S ⊆ πB (σA=a (R))}
3 {a ∈ πA (R) | ∀b ∈ S → (a, b) ∈ R}

25

An Example with Division


• Retrieve the name of employees who work on all the projects that Smith works
on

Smith ← σLName=‘Smith’ (Employee)


SmithPNos ← πPNo (WorksOn ∗ESSN=SSN Smith)
SSNPNos ← πPNo,ESSN (WorksOn)
SSNS(SSN) ← SSNPNos ÷ SmithPNos
Result ← πFName,LName (SSNS ∗ Employee)

26

Relational Algebra, October 9, 2008 – 19


Division: Operational Semantics
SSNPNos SmithPNos
ESSN PNo PNo
123456789 1 1
123456789 2 2
666884444 3
453453453 1 SSNS
453453453 2 SSN
333445555 2 123456789
333445555 3 453453453
333445555 10
333445555 20 Result
999887777 30 FName LName
999887777 10 John Smith
987987987 10 Joyce English
987987987 30
987654321 30
987654321 20
888665555 20

27

• Operational semantics of division:

3 partition relation SSNPNos in classes of tuples with the same value of ESSN
3 relation SSNS is the result of the division SSNPNos ÷ SmithPNos, i.e., the set of
ESSN values that occur in a class with at least the PNo values of SmithPNos
3 Relation Result is the result of the query

Relational Algebra, October 9, 2008 – 20


Semantics of Division
R S T=R÷S
A B B A
a1 b1 b1 a1
a1 b2 b2 a4
a1 b3 b3
a1 b4
a2 b1
a2 b3
a3 b2
a3 b3
a3 b4
a4 b1
a4 b2
a4 b3
• Partition relation R according to the A values; to each value ai is attached a set
of B values associated with that ai value in R
• Include in T each ai such that the set of B values associated with ai contains S

28

Another Example with Division


Find the names of employees who work on all the projects controled by department
number 5

Dept5Projs(PNo) ← πPNumber (σDNum=5 (Project))


EmpProj(SSN, PNo) ← πESSN,PNo (WorksOn)
ResultEmpSSNs ← EmpProj ÷ Dept5Projs
Result ← πLName,FName (ResultEmpSSNs ∗ Employee)

29

Relational Algebra, October 9, 2008 – 21


Relational Completeness
• A language is “relational complete” if it has at least the power of the relational
algebra
• Relational completeness is the only widely-accepted measure of power (besides
computational completeness)
• Domain calculus and tuple calculus have the same power of expression as the
relational algebra

30

Equivalences (Theory)
• Not all operations of the algebra are independent
• {σ, π, ∪, −, ×} is a complete set, i.e., it has all the expression power of the algebra
• ∩, 1, and ÷ can be derived from them
• R ∩ S = (R ∪ S) − ((R − S) ∪ (S − R))
• R 1condition S = σcondition (R × S)
• R ∗condition S = πattr (σcondition (R × S))
• R ÷ S = T can be reexpressed with difference and Cartesian product

31

Relational Algebra, October 9, 2008 – 22


Division Redefined
R S T1 T1 × S
A B B A A B
a1 b1 b1 a1 a1 b1
a1 b2 b2 a2 a1 b2
R ÷ S = T is equivalent to a1 b3 b3 a3 a1 b3
a1 b4 a4 a2 b1
T1 ← πA (R) a2 b1 R÷S a2 b2
T2 ← πA ((T1 × S) − R) a2 b3 A T2 a2 b3
T ← T1 − T2 a3 b2 a1 A a3 b1
a3 b3 a4 a2 a3 b2
a3 b4 a3 a3 b3
a4 b1 a4 b1
a4 b2 a4 b2
a4 b3 a4 b3

32

• T1 contains all the candidate a-values, the answer is a subset of T1

• T1 × S associates each a-value with all b-values of S


• if an a-value is in T2 , this means that a-value is associated in R with fewer b-values
than the b-values with which it is associated in T1 × S: that a-value should not be
part of the result

Relational Algebra, October 9, 2008 – 23


Equivalences (Practical)
(1) σc1 ∧c2 (R) = σc1 (σc2 (R))
(2) σc1 (σc2 (R)) = σc2 (σc1 (R))
(3) if A ⊆ A1 , then πA (R) = πA (πA1 (R))
(4) πA (σc (R)) = σc (πA (R)) if attributes in c ⊆ attributes in A
(5) σc (R 1 S) = σc (R) 1 S if attributes in c ⊆ attributes in R
(6) πA,B (R 1c S) = πA (R) 1c πB (S)
if c involves only attributes in A of R and in B of S
(7) πA,B (R 1c S) = πA,B (πA,A1 (R) 1c πB,B1 (S))
if c involves attributes in A, A1 of R and in B, B1 of S
(8) (R 1 S) 1 T = R 1 (S 1 T )
(9) σc (R ∪ S) = σc (R) ∪ σc (S)
(10) πA (R ∪ S) = πA (R) ∪ πA (S)

33

(1) Break conjunctive selection σc1 ∧c2 (R) = σc1 (σc2 (R))

(2) Commute selections σc1 (σc2 (R)) = σc2 (σc1 (R))


Used for
• applying the most selective join first for efficiency
• commute a simple selection with another operation (join, projection)
(3) Sequence partial projections: if A ⊆ A1 , then πA (R) = πA (πA1 (R))
(4) Commute selection and projection:
• πA (σc (R)) = σc (πA (R)), if attributes in c ⊆ attributes in A
• σc (πA (R)) can be evaluated as πA (σc (R)) (but not the other way around)
(5) Enter selection into join: if attributes in c ⊆ attributes in R, then
σc (R 1 S) = σc (R) 1 S
(6) Enter projection into join: if c involves only attributes in A of R and in B of S, then
πA,B (R 1c S) = πA (R) 1c πB (S)
(7) Enter projection into join (general case):
• if c involves attributes in A, A1 of R and in B, B1 of S, then πA,B (R 1c S) =
πA,B (πA,A1 (R) 1c πB,B1 (S))
• the attributes of R (idem for S) comprise A, A1 , and A2 ; A is needed in the
result, A1 participates in the join but is not needed in the result; A2 does not
participate at all

Relational Algebra, October 9, 2008 – 24


(8) Associate, commute joins (also valid for set-theoretic operations)
(R 1 S) 1 T = R 1 (S 1 T )
R1S=S1R
Used for choosing the order of joins for efficiency
(9) Enter selection into union (also intersection, difference)
σc (R ∪ S) = σc (R) ∪ σc (S)
(10) Enter projection into union (also intersection, difference)
πA (R ∪ S) = πA (R) ∪ πA (S)

Motivation for Outer Joins: Ordinary Joins are often Lossy


• πFName,...,DNo (Employee 1SSN=MgrSSN Department) ⊆ Employee
• πFName,...,DNo (Employee =1SSN=MgrSSN Department) = Employee
• πFName,...,DNo (σDName6=null (Employee =1SSN=MgrSSN Department) =
πFName,...,DNo (Employee 1SSN=MgrSSN Department)

34

• “Lossy” = information is lost in the result of the join (e.g. employees who are not
department managers disappear in Employee 1SSN=MgrSSN Department)

• In R 1 S, only tuples satisfying the join condition contribute to the result and in-
formation may be lost (projections of R 1 S on R or S may be smaller than R or
S)

Relational Algebra, October 9, 2008 – 25


Outer Joins
• Outer Joins preserve information from the operands (outer joins are “lossless”)
• Left Outer Join R =1 S: retains all tuples of the left operand relation R: if, for
a tuple of R, no matching tuple is found in S, the attribute values corresponding
to S in the result are set to null
• Right Outer Join S in R 1< S: retains all tuples of S
• Full Outer Join (=1<): retains all tuples in both relations

35

Example of Outer Join


• Retrieve the name of all employees, plus the name of the departments that they
manage (if any)

Temp ← Employee =1SSN=MgrSSN (Department)


Result ← πFName,MInit,LName,DName (Temp)

Result
FName MInit LName DName
John B Smith null
Franklin T Wong Research
Alicia J Zelaya null
Jennifer S Wallace Administration
Ramesh K Narayan null
Joyce A English null
Ahmad V Jabbar null
James E Borg Headquarters

36

Relational Algebra, October 9, 2008 – 26


Outer Union
• Union of tuples from two partially compatible relations (only some of their
attributes are union compatible)
• Attributes that are not union compatible from either relation are kept in the
result, and tuples with no values for these attributes are padded with null values
• Outer union of
Student(Name,SSN,Department,Advisor)
Professor(Name,SSN,Department,Rank)
is a relation R(Name,SSN,Department,Advisor,Rank) obtained from
Student ∗ Professor ∪
Professors that are not students (with null for Advisor) ∪
Students that are not professors (with null for Rank)
• All tuples of the operand relations appear as a subtuple of the result

37

Outer Union as ER Generalization


• Outer union corresponds to generalization in the ER model

Person
SSN
Name
Department

Student Professor
Advisor Rank

38

Relational Algebra, October 9, 2008 – 27


Transitive Closure
• Natural and frequent operation for exploring nested structures (e.g., part-subpart
composition)
• Natural in algebra style but not available
• Applies to a recursive relationship between tuples of the same relation, e.g.,
between employee and supervisor in relation Employee
Example
• Retrieve all employees supervised by James Borg
= all employees directly supervised by James Borg
+ all employees directly supervised by the previous ones
+ ...
• Although it is possible to specify each level in relational algebra, the number of
levels is not known since it depends on the extension

39

• Remember that the relational algebra (nor TRC, DRC, SQL - see later) is not com-
putational complete ( = does not have the expressive power of algorithmic languages)
• Computational completeness = Church-Turing thesis

3 thesis = any algorithm that you can think of can be formulated with any of the
popular programming languages
3 revised with Gödel theorem and undecidable problems

Relational Algebra, October 9, 2008 – 28


Recursive Formulations of Transitive Closure
• Prolog-style

Result(s) ← BorgSSN(s)
Result(s1 ) ← Supervision(s1 , s2 ) and Result(s2 )

• Recursive equation

Result(SSN ← SSN1) ← BorgSSN(SSN1) ∪


πSSN1 (Supervision ∗SSN2=SSN Result)

• Alternative: usual algorithmic program

40

More about recursion ...

• A dictionary definition:
3 recursive: see “recursive”

• “Anything in computer science that is not recursive is no good” (Jim Gray?)


• Recursion = fundamental linguistic tool (like iteration) for expressing, in a well-defined
finite way, patterns of action that repeat an unknown number of times (in natural
language: “and so on”, “etc.”, “...”)

Relational Algebra, October 9, 2008 – 29


Incomplete Algebra Solution for Transitive Closure

BorgSSN ← πSSN (σFName=‘James0 ∧LName=‘Borg0 (Employee))


Supervision(SSN1, SSN2) ← πSSN,SuperSSN (Employee)
Result1(SSN) ← πSSN1 (Supervision 1SSN2=SSN BorgSSN)
Result2(SSN) ← πSSN1 (Supervision 1SSN2=SSN Result1)

BorgSSN Supervision
SSN SSN1 SSN2
888665555 123456789 333445555
333445555 888665555
999887777 987654321
987654321 888665555
666884444 333445555
453453453 333445555
987987987 987654321
888665555 null

41

Incomplete Algebra Solution for Transitive Closure


Result1 Result2 Result
SSN SSN SSN
333445555 123456789 333445555
987654321 999887777 987654321
supervised by 666884444 123456789
Borg 453453453 999887777
987987987 666884444
supervised by Borg’s 453453453
subordinates 987987987

42

Relational Algebra, October 9, 2008 – 30

You might also like