Relational Algebra PDF
Relational Algebra PDF
Department
DNumber DName DMgr MgrStartDate
DeptLocations
DNumber DLocation
Project
PNumber PName PLocation DNumber
WorksOn
PNo ESSN Hours
Dependent
ESSN DependentName Sex BDate Relationship
SSN FName MInit LName BDate Address Sex Salary SuperSSN DNo
123456789 John B Smith ... ... M 30000 333445555 5
333445555 Franklin T Wong ... ... M 40000 888665555 5
666884444 Ramesh K Narayan ... ... M 38000 333445555 5
453453453 Joyce A English ... ... F 25000 333445555 5
πattributes (relation)
“Duplicate Removal”
• The result of a projection is a relation ⇒ projection involves duplicate removal
• Example: πDNo (Employee)
DNo
5
5
DNo
4
5
4
4
5
1
5
4
1
3 “If something is true, then saying it twice does not make it more true” (Codd)
Even if repetition is a proven pedagogical technique :-)
3 Identical things are indistinguishable, there is no need to represent them twice
3 “Objects in the real world have only one thing in common: they are all different”
(anonymous, A. Taivalsaari, JOOP, Nov. 1997)
3 Distinct things should have some value to distinguish them
3 It is difficult to manipulate “duplicate tuples” unless for counting, averaging, etc
3 The obvious modeling of identical objects is as a common description plus the
number of copies of the object
• Duplicate removal (from a multiset to a set) can be expensive; it can be done with
3 nested loops
3 sort/merge
3 hashing
10
• For a precise syntax and semantics of the algebra, see A Precise Definition of Basic
Relational Notions and the Relational Algebra, A. Pirotte, ACM SIGMOD Record,
13-1, 1982, pp. 30-45.
Temp
SSN FName MInit LName BDate Address Sex Salary SuperSSN DNo
123456789 John B Smith 09-Jan-55 ... M 30000 333445555 5
333445555 Franklin T Wong 08-Dec-45 ... M 40000 888665555 5
666884444 Ramesh K Narayan 15-Sep-52 ... M 38000 333445555 5
453453453 Joyce A English 31-Jul-62 ... F 25000 333445555 5
11
12
• Closure
3 closure = make nesting freely usable for combining operations, i.e., the result of
every algebraic operation is a relation and it can be used as operand of another
algebraic operation
3 closure can be violated by
∗ the definition of language structure (SQL does )
∗ operations whose result is not a relation (e.g., “relations” with duplicate
tuples)
3 What is really needed is “compositionality”
∗ the result of a query can be used as argument for another query
∗ this is a version of what is called orthogonality (i.e., the generality of
combining pieces of the definition of a language)
Set-Theoretical Operations
• Standard operations on sets are automatically applicable to relations
• Union compatibility
3 relations have to be defined on the same domains
3 type compatibility would be more adequate
3 more precise definition: two relations R and S are union-compatible if there
is a one-to-one correspondence between attributes of R and attributes of S
such that corresponding attributes are associated with the same domain
• A mechanism for defining the attribute names of the result is needed
13
14
• Rules must be stated to specify the attribute names of the result; for example
3 the attributes of the first operand
3 the attributes of the second operand
3 explicitly-specified attributes, e.g., R1 (F N, LN ) ← Student ∪ P rof
15
• The Cartesian product associates every tuple of the first relation with every tuple of
the second one
• The result relation has all the attributes of the operand relations: if tha attributes of
the operand relations are not all distinct, some of them must be renamed
• The relational algebra thus needs a renaming operation, with as a possible syntax:
3 rename (relation name, (oldname → newname, ...) )
3 in the example: rename(DeptLocations,(DNumber→DNo)
16
• The restriction retains only the tuples of the Cartesian product where the values of
DNumber and of DNo are equal
• The projection suppresses the DNo attribute
• All these operations (Cartesian product, selection, and projection) can can be ex-
pressed as a single algebra operation: the join
• The join is a fundamental operation for meaningfully creating bigger relations from
smaller ones: but it is not always the inverse of projection (see later)
17
Natural Join
• Add the location information to the information about departments (with the
DNumber attribute of DeptLocs renamed as DNo)
DeptLocs ← Department ∗DNumber=DNo DeptLocations
DNumber DName MgrSSN MgrStartDate Location
5 Research 333445555 22-May-78 Bellaire
5 Research 333445555 22-May-78 Sugarland
5 Research 333445555 22-May-78 Houston
4 Administration 987654321 01-Jan-85 Stafford
1 Headquarters 888665555 19-Jun-71 Houston
18
• The join condition test equality of DNumber values in Project and Department
• It can also be written Project ∗DNumber=Dnumber Department
19
20
• The user has to choose one of the two algebraic formulations (but the query optimizer
may decide to evaluate the query with the other one)
21
• The query requires two binary joins and formulations differ in the ordering of those
joins
• For actual evaluation, the query optimizer will choose the most efficient order (in this
case, most probably the second one).
• A third formulation would express a product of Employee and Project, and would join
the result with Department
22
• The join Employee ∗SSN=ESSN Dependent illustrates the loss of information in a join
(see later, outer join)
• The join MgrsWithDeps ∗ Employee is sometimes called a semi-join, it is similar to a
selection: it selects the tuples of Employee whose SSN appears in the one-attribute
relation MgrsWithDeps.
EmpsWithoutDeps
z }| {
(2) πLName ( (πSSN (Employee) − πESSN (Dependent)) ∗Employee)
| {z } | {z }
AllEmps EmpsWithDeps
EmpsWithDeps0
z }| {
(3) πLName (Employee − Employee ∗SSN=ESSN (πESSN (Dependent)) )
| {z }
EmpsWithDeps
23
Union
List of project numbers for projects that involve an employee whose last name is
‘Smith’ as a worker or as a manager of the department that controls the project
Smiths(ESSN) ← πSSN (σLName=‘Smith’ (Employee))
SmithWorkerProjs ← πPNo (WorksOn ∗ Smiths)
Mgrs ← πLName,DNumber (Employe 1SSN=MgrSSN Department)
SmithMgrs ← σLName=‘Smith’ (Mgrs)
SmithManagedDepts(DNum) ← πDNumber (SmithMgrs)
SmithMgrProjs(PNo) ← πPNumber (SmithManagedDepts ∗ Project)
Result ← SmithWorkerProjs ∪ SmithMgrProjs
24
25
26
27
3 partition relation SSNPNos in classes of tuples with the same value of ESSN
3 relation SSNS is the result of the division SSNPNos ÷ SmithPNos, i.e., the set of
ESSN values that occur in a class with at least the PNo values of SmithPNos
3 Relation Result is the result of the query
28
29
30
Equivalences (Theory)
• Not all operations of the algebra are independent
• {σ, π, ∪, −, ×} is a complete set, i.e., it has all the expression power of the algebra
• ∩, 1, and ÷ can be derived from them
• R ∩ S = (R ∪ S) − ((R − S) ∪ (S − R))
• R 1condition S = σcondition (R × S)
• R ∗condition S = πattr (σcondition (R × S))
• R ÷ S = T can be reexpressed with difference and Cartesian product
31
32
33
(1) Break conjunctive selection σc1 ∧c2 (R) = σc1 (σc2 (R))
34
• “Lossy” = information is lost in the result of the join (e.g. employees who are not
department managers disappear in Employee 1SSN=MgrSSN Department)
• In R 1 S, only tuples satisfying the join condition contribute to the result and in-
formation may be lost (projections of R 1 S on R or S may be smaller than R or
S)
35
Result
FName MInit LName DName
John B Smith null
Franklin T Wong Research
Alicia J Zelaya null
Jennifer S Wallace Administration
Ramesh K Narayan null
Joyce A English null
Ahmad V Jabbar null
James E Borg Headquarters
36
37
Person
SSN
Name
Department
Student Professor
Advisor Rank
38
39
• Remember that the relational algebra (nor TRC, DRC, SQL - see later) is not com-
putational complete ( = does not have the expressive power of algorithmic languages)
• Computational completeness = Church-Turing thesis
3 thesis = any algorithm that you can think of can be formulated with any of the
popular programming languages
3 revised with Gödel theorem and undecidable problems
Result(s) ← BorgSSN(s)
Result(s1 ) ← Supervision(s1 , s2 ) and Result(s2 )
• Recursive equation
40
• A dictionary definition:
3 recursive: see “recursive”
BorgSSN Supervision
SSN SSN1 SSN2
888665555 123456789 333445555
333445555 888665555
999887777 987654321
987654321 888665555
666884444 333445555
453453453 333445555
987987987 987654321
888665555 null
41
42