0% found this document useful (0 votes)
202 views

Relational Algebra

Relational Algebra is a formal system used to define the operations of a relational database management system (RDBMS). It consists of operators like select, project, join, union, difference and intersection that allow retrieving and manipulating data stored in relations. The RDBMS translates SQL queries into a sequence of relational algebra operations to retrieve or modify the requested data in the database. Relational Algebra provides a precise mathematical definition of relational operations that underpin SQL and ensures consistent behavior of the RDBMS.

Uploaded by

Matoke Nahason
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
202 views

Relational Algebra

Relational Algebra is a formal system used to define the operations of a relational database management system (RDBMS). It consists of operators like select, project, join, union, difference and intersection that allow retrieving and manipulating data stored in relations. The RDBMS translates SQL queries into a sequence of relational algebra operations to retrieve or modify the requested data in the database. Relational Algebra provides a precise mathematical definition of relational operations that underpin SQL and ensures consistent behavior of the RDBMS.

Uploaded by

Matoke Nahason
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Relational Algebra

In order to implement a DBMS, there must exist a set of rules which state how the database system will behave.
For instance, somewhere in the DBMS must be a set of statements which indicate than when someone inserts data
into a row of a relation, it has the effect which the user expects. One way to specify this is to use words to write
an `essay' as to how the DBMS will operate, but words tend to be imprecise and open to interpretation. Instead,
relational databases are more usually defined using Relational Algebra.

Relational Algebra is:

• the formal description of how a relational database operates


• an interface to the data stored in the database itself
• the mathematics which underpin SQL operations

Operators in relational algebra are not necessarily the same as SQL operators, even if they have the same name.
For example, the SELECT statement exists in SQL, and also exists in relational algebra. These two uses of
SELECT are not the same. The DBMS must take whatever SQL statements the user types in and translate them
into relational algebra operations before applying them to the database.

Terminology
• Relation - a set of tuples.
• Tuple - a collection of attributes which describe some real world entity.
• Attribute - a real world role played by a named domain.
• Domain - a set of atomic values.
• Set - a mathematical definition for a collection of objects which contains no duplicates.

Operators - Write
• INSERT - provides a list of attribute values for a new tuple in a relation. This operator is the same as SQL.
• DELETE - provides a condition on the attributes of a relation to determine which tuple(s) to remove from
the relation. This operator is the same as SQL.
• MODIFY - changes the values of one or more attributes in one or more tuples of a relation, as identified
by a condition operating on the attributes of the relation. This is equivalent to SQL UPDATE.

Operators - Retrieval
There are two groups of operations:

• Mathematical set theory based relations:


UNION, INTERSECTION, DIFFERENCE, and CARTESIAN PRODUCT.
• Special database operations:
SELECT (not the same as SQL SELECT), PROJECT, and JOIN.

Relational SELECT

Nahason Matoke 1 MMUST


SELECT is used to obtain a subset of the tuples of a relation that satisfy a select condition.

For example, find all employees born after 1st Jan 1950:

SELECTdob '01/JAN/1950'(employee)

Relational PROJECT
The PROJECT operation is used to select a subset of the attributes of a relation by specifying the names of the
required attributes.

For example, to get a list of all employees surnames and employee numbers:

PROJECTsurname,empno(employee)

SELECT and PROJECT


SELECT and PROJECT can be combined together. For example, to get a list of employee numbers for employees
in department number 1:

Figure : Mapping select and project

Set Operations - semantics


Consider two relations R and S.

• UNION of R and S
the union of two relations is a relation that includes all the tuples that are either in R or in S or in both R
and S. Duplicate tuples are eliminated.
• INTERSECTION of R and S
the intersection of R and S is a relation that includes all tuples that are both in R and S.
• DIFFERENCE of R and S
the difference of R and S is the relation that contains all the tuples that are in R but that are not in S.

SET Operations - requirements


For set operations to function correctly the relations R and S must be union compatible. Two relations are union
compatible if

• they have the same number of attributes


• the domain of each attribute in column order is the same in both R and S.

Nahason Matoke 2 MMUST


UNION Example

Figure : UNION

INTERSECTION Example

Figure : Intersection

DIFFERENCE Example

Nahason Matoke 3 MMUST


Figure : DIFFERENCE

CARTESIAN PRODUCT
The Cartesian Product is also an operator which works on two sets. It is sometimes called the CROSS PRODUCT
or CROSS JOIN.

It combines the tuples of one relation with all the tuples of the other relation.

CARTESIAN PRODUCT example

Figure : CARTESIAN PRODUCT

JOIN Operator
JOIN is used to combine related tuples from two relations:

• In its simplest form the JOIN operator is just the cross product of the two relations.
• As the join becomes more complex, tuples are removed within the cross product to make the result of the
join more meaningful.
• JOIN allows you to evaluate a join condition between the attributes of the relations on which the join is
undertaken.

The notation used is

Nahason Matoke 4 MMUST


R JOINjoin condition S

JOIN Example

Figure : JOIN

Natural Join
Invariably the JOIN involves an equality test, and thus is often described as an equi-join. Such joins result in two
attributes in the resulting relation having exactly the same value. A `natural join' will remove the duplicate
attribute(s).

• In most systems a natural join will require that the attributes have the same name to identify the
attribute(s) to be used in the join. This may require a renaming mechanism.
• If you do use natural joins make sure that the relations do not have two attributes with the same name by
accident.

OUTER JOINs
Notice that much of the data is lost when applying a join to two relations. In some cases this lost data might hold
useful information. An outer join retains the information that would have been lost from the tables, replacing
missing data with nulls.
There are three forms of the outer join, depending on which data is to be kept.

• LEFT OUTER JOIN - keep data from the left-hand table


• RIGHT OUTER JOIN - keep data from the right-hand table
• FULL OUTER JOIN - keep data from both tables

OUTER JOIN example 1

Nahason Matoke 5 MMUST


Figure : OUTER JOIN (left/right)

OUTER JOIN example 2

Figure : OUTER JOIN (full)

Nahason Matoke 6 MMUST


Relational Algebra - Example
Consider the following SQL to find which departments have had employees on the `Further Accounting'
course.

SELECT DISTINCT dname


FROM department, course, empcourse, employee
WHERE cname = `Further Accounting'
AND course.courseno = empcourse.courseno
AND empcourse.empno = employee.empno
AND employee.depno = department.depno;

The equivalent relational algebra is

PROJECTdname (department JOINdepno = depno (


PROJECTdepno (employee JOINempno = empno (
PROJECTempno (empcourse JOINcourseno = courseno (
PROJECTcourseno (SELECTcname = `Further Accounting' course)
))
))
))

Symbolic Notation
From the example, one can see that for complicated cases a large amount of the answer is formed
from operator names, such as PROJECT and JOIN. It is therefore commonplace to use symbolic
notation to represent the operators.

• SELECT -> (sigma)


• PROJECT -> (pi)
• PRODUCT -> ×(times)
• JOIN -> |×| (bow-tie)
• UNION -> (cup)
• INTERSECTION -> (cap)
• DIFFERENCE -> - (minus)
• RENAME -> (rho)

Usage
The symbolic operators are used as with the verbal ones. So, to find all employees in department 1:

SELECTdepno = 1(employee)
becomes depno = 1(employee)

Conditions can be combined together using ^ (AND) and v (OR). For example, all employees in
department 1 called `Smith':

SELECTdepno = 1 ^ surname = `Smith'(employee)


becomes depno = 1 ^ surname = `Smith'(employee)

Nahason Matoke 7 MMUST


The use of the symbolic notation can lend itself to brevity. Even better, when the JOIN is a natural join,
the JOIN condition may be omitted from |x|. The earlier example resulted in:

PROJECTdname (department JOINdepno = depno (


PROJECTdepno (employee JOINempno = empno (
PROJECTempno (empcourse JOINcourseno = courseno (
PROJECTcourseno (SELECTcname = `Further Accounting' course)))))))

becomes

dname (department |×| (


depno (employee |×| (
empno (empcourse |×| (
courseno ( cname = `Further Accounting' course) ))))))

Rename Operator
The rename operator returns an existing relation under a new name. A(B) is the relation B with its
name changed to A. For example, find the employees in the same Department as employee 3.

emp2.surname,emp2.forenames (
employee.empno = 3 ^ employee.depno = emp2.depno (
employee × ( emp2employee)
)
)

Derivable Operators
• Fundamental operators: , , ×, , -,
• Derivable operators: |×|,

A B A - (A - B)

Equivalence
A|×|cB a1,a2,...aN( c(A × B))

• where c is the join condition (eg A.a1 = B.a1),


• and a1,a2,...aN are all the attributes of A and B without repetition.

Nahason Matoke 8 MMUST


c is called the join-condition, and is usually the comparison of primary and foreign key. Where there are
N tables, there are usually N-1 join-conditions. In the case of a natural join, the conditions can be
missed out, but otherwise missing out conditions results in a cartesian product (a common mistake to
make).

Equivalences
The same relational algebraic expression can be written in many different ways. The order in which
tuples appear in relations is never significant.

• A ×B B × A
• A B B A
• A B B A
• (A - B) is not the same as (B - A)
• c1 ( c2(A)) c2 ( c1(A)) c1 ^ c2(A)
• a1(A) a1( a1,etc (A))
where etc represents any other attributes of A.
• many other equivalences exist.

While equivalent expressions always give the same result, some may be much easier to evaluate that
others.

When any query is submitted to the DBMS, its query optimiser tries to find the most efficient equivalent
expression before evaluating it.

Comparing RA and SQL


Relational algebra:

• is closed (the result of every expression is a relation)


• has a rigorous foundation
• has simple semantics
• Is used for reasoning, query optimization, etc.

SQL:

• is a superset of relational algebra


• Has a convenient formatting features, etc.
• provides aggregate functions
• has complicated semantics
• Is an end-user language.

Note: Any relational language as powerful as relational algebra is called relationally complete. A
relationally complete language can perform all basic, meaningful operations on relations. Since SQL is
a superset of relational algebra, it is also relationally complete.

Nahason Matoke 9 MMUST

You might also like