Text material (3)
Text material (3)
Part 06
Relational Algebra
What is algebra? I think all of you know what is algebra. A
language based on operators and a domain of values and the
operators map values taken from the domain into other domain
values. Hence, an expression involving operators and arguments
produces values in the domain of operation. For example, plus,
minus and star and all arithmetic operators take values inputs as
domain values integers and produce domain integers as outputs.
The similar thing is true. Relational algebra: domain is the set of
all relations, and now the relation we are talking about is a table
which is defined in the relational model and the expression is
referred to as a query; so relational algebra. Understanding
algebra and calculus: calculus will be seen later on. Relational
calculus is the key to understanding SQL and so query proxy.
Unless you know how these operators work or relational algebra
works, only then you can understand how SQL works; then only
if you understand how SQL works, you can do query processing
well or write queries of process. So this is the formal relational
query languages. There are two mathematical query languages
which form the basis of "real" languages like SQL because SQL
is used for implementation. But the basis is two different query
languages: one is one called relational algebra and one called
relational calculus. So relational algebra is more operational or
procedural and it is very useful for representing the execution
plans; that is, how the queries are processed, while relational
calculus allows users to describe what they want rather than how
to do it and so it is called declarative. These are two flavors of
the formal models.
So relational algebra and relational calculus are formal
languages and informally you can call relational algebra
procedural language and relational calculus declarative language
but formally both are equivalent to each other and a language
that produces relation that can be derived from relational
calculus is called complete.
Let us look at relational algebra. This is the lecture going to talk
about relational algebra. Relational algebra operations work on
one or more relations. Please note relation algebra takes its
domain in a set of relations to define another relation without
changing the original relations. So that’s the job of relation
algebra. Both operands, that’s, what is input and the results are
relations. So output from one operation can become input to
another operation; it allows expressions to be nested, just as in
arithmetic. This property is called closure. So, allowed nested
relation algebra expression. So it is used to specify the retrieval
result; operand the relation treated as set and the operators from
set theory. The result of the retrieval is a new relation and each
relational operator takes in relations and produces a new
relation. Sequence of relational algebra operations forms a
relational algebra expression. It is used to define the way in
which relations can be operated to manipulate their data. And it
is the basis of SQL, as already I told you, for relational data
basis and illustrates basic operations specified by SQL. So SQL
is based on most part on the formal model of relational algebra.
And this algebra is composed of two types of operations: unary
operations which deal with single table and binary operations
which are involving multiple tables. It is a domain; is a set of
relations. So, the basic operations are unary; you have select and
project. In binary you have union, set difference, Cartesian
product and you also have some derived operators like set
intersection, division and join which will be seen in the next
lecture and it is called procedural because relational expression
specifies query by describing the algorithm for determining the
result of the expression. How it is done; what is explained; it’s
called procedural.
Selection Operation
Let us look at the five relation basic operators that we are going
to see in this particular lecture.
Selection selects a subset of rows from the relation; selects the
subset of rows (horizontally). Projection retains only wanted
columns from the relation; here subset of tuples; here only the
wanted attributes. Union is a binary operator; it takes tuples in r1
or in tuples in r2. Set-difference, it takes tuples in r1, but not the
tuples in r2. And cross-product allows us to combine two
relations. Most of these three are from the set theory. Since each
operation returns a relation, operations can be composed; that is
algebra is "closed". That is, you can take output of one, put it
as.. since it is relation that itself can form the input of the other
operation.
So this is the example; this is the selection; that is, it selects
some tuples. So, the selection operation allows you to select
some tuples. Projection allows you to select some columns or
attributes. Then union; this is R and this is S; this R Union S;
inter-section is actually a derived operator which will be seen in
the next lecture only. Then set-differences is… if this is R and
this is S and this gives you R - what is there in S. So, that is set
difference and this is Cartesian product. Suppose you have two
relations, R and S, a b 123, then all the combinations of this R
and S, this R X S and this is called Cartesian product. Selections
of projection are single table operations while union set
difference and Cartesian product are binary; that is, two relations
it takes while intersection is called a derived one… derived
operations which will be seen in the next lecture.
Let’s take each of the unary operations one by one. Let us take
first selection.
The SELECT operation for the selection is used to select or
(retrieve) a subset of the tuples from the relation that matches
the "selection condition". You have… this is the selection
operation; this is the condition and this is the relation on which
you want to operate. This selects all tuples of the relation
"Employee" which matches the criteria that the attribute salary
has value greater than 3000. So this is nothing but Employee.
Salary greater than 3000.
So this is the select operation; this general syntax of the select
operation is as follows:
This has this operator; then the condition and the relation, where
the condition is of the form: an expression, logical operator and
an expression. For example, expression can be attribute name,
operator constant or it can be attribute name, operator attribute
name. So logical operator that you can use AND /OR anytime of
operator and you can also use relational operators like less than,
greater than, equal to, not equal to and so on. This essentially
says that it selects from R all tuples that satisfy the selection-
condition given. The condition only refers to the attributes in R
not other attributes. An atomic-selection-condition is of the
form: relation-attribute operation constant, or relation-attribute
operation relation-attribute. So a selection-condition is obtained
by boolean combination of difference selection conditions by
means of connectives like AND, OR, and NOT and so on.
Some of the properties of SELECT operation is as follows:
The SELECT operation is unary. It operates on only one
relation; that we have already discussed. And the selection
conditions are applied to each tuple in the relation individually.
So hence the condition cannot span more than one tuple. The
degree of the relation resulting from selection c (R) is the same
as that of R. So how many… because attributes are not changed;
Remember, degree of relation is the number of attributes it has.
So, the number of attribute does not change; only the cardinality,
the number of tuples, has changes. This SELECT operation is
commutative. Hence, SELECT operation can be cascaded in any
fashion.
Let’s take an example. Here you have the selection condition.
Here we are saying the selection operator courses equal to CM
from this students relations. This is the students relation having
student ID, name and course and we are not bothered about all
that. We are just taking this student relation and wherever the
courses CM here are outputting, the complete. The
number/degree of relation that is three does not change. The
cardinalilty, the number changes. Only those tuples that satisfy
this condition have put out. So, this is the selection operation.
Find the cardholders from Modena. There is only one input
table. Both cardholder and the answer table have the same
schema (list of columns). Every row in the answer has the value
'Modena' in the b_addr column. This is the b_address equal to
Modena and cardholder table.
This is the selection operation and an another example. You
have…this is the same schema that doesn't change. That is, the
number of attribute does not change and here what we are
saying is only Modena we need. Only this one tuple is taken out.
All rows in answer have the value Modena and b_addr column.
Projection Operation
Let’s look at the next operator or operation which is the
projector. So the project operation selects specific columns from
the specific relation. This is the symbol for the projection
operation and here we are listing the attributes that you want to
come, in the output relation and this is the employee, the
original relation. For example, this returns a relation comprising
only of employee.name, employee.salary attributes. All other
attributes are removed from the relation. The operation has said
to have "projected" employee relation onto the required relation.
That’s called a projection. Here the degree does change.
So, project operation is as follows: This is the format and this is
the operator; this is the attributes’ list and this is the relation,
where attributes’ list is a comma-separated list of attributes
belonging to the specified relation. Of course, all the attributes’
list here should belong to the specific relation. The result of the
PROJECT is the same order as the specified list of attributes. It
will be the same order. The order of the attributes will not
change. Original whatever order it is… of course, what is
necessary only will be taken ; others will be moved out.
The projection operator is an operator which helps you to select
columns from the relation. Properties of the projection
relationships are the operation removes duplicates in the results.
Suppose if you select… this is an important thing… suppose if
you select two columns, you are not selecting primary key and
all, these two columns only if you select, maybe tuples that are
duplicated. Then it is a job of the projection operation to remove
those tuples. When attribute list of PROJECT includes the super
key, then the number of tuples returned is equal to the number of
tuples in the specified relation, only if it is super key, uniquely
determined. Projection is not commutative. If l1 is a substring of
l2, else the operation is an incorrect expression.
Let’s take an example of projection. The projection or operation
selects a list of columns from a table. This is the column list.
Example: student number and name. I have here students,
student number name and course and it has three attributes and
now I am only selecting student number and name; that comes
here; there is no question of duplicates; so this is it. This is the
projection operation.
Find the addresses of all cardholders. That’s the way you say;
there is only one input table, unary operation. The schema of the
answer table is the list of columns; it is not the same as original
is not the selection operation. If there are many cardholders
living at the same address, these are not duplicated in the answer
table. So you only get the unique addresses because it’s not
associated with the cardholders.
Let’s take other example; here you have schema of answer table
is the same as the list of columns in the query; it’s not equal to
the input query. Here you have borrower number, borrower
name, borrower address and b-status; you have been asked to
take out only the b_address. All these addresses are taken out
and duplicate… these two are duplicated and that is removed.
That is one again what I am trying to tell you. This is special
about the projection.
And normally in a database, interesting queries are combination
of projection and selection. This is the input and output of the
relational operator that is a single relation; they can be
composed in any fashion with the output of one operation being
the input of the other operation. What we are doing here is we
are first doing a selection; that is, we are taking all the
employees’ salaries greater than 3000 and it returns for those
tuples, only the name and salary field of all records of
employees, where salary is greater than 3000. So results of a
query can also be assigned to a name to form a relation by that
name. You can actually say this is the salary statement like this;
it also creates a new relation called salary statement; this is also
possible.
Selects rows that satisfy selection condition. Result is a relation
and schema of result is same as that of the input relation. Do we
need to do duplicate elimination in selection. Here this is not
necessary because all the tuples are unique. When you select
two tuples, this is also unique; so greater than rating 8 is what
we have selected here; so these two are not considered.
Now suppose I want to project sname rating when rating is
greater than 8, then these will be removed and you will get this
and this; this is the result of this operation.
So selection operation projections are usually combined; this is
the result of …. you will be selecting only these two; out of this,
you will be selecting only this; that’s the result. We have seen
unary operations, select and project and combination of select
and project. In fact, many of the accessing query, that is, we
want to access certain selected information of the database. This
is the combination of select and projector.
Set Operations
Now we look at set operations, normal set operations. We are
going to look at three union difference and Cartesian product.
Let’s look at the union difference and Cartesian product and
intersections. Set operators connect two relations by set
operations. This is the binary operation. However, for a set
operation R1 op R2 to be valid, R1 and R2 should be union-
compatible. I will explain what that is. That is, R1 and R2 must
have the same number of columns of attributes. The names of
the attributes must be the same in both the relations. And
attributes with the same name in both relations should have the
same domain. I will just explain this a little bit in detail. It
means that the number of attributes have to be the same,
attribute names also have to be same. Later on, I will come to
cases where that will not necessary and domain of the attribute
also has to be same. That means this particular attribute is
almost identical.
Now let’s look at an example of set operators; this is union
operator; so projection of name (Faculty) and union projection
of name (Student), projection of address (Faculty), intersection
address (Student) and minus. Here is a problem that the two
relations in this case, student and transcript, are not union-
compatible; even though the attribute numbers and domains
match, the names do not. What we do for this, we will see later.
You will do a rename. Rename all the attributes in the relation;
Given a relation with schema R (A1,.....An). The expression
R[B1,......Bn] you can use this to rename A1 as B1,..... An as Bn.
The rename operator does not change the domain of the
attributes; just names are changed.
So the rename operator will be used to make it union
compatible. For example, you can put this minus expression and
you can put temp as this; you can do the rename and then all
faculty did not teach courses, both in 'fall' and 'spring' of 2005.
You come to this later, as we told. Now let’s look at union. The
first operation we will be looking at; we talked about not being
union compatible; if it is not, how we use rename to change it.
Now this takes a set of rows in each table and combines them,
eliminating duplicates. So participating relations must be
compatible which we have already discussed, that is, must have
the same number of columns or attributes, the same column
names or attribute names, and domains same domains and same
data types. So this is the operation. This is R, this is S, R union S
is a1 b1, a1 b1, a2, b2, and a2, b2 here will not come again
because you are removing duplicates a3 b3; so this is R union S.
The same as set theory in normal mathematics. We treat two
tables as sets like that and perform a set union; table 1 and union
table 2 is table 1, union table 2; the duplicates have to be
removed. This operation is impossible unless both tables
involved have the same schemas. Otherwise how can you come
by? It is not possible. Because rows from both tables must fit
into a single answer table. Hence they must "look alike" or have
the same attribute, same domains and so on. Some columns
belong to both the tables also.
So part1suppliers is you are taking p1 suppliers and S number of
that projecting only the S number and part2suppliers you are
again taking p2 and then taking the S number. The answer is
part1suppliers union, part2suppliers; like this, you can rename
input and here both have only the same attributes, S number
because after projection that’s what you have. For example, this
is part1suppliers, this is part2suppliers, after the operation and
this is part1suppliers, union part2suppliers where the duplicates
have been removed.
This is another example; find the borrower numbers of all
borrowers who have either borrowed or reserved a book. So
reserver is equal to borrower_id (Reserves), borrowers is equal
to borrower_id (Borrows), and answer is equal to borrowers
union reservers. This is borrower_id, this is borrower_id; the
combination of that; some are not duplicated; that is shown here.
This is the answer to the union. Sometimes what it shows you
has to do projection orders sometimes for what you want to
solve. You have to do a selection and to make it union
compatible most probably you may also have to do a projection.
Let’s take next operation which is set difference. This takes set
of rows in the first relation but which is not those sets of rows.
Rows means tuples; should not be there in the second relation;
that’s what we are going to look at. Again, the participating
relations must be compatible. Here you have a1 b1, a2 b2, here
you have a2 b2, a3 b3; so the answer from R you must take
tuples that do not belong to S. The answer is only a1 b1.
Because a2 b2 is there in S; that has to be removed. So set
difference finds all students with 4.0 GPA or find students with
all As and find students who never got anything other than an A.
So this can be given by student id grade which is equal to A
(Transcript), and students get at least one non-A grade; that is
this, and students who got at least one grade; that is this. The
result will be Temp2 - Temp1. So this is how you get the answer
for this.
Here two tables are treated as sets and perform a set difference.
This is table 1 - table 2; this is the answer. This operation is
impossible unless both tables involved have the same schemas
and because it only makes sense to calculate the set difference if
the two sets have elements in common.
Elements in our case attribute; this is part1suppliers,
part2suppliers and this is the minus. You take all the minus what
is here. Next you see the next operation which is Cartesian
product. So this is going to be… this Cartesian product
operation is very important because later on we are going to
look at joint relation which is a derived relation and this is based
on Cartesian product and it is one of the joint relations; different
types are there; that is one of the most important components of
the database management system. Given two sets,
A={a1,a2,a3}, B={b1,b2}, the Cartesian product (A X B)
returns the set of tuples all these sets {(a1,b1), (a1,b2), (a2,b1),
(a2,b2), (a3,b1), (a3,b2)}; so these are the sets. The number of
tuples you get after you do the Cartesian product of this… if you
have three tuples in the first relation and two tuples in the next
relation, you will get 3 X 2, six tuples in the relation. Given two
relations with schema R1 equal to this and R2 equal to this and
let the Temp be R1 X R2. R1 X R2 returns a relation Temp with
schema which contains all this such that for all possible tuples t1
in R1, and t2 in R2, there exists a tuple t in Temp such that t is
identical to t1 with respect to A1,..... An and t is identical to t2
with respect to B1. Find all courses, student name 'John Doe' has
been completed. So here you have; this is the selection for this
student id; you select the name of the student 'John Doe' and this
relation student will also have a student id so that student id will
take here and you will multiply these two Cartesian products
this two and then take out the course.
This is how many rows will be in the result, as have already told
you, if there are multiplication of two cardinality degrees.
You can also use… we will just take an example; this is R1, this
is S1; please note, s_id comes here and s_id comes here which
means the result relation that will come again. So, the number of
attributes are the degree will be equal to the number of degrees
in the first plus the number of degrees in the second; so seven;
and the number of attributes will be the number of tuples in the
first relation into the number of tuples in the second relation.
You have six; this is the R1 X S1.
Cartesian product is given like table name X table name; this is
the format in which… and actually there is a lot of redundant
information here or noise that is called. But this is the useful
first step rest of which we will see later on.