Pvp19 Dbms Unit-2 Material
Pvp19 Dbms Unit-2 Material
UNIT-2
SYLLABUS:
Relational Model: The Relational Model Concepts, Relational Model Constraints and
Relational Database Schemas.
SQL: Data Definition, Constraints, and Basic Queries and Updates, SQL Advanced Queries,
Assertions, Triggers, and Views
Formal Relational Languages: Relational Algebra: Unary Relational Operations: Select and
Project, Relational Algebra Operations from Set Theory, Binary Relational Operations: Join
and Division, Examples of Queries in Relational Algebra.
1. Relational Model:
Introduction
The relational data model was first introduced by Ted Codd of IBM Research in 1970 in a
classic paper (Codd 1970), and it attracted immediate attention due to its simplicity and
mathematical foundation. The model uses the concept of a mathematical relation which looks
somewhat like a table of values as its basic building block, and has its theoretical basis in set
theory and first-order predicate logic.
The first commercial implementations of the relational model became available in the early
1980s, such as the SQL/DS system on the MVS operating system by IBM and the Oracle
DBMS. Since then, the model has been implemented in a large number of commercial systems.
Current popular relational DBMSs (RDBMSs) include DB2 and Informix Dynamic Server
(from IBM), Oracle and Rdb (from Oracle), Sybase DBMS (from Sybase) and SQLServer and
Access (from Microsoft). In addition, several open source systems, such as MySQL and
PostgreSQL, are available.
A domain D is a set of atomic values. By atomic we mean that each value in the domain is
invisible as far as the formal relational model is concerned. A common method of specifying a
domain is to specify a data type from which the data values forming the domain are drawn. It
is also useful to specify the name for the domain, to help in interpreting its values.
Some examples of domains follow:
Usa_phone_numbers: The set of ten-difgit phone numbers valid in United States.
Social_security_numbers: The set of valid nine-digit social security numbers.
Names: The set of character strings that represents the names of persons.
Employee_ages: Possible ages of employees in a company; each must be an integer
value between 15 and 80.
The preceding are called logical definitions of domains. A data type or format is also specified
for each domain. For example, the data type for the domain Usa_phone_numbers can be
declared as a character string of the form (ddd)ddddddd, where each d is a numeric (decimal)
digit and the first three digits form a valid telephone area code. The data type for
Employee_ages is an integer number between 15 and 80.
Attribute:
An attribute Ai is the name of a role played by some domain D in the relation schema R. D is
called the domain of Ai and is denoted by dom(Ai).
Tuple:
Mapping from attributes to values drawn from the respective domains of those attributes.
Tuples are intended to describe some entity (or relationship between entities) in the miniworld
Example: a tuple for a PERSON entity might be
{ Name -- ”smith”, Gender--> Male, Age --> 25 }
Relation:
A named set of tuples all of the same form i.e., having the same set of attributes.
Relation schema:
A relation schema R, denoted by R(A1, A2, ...,An), is made up of a relation name R and a list of
attributes A1, A2, ...,An. Each attribute Ai is the name of a role played by some domain D in the
relation schema R. D is called the domain of Ai and is denoted by dom(Ai). A relation schema
is used to describe a relation; R is called the name of this relation.
The degree (or arity) of a relation is the number of attributes n of its relation schema. A relation
of degree seven, which stores information about university students,would contain seven
attributes describing each student as follows:
STUDENT(Name, Ssn, Home_phone, Address, Office_phone, Age, Gpa)
Using the data type of each attribute, the definition is sometimes written as:
STUDENT(Name: string, Ssn: string, Home_phone: string, Address: string,
Office_phone: string, Age: integer, Gpa: real)
Domains for some of the attributes of the STUDENT relation:
dom(Name) = Names;
dom(Ssn) = Social_security_numbers;
dom(HomePhone) =USA_phone_numbers,
dom(Office_phone) = USA_phone_numbers,
Relation (or relation state):
A relation (or relation state) r of the relation schema by R(A1, A2, ...,An), also denoted by r(R),
is a set of n-tuples r = {t1, t2, ..., tm}. Each n-tuple t is an ordered list of n values t =<v1, v2,
..., vn where each value vi ≤ i ≤ n is an element of dom(Ai) or is a special NULL value. The ith
value in tuple t, which corresponds to the attribute Ai, is referred to as t[Ai ] or t. Ai.
The terms relation intension for the schema R and relation extension for a relation state r(R)
are also commonly used.
strings, and variable-length strings are also available, as are date, time, timestamp, and money,
or other special data types.
1.2.2 Key Constraints and Constraints on NULL Values
A key is a set of one or more attributes that can uniquely identify each row in a table. A key
not only identifies the rows of a table but also relates two or more tables.
Different Types of Keys:
1) Super Key
2) Candidate Key
3) Primary Key
4) Foreign Key
5) Secondary Key/Alternate Key
6) Unique Key
7) Composite Key
8) Surrogate Key
9) Partial Key
1) Super Key: Super Key is an attribute (or a set of attributes) that uniquely identify a
tuple i.e. an entity in entity set.
It is a superset of Candidate Key, since Candidate Keys are selected from super key.
Example:
2) Candidate Key: Each table has only a single primary key. Each relation may have one
or more candidate key. One of these candidate key is called Primary Key. Each
candidate key qualifies for Primary Key. Therefore candidates for Primary Key is called
Candidate Key.
Candidate key can be a single column or combination of more than one column. A
minimal super key is called a candidate key.
Example:
Above, Student_ID, Student_Enroll and Student_Email are the candidate keys. They
are considered candidate keys since they can uniquely identify the student record.
3) Primary Key: It is an attribute or set of attributes that uniquely identify an entity (row)
in the entity set (table). The main difference between the primary key and the candidate
key in that is primary key does not contain NULL values.
Primary Key must be UNIQUE and NOT NULL.
Example:
4) Foreign Key: A foreign key is a set of attributes in a table that refers to the primary
key of another table. The foreign key links these two tables.
Example:
5) Secondary Key/Alternalte Key: A primary key is the field in a database that is the
primary key used to uniquely identify a record in a database. A secondary key is an
additional key, or alternate key, which can be use in addition to the primary key to
locate specific data.
Secondary Key is the key that has not been selected to be the primary key. However, it
is considered a candidate key for the primary key.
Therefore, a candidate key not selected as a primary key is called secondary key.
Candidate key is an attribute or set of attributes that you can consider as a Primary key.
Note: Secondary Key is not a Foreign Key.
Example 1:
Above, Student_ID, Student_Enroll and Student_Email are the candidate keys. They
are considered candidate keys since they can uniquely identify the student record. Select
any one of the candidate key as the primary key. Rest of the two keys would be
Secondary Key.
If you selected Student_ID as primary key, therefore Student_Enroll and
Student_Email will be Secondary Key (candidates of primary key).
Example 2:
Example 2:
Above, our composite keys are StudentID and StudentEnrollNo. The table has two
attributes as primary key.
Therefore, the Primary Key consisting of two or more attribute is called Composite
Key.
8) Surrogate Key: A Surrogate Key’s only purpose is to be a unique identifier in a
database, for example, incremental key.
Surrogate Key has no actual meaning and is used to represent existence. It has an
existence only for data analysis.
Example: The surrogate key is
Key in the <ProductPrice> table.
Here, using partial key Emp_no, we can not identify a tuple uniquely but we can select
a bunch of tuples from the table
All tuples in a relation must also be distinct. This means that no two tuples can have the same
combination of values for all their attributes. There are other subsets of attributes of a relation
schema R with the property that no two tuples in any relation state r of R should have the same
combination of values for these attributes.
Suppose that we denote one such subset of attributes by SK; then for any two distinct tuples t1
and t2 in a relation state r of R, we have the constraint that: t1 t2[SK] . such set of attributes SK is
called a superkey of the relation schema R
Superkey
A superkey SK specifies a uniqueness constraint that no two distinct tuples in any state r of R
can have the same value for SK. Every relation has at least one default superkey the set of all
its attributes.
Key
A key K of a relation schema R is a superkey of R with the additional property that removing
any attribute A from K leaves a set of attributes K that is not a superkey of R anymore. Hence,
a key satisfies two properties:
1. Two distinct tuples in any state of the relation cannot have identical values for (all) the
attributes in the key. This first property also applies to a superkey.
2. It is a minimal superkey that is, a superkey from which we cannot remove any attributes
and still have the uniqueness constraint in condition will hold.This property is not
required by a superkey.
Example: Consider the STUDENT relation
The attribute set {Ssn} is a key of STUDENT because no two student tuples can have
the same value for Ssn.
Any set of attributes that includes Ssn for example, {Ssn, Name, Age} is a superkey.
The superkey {Ssn, Name, Age} is not a key of STUDENT because removing Name
or Age or both from the set still leaves us with a superkey.
In general, any superkey formed from a single attribute is also a key. A key with multiple
attributes must require all its attributes together to have the uniqueness property.
Candidate Key
A relation schema may have more than one key. In this case, each of the keys is called a
candidate key.
Example: The CAR relation has two candidate keys: License_number and
Engine_serial_number
Primary Key
It is common to designate one of the candidate keys as the primary key of the relation. This is
the candidate key whose values are used to identify tuples in the relation. We use the
convention that the attributes that form the primary key of a relation schema are underlined.
Other candidate keys are designated as unique keys and are not underlined.
Another constraint on attributes specifies whether NULL values are or are not permitted. For
example, if every STUDENT tuple must have a valid, non-NULL value for the Name attribute,
then Name of STUDENT is constrained to be NOT NULL.
1.2.3 Relational Databases and Relational Database Schemas
Relational database schema S is a set of relation schemas S = {R1, R2, ..., Rm} and a s et of
integrity constraints IC.
Example of relational database schema:
COMPANY = {EMPLOYEE, DEPARTMENT, DEPT_LOCATIONS, PROJECT,
WORKS_ON, DEPENDENT}
Figure: Schema diagram for the COMPANY relational database schema. The underlined
attributes represent primary keys
A Relational database state is a set of relation states DB = {r1, r2, ..., rm}.Each ri is a state of R
and such that the ri relation states satisfy integrity constraints specified in IC.
Figure: One possible database state for the COMPANY relational database schema.
A database state that does not obey all the integrity constraints is called Invalid state and a state
that satisfies all the constraints in the defined set of integrity constraints IC is called a Valid
state.
Attributes that represent the same real-world concept may or may not have identical names in
different relations.
Example: The Dnumber attribute in both DEPARTMENT and DEPT_LOCATIONS stands
for the same real-world concept the number given to a department.
1) Domain Constraints:
Domain constraints can be defined as the definition of a valid set of values for an
attribute.
The data type of domain includes string, character, integer, time, date, currency, etc.
The value of the attribute must be available in the corresponding domain.
Example:
Example:
2. SQL
Introduction:
SQL was called SEQUEL (Structured English Query Language) and was designed and
implemented at IBM Research.The SQL language may be considered one of the major reasons
for the commercial success of relational databases. SQL is a comprehensive database language.
It has statements for data definitions, queries, and updates. Hence, it is both a DDL and a DML.
In addition, it has facilities for defining views on the database, for specifying security and
authorization, for defining integrity constraints, and for specifying transaction controls. It also
has rules for embedding SQL statements into a general-purpose programming language such
as Java, COBOL, or C/C++.
information on all the schemas in the catalog and all the element descriptors in these schemas.
Integrity constraints such as referential integrity can be defined between relations only if they
exist in schemas within the same catalog. Schemas within the same catalog can also share
certain elements, such as domain definitions.
2.1.2 The CREATE TABLE Command in SQL
The CREATE TABLE command is used to specify a new relation by giving it a name and
specifying its attributes and initial constraints. The attributes are specified first, and each
attribute is given a name, a data type to specify its domain of values, and any attribute
constraints, such as NOT NULL. The key, entity integrity, and referential integrity constraints
can be specified within the CREATE TABLE statement after the attributes are declared, or
they can be added later using the ALTER TABLE command.
Typically, the SQL schema in which the relations are declared is implicitly specified in the
environment in which the CREATE TABLE statements are executed. Alternatively, we can
explicitly attach the schema name to the relation name, separated by a period. For example, by
writing
CREATE TABLE COMPANY.EMPLOYEE …
rather than
CREATE TABLE EMPLOYEE …
The relations declared through CREATE TABLE statements are called base tables.
Examples:
1) Timestamp data type (TIMESTAMP) includes the DATE and TIME fields, plus a
minimum of six positions for decimal fractions of seconds and an optional WITH TIME
ZONE qualifier.
2) INTERVAL data type. This specifies an interval a relative value that can be used to
increment or decrement an absolute value of a date, time, or timestamp. Intervals are
qualified to be either YEAR/MONTH intervals or DAY/TIME intervals.
It is possible to specify the data type of each attribute directly or a domain can be declared, and
the domain name used with the attribute Specification. This makes it easier to change the data
type for a domain that is used by numerous attributes in a schema, and improves schema
readability. For example, we can create a domain SSN_TYPE by the following statement:
CREATE DOMAIN SSN_TYPE AS CHAR(9);
We can use SSN_TYPE in place of CHAR(9) for the attributes Ssn and Super_ssn of
EMPLOYEE, Mgr_ssn of DEPARTMENT, Essn of WORKS_ON, and Essn of DEPENDENT
2.2 Constraints
Basic constraints that can be specified in SQL as part of table creation:
Key and referential integrity constraints
Restrictions on attribute domains and NULLs
Constraints on individual tuples within a relation
2.2.1 Specifying Attribute Constraints and Attribute Defaults
Because SQL allows NULLs as attribute values, a constraint NOT NULL may be specified if
NULL is not permitted for a particular attribute. This is always implicitly specified for the
attributes that are part of the primary key of each relation, but it can be specified for any other
attributes whose values are required not to be NULL.
It is also possible to define a default value for an attribute by appending the clause DEFAULT
<value> to an attribute definition. The default value is included in any new tuple if an explicit
value is not provided for that attribute.
CREATE TABLE DEPARTMENT
(...,
Mgr_ssn CHAR(9) NOT NULL DEFAULT ‘888665555’,
-----------
-----------
)
Another type of constraint can restrict attribute or domain values using the CHECK clause
following an attribute or domain definition . For example, suppose that department numbers
are restricted to integer numbers between 1 and 20; then, we can change the attribute
declaration of Dnumber in the DEPARTMENT table to the following:
Dnumber INT NOT NULL CHECK (Dnumber > 0 AND Dnumber < 21);
The CHECK clause can also be used in conjunction with the CREATE DOMAIN
statement.For example, we can write the following statement:
CREATE DOMAIN D_NUM AS INTEGER
CHECK (D_NUM > 0 AND D_NUM < 21);
We can then use the created domain D_NUM as the attribute type for all attributes that refer to
department number such as Dnumber of DEPARTMENT, Dnum of PROJECT, Dno of
EMPLOYEE, and so on.
2.2.2 Specifying Key and Referential Integrity Constraints
The PRIMARY KEY clause specifies one or more attributes that make up the primary key of
a relation. If a primary key has a single attribute, the clause can follow the attribute directly.
For example, the primary key of DEPARTMENT can be specified as:
Dnumber INT PRIMARY KEY;
The UNIQUE clause can also be specified directly for a secondary key if the secondary key is
a single attribute, as in the following example:
Dname VARCHAR(15) UNIQUE;
Referential integrity is specified via the FOREIGN KEY clause
FOREIGN KEY (Super_ssn) REFERENCES EMPLOYEE(ssn),
FOREIGN KEY (Dno) REFERENCES DEPARTMENT(Dnumber)
A referential integrity constraint can be violated when tuples are inserted or deleted, or when a
foreign key or primary key attribute value is modified. The default action that SQL takes for
an integrity violation is to reject the update operation that will cause a violation, which is known
as the RESTRICT option.
The schema designer can specify an alternative action to be taken by attaching a referential
triggered action clause to any foreign key constraint. The options include SET NULL,
CASCADE, and SET DEFAULT. An option must be qualified with either ON DELETE or
ON UPDATE.
FOREIGN KEY(Dno) REFERENCES DEPARTMENT(Dnumber) ON DELETE
SET DEFAULT ON UPDATE CASCADE
FOREIGN KEY (Super_ssn) REFERENCES EMPLOYEE(Ssn) ON DELETE SET
NULL ON UPDATE CASCADE
FOREIGN KEY (Dnumber) REFERENCES DEPARTMENT(Dnumber) ON
DELETE CASCADE ON UPDATE CASCADE
In general, the action taken by the DBMS for SET NULL or SET DEFAULT is the same for
both ON DELETE and ON UPDATE: The value of the affected referencing attributes is
changed to NULL for SET NULL and to the specified default value of the referencing attribute
for SET DEFAULT.
The action for CASCADE ON DELETE is to delete all the referencing tuples whereas the
action for CASCADE ON UPDATE is to change the value of the referencing foreign key
attribute(s) to the updated (new) primary key value for all the referencing tuples. It is the
responsibility of the database designer to choose the appropriate action and to specify it in the
database schema. As a general rule, the CASCADE option is suitable for “relationship” relation
such as WORKS ON: for relations that represent multivalued attributes, such as
DEPT_LOCATIONS; and for relations that represent weak entity types, such as
DEPENDENT.
2.2.3 Giving Names to Constraints
The names of all constraints within a particular schema must be unique. A constraint name is
used to identify a particular constraint in case the constraint must be dropped later and replaced
with another constraint.
2.2.4 Specifying Constraints on Tuples Using CHECK
In addition to key and referential integrity constraints, which are specified by special keywords,
other table constraints can be specified through additional CHECK clauses at the end of a
CREATE TABLE statement. These can be called tuple-based constraints because they apply
to each tuple individually and are checked whenever a tuple is inserted or modified
For example, suppose that the DEPARTMENT table had an additional attribute
Dept_create_date, which stores the date when the department was created. Then we could add
the following CHECK clause at the end of the CREATE TABLE statement for the
DEPARTMENT table to make sure that the managers start date is later than the department
creation date.
CHECK (Dept_create_date <= Mgr_start_date);
FROM EMPLOYEE
WHERE Fname=’John’ AND Minit=’B’ AND Lname=’Smith’;
The SELECT clause of SQL specifies the attributes whose values are to be retrieved,
which are called the projection attributes. The WHERE clause specifies the Boolean
condition that must be true for any retrieved tuple, which is known as the selection
condition.
2) Retrieve the name and address of all employees who work for the ‘Research’
department.
SELECT Fname, Lname, Address
FROM EMPLOYEE, DEPARTMENT
WHERE Dname=’Reaearch’ AND Dnumber=Dno;
In the WHERE clause, the condition Dname=’Reaearch’ is a selection condition that
chooses the particular tuple of interest in the DEPARTMENT table, because Dname is
an attribute of DEPARTMENT. The condition Dnumber = Dno is called a join
condition, because it combines two tuples: one from DEPARTMENT and one from
EMPLOYEE, whenever the value of Dnumber in DEPARTMENT is equal to thevalue
of Dno in EMPLOYEE.A query that involves only selection and join conditions plus
projection attributes is known as a select-project-join query.
3) For every project located in ‘Stafford’, list the project number, the controlling
department number, and the department manager’s last name, address and birth date.
SELECT Pnumber, Dnum, Lname, Address, Bdate
FROM PROJECT, DEPARTMENT, EMPLOYEE
WHERE Dname=Dnumber AND Mgr_Ssn=Ssn AND Plocation=’Stafford’;
The join condition Dnum = Dnumber relates a project tuple to its controlling department
tuple, whereas the join condition Mgr_ssn = Ssn relates the controlling department tuple
to the employee tuple who manages that department. Each tuple in the result will be a
combination of one project, one department, and one employee that satisfies the join
conditions. The projection attributes are used to choose the attributes to be displayed
from each combined tuple.
2.3.2 Ambiguous Attribute Names, Aliasing, Renaming and Tuple Variables
In SQL, the same name can be used for two or more attributes as long as the attributes are in
different relations. If this is the case, and a multitable query refers to two or more attributes
with the same name, we must qualify the attribute name with the relation name to prevent
ambiguity. This is done by prefixing the relation name to the attribute name and separating the
two by a period.
Example: Retrieve the name and address of all employees who work for the ‘Research’
department
SELECT Fname, EMPLOYEE.Name, Address
FROM EMPLOYEE, DEPARTMENT
WHERE DEPARTMENT.Name=’Research’ AND
DEPARTMENT.Dnumber=EMPLOYEE.Dnumber;
V.RASHMI (Assistant Professor) PVPSIT IT 25
DATABASE MANAGEMENT SYSTEMS PVP19 UNIT-2
The ambiguity of attribute names also arises in the case of queries that refer to the same relation
twice. For example consider the query: For each employee retrieve them employee’s first and
last name and the first and last name of his or her immediate supervisor.
SELECT E.Fname, E.Lname, S.Fname, S.Lname
FROM EMPLOYEE AS E, EMPLOYEE AS S
WHERE E.Super_ssn=S.Ssn;
In this case, we are required to declare alternative relation names E and S, called aliases or
tuple variables, for the EMPLOYEE relation. An alias can follow the keyword AS, or it can
directly follow the relation name for example, by writing EMPLOYEE E, EMPLOYEE S. It is
also possible to rename the relation attributes within the query in SQL by giving them aliases.
For example, if we write
EMPLOYEE AS E(Fn, Mi, Ln, Ssn, Bd, Addr, Sex, Sal, Sssn, Dno)
in the FROM clause, Fn becomes an alias for Fname, Mi for Minit, Ln for Lname, and so on.
2.3.3 Unspecified WHERE Clause and Use of the Asterisk
A missing WHERE clause indicates no condition on tuple selection; hence, all tuples of the
relation specified in the FROM clause qualify and are selected for the query result.If more than
one relation is specified in the FROM clause and there is no WHERE clause, then the CROSS
PRODUCT all possible tuple combinations of these relations is selected.
Example: Select all EMPLOYEE Ssns and all combinations of EMPLOYEE Ssn and
DEPARTMENT Dname in the database.
SELECT Ssn
FROM EMPLOYEE;
SELECT Ssn, Dname
FROM EMPLOYEE, DEPARTMENT;
To retrieve all the attribute values of the selected tuples, we do not have to list the attribute
names explicitly in SQL; we just specify an asterisk (*), which stands for all the attributes. For
example, the following query retrieves all the attribute values of any EMPLOYEE who works
in DEPARTMENT number 5
SELECT * FROM EMPLOYEE WHERE Dno=5;
SELECT * FROM EMPLOYEE, DEPARTMENT WHERE Dname=’Research’
AND Dno=Dnumber;
SELECT * FROM EMPLOYEE, DEPARTMENT;
2.3.4 Tables as Sets in SQL
SQL usually treats a table not as a set but rather as a multiset; duplicate tuples can appear more
than once in a table, and in the result of a query. SQL does not automatically eliminate duplicate
tuples in the results of queries, for the following reasons:
SQL has directly incorporated some of the set operations from mathematical set theory, which
are also part of relational algebra. There are
set union (UNION)
set difference (EXCEPT) and
set intersection (INTERSECT)
The relations resulting from these set operations are sets of tuples; that is, duplicate tuples are
eliminated from the result. These set operations apply only to union-compatible relations, so
we must make sure that the two relations on which we apply the operation have the same
attributes and that the attributes appear in the same order in both relations.
Example: Make a list of all project numbers for projects that involve an employee whose last
name is ‘Smith’ either as a worker or as a manager of the department that controls the project
(SELECT DISTINCT Pnumber FROM PROJECT, DEPARTMENT,
EMPLOYEE WHERE Dnum=Dnumber AND Mgr_ssn=Ssn AND
UNION
(SELECT DISTINCT Pnumber FROM PROJECT, WORKS_ON, EMPLOYEE WHERE
Pnumber=Pno AND Essn=Ssn AND Lname=’Smith’);
A variation of the INSERT command inserts multiple tuples into a relation in conjunction with
creating the relation and loading it with the result of a query. For example, to create a temporary
table that has the employee last name, project name, and hours per week for each employee
working on a project, we can write the statements in U3A and U3B:
U3A: CREATE TABLE WORKS_ON_INFO (
Emp_name VARCHAR(15),
Proj_name VARCHAR(15),
Hours_per_week DECIMAL(3,1) );
U3B: INSERT INTO WORKS_ON_INFO
(Emp_name, Proj_name,Hours_per_week )
SELECT E.Lname, P.Pname, W.Hours
FROM PROJECT P, WORKS_ON W, EMPLOYEE E
WHERE P.Pnumber=W.Pno AND W.Essn=E.Ssn;
A table WORKS_ON_INFO is created by U3A and is loaded with the joined information
retrieved from the database by the query in U3B. We can now query WORKS_ON_INFO as
we would any other relation;
2.4.2 The DELETE Command
The DELETE command removes tuples from a relation. It includes a WHERE clause, similar
to that used in an SQL query, to select the tuples to be deleted. Tuples are explicitly deleted
from only one table at a time. The deletion may propagate to tuples in other relations if
referential triggered actions are specified in the referential integrity constraints of the DDL.
Example:
DELETE FROM EMPLOYEE WHERE Lname=’Brown’;
Depending on the number of tuples selected by the condition in the WHERE clause, zero, one,
or several tuples can be deleted by a single DELETE command. A missing WHERE clause
specifies that all tuples in the relation are to be deleted; however, the table remains in the
database as an empty table.
2.4.3 The UPDATE Command
The UPDATE command is used to modify attribute values of one or more selected Tuples. An
additional SET clause in the UPDATE command specifies the attributes to be modified and
their new values. For example, to change the location and controlling department number of
project number 10 to ‘Bellaire’ and 5, respectively, we use
UPDATE PROJECT SET Plocation=’Bellaire’, Dnum=5 WHERE Pnumber=10;
As in the DELETE command, a WHERE clause in the UPDATE command selects the tuples
to be modified from a single relation. However, updating a primary key value may propagate
to the foreign key values of tuples in other relations if such a referential triggered action is
specified in the referential integrity constraints of the DDL.
Several tuples can be modified with a single UPDATE command. An example is to give all
employees in the ‘Research’ department a 10 percent raise in salary, as shown by the following
query
UPDATE EMPLOYEE
SET Salary = Salary * 1.1
WHERE Dno = 5;
Each UPDATE command explicitly refers to a single relation only. To modify multiple
relations, we must issue several UPDATE commands.
But the result of the complete expression is true in both cases — it does not depend on
the value you assume for null.
A similar case applies to the and operator: and connections are false as soon as any
operand is false.
The result of the following expression is therefore false:
(NULL = 1) AND (0 = 1)
In all other cases, any unknown operand for not, and, and or causes the logical operation
to return unknown.
III. General Rule: where, having, when, etc.
It is not enough that a condition is not false.
The result of the following query is therefore always the empty set:
SELECT col FROM t
WHERE col = NULL
The result of the equals comparison to null is always unknown. The where clause thus
rejects all rows.
Use the null predicate to search for null values:
WHERE col IS NULL
Odd Consequence: not in (null, …) is never true
Consider this example:
WHERE 1 NOT IN (NULL)
True
Unknown
False
Two values for null that make the expression true and false respectively. Let’s take 0
and 1.
For 0, the expressions becomes 1 NOT IN (0), which is true.
For 1, the expression becomes 1 NOT IN (1), which is clearly false.
The result of the original expression is therefore unknown, because it changes if null is
replaced by different values.
Result of not in predicates that contain a null value is never true:
WHERE 1 NOT IN (NULL, 2)
This expression is again unknown because substituting different values for null (e.g. 0
and 1) still influences the result.
Not in predicates that contain a null value can be false:
WHERE 1 NOT IN (NULL, 1)
No matter which value you substitute for the null (0, 1 or any other value) the result is
always false.
Nested Queries:
In nested queries, a query is written inside a query. The result of inner query is used in execution
of outer query. We will use STUDENT, COURSE, STUDENT_COURSE tables for
understanding nested queries.
STUDENT Table data:
EXISTS:
The EXISTS condition in SQL is used to check whether the result of a correlated nested
query is empty (contains no tuples) or not.
The result of EXISTS is a boolean value True or False.
It can be used in a
SELECT, UPDATE, INSERT or DELETE statement.
Syntax:
SELECT column_name(s)
FROM table_name
WHERE EXISTS
(SELECT column_name(s)
FROM table_name
WHERE condition);
Examples:
Consider the following two relation “Customers” and “Orders”.
Queries:
I. Using EXISTS condition with SELECT statement
To fetch the first and last name of the customers who placed at least one order.
SELECT fname, lname
FROM Customers
WHERE EXISTS (SELECT *
FROM Orders
WHERE Customers.customer_id = Orders.c_id);
Output:
Aggregate functions:
Aggregate functions return single values by performing action of a group of values.
Example: Emp
Group By:
The GROUP BY Statement in SQL is used to arrange identical data into groups with
the help of some functions. i.e if a particular column has same values in different rows
then it will arrange these rows in a group.
Important Points:
GROUP BY clause is used with the SELECT statement.
In the query, GROUP BY clause is placed after the WHERE clause.
In the query, GROUP BY clause is placed before ORDER BY clause if used any.
Syntax:
SELECT column1, function_name(column2)
FROM table_name WHERE condition GROUP BY column1, column2
ORDER BY column1, column2;
function_name: Name of the function used for example, SUM() , AVG().
table_name: Name of the table.
condition: Condition used.
Sample data tables to use in Query:
Example:
Group By single column: Group By single column means, to place all the rows with
same value of only that particular column in one group.
Consider the query as shown below:
As you can see in the above output, the rows with duplicate NAMEs are grouped under same
NAME and their corresponding SALARY is the sum of the SALARY of duplicate rows.
The SUM() function of SQL is used here to calculate the sum.
Group By multiple columns: Group by multiple column is say for example, GROUP BY
column1, column2. This means to place all the rows with same values of both the columns
column1 and column2 in one group.
Query:
SELECT SUBJECT, YEAR, Count(*) FROM Student
GROUP BY SUBJECT, YEAR;
Output:
As you can see in the above output the students with both same SUBJECT and YEAR are
placed in same group. And those whose only SUBJECT is same but not YEAR belongs to
different groups. So here we have grouped the table according to two columns or more than
one column.
HAVING Clause:
HAVING clause can be used to place conditions to decide which group will be the part
of final result-set.
We can not use the aggregate functions like SUM(), COUNT() etc. with WHERE
clause.
So we have to use HAVING clause if we want to use any of these functions in the
conditions.
As you can see in the output only one group out of the three groups appears in the result-
set as it is the only group where sum of SALARY is greater than 3000.
So we have used HAVING clause here to place this condition as the condition is
required to be placed on groups not columns.
Syntax:
SELECT column1, function_name(column2)
FROM table_name
WHERE condition
GROUP BY column1, column2
HAVING condition
ORDER BY column1, column2;
ORDER BY:
The ORDER BY statement in SQL is used to sort the fetched data in either ascending
or descending according to one or more columns.
By default ORDER BY sorts the data in ascending order.
We can use the keyword DESC to sort the data in descending order and the keyword
ASC to sort in ascending order.
Sort according to one column
To sort in ascending or descending order we can use the keywords ASC or DESC
respectively.
Syntax:
SELECT * FROM table_name ORDER BY column_name ASC|DESC;
Where
table_name: name of the table.
column_name: name of the column according to which the data is needed to be arranged.
ASC: to sort the data in ascending order.
DESC: to sort the data in descending order.
|: use either ASC or DESC to sort in ascending or descending order
Sort according to multiple columns:
To sort in ascending or descending order we can use the keywords ASC or DESC respectively.
To sort according to multiple columns, separate the names of columns by the (,) operator.
Syntax:
SELECT * FROM table_name ORDER BY column1 ASC|DESC, column2 ASC|DESC;
Now consider the above database table and find the results of different queries.
Sort according to a single column:
In this example, we will fetch all data from the table Student and sort the result in descending
order according to the column ROLL_NO.
Query:
SELECT * FROM Student ORDER BY ROLL_NO DESC;
Output:
In the above example, if we want to sort in ascending order we have to use ASC in place of
DESC.
Sort according to multiple columns:
In this example we will fetch all data from the table Student and then sort the result in
ascending order first according to the column Age. And then in descending order
according to the column ROLL_NO.
Note:
ASC is the default value for the ORDER BY clause. So, if we don’t specify anything
after the column name in the ORDER BY clause, the output will be sorted in ascending
order by default.
Query:
SELECT * FROM Student ORDER BY Age ASC, ROLL_NO DESC;
Output:
In the above output, we can see that first the result is sorted in ascending order according to
Age. There are multiple rows of having the same Age. Now, sorting further this result-set
according to ROLL_NO will sort the rows with the same Age according to ROLL_NO in
descending order.
Assertions:
When a constraint involves 2 (or) more tables, the table constraint mechanism is
sometimes hard and results may not come as expected.
To cover such situation SQL supports the creation of assertions that are constraints not
associated with only one table.
An assertion statement should ensure a certain condition will always exist in the
database. DBMS always checks the assertion whenever modifications are done in the
corresponding table.
A data assertion is a query that looks for problems in a dataset. If the query returns any
rows then the assertion fails.
Data assertions are defined this way because it’s much easier to look for problems rather
than the absence of them.
It also means that assertion queries can themselves be used to quickly inspect the data
causing the assertion to fail - making it easy to diagnose and fix the problem.
Triggers:
A trigger is a database object that is associated with the table, it will be activated when
a defined action is executed for the table. In another way;
A trigger is a stored procedure in database which automatically invokes whenever a
special event in the database occurs.
For example, a trigger can be invoked when a row is inserted into a specified table or
when certain table columns are being updated.
The trigger can be executed when we run the following statements:
1. INSERT
2. UPDATE
3. DELETE
And it can be invoked before or after the event.
Syntax –
create trigger [trigger_name]
[before | after]
{insert | update | delete}
on [table_name]
[for each row]
[trigger_body]
Explanation of syntax for Trigger:
create trigger [trigger_name]:
Creates or replaces an existing trigger with the trigger_name.
[before | after]: This specifies when the trigger will be executed.
{insert | update | delete}: This specifies the DML operation.
on [table_name]: This specifies the name of the table associated with the trigger.
[for each row]: This specifies a row-level trigger, i.e., the trigger will be executed for each
row being affected.
[trigger_body]: This provides the operation to be performed as trigger is fired
BEFORE and AFTER of Trigger:
BEFORE triggers run the trigger action before the triggering statement is run.
AFTER triggers run the trigger action after the triggering statement is run.
Example:
Given Student Report Database, in which student marks assessment is recorded.
In such schema, create a trigger so that the total and average of specified marks is
automatically inserted whenever a record is insert.
Here, as trigger will invoke before record is inserted so, BEFORE Tag can be used.
Suppose the database Schema shown below is considered.
Example:
SQL Trigger to problem statement.
create trigger stud_marks
before INSERT on Student for each row set
Student.total = Student.subj1 + Student.subj2 + Student.subj3,
Student.per = Student.total * 60 / 100;
Above SQL statement will create a trigger in the student database in which whenever subjects
marks are entered, before inserting this data into the database, trigger will compute those two
values and insert with the entered values. i.e.,
StudentMarks:
Examples:
In this example, we will create a view named StudentNames from the table StudentDetails.
Query:
CREATE VIEW StudentNames AS
SELECT S_ID, NAME
FROM StudentDetails
ORDER BY NAME;
If we now query the view as,
SELECT * FROM StudentNames;
Output:
In this example we will create a View named MarksView from two tables StudentDetails and
StudentMarks.
To create a View from multiple tables we can simply include multiple tables in the SELECT
statement.
Query:
CREATE VIEW MarksView AS
SELECT StudentDetails.NAME, StudentDetails.ADDRESS, StudentMarks.MARKS
FROM StudentDetails, StudentMarks
WHERE StudentDetails.NAME = StudentMarks.NAME;
To display data of View MarksView:
SELECT * FROM MarksView;
Output:
DELETING VIEWS:
SQL allows us to delete an existing View. We can delete or drop a View using the DROP
statement.
Syntax:
DROP VIEW view_name;
view_name: Name of the View which we want to delete.
Example: if we want to delete the View MarksView, we can do this as:
DROP VIEW MarksView;
UPDATING VIEWS:
There are certain conditions needed to be satisfied to update a view. If any one of these
conditions is not met, then we will not be allowed to update the view.
The SELECT statement which is used to create the view should not include GROUP
BY clause or ORDER BY clause.
The SELECT statement should not have the DISTINCT keyword.
The View should have all NOT NULL values.
The view should not be created using nested queries or complex queries.
The view should be created from a single table.
If the view is created using multiple tables then we will not be allowed to update the
view.
We can use the CREATE OR REPLACE VIEW statement to add or remove fields from a view.
Syntax:
Output:
those tuples that satisfy a qualifying condition. Alternatively, we can consider the SELECT
operation to restrict the tuples in a relation to only those tuples that satisfy the condition.
The SELECT operation can also be visualized as a horizontal partition of the relation into two
sets of tuples those tuples that satisfy the condition and are selected, and those tuples that do
not satisfy the condition and are discarded.
Examples:
1) Select the EMPLOYEE tuples whose department number is 4.
DNO = 4 (EMPLOYEE)
3) Select the tuples for all employees who either work in department 4 and make over
$25,000 per year, or work in department 5 and make over $30,000
(Dno=4 AND Salary>25000) OR (Dno=5 AND Salary>30000)(EMPLOYEE)
Hence, a sequence of SELECTs can be applied in any order.we can always combine a cascade
(or sequence) of SELECT operations into a single SELECT operation with a conjunctive
(AND) condition; that is,
<cond1> <condn>(R <cond1>AND<cond2>AND ... AND<condn>(R)
V.RASHMI (Assistant Professor) PVPSIT IT 52
DATABASE MANAGEMENT SYSTEMS PVP19 UNIT-2
In SQL, the SELECT condition is specified in the WHERE clause of a query.For example, the
following operation:
Dno=4 AND Salary>25000 (EMPLOYEE)
Project Operation is a unary operation that returns its argument relation, with
certain attributes left out.
Notation: 𝜋 A1,A2,A3 ….Ak (r)
where A1, A2, …, Ak are attribute names and r is a relation name.
The result is defined as the relation of k columns obtained by erasing the columns that
are not listed.
Duplicate rows removed from result, since relations are sets.
Example: eliminate the dept_name attribute of instructor
Query: 𝜋 ID, name, salary (instructor)
Result:
Example:
1) To list each employee’s first and last name and salary we can use the PROJECT
operation as follows:
Lname, Fname, Salary(EMPLOYEE)
If the attribute list includes only nonkey attributes of R, duplicate tuples are likely to
occur. The result of the PROJECT operation is a set of distinct tuples, and hence a valid
relation. This is known as duplicate elimination.For example, consider the following
PROJECT operation:
gender, Salary(EMPLOYEE)
The tuple <’F’, 25000> appears only once in the resulting relation even though
this combination of values appears twice in the EMPLOYEE relation.
The number of tuples in a relation resulting from a PROJECT operation is always less
than or equal to the number of tuples in R. Commutativity does not hold on PROJECT
<list1> <list2>(R <list1>(R)
as long as <list2> contains the attributes in <list1>; otherwise, the left-hand side is an
incorrect expression.
In SQL, the PROJECT attribute list is specified in the SELECT clause of a query. For
example, the following operation:
gender, Salary(EMPLOYEE)
For example, to retrieve the first name, last name, and salary of all employees who work in
department number 5, we must apply a SELECT and a PROJECT operation. We can write a
single relational algebra expression, also known as an in-line expression, as follows:
Alternatively, we can explicitly show the sequence of operations, giving a name to each
intermediate relation, as follows:
DEP5_EMPS ← 𝝈 Dno=5(EMPLOYEE)
RESULT ← Fname, Lname, Salary(DEP5_EMPS)
We can also use this technique to rename the attributes in the intermediate and result relations.
To rename the attributes in a relation, we simply list the new attribute names in parentheses.
Dno=5(EMPLOYEE)
If no renaming is applied, the names of the attributes in the resulting relation of a SELECT
operation are the same as those in the original relation and in the same order.For a PROJECT
operation with no renaming, the resulting relation has the same attribute names as those in the
projection list and in the same order in which they appear in the list.
We can also define a formal RENAME operation which can rename either the relation name
or the attribute names, or both as a unary operator.
The general RENAME operation when applied to a relation R of degree n is denoted by any of
the following three forms:
The first expression renames both the relation and its attributes. Second renames the relation
only and the third renames the attributes only.If the attributes of R are (A1, A2, ..., An) in that
order, then each Ai is renamed as Bi.
Renaming in SQL is accomplished by aliasing using AS, as in the following example:
SELECT E.Fname AS First_name,
E.Lname AS Last_name,
E.Salary AS Salary
FROM EMPLOYEE AS E
WHERE E.Dno=5,
STUDENT-INSTRUCTOR INSTRUCTOR-STUDENT
Example: To retrieve the Social Security numbers of all employees who either
work in department 5 or directly supervise an employee who works in department
5.
DEPT5_EMP← Dno=5(EMPLOYEE)
RESULT1← 𝝅Ssn(DEPT5_EMPS)
RESULT2(Ssn) ← 𝝅Super_Ssn(DEPT5_EMPS)
Two relations R(A1, A2, ..., An) and S(B1, B2, ..., Bn) are said to be union compatible (or
type compatible) if they have the same degree n and if dom(Ai) = dom(Bi) for 1 ≤ i ≤ n. This
means that the two relations have the same number of attributes and each corresponding pair
of attributes has the same domain.
in general,
R-S S-R
INTERSECTION can be expressed in terms of union and set difference as follows,
In SQL, there are three operations UNION, INTERSECT, and EXCEPT that correspond
to the set operations
Union Operation:
The union operation allows us to combine two relations
Notation: r ∪ s
For r ∪ s to be valid.
1. r, s must have the same arity (same number of attributes)
2. The attribute domains must be compatible (example: 2nd column of r
deals with the same type of values as does the 2nd column of s)
Set-Intersection Operation:
The set-intersection operation allows us to find tuples that are in both the input
relations.
Notation: r ∩ s
Assume:
r, s have the same arity
attributes of r and s are compatible
Example: Find the set of all courses taught in both the Fall 2017 and the Spring
2018 semesters.
Π course_id(𝜎semester=“Fall”Λyear=2017(section))∩
Πcourse_id (𝜎 semester=“Spring” Λ year=2018 (section))
Result
The set-difference operation allows us to find tuples that are in one relation but are
not in another.
Notation: r – s
Set differences must be taken between compatible relations.
r and s must have the same arity
attribute domains of r and s must be compatible
Example: to find all courses taught in the Fall 2017 semester, but not in the Spring
2018 semester
Π course_id(𝜎semester=“Fall”Λyear=2017(section))−
Πcourse_id (𝜎 semester=“Spring” Λ year=2018 (section))
Result:
Equivalent Queries
There is more than one way to write a query in relational algebra.
Example: Find information about courses taught by instructors in the Physics
department with salary greater than 90,000
Query 1
𝜎 dept_name=“Physics” ⋀ salary > 90,000 (instructor)
Query 2
𝜎 dept_name=“Physics” (𝜎 salary > 90.000 (instructor))
The two queries are not identical; they are, however, equivalent -- they give the
same result on any database.
3.1.2 The CARTESIAN PRODUCT (CROSS PRODUCT) Operation
Cartesian Product Operation in Relational Algebra
Applying CARTESIAN PRODUCT on two relations that is on two sets of tuples, it
will take every tuple one by one from the left set(relation) and will pair it up with all
the tuples in the right set(relation).
So, the CROSS PRODUCT of two relation A(R1, R2, R3, …, Rp) with degree p, and
B(S1, S2, S3, …, Sn) with degree n, is a relation C(R1, R2, R3, …, Rp, S1, S2, S3, …,
Sn) with degree p + n attributes.
Notation: A ✕ S
where A and S are the relations,
the symbol ‘✕’ is used to denote the CROSS PRODUCT operator.
Example:
Consider two relations STUDENT(SNO, FNAME, LNAME) and DETAIL(ROLLNO,
AGE) below:
The CARTESIAN PRODUCT creates tuples with the combined attributes of two relations. We
can SELECT related tuples only from the two relations by specifying an appropriate selection
condition after the Cartesian product.
In SQL, CARTESIAN PRODUCT can be realized by using the CROSS JOIN option in joined
tables.
Cartesian Product Operation in Relational Algebra
Applying CARTESIAN PRODUCT on two relations that is on two sets of tuples, it
will take every tuple one by one from the left set (relation) and will pair it up with all
the tuples in the right set (relation).
So, the CROSS PRODUCT of two relation A(R1, R2, R3, …, Rp) with degree p, and
B(S1, S2, S3, …, Sn) with degree n, is a relation C(R1, R2, R3, …, Rp, S1, S2, S3, …,
Sn) with degree p + n attributes.
Notation: A ✕ S
where A and S are the relations,
the symbol ‘✕’ is used to denote the CROSS PRODUCT operator.
Example:
Consider two relations STUDENT(SNO, FNAME, LNAME) and DETAIL(ROLLNO,
AGE) below:
1. Natural Join:
A natural join is the set of tuples of all combinations in R and S that are equal on their common
attribute names. It is denoted by ⋈.
Example: Let's use the EMPLOYEE table and SALARY table:
Input: ∏ EMP_NAME, SALARY (EMPLOYEE ⋈ SALARY)
Output:
2. Outer Join:
The outer join operation is an extension of the join operation. It is used to deal with missing
information.
Example:
EMPLOYEE FACT_WORKERS
a) Left Outer Join: Left outer join contains the set of tuples of all combinations in R and
S that are equal on their common attribute names.
In the left outer join, tuples in R have no matching tuples in S.
It is denoted by ⟕.
Example: Using the above EMPLOYEE table and FACT_WORKERS table.
Input: EMPLOYEE ⟕ FACT_WORKERS
Output:
b) Right outer join: Right outer join contains the set of tuples of all combinations in R
and S that are equal on their common attribute names.
In right outer join, tuples in S have no matching tuples in R.
It is denoted by ⟖.
Example: Using the above EMPLOYEE table and FACT_WORKERS Relation
Input: EMPLOYEE ⟖ FACT_WORKERS
Output:
c) Full outer join: Full outer join is like a left or right join except that it contains all rows
from both tables.
In full outer join, tuples in R that have no matching tuples in S and tuples in S that
have no matching tuples in R in their common attribute name.
It is denoted by ⟗.
Example: Using the above EMPLOYEE table and FACT_WORKERS table
Input: EMPLOYEE ⟗ FACT_WORKERS
Output:
3) Equi join: It is also known as an inner join. It is the most common join. It is based on
matched data as per the equality condition. The equi join uses the comparison operator(=).
Example:
The JOIN operation, denoted by is used to combine related tuples from two relations into
single longer tuples. It allows us to process relationships among relations.The general form of
a JOIN operation on two relations R(A1, A2, ..., An) and S(B1, B2, ..., Bm) is
R <join condition>S
Example: Retrieve the name of the manager of each department.
To get the manager’s name, we need to combine each department tuple with the employee tuple
whose Ssn value matches the Mgr_ssn value in the department tuple
The result of the JOIN is a relation Q with n + m attributes Q(A1, A2, ..., An,B1, B2, ..., Bm
in that order. Q has one tuple for each combination of tuples one from R and one from S
whenever the combination satisfies the join condition. This is the main difference between
CARTESIAN PRODUCT and JOIN. In JOIN, only combinations of tuples satisfying the join
condition appear in the result, whereas in the CARTESIAN PRODUCT all combinations of
tuples are included in the result. The join condition is specified on attributes from the two
relations R and S and is evaluated for each combination of tuples.
Each tuple combination for which the join condition evaluates to TRUE is included in the
resulting relation Q as a single combined tuple. A general join condition is of the form
<condition> AND <condition> AND...AND <condition>
The attribute Dnum is called the join attribute for the NATURAL JOIN operation, because it
is the only attribute with the same name in both relations.
If the attributes on which the natural join is specified already have the same names in both
relations, renaming is unnecessary. For example, to apply a natural join on the Dnumber
attributes of DEPARTMENT and DEPT_LOCATIONS, it is sufficient to write
In general, the join condition for NATURAL JOIN is constructed by equating each pair of join
attributes that have the same name in the two relations and combining these conditions with
AND. If no combination of tuples satisfies the join condition, the result of a JOIN is an empty
relation with zero tuples.
A more general, but nonstandard definition for NATURAL JOIN is
where,
<list1> : list of i attributes from R,
<list2> : list of i attributes from S
The lists are used to form equality comparison conditions between pairs of corresponding
attributes and then the conditions are then ANDed together. Only the list corresponding to
attributes of the first relation R <list1> is kept in the result Q.
In general, if R has nR tuples and S has nS tuples, the result of a JOIN operation R <join
condition> S will have between zero and nR * nS tuples. The expected size of the join result
divided by the maximum size nR * nS leads to a ratio called join selectivity, which is a property
of each join condition. If there is no join condition, all combinations of tuples qualify and the
JOIN degenerates into a CARTESIAN PRODUCT, also called CROSS PRODUCT or CROSS
JOIN.
A single JOIN operation is used to combine data from two relations so that related information
can be presented in a single table. These operations are also known as inner joins. Informally,
an inner join is a type of match and combine operation defined formally as a combination of
CARTESIAN PRODUCT and SELECTION. The NATURAL JOIN or EQUIJOIN operation
can also be specified among multiple tables, leading to an n-way join. For example, consider
the following three-way join:
This combines each project tuple with its controlling department tuple into a single tuple, and
then combines that tuple with an employee tuple that is the department manager. The net result
is a consolidated relation in which each tuple contains this project-department-manager
combined information.
In SQL, JOIN can be realized in several different ways
- The first method is to specify the <join conditions> in the WHERE clause, along with
any other selection conditions.
- The second way is to use a nested relation
- Another way is to use the concept of joined tables
3.5.3 A Complete Set of Relational Algebra Operations
The set of relational algebra operations is a complete set; that is, any of the other
original relational algebra operations can be expressed as a sequence of operations from this
set. For example, the INTERSECTION operation can be expressed by using UNION and
MINUS as follows:
The expression:
Smith ← Π Pno(σ Ename = ‘john smith’ (employee * works on Pno=Eno))
The DIVISION operation, denoted by ÷, is useful for a special kind of query that sometimes
occurs in database applications. An example is Retrieve the names of employees who work on
all the projects that ‘John Smith’ works on. To express this query using the DIVISION
operation, proceed as follows.
First, retrieve the list of project numbers that’John Smith’ works on in the intermediate
relation SMITH_PNOS:
Next, create a relation that includes a tuple <Pno, Essn> whenever the employee whose
Ssn is Essn works on the project whose number is Pno in the intermediate relation
SSN_PNOS:
Finally, apply the DIVISION operation to the two relations, which gives the desired
employees’ Social Security numbers:
In general, the DIVISION operation is applied to two relations R(Z) ÷ S(X), where the
attributes of R are a subset of the attributes of S; that is, X Z.Let Y be the set of attributes of
R that are not attributes of S; that is, Y = Z X (and hence Z = X Y). The result of DIVISION
is a relation T(Y) that includes a tuple t if tuples tR appear in R with tR [Y] = t, and with tR
[X] = tS for every tuple tS in S. This means that, for a tuple t to appear in the result T of t
Figure below illustrates a DIVISION operation where X = {A}, Y = {B}, and Z = {A, B}.
The tuples (values) b1 and b4 appear in R in combination with all three tuples in S; that is why
they appear in the resulting relation T. All other values of B in R do not appear with all the
tuples in S and are not selected: b2 does not appear with a2, and b3 does not appear with a1.
The DIVISION operation can be expressed as a sequence of 𝜋, x and – operations are as
follows:
Leaf nodes P, D, and E represent the three relations PROJECT, DEPARTMENT, and
EMPLOYEE. The relational algebra operations in the expression are represented by internal
tree nodes. The query tree signifies an explicit order of execution in the following sense. The
node marked (1) must begin execution before node (2) because some resulting tuples of
operation (1) must be available before we can begin to execute operation (2). Similarly, node
(2) must begin to execute and produce results before node (3) can start execution, and so on.
A query tree gives a good visual representation and understanding of the query in terms of the
relational operations it uses and is recommended as an additional means for expressing queries
in relational algebra.
Query 2. For every project located in ‘Stafford’, list the project number, the controllong
department number, and the department manager’s last name, address and birth date.
Query 3. Find the names of employees who work on all the projects controlled by
department number 5.
Query 4. Make a list of project numbers for projects that involve an employee whose last
name is ‘Smith’, either as a worker or as a manager as a department that controls the
project.
Query 5. List the names of all employees with two or more dependents.
Query 7. List the names of managers who have at least one dependent.