Lecture 6
Lecture 6
The relational algebra and calculus were developed before the SQL language.
Preliminaries
A query is applied to relation instances, and the result of a query
is also a relation instance.
◦ Schemas of input relations for a query are fixed.
◦ The schema for the result of a given query is also fixed!
Determined by definition of query language constructs.
Relational Algebra
Basic operations:
◦ Selection ( ) Selects a subset of rows from relation.
◦
Projection ( ) Deletes unwanted columns from relation.
◦
Cross-product ( ) Allows us to combine two relations.
◦
Set-difference ( ) Tuples in reln. 1, but not in reln. 2.
◦ Union ( ) Tuples in reln. 1 and in reln. 2.
Additional operations:
◦ Intersection, join, division, renaming
In domain relational calculus, filtering is done based on the domain of the attributes
and not based on the tuple values.
Syntax: { c1, c2, c3, ..., cn | F cond(c1, c2, c3, ... ,cn)}
we must have n of these domain variables—one for each attribute.
where, c1, c2... etc represents domain of attributes(columns) and F defines the formula
including the condition for fetching the data.
For example,
{< name, age > | ∈ Student ∧ age > 17}
The above query will return the names and ages of the students in the table Student
who are older than 17.
DRC Example
Consider two relations
◦ EMP(Name, MGR, DEPT, SAL)
◦ CHILDREN(Ename, Cname, Age)
Q1: Retrieve Salary and Children’s name of Employees whose manager is
‘white’
{s,c|(u)(v)(w)(x)(y)(EMP(u,v,w,s) AND CHILDREN(x,c,y) AND
/* initiate domain variables */
u = x AND /* join condition */
v = ‘white’ } /* selection condition */
/* projection is implied (s, c) */
Domain Relational Calculus (DRC)
‘Research’ department.(TRC)
Query 2. For every project located in ‘Stafford’, list the project number, the
B. Smith’
First phase is called syntax checking phase, the system parses the
query and checks that it follows the syntax rules or not. then matches
the objects in the query syntax with the view tables and columns listed
in the system table.
The sorting of relations which do not fit in the memory because their size is
larger than the memory size. Such type of sorting is known as External Sorting.
Sorting is also a key component in sort-merge algorithms used for JOIN and
other operations (such as UNION and INTERSECTION).
Cont’d
The typical external sorting algorithm uses a sort-merge strategy, which starts
by sorting small subfiles—called runs—of the main file and then merges the
sorted runs, creating larger sorted subfiles that are merged in turn.
consists of two phases: the sorting phase and the merging phase .
Access structures in DB
Indexing is a data structure technique to efficiently retrieve records from the
database files based on some attributes on which the indexing has been done.
Indexing in database systems is similar to what we see in books.
Index file is typically much smaller than the data file, searching the index using a
binary search is a better option.
Indexing can be of the following types
Primary Index − Primary index is defined on an ordered data file. The data file
is ordered on a key field. The key field is generally the primary key of the
relation.
Clustering Index − Clustering index is defined on an ordered data file. The data
file is ordered on a non-key field.
Secondary Index − Secondary index may be generated from a field which is a
candidate key and has a unique value in every record, or a non-key with duplicate
values
Hashing
The search condition must be an equality condition on a single field, called
the hash field. In most cases, the hash field is also a key field of the file, in
which case it is called the hash key.
Generating and saving temporary files on disk is time consuming and expensive.
Alternative:
Pipeline the data through multiple operations - pass the result of a previous
operator to the next without saving to complete the previous operation.
Parallel query processing
Parallel query processing can improve the performance of the following types of
queries: select statements that scan large numbers of pages but return
relatively few rows, such as: Table scans or clustered index scans with grouped
or ungrouped aggregates.
Leaf node of the tree, representing the base input relations of the query.