0% found this document useful (0 votes)
44 views

Relational Model and Algebra: Introduction To Databases Compsci 316 Fall 2014

The document discusses the relational model and relational algebra for querying relational databases. It introduces the relational model as a collection of tables where each table has columns and rows. It describes key concepts like schema vs instance, and relational algebra operators like selection, projection, cross product, and join. These operators allow composing queries to retrieve and relate data across multiple tables.

Uploaded by

Adrian Adr
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
44 views

Relational Model and Algebra: Introduction To Databases Compsci 316 Fall 2014

The document discusses the relational model and relational algebra for querying relational databases. It introduces the relational model as a collection of tables where each table has columns and rows. It describes key concepts like schema vs instance, and relational algebra operators like selection, projection, cross product, and join. These operators allow composing queries to retrieve and relate data across multiple tables.

Uploaded by

Adrian Adr
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 41

Relational

Model and Algebra


Introduction to Databases
CompSci 316 Fall 2014
2

Announcements (Thu. Aug. 28)


• Registration
• As a courtesy to others, please add/drop ASAP
• Tonight: permission #’s will be emailed to 18 on the wait list
• Monday evening: another round of permission #’s
• If you are not on the official wait list, check

• UTAs and office hours to be announced soon


• Next week
• Brett will run the class (I will be away at a conference)
• Tuesday: lab to help with setup, VM, RA—bring laptop!
• Thursday: relational database design
• Homework #1 assigned; due in ~2 weeks
• Sign up for Gradiance and Piazza
• Wait for our email to start setting up VM (and signing up for
Amazon if needed)
3

Edgar F. Codd (1923-2003)

• Pilot in the Royal Air Force in WW2


• Inventor of the relational model
and algebra while at IBM
• Turing Award, 1981

! "
4

Relational data model


• A database is a collection of relations (or tables)
• Each relation has a set of attributes (or columns)
• Each attribute has a name and a domain (or type)
• Set-valued attributes are not allowed
• Each relation contains a set of tuples (or rows)
• Each tuple has a value for each attribute of the relation
• Duplicate tuples are not allowed
• Two tuples are duplicates if they agree on all attributes

Simplicity is a virtue!
5

Example
Group
User gid name
uid name age pop . $ ! .
# $ % % & / 0 12 /
# ' % % # 3 4 10
()* + ( % * - -
) , ( %
- - - -
Member uid gid
#
# /
Ordering of rows doesn’t matter
()* .
(even though output is
always in some order) ()* /
) .
) /
- -
6

Schema vs. instance


• Schema (metadata)
• Specifies how the logical structure of data
• Is defined at setup time
• Rarely changes
• Instance
• Represents the data content
• Changes rapidly, but always conforms to the schema
Compare to type vs. objects of type in a
programming language
7

Example
• Schema
• User (uid int, name string, age int, pop float)
• Group (gid string, name string)
• Member (uid int, gid string)
• Instance
• User: {〈 #, $ , %, % &〉, 〈()*, ' , %, % #〉,…}
• Group: {〈 . , $ ! .〉, 〈 /, 0 2 / 〉, …}
• Member: {〈 #, 〉, 〈 # , /〉, …}
8

Relational algebra
A language for querying relational data
based on “operators”

RelOp

RelOp

• Core operators:
• Selection, projection, cross product, union, difference,
and renaming
• Additional, derived operators:
• Join, natural join, intersection, etc.
• Compose operators to make complex queries
9

Selection
• Input: a table
• Notation:
• is called a selection condition (or predicate)
• Purpose: filter rows according to some criteria
• Output: same columns as , but only rows or that
satisfy
10

Selection example
• Users with popularity higher than 0.5
.

uid name age pop uid name age pop


# $ % % & # $ % % &
# ' % % #
.
()* + ( % * ()* + ( % *
) , ( %
- - - - - - - -
11

More on selection
• Selection condition can include any column of ,
constants, comparison (=, ≤, etc.) and Boolean
connectives (∧: and, ∨: or, ¬: not)
• Example: users with popularity at least 0.9 and age
under 10 or above 12
. ∧ ∨

• You must be able to evaluate the condition over a


single row of the input table!
• Example: the most popular user
!" #$ %& !
12

Projection
• Input: a table
• Notation: '(
• ) is a list of columns in
• Purpose: output chosen columns
• Output: same rows, but only the columns in )
13

Projection example
• IDs and names of all users
'*#+,$ -

uid name age pop uid name


# $ % % & # $
# ' % % # # '
'*#+,$ -
()* + ( % * ()* +
) , ( % ) ,
- - - - - -
14

More on projection
• Duplicate output rows are removed (by definition)
• Example: user ages
'

uid name age pop age


# $ % % & %
# ' % % # %
'*#+,$ -
()* + ( % * (
) , ( % (
- - - - -
15

Cross product
• Input: two tables R and S
• Natation: × /
• Purpose: pairs rows from two tables
• Output: for each row in and each in /, output
a row (concatenation of and )
16

Cross product example


× 0 12
uid gid
uid name age pop
# /
# ' % % #
()* .
()* + ( % *
()* /
- - - - × - -

uid name age pop uid gid


# ' % % # # /
# ' % % # ()* .
# ' % % # ()* /
()* + ( % * # /
()* + ( % * ()* .
()* + ( % * ()* /
- - - - - -
17

A note a column ordering


• Ordering of columns is unimportant as far as
contents are concerned
uid name age pop uid gid uid gid uid name age pop
# ' % % # # / # / # ' % % #
# ' % % # ()* . ()* . # ' % % #
# ' % % # ()* / ()* / # ' % % #
()* + ( % * # / = # / ()* + ( % *
()* + ( % * ()* . ()* . ()* + ( % *
()* + ( % * ()* / ()* / ()* + ( % *
- - - - - - - - - - - -

• So cross product is commutative, i.e., for any and


/, × / = / × (up to the ordering of columns)
18

Derived operator: join


(A.k.a. “theta-join”)
• Input: two tables and /
• Notation: ⋈ /
• is called a join condition (or predicate)
• Purpose: relate rows from two tables
according to some criteria
• Output: for each row in and each row in
/, output a row if and satisfy
• Shorthand for ×/
19

Join example
• Info about users, plus IDs of their groups
⋈%& !.*#+45 -6 !.*#+ 0 12 uid gid

uid name age pop # /

# ' % % # ()* .

()* + ( % * ()* /

- - - - ×
%& !.*#+4
5 -6 !.*#+ - -

Prefix a column reference uid name age pop uid gid


with table name and “.” to # ' % % # # /
disambiguate identically named # ' % % # ()* .
columns from different tables # ' % % # ()* /
()* + ( % * # /
()* + ( % * ()* .
()* + ( % * ()* /
- - - - - -
20

Derived operator: natural join


• Input: two tables and /
• Notation: ⋈ /
• Purpose: relate rows from two tables, and
• Enforce equality between identically named columns
• Eliminate one copy of identically named columns
• Shorthand for '( ⋈ / , where
• equates each pair of columns common to and /
• ) is the union of column names from and / (with
duplicate columns removed)
21

Natural join example


⋈ 0 12 = '? ⋈? 0 12
= '*#+,$ - , , , #+ ⋈ %& !.*#+4 0 12
5 -6 !.*#+
uid name age pop uid gid
# ' % % # # /
()* + ( % * ()* .

- - - - ⋈
%& !.*#+4
5 -6 !.*#+ ()* /
- -

uid name age pop uid gid


# ' % % # # /

()* + ( % * ()* .
()* + ( % * ()* /
- - - - - -
22

Union
• Input: two tables and /
• Notation: ∪ /
• and / must have identical schema
• Output:
• Has the same schema as and /
• Contains all rows in and all rows in / (with duplicate
rows removed)
23

Difference
• Input: two tables and /
• Notation: − /
• and / must have identical schema
• Output:
• Has the same schema as and /
• Contains all rows in that are not in /
24

Derived operator: intersection


• Input: two tables and /
• Notation: ∩ /
• and / must have identical schema
• Output:
• Has the same schema as and /
• Contains all rows that are in both and /
• Shorthand for − −/
• Also equivalent to / − / −
• And to ⋈ /
25

Renaming
• Input: a table and /
• Notation: ;< , ; =>,=?,… , or ;< =>,=?,…
• Purpose: “rename” a table and/or its columns
• Output: a table with the same rows as , but called
differently
• Used to
• Avoid confusion caused by identical column names
• Create identical column names for natural joins
• As with all other relational operators, it doesn’t
modify the database
• Think of the renamed table as a copy of the original
26

Renaming example
• IDs of users who belong to at least two groups
0 12 ⋈? 0 12
'*#+ 0 12 ⋈5 -6 !.*#+45 -6 !.*#+ ∧ 0 12
5 -6 !. #+A5 -6 !. #+

; *#+>, #+> 0 12
'*#+> ⋈*#+> 4*#+? ∧ #+>A #+?
; *#+?, #+? 0 12
27

Expression tree notation

'*#+>
⋈*#+>4*#+? ∧ #+> A #+?

; *#+> , #+> ; *#+? , #+?

0 12 0 12
28

Summary of core operators


• Selection:
• Projection: '(
• Cross product: × /
• Union: ∪ /
• Difference: − /
• Renaming: ;< =>,=?,…
• Does not really add “processing” power
29

Summary of derived operators


• Join: ⋈ /
• Natural join: ⋈ /
• Intersection: ∩ /

• Many more
• Semijoin, anti-semijoin, quotient, …
30

An exercise
• Names of users in Lisa’s groups
Writing a query bottom-up: Their names '$ -


Users in
Lisa’s groups '*#+

Lisa’s groups ' #+ 0 12
Who’s Lisa? ⋈
$ - 4B(#& B 0 12
31

Another exercise
• IDs of groups that Lisa doesn’t belong to
Writing a query top-down:

All group IDs IDs of Lisa’s groups

' #+ ' #+

C DE ⋈

0 12 $ - 4B(#& B
32

A trickier exercise
• Who are the most popular?
• Who do NOT have the highest pop rating?
• Whose pop is lower than somebody else’s?

'*#+ '%& !> .*#+

⋈%& !> . %& !? .

;%& !> ;%& !?


A deeper question:
When (and why) is “−” needed?
33

Monotone operators
RelOp What happens
Add more rows to the output?
to the input...

• If some old output rows may need to be removed


• Then the operator is non-monotone
• Otherwise the operator is monotone
• That is, old output rows always remain “correct” when
more rows are added to the input
• Formally, for a monotone operator D :
⊆ G implies D ⊆D G
for any , G
34

Classification of relational operators


• Selection: Monotone
• Projection: '( Monotone
• Cross product: × / Monotone
• Join: ⋈ / Monotone
• Natural join: ⋈ / Monotone
• Union: ∪ / Monotone
• Difference: − / Monotone w.r.t. ; non-monotone w.r.t /
• Intersection: ∩ / Monotone
35

Why is “ ” needed for “highest”?


• Composition of monotone operators produces a
monotone query
• Old output rows remain “correct” when more rows are
added to the input
• Is the “highest” query monotone?
• No!
• Current highest pop is 0.9
• Add another row with pop 0.91
• Old answer is invalidated
So it must use difference!
36

Why do we need core operator ?


• Difference
• The only non-monotone operator
• Cross product
• The only operator that adds columns
• Union
• The only operator that allows you to add rows?
• A more rigorous argument?
• Selection? Projection?
• Homework problem
37

Extensions to relational algebra


• Duplicate handling (“bag algebra”)
• Grouping and aggregation
• “Extension” (or “extended projection”) to allow
new column values to be computed

All these will come up when we talk about SQL


But for now we will stick to standard relational
algebra without these extensions
38

Why is r.a. a good query language?


• Simple
• A small set of core operators
• Semantics are easy to grasp
• Declarative?
• Yes, compared with older languages like CODASYL
• Though operators do look somewhat “procedural”
• Complete?
• With respect to what?
39

Relational calculus
• E. EHI E ∈ ∧
¬ ∃EG ∈ : E. D < EG . D }, or
• E. EHI E ∈ ∧
∀EG ∈ : E. D ≥ EG . D }
• Relational algebra = “safe” relational calculus
• Every query expressible as a safe relational calculus
query is also expressible as a relational algebra query
• And vice versa
• Example of an “unsafe” relational calculus query
• E. PQ1 ¬ E ∈
• Cannot evaluate it just by looking at the database
40

Turing machine
• A conceptual device that can
execute any computer algorithm
• Approximates what general-
purpose programming languages
can do
• E.g., Python, Java, C++, … Alan Turing (1912-1954)

So how does relational algebra compare with a


Turing machine?

5 6 "
41

Limits of relational algebra


• Relational algebra has no recursion
• Example: given relation Friend(uid1, uid2), who can Bart
reach in his social network with any number of hops?
• Writing this query in r.a. is impossible!
• So r.a. is not as powerful as general-purpose languages
• But why not?
• Optimization becomes undecidable
Simplicity is empowering
• Besides, you can always implement it at the application
level, and recursion is added to SQL nevertheless!

You might also like