Introduction To Relational Databases: (Notes For Introductory Lecture)
Introduction To Relational Databases: (Notes For Introductory Lecture)
doc
Printed at: 15:37 on Friday, 7 March, 2008
1. Cover slide
These notes are adapted from the ones given to Hugh Darwen's students at Warwick University.
Paragraphs in italics, like this one, are ones I have added for the benefit of M359 students, mostly
addressing slight mismatches between my Warwick course and M359.
Introductory Remarks
In Block 2 of M359, Relational Databases: Theory and Practice, we study the theory upon which
relational databases are based. Or perhaps it would be better to say, “should be based”, or “were
intended to be based (by its original proponent)”. We will also study a database language that is
firmly based on this theory and in so doing take note of some general principles of good computer
language design.
Block 3 of M359 introduces you to the “state of the art” of relational database support in the
industry. This takes the form of SQL, the language that was dubbed “intergalactic dataspeak” by
database guru Michael Stonebraker in the 1980s (since when its use and its number of
implementations have grown enormously). You are strongly encouraged to compare and contrast
the theory of Block 2 and the practice of SQL!
Section 1: Introduction
Section 1 (i.e., this first lecture) gives a very broad overview of
• what a database is
• what a relational database is
• what a database management system (DBMS) is
• what a DBMS does
• how a relational DBMS does what a DBMS does
We start to familiarise ourselves with terminology and notation used on this course.
We get a brief introduction to each topic that will be considered in more detail in later sections of
the course (this remark applies reasonably well to M359 as well as the Warwick course).
2. Some Preliminaries
Note that the word “database” had not been coined when Ted Codd, actually a hardware specialist
working on for IBM on its mainframe architecture at the time, did this important work for which he
was later a recipient of the Turing Award. Codd died in 2003.
What follows is only partially relevant to M359 students. On M359 we use our own notation for
our study of the theory in Block 2. It is very similar to Tutorial D and for many operations it is
identical.
Tutorial D (which is always written that way, in bold face) was invented in the late 1990s by Hugh
Darwen and C.J. (Chris) Date for the very purpose that its name suggests—teaching, as opposed to
commercial use. It is used for examples and exercises in several books coauthored by them and in
particular in the two books by Chris Date that are recommended reading for this course. The
official definition of the language is given in their book, “Databases, Types, and The Relational
Model: The Third Manifesto” (Addison-Wesley, 2005, ISBN 0-321-39942-0), but this book is not
suitable for an introductory course of this nature and in any case we will not use all of the language.
3. What Is a Database?
The organised, machine-readable collection of symbols is what you “see” if you “look at” a
database at a particular point in time. It is to be interpreted as a true account of the enterprise at that
point in time. Of course it might happen to be incorrect, incomplete or inaccurate, so perhaps it is
better to say “believed to be true” rather than just “true”.
The alternative view of a database as a collection of variables reflects the fact that the account of the
enterprise has to change from time to time, depending on the frequency of change in the details we
choose to include in that account.
The suitability of a particular kind of database (such as “relational”, or “object-oriented”) might
depend to some extent on the requirements of its user(s). When E.F. Codd developed his theory of
relational databases (first published in 1969), he sought to minimise that extent. Thus, when
designing a relational database we should be able to focus on the level of detail of the account
without having to anticipate the uses to which it will be put. That is perhaps the distinguishing
feature of the relational approach, and you should bear it in mind as we explore some of its
ramifications on this course.
6. “Collection of Variables”
Under our agreed interpretation, we can conclude that the following sentences are all true:
Student S1, named Anne, is enrolled on course C1.
Student S1, named Anne, is enrolled on course C2.
Student S2, named Boris, is enrolled on course C1.
Student S3, named Cindy, is enrolled on course C3.
Student S4, named Devinder, is enrolled on course C1.
Notice that in English we can join all these sentences together to form a single sentence, using
conjunctions like “and”, “or”, “because” and so on. If we join them using “and” in particular, we
get a single sentence that is logically equivalent to the given set of sentences in the sense that it is
true if each one of them is true (and false if any one of them is false). A database, then, can be
thought of as a representation of an account of the enterprise expressed as a single sentence!
We might also be able to conclude that the following sentences (for example) are false:
Student S2, named Boris, is enrolled on course C2.
Student S2, named Beth, is enrolled on course C1.
Whenever the variable is updated, the set of true sentences represented by its value changes in some
way. This usually reflects perceived changes in the enterprise, affecting our beliefs about it and
therefore our account of it.
Note that if the variable were updated by the addition or removal of a column, that would not
necessarily reflect a perceived change in the enterprise. Rather, it would reflect a decision to
change the structure of the variable, thereby also changing the form of our account under the
interpretation of that structure.
8. Relation Table
The title of this slide is trying to say that the terms “relation” and “table” are not synonymous. A
table is one way of denoting a particular relation, but several different tables can all denote the same
relation, because we can change the left-to-right order in which the columns are shown and/or the
top-to-bottom order in which the rows are shown and yet still be depicting (denoting) the same
relation.
What does it mean, to say that the order of columns and the order of rows doesn’t matter? We will
find out the answer to this question when we later study the typical operators that are defined for
operating on relations (e.g., to compute results of queries against the database) and relation
variables (e.g., to update the database). None of these operators will depend on the notion of some
row or some column being the first or last, or immediately before or after some other row.
9. Anatomy of a Relation
Because of the distinction I have noted between the terms “relation” and “table”, we prefer not to
use the terminology of tables for the anatomical parts of a relation. We use instead the terms
proposed by E.F. Codd, the researcher who first proposed relational theory as a basis for database
technology, in 1969.
Try to get used to these terms. You might not find them very intuitive. Their counterparts in the
tabular representation might help:
relation = table
(n-)tuple = row
attribute = column
Also (repeating what is shown in the slide):
The degree is the number of attributes.
The cardinality is the number of tuples.
The heading is the set of attributes (note set, because the attributes are not ordered in any
way and no attribute appears more than once).
The body is the set of tuples (again, note set—the tuples are not ordered and no tuple
appears more than once).
An attribute has an attribute name, and no two have the same name.
Each attribute has an attribute value in each tuple.
17. EXERCISE
Consider this table:
A B A
1 2 3
4 5
6 7 8
9 9 ?
1 2 3
Give three reasons why it cannot be representing a relation.By the way, this table is supported by
SQL, and the three reasons represent SQL's most serious and far-reaching deviations from relational
theory.
End of Notes