An Introduction To Relational Database Management System
An Introduction To Relational Database Management System
History
The concept of relational databases was first described by Edgar Frank Codd (almost
exclusively referenced as E. F. Codd in technical literature) in the IBM research report
RJ599, dated August 19th, 1969.1 However, the article that is usually considered the
cornerstone of this technology is "A Relational Model of Data for Large Shared Data
Banks," published in Communications of the ACM(Vol. 13, No. 6, June 1970, pp. 377-87).
Additional articles by E. F. Codd throughout the 1970s and 80s are still considered gospel
for relational database implementations. His famous "Twelve Rules for Relational
Databases"2 were published in two Computerworld articles "Is Your DBMS Really
Relational?" and "Does Your DBMS Run By the Rules?" on October 14, 1985, and
October 21, 1985, respectively. He has since expanded on the 12 rules, and they now
number 333, as published in his book "The Relational Model for Database Management,
Version 2" (Addison -Wesley, 1990).
The language, SQL, was originally developed in the research division of IBM (initially at
Yorktown Heights, N.Y., and later at San Jose, Calif., and Raymond Boyce and Donald
Chamberlin were the original designers.)3 and has been adopted by all major relational
database vendors. The name SQL originally stood for Structured Query Language. The
first commercially available implementation of the language was named SEQUEL (for
Sequential English QUEry Language) and was part of IBM's SEQUEL/DS product. The
name was later changed for legal reasons. Thus, many long-time database developers use
the pronunciation "see-quell."
SQL has been adopted as an ANSI/ISO standard. Although revised in 1999 (usually
referenced as SQL99 or SQL3), most vendors are still not fully compliant with the 1992
version of the standard. The 1992 standard is smaller and simpler to reference for a user,
and since only some of the 1999-specific requirements are typically implemented at this
time, it may be a better starting point for learning the language.
Introduction
The database design phase is a very important step for all IT projects developing systems
that rely on a database to adequately store, query, import & export data and support
reporting. For such systems the operation of the database is critical hence its design and
implementation must be long lasting, flawless and perfectly tailored to meet the
requirements of the system.
The problem with data is that it changes. Not just its individual items' values change, but
their structure and use, especially when kept over extended periods of time. Even for
public records that may have been kept for hundreds of years, there are occasionally
changes in what data elements are captured and recorded and how.
Therefore, a method to avoid problems due to duplication of data values and modification
of structure and content has been developed. This method is called normalization.
An Introduction to Relational Database Management System
You normalize a database in order to ensure data consistency and stability, to minimize
data redundancy, and to ensure consistent updateability and maintainability of the data, and
avoid update and delete anomalies that result in ambiguous data or inconsistent results.
Database
A database is a collection of data that is organized in a systematic way so that its contents
can easily be accessed, managed and updated. The most prevalent type of database is the
relational database, a tabular database in which data is defined so that it can be reorganized
and accessed in a number of different ways. A distributed database is one that can be
dispersed or replicated among different points in a network. The software used to manage
and query a database is known as a database management system (DBMS).
Table
A table is set of data elements that has a horizontal dimension (rows) and a vertical
dimension (columns) in a relational database system. A table has a specified number of
columns but can have any number of rows. Rows stored in a table are structurally
equivalent to records from flat files. Columns are often referred as attributes or fields. In a
database managed by a DBMS the format of each attribute is a fixed datatype. For example
the attribute date can only contain information in the date time format.
Identifier
An identifier is an attribute that is used either as a primary key or as a foreign key. The
integer datatype is used for identifiers. In cases where the number of records exceed the
allowed values by the integer datatype then a biginteger datatype is used.
Primary key
A column in a table whose values uniquely identify the rows in the table. A primary key
value cannot be NULL.
Foreign key
A column in a table that does not uniquely identify rows in that table, but is used as a link
to matching columns in other tables.
Winter School on "Data Mining Techniques and Tools for Knowledge Discovery in Agricultural Datasets”
84
An Introduction to Relational Database Management System
Relationship
A relationship is an association between two tables. For example the relationship between
the table "hotel" and "customer" maps the customers to the hotels they have used.
Index
An index is a data structure which enables a query to run at a sublinear-time. Instead of
having to go through all records one by one to identify those which match its criteria the
query uses the index to filter out those which don't and focus on those who do.
View
A view is a virtual or logical table composed of the result set of a pre-compiled query.
Unlike ordinary tables in a relational database, a view is not part of the physical schema: it
is a dynamic, virtual table computed or collated from data in the database. Changing the
data in a view alters the data stored in the database
Query
A query is a request to retrieve data from a database with the SQL SELECT instruction or
to manipulate data stored in tables.
SQL
Structured Query Language (SQL), pronounced "sequel", is a language that provides an
interface to relational database systems. It was developed by IBM in the 1970s for use in
System R. SQL is a de facto standard, as well as an ISO and ANSI standard.
Normalization
The normalization process is based on collecting an exhaustive list of all data items to be
maintained in the database and starting the design with a few "superset" tables.
Theoretically, it may be possible, although not very practical, to start by placing all the
attributes in a single table. For best results, start with a reasonable breakdown.
Reduce entities to first normal form (1NF) by removing repeating or multivalued attributes
to another, child entity.
Basically, make sure that the data is represented as a (proper) table. While key to the
relational principles, this is somewhat a motherhood statement. However, there are six
properties of a relational table (the formal name for "table" is "relation"):
85
An Introduction to Relational Database Management System
To make this first normal form, we would have to create a child entity of Orders (Order
Items) where we would store the information about the line items on the order. Each order
could then have multiple Order Items related to it.
Reduce first normal form entities to second normal form (2NF) by removing attributes that
are not dependent on the whole primary key.
The purpose here is to make sure that each column is defined in the correct table. Using the
more formal names may make this a little clearer. Make sure each attribute is kept with the
entity that it describes.
Consider the Order Items table that we established above. If we place Customer reference
in the Order Items table (Order Number, Line Item Number, Item, Qty, Price, Customer)
and assume that we use Order Number and Line Item Number as the Primary Key, it
quickly becomes obvious that the Customer reference becomes repeated in the table
because it is only dependent on a portion of the Primary Key - namely the Order Number.
Therefore, it is defined as an attribute of the wrong entity. In such an obvious case, it
should be immediately clear that the Customer reference should be in the Orders table, not
the Order Items table.
So instead of:
Winter School on "Data Mining Techniques and Tools for Knowledge Discovery in Agricultural Datasets”
86
An Introduction to Relational Database Management System
We get:
OrderNo Customer
Customer Address City
245 SteelCo
SteelCo Delhi Delhi
246 Acme Corp
Acme Corp Maharashtra Bombay
247 SteelCo
Many database designers stop at 3NF, and those first three levels of normalization do
provide the most bang for the buck. Indeed, these were the original normal forms described
in E. F. Codd's first papers. However, there are currently four additional levels of
normalization, so read on. Be aware of what you don't do, even if you stop with 3NF. In
some cases, you may even need to de-normalize some for performance reasons.
To conclude, in the following section 10 tips has been presented which can help to ensure
that databases are well designed and can be easily exported and manipulated with the
minimum of difficulties.
Winter School on "Data Mining Techniques and Tools for Knowledge Discovery in Agricultural Datasets”
87
An Introduction to Relational Database Management System
To Develop a Prototype
Significant time can be saved by creating the structure in a simple desktop database (such
as Microsoft Access) before finalising the design in one of the enterprise databases. The
developer will be able to recognise simple faults and makes changes more rapidly than
would be possible at a later date.
Winter School on "Data Mining Techniques and Tools for Knowledge Discovery in Agricultural Datasets”
88
An Introduction to Relational Database Management System
a. data definition
b. view definition
c. data manipulation (interactive and by program)
d. integrity constraints
e. authorization
f. transaction boundaries (begin, commit, and rollback).
6. View Updating Rule: All views that are theoretically updateable are also
updateable by the system.
7. High-Level Insert, Update, and Delete: The capability of handling a base relation
or a derived relation as a single operand applies not only to the retrieval of data, but
also to the insertion, update, and deletion of data.
Winter School on "Data Mining Techniques and Tools for Knowledge Discovery in Agricultural Datasets”
89
An Introduction to Relational Database Management System
Winter School on "Data Mining Techniques and Tools for Knowledge Discovery in Agricultural Datasets”
90