0% found this document useful (0 votes)
618 views

A Level CS CH 11 9618

1. The document discusses the limitations of using flat files to store database information, such as duplication of data, difficulty running queries, and data inconsistencies. 2. It then introduces the concept of a relational database as a solution, where data is stored in tables that can be related through common fields. This avoids data duplication and allows easier querying of information across multiple tables. 3. The document covers database normalization through three forms. The goal is to eliminate anomalies like modification anomalies that could occur from non-key attributes being partially dependent on the primary key. An example order database is normalized through three forms to eliminate these issues.

Uploaded by

calvin esau
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
618 views

A Level CS CH 11 9618

1. The document discusses the limitations of using flat files to store database information, such as duplication of data, difficulty running queries, and data inconsistencies. 2. It then introduces the concept of a relational database as a solution, where data is stored in tables that can be related through common fields. This avoids data duplication and allows easier querying of information across multiple tables. 3. The document covers database normalization through three forms. The goal is to eliminate anomalies like modification anomalies that could occur from non-key attributes being partially dependent on the primary key. An example order database is normalized through three forms to eliminate these issues.

Uploaded by

calvin esau
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

Computer Science 9618 Notes Databases Subject Teacher: Fahim Siddiq 03336581412

Limitations of a Flat File-Based Approach


Originally, all data held in computers was stored in flat files. A typical file used for a database-
type application would consist of a large number of records, each of which would consist of a
number of fields. Each field would have its own data type and hold a single item of data. For
example, a stock file would contain records describing stock. Each record may consist of the fields
in table.

This approach led to very large files that were difficult to process. Suppose we want to know
which items of stock need to be reordered. This is fairly straightforward. We search the file
sequentially; if the number in stock is less than the re-order level, we output the details of the
item and supplier.

The problem is that when we check the stock the next day, we create another order because the
stock that has been ordered has not been delivered. To overcome this, we could introduce a new
field called OnOrder of type Boolean. This can be set to True when an order has been placed and
reset to False when an order has been delivered. Unfortunately, it is not that straightforward.
The original software is expecting seven fields, not eight fields. This means that the software
designed to manipulate the original file must be modified to read the new file layout, i.e. the
program code needs modifying.

Ad-hoc enquiries are virtually impossible. What happens if management ask for a list of the
bestselling products?
The file has not been set up for this and to change it so that such a request can be satisfied
involves modifying all existing software.

Further, suppose we want to know which products are supplied by the company Food & Drink
Ltd. In some cases, the company’s name has been entered as “Food & Drink Ltd.”, sometimes as
“Food and Drink Ltd.” And sometimes the full stop after “Ltd” has been omitted. This means that
a match is very difficult because the data are inconsistent. Each time a new product is added to
the database, the name and address of the supplier must be entered. This leads to redundant
data or data duplication as we already have the supplier address recorded as part of several other
product records. Below figure shows how data can be proliferated when each department keeps
its own files.
Computer Science 9618 Notes Databases Subject Teacher: Fahim Siddiq 03336581412

1. Separation and Isolation of Data: Suppose we wish to know which customers have
bought items from a particular supplier. We first need to find the items supplied by a
particular supplier from one file and then use a second file to find which customers have
bought those products. This difficulty can be compounded if data are needed from more
than two files.
2. Duplication of Data: Above figure suggests that the supplier data will be duplicated
for every stock record. Duplication is wasteful as it costs time and money. Data has to be
entered more than once, therefore it takes up user time and storage space. Duplication
is also likely to lead to a loss of data integrity and data inconsistency. What happens if a
customer changes their address? The Sales Department may update their files but the
Accounts Department may not do this at the same time. Worse still, suppose the
Purchasing Department orders some parts and there is an increase in price. The
Purchasing Department increases the cost and sale prices but the Accounts Department
does not; there is now a discrepancy. When we have two copies of a data item which
should be the same and they are not, this is called “inconsistent data”.

3. Data Dependence: Data formats (typically a record description) are defined in the
application programs. If there is a need to change any of these formats, whole programs
may have to be changed. Different applications may hold the data in different forms,
again causing a problem. Suppose an extra field is needed in a file, again all application
programs using that file have to be re-coded.

4. Queries/Reports: Processing files by computer was a huge advance on the manual


processing of queries on the data. This led to end users wanting more and more
Computer Science 9618 Notes Databases Subject Teacher: Fahim Siddiq 03336581412

information. Each time a new query was asked for by a user, a new program had to be
written. Often, the data needed to answer the query were stored in more than one file,
some of which were incompatible.

The Need for Database Software

A solution to many of these problems with using flat files was the arrival of relational database
software. The data are stored in tables which have relationships between the various tables.
Each table stores data about an entity – i.e. some “thing” about which data are stored, for
example, a customer or a product. Each table has a primary key field, by which all the values in
that table are identified. The table can be viewed just like a spreadsheet grid, so one row in the
table is one record.

The practical design of relational databases is based in the theory developed in the late 1970s by
Ted Codd. The theory called the entities relations and they are implemented as tables. Each
record in the table is called a “tuple” (also known as a row). A data item is known as an attribute
(or a column).

The records in the tables can be related to entities in other tables by having common fields within
the entities. So, the problem of the supplier details being duplicated can be solved by the relevant
field in the order table simply containing the key of the supplier entity. The likely data design
here would be:

● The Supplier table has a primary key of SupplierID.


● The Product table also has the SupplierID field (to link back to the Supplier table).
● The SupplierID field in the Product table is called a foreign key.

The user can search the supplier table for details of the relevant supplier using the supplier key
when it is necessary. In this way only the foreign key SupplierID needs to be stored in the Product
table. The inclusion of other supplier data, such as the SupplierName and SupplierAddress, would
be a duplication. We already have these details of the supplier stored in the Supplier table.

The differing needs of the departments are met by the software that is used to control the data.
As all the data are stored somewhere in the system, a department only needs software that can
search for it. In this way each department does not need its own set of data, simply its own view
of the centralized database to which all users have access.

Designing a Relational Database

Normalisation is a set of formal rules that must be considered once we have a set of table
designs. By following the normalisation rules we ensure that the final table designs do not result
Computer Science 9618 Notes Databases Subject Teacher: Fahim Siddiq 03336581412

in duplicated data. If the initial designs were well thought through then the normalisation process
will not result in any changes to the table designs.

Table: ORDER

First normal form (1NF)


A table with no repeating groups is said to be in first normal form. Table “Order” has repeating
groups in the attributes ProdID and Description. We remove the
repeating groups by:
● moving the ProdID and Description attributes to a new table.
● linking the new table to the original table ORDER with a foreign key.

Below tables show the data in first normal form. The primary key of each table is shown in red.
Computer Science 9618 Notes Databases Subject Teacher: Fahim Siddiq 03336581412

Table: ORDER (1NF).

Table: ORDER-PRODUCTS

Second normal form (2NF)


A table is in second normal form if any partial dependencies have been removed. That is, every
non-key attribute must be fully dependent on all of the primary key.

In our ORDER-PRODUCTS table, Description depends only on ProdID and not on Num. Hence the
non-key attribute (Description) is not dependent on all of the primary key. We say that
Description is dependent on ProdID or, turned around: ProdID determines Description or ProdID
→ Description.
We remove the partial dependency by:
● moving the Description attribute to a new table.
● linking the new table to the ORDER-PRODUCTS table with a foreign key.

Below tables show the data in second normal form.


Computer Science 9618 Notes Databases Subject Teacher: Fahim Siddiq 03336581412

Table: ORDERPRODUCTS (2NF). Table: The PRODUCT(2NF)

At this stage, the ORDER-PRODUCTS table is fully normalised:


● 1NF – it does not have a repeated group of attributes.
● 2NF – there are no non-key attributes.

The PRODUCT table is also fully normalised:


● 1NF – it does not have a repeated group of attributes.
● 2NF – it has a single-attribute primary key.

Third normal form (3NF)

Third normal form (like second normal form) is concerned with the non-key attributes. To be in
3NF, there must be no dependencies between any of the non-key attributes. A table with no or
one non-key attribute must be in 3NF, so PRODUCT and ORDER-PRODUCTS are in 3NF.

There is a problem with the original ORDER table. City determines the Country, so we have two
nonkey attributes which are dependent. This means that ORDER is not in 3NF. Below tables and
show the data in third normal form.

Table: ORDER table (3NF). Table: CITY-COUNTRIES (3NF).


Computer Science 9618 Notes Databases Subject Teacher: Fahim Siddiq 03336581412

To summarise, we have been through the stages shown in below table. The primary key is
underlined.

The Three Relationships

One-to-one
A one-to-one relationship is when each record in one table only connects to one record in
another table. Each foreign key value will link to one primary key value and each primary key
value will only be linked to by one foreign key value. The foreign key can exist on either side of
the relationship.
Computer Science 9618 Notes Databases Subject Teacher: Fahim Siddiq 03336581412

Figure - One-to-one relationship, Sales Rep table; & Employee table.

The Sales Rep table stores details of the sales representatives within a business. This only
contains basic information about their name but their full employee details are stored in a
separate table called Employee. Each sales representative only has one employee record and
each employee record can only refer to one sales rep record.

One-to-Many
A one-to-many relationship is when each record in one table can connect to many (zero or more)
records in another table. A foreign key will exist within the table on the many side of the
relationship and will connect to a primary key in the one side of the relationship. This is the most
common type of relationship within relational databases.

Figure: One to Many relationship Between Product table & Category table

Many-to-Many

Many-to-many relationships are only conceptual. They are not used in relational databases
because they are converted into two sets of one-to-many relationships. In a many-to-many
relationship, each record in one table can connect to many records in another table but each
record in the other table can also connect to many records in the original table.
Computer Science 9618 Notes Databases Subject Teacher: Fahim Siddiq 03336581412

Create and Interpret an Entity Relationship Diagram

An entity relationship diagram (ERD) shows the relationships (connections) between each entity.
Each entity is represented by a rectangle. Each relationship is represented by a line.

Figure shows a one-to-one relationship between a Sales Rep and an Employee. Each sales rep is
related to one employee and each employee can only be one sales rep.

Figure shows a one-to-many relationship between Category and Product. Each category can have
many products, but each product has only one category.

Figure shows a many-to-many relationship between Order and Product. Each order can be for
many products and each product can exist on many orders. This is a conceptual diagram only.
Other RDBMSs may use two symbols at each end of the relationship. For example, 0:1or 0| could
be used to depict that there can be between zero and one related record on that side of the
relationship, whereas 1:1or || could be used to depict that there must be exactly one related
record on that side of the relationship.
Computer Science 9618 Notes Databases Subject Teacher: Fahim Siddiq 03336581412

Primary key
A primary key is a unique identifier for each record in a table. Therefore, the field used for the
primary key must contain unique values and not have any repeating values.
Examples of primary keys could include:
• registration plate for a car
• student number for a student
• product code for a product.

Compound key
A compound key is two or more fields combined to form a unique identity.

Foreign key
A foreign key is a field in a table that refers to the primary key in another table. It is used to create
the relationship between the two tables. The foreign key must always have the same data type
and field size as the primary key it is linking to.

Candidate key: A key that could be chosen as the primary key.


Secondary key: A candidate key that has not been chosen as the primary key.

Referential Integrity
Referential integrity exists when data in the foreign key of the table on the many side of a
relationship exists in the primary key of the table on the one side of a relationship.

In the Order table above, Customer ID 5 does not exist in the Customer table. This means that
the Order table does not contain referential integrity because the related customer does not
exist.
Computer Science 9618 Notes Databases Subject Teacher: Fahim Siddiq 03336581412

Difference Between Flat-Files & Relational Database

Flat Files Relational Database


Data are stored in a number of files. Data are contained in a single software
application – the relational database or DBMS
software.
Data are highly likely to be duplicated and may Duplication of data is minimized and so the
become inconsistent – it can never be certain chance of data inconsistency is reduced.
that all copies of a piece of data have been As long as there is a link to the table storing
updated. the data, they can always be accessed via the
link rather than repeating the data.
Good database design avoids data
duplication.
Because of data duplication, the volume of Because data duplication is minimized, the
data stored is large. volume of data is reduced, leading to faster
searching and sorting of data.
When data structures need to be altered, the Data structures remain the same even when
software must be re-written. the tables are altered. Existing programs do
not need to be altered when a table design is
changed.
Views of the data are governed by the Queries and reports can be set up with simple
different files used to control the data and “point and click” features or using the data
produced by individual departments. All views manipulation language. A novice user can
of the data have to be programmed and this is write queries quickly.
very time-consuming.

The Database Management System (DBMS)

It is vital to understand that a database is not just a collection of data. A database is an


implementation according to the rules of a theoretical model. The basic concept was proposed
some 40 years ago by ANSI (American National Standards Institute) in its three-level model. The
three levels are:
• The external level
• The conceptual level
• The internal level.
The architecture is illustrated in figure below in the context of a database to be set up for our
theatrical agency.
Computer Science 9618 Notes Databases Subject Teacher: Fahim Siddiq 03336581412

The physical storage of the data is represented here as being on disk. The details of the storage
(the internal schema) are known only at the internal level, the lowest level in the ANSI
architecture. This is controlled by the database management system (DBMS) software. The
programmers who wrote this software are the only ones who know the structure for the storage
of the data on disk. The software will accommodate any changes that might be needed in the
storage medium.
At the next level, the conceptual level, there is a single universal view of the database. This is
controlled by the database administrator (DBA) who has access to the DBMS. In the ANSI
architecture the conceptual level has a conceptual schema describing the organization of the
data as perceived by a user or programmer. This may also be described as a logical schema. At
the external level there are individual user and programmer views. Each view has an external
schema describing which parts of the database are accessible. A view can support a number of
user programs.
An important aspect of the provision of views is that they can be used by the DBA as
a mechanism for ensuring security. Individual users or groups of users can be given
appropriate access rights to control what actions are allowed for that view. For example,
a user may be allowed to read data but not to amend data. Alternatively, there may only be
access to a limited number of the tables in the database.

The Facilities Provided by a DBMS


The DBMS provides software tools through a developer interface.

Developer Interface: Gives access to software tools provided by a DBMS for creating tables the
DBMS provides facilities for a programmer to develop a user interface.
Query: It provides a query processor that allows a query to be created and processed. The query
is the mechanism for extracting and manipulating data from the database.
Computer Science 9618 Notes Databases Subject Teacher: Fahim Siddiq 03336581412

Report: The other feature likely to be provided by the DBMS is the capability for creating a report
to present formatted output.

Database Administrator
The DBA is responsible for setting up the user and programmer views and for defining the
appropriate, specific access rights. An important feature of the DBMS is the data dictionary which
is part of the database that is hidden from view from everyone except the DBA. It contains
metadata about the data. This includes details of all the definitions of tables, attributes and so
on but also of how the physical storage is organized.

There are a number of features that can improve performance. Of special note is the capability
to create an index for a table. This is needed if the table contains a large number of attributes
and a large number of tuples. An index is a secondary table that is associated with an attribute
that has unique values. The index table contains the attribute values and pointers to the
corresponding tuples in the original table. The index can be on the primary key or on a secondary
key. Searching an index table is much quicker than searching the full table.

The integrity of the data in the database is a key concern. One potential cause of problems occurs
when a transaction is started but a system problem prevents its completion. The result would be
a database in an undefined state. The DBMS should have a built-in feature that prevents this from
happening. As with all systems, regular backup is a requirement. The DBA will be responsible for
backup of the stored data.

Structured Query Language (SQL)


SQL is the programming language provided by a DBMS to support all of the operations associated
with a relational database. Even when a database package offers high-level software tools for
user interaction, they create an implementation using SQL.

Data definition language

Data Definition Language (DDL) is a way to adjust the structure of a database. You might have
created databases in the past using a GUI such as Access or even MySQL. DDL allows you to create
databases from pure code including the ability to:

• Create tables: CREATE TABLE


• Change the structure of a table: ALTER
• Delete tables: DROP
Computer Science 9618 Notes Databases Subject Teacher: Fahim Siddiq 03336581412

CREATE
You need to know what they all do (as listed above), though you only need to know how to
implement the CREATE TABLE command. Let's look at how we could have made the crooks
table above:

CREATE TABLE crooks


(
ID INTEGER PRIMARY KEY,
NAME VARCHAR(16),
GENDER VARCHAR(6),
DOB DATE,
TOWN VARCHAR(20),
NUMSCARS INTEGER
)

ALTER
An ALTER statement in SQL changes the properties of a table in a relational database without
the need to access the table manually.

ALTER TABLE crooks ADD convictions INTEGER


ALTER TABLE crooks DROP COLUMN convictions

DROP
Dropping a table is like dropping a nuclear bomb. It is irreversible and is frowned upon in
modern society.

DROP TABLE crooks

By running this line of code, the table "crooks" will be removed from the database with no
chance of it being recovered unless backups have been previously made.
Setting Primary Keys
Primary keys can be set after table creation via the alter statement.

ALTER TABLE Persons


Computer Science 9618 Notes Databases Subject Teacher: Fahim Siddiq 03336581412

ADD PRIMARY KEY (id)

Primary keys can also be set during table creation

CREATE TABLE users


(
user_id int NOT NULL,
username varchar(255) NOT NULL,
password varchar(255) NOT NULL,
Address varchar(255),
PRIMARY KEY (user_id)
)

Setting Composite Keys


To set a primary key made up of two columns during table creation you could do something
such as this

CREATE TABLE users


(
user_id int NOT NULL,
username varchar(255) NOT NULL,
password varchar(255) NOT NULL,
Address varchar(255),
CONSTRAINT pk_UserId PRIMARY KEY (user_id,username)
)

Where the constraint name would be UserId and the table's primary key would be made up of
the user_id and the username columns.
This could also be done after table creation:

ALTER TABLE users


ADD CONSTRAINT pk_UserID PRIMARY KEY (user_id,username)

Data Manipulation Language (DML)


There are three categories of use for Data Manipulation Language (DML)
• The insertion of data into the tables when the database is created.
• The modification or removal of data in the database.
• The reading of data stored in the database.
Computer Science 9618 Notes Databases Subject Teacher: Fahim Siddiq 03336581412

To help us understand how these things work we are going to use a test data set. Databases are
used in all areas of the computer industry, but for the moment we are going to use a dataset
that keeps track of crooks in England, noting, names, gender, date of birth, towns and numbers
of scars. Take a look at the crooks data table below:

ID name gender DoB town numScars

1 Geoff male 12/05/1982 Hull 0

2 Jane female 05/08/1956 York 1

3 Keith male 07/02/1999 Snape 6

4 Oliver male 22/08/1976 Blaxhall 2

5 Kelly female 11/11/1911 East Ham 10

6 Marea female 14/07/1940 Wythenshawe 6

To select all the items from this table we can use:

SELECT * FROM crooks

This would display all the results. But what if we just want to display the names and number of
scars of the female crooks?

SELECT name, numScars FROM crooks


WHERE gender = 'female'

The result of this query would be:


Computer Science 9618 Notes Databases Subject Teacher: Fahim Siddiq 03336581412

name numScars

Jane 1

Kelly 10

Marea 6

SELECT

The SELECT statement allows you to ask the database a question (Query it), and specify what data
it returns. We might want to ask something like Tell me the name and ages of all the crooks. Of
course, this wouldn't work, so we need to put this into a language that a computer can
understand: Structured Query Language or SQL for short:

SELECT name, DoB --what to return


FROM crooks --where are you returning it from

This would return the following:

name DoB

Geoff 12/05/1982

Jane 05/08/1956

Keith 07/02/1999

Oliver 22/08/1976
Computer Science 9618 Notes Databases Subject Teacher: Fahim Siddiq 03336581412

Kelly 11/11/1911

Marea 14/07/1940

But suppose we wanted to filter these results, for instance: Tell me the ID, name and ages of all
the crooks who are male and come from Snape. We need to use another statement, the WHERE
clause, allowing us to give the query some criteria (or options):

SELECT ID, name, DoB


FROM crooks
WHERE town = 'Snape' AND gender = 'male' --Criteria

This would return the following:

ID name DoB

3 Keith 07/02/1999

Say the police knew that a crime had been committed by a heavily scarred woman (4+ scars),
they want a list of all the scarred women:

SELECT name, town, scars


FROM crooks
WHERE numScars >= 4 AND gender = 'female' --Criteria

This would return:

name town numScars

Kelly East Ham 10

Marea Wythenshawe 6
Computer Science 9618 Notes Databases Subject Teacher: Fahim Siddiq 03336581412

However, the police want to quickly sort through and see who is the most heavily scarred. We
are going to use an ORDER command:

SELECT name, town


FROM crooks
WHERE numScars >= 4 AND gender = 'female' --Criteria
ORDER BY numScars DESC --sorts the numScars values in big to small order

INNER JOIN
We spoke earlier about how databases are great because they allow you to link tables together
and perform complex searches on linked data. So far, we have only looked at searching one table.
When you use a social network such as Facebook you can often see a list of all your friends in the
side bar as well as details from your record such as your name and place of work. How did they
find this data? They would have searched for all the relationships that involve your ID, returning
the names of people involved AND returned values such as job title from your personal record.
This looks like using two queries: --return relationship information --return personal record
information It would be possible to do this, but it's far easier to use one query for both things.
Take a look at this example. The police want to know the name and town of a criminal (ID = 45)
along with all the descriptions of crimes they have performed:

SELECT name, town, description --select things to return (from different tables)
FROM crooks, crime --name tables that data comes from
WHERE crook.Id = crime.crimId --specify the link dot.notation means table.field. The Ids are
the same
AND crook.Id = 45 --specify which crook you are looking at
ORDER BY date ASC --order the results by the oldest first

Operators used in the WHERE clauses

Operator Meaning of the operator Example

= Checks if they're equivalent Id1 = 123


Computer Science 9618 Notes Databases Subject Teacher: Fahim Siddiq 03336581412

Checks if a field is greater than or less • Id1 > 123


> and <
than • Id1 < 123

<> Checks if it's not equal to Id1 <> 123


Similar to "> and <", but also checks • Id1 >= 123
>= and <=
for equality • Id1 <= 123
Will be accepted if either the left,
OR Id1 = 123 OR Id2 <> 444
right or both are true
Only accepted if both the left part
AND Id1 = 123 AND Id2 <> 444
and the right part are true
Inverts the boolean value of the
NOT NOT Id1 = 123
statement
Checks if it's a null value contained
IS NULL Id1 IS NULL
within the variable
... BETWEEN ...
Checks if something is within a range Id1 BETWEEN 2.8 AND 3.14159265
AND ...

INSERT
We might also want to add new things to our database, for example when we are adding new
Criminal records or a new friendship link on Facebook. To add new records, we use the INSERT
command with values of the fields we want to insert:

INSERT INTO crooks


VALUES (1234,'Julie', 'female','12/12/1994','Little Maplestead',67)

Sometimes we might not want to insert all the fields, some of them might not be compulsory:

INSERT INTO crooks (ID, name, town) --specific fields to insert into
VALUES (999, 'Frederick', 'Shotley')
Computer Science 9618 Notes Databases Subject Teacher: Fahim Siddiq 03336581412

ID name gender DoB town numScars

1 Geoff male 12/05/1982 Hull 0

2 Jane female 05/08/1956 York 1

3 Keith male 07/02/1999 Snape 6

4 Oliver male 22/08/1976 Blaxhall 2

5 Kelly female 11/11/1911 East Ham 10

6 Marea female 14/07/1940 Wythenshawe 6

1234 Julie female 12/12/1994 Little Maplestead 67

999 Frederick Shotley

DELETE

Sometimes you might fall out with friends on Facebook so that you don't even want them to see
your restricted page. You'd want to delete your relationship (it's actually likely that Facebook
retains the connection but flags it as 'deleted', that isn't important here). The police might also
find that a crook is in fact innocent and then want to delete their criminal record. We need a
DELETE command to do all of these things, to permanently delete records from databases.
Imagine that we find out that Geoff was framed and he is completely innocent:

DELETE FROM crooks


WHERE name = 'Geoff'

You might also like