A Level CS CH 11 9618
A Level CS CH 11 9618
This approach led to very large files that were difficult to process. Suppose we want to know
which items of stock need to be reordered. This is fairly straightforward. We search the file
sequentially; if the number in stock is less than the re-order level, we output the details of the
item and supplier.
The problem is that when we check the stock the next day, we create another order because the
stock that has been ordered has not been delivered. To overcome this, we could introduce a new
field called OnOrder of type Boolean. This can be set to True when an order has been placed and
reset to False when an order has been delivered. Unfortunately, it is not that straightforward.
The original software is expecting seven fields, not eight fields. This means that the software
designed to manipulate the original file must be modified to read the new file layout, i.e. the
program code needs modifying.
Ad-hoc enquiries are virtually impossible. What happens if management ask for a list of the
bestselling products?
The file has not been set up for this and to change it so that such a request can be satisfied
involves modifying all existing software.
Further, suppose we want to know which products are supplied by the company Food & Drink
Ltd. In some cases, the company’s name has been entered as “Food & Drink Ltd.”, sometimes as
“Food and Drink Ltd.” And sometimes the full stop after “Ltd” has been omitted. This means that
a match is very difficult because the data are inconsistent. Each time a new product is added to
the database, the name and address of the supplier must be entered. This leads to redundant
data or data duplication as we already have the supplier address recorded as part of several other
product records. Below figure shows how data can be proliferated when each department keeps
its own files.
Computer Science 9618 Notes Databases Subject Teacher: Fahim Siddiq 03336581412
1. Separation and Isolation of Data: Suppose we wish to know which customers have
bought items from a particular supplier. We first need to find the items supplied by a
particular supplier from one file and then use a second file to find which customers have
bought those products. This difficulty can be compounded if data are needed from more
than two files.
2. Duplication of Data: Above figure suggests that the supplier data will be duplicated
for every stock record. Duplication is wasteful as it costs time and money. Data has to be
entered more than once, therefore it takes up user time and storage space. Duplication
is also likely to lead to a loss of data integrity and data inconsistency. What happens if a
customer changes their address? The Sales Department may update their files but the
Accounts Department may not do this at the same time. Worse still, suppose the
Purchasing Department orders some parts and there is an increase in price. The
Purchasing Department increases the cost and sale prices but the Accounts Department
does not; there is now a discrepancy. When we have two copies of a data item which
should be the same and they are not, this is called “inconsistent data”.
3. Data Dependence: Data formats (typically a record description) are defined in the
application programs. If there is a need to change any of these formats, whole programs
may have to be changed. Different applications may hold the data in different forms,
again causing a problem. Suppose an extra field is needed in a file, again all application
programs using that file have to be re-coded.
information. Each time a new query was asked for by a user, a new program had to be
written. Often, the data needed to answer the query were stored in more than one file,
some of which were incompatible.
A solution to many of these problems with using flat files was the arrival of relational database
software. The data are stored in tables which have relationships between the various tables.
Each table stores data about an entity – i.e. some “thing” about which data are stored, for
example, a customer or a product. Each table has a primary key field, by which all the values in
that table are identified. The table can be viewed just like a spreadsheet grid, so one row in the
table is one record.
The practical design of relational databases is based in the theory developed in the late 1970s by
Ted Codd. The theory called the entities relations and they are implemented as tables. Each
record in the table is called a “tuple” (also known as a row). A data item is known as an attribute
(or a column).
The records in the tables can be related to entities in other tables by having common fields within
the entities. So, the problem of the supplier details being duplicated can be solved by the relevant
field in the order table simply containing the key of the supplier entity. The likely data design
here would be:
The user can search the supplier table for details of the relevant supplier using the supplier key
when it is necessary. In this way only the foreign key SupplierID needs to be stored in the Product
table. The inclusion of other supplier data, such as the SupplierName and SupplierAddress, would
be a duplication. We already have these details of the supplier stored in the Supplier table.
The differing needs of the departments are met by the software that is used to control the data.
As all the data are stored somewhere in the system, a department only needs software that can
search for it. In this way each department does not need its own set of data, simply its own view
of the centralized database to which all users have access.
Normalisation is a set of formal rules that must be considered once we have a set of table
designs. By following the normalisation rules we ensure that the final table designs do not result
Computer Science 9618 Notes Databases Subject Teacher: Fahim Siddiq 03336581412
in duplicated data. If the initial designs were well thought through then the normalisation process
will not result in any changes to the table designs.
Table: ORDER
Below tables show the data in first normal form. The primary key of each table is shown in red.
Computer Science 9618 Notes Databases Subject Teacher: Fahim Siddiq 03336581412
Table: ORDER-PRODUCTS
In our ORDER-PRODUCTS table, Description depends only on ProdID and not on Num. Hence the
non-key attribute (Description) is not dependent on all of the primary key. We say that
Description is dependent on ProdID or, turned around: ProdID determines Description or ProdID
→ Description.
We remove the partial dependency by:
● moving the Description attribute to a new table.
● linking the new table to the ORDER-PRODUCTS table with a foreign key.
Third normal form (like second normal form) is concerned with the non-key attributes. To be in
3NF, there must be no dependencies between any of the non-key attributes. A table with no or
one non-key attribute must be in 3NF, so PRODUCT and ORDER-PRODUCTS are in 3NF.
There is a problem with the original ORDER table. City determines the Country, so we have two
nonkey attributes which are dependent. This means that ORDER is not in 3NF. Below tables and
show the data in third normal form.
To summarise, we have been through the stages shown in below table. The primary key is
underlined.
One-to-one
A one-to-one relationship is when each record in one table only connects to one record in
another table. Each foreign key value will link to one primary key value and each primary key
value will only be linked to by one foreign key value. The foreign key can exist on either side of
the relationship.
Computer Science 9618 Notes Databases Subject Teacher: Fahim Siddiq 03336581412
The Sales Rep table stores details of the sales representatives within a business. This only
contains basic information about their name but their full employee details are stored in a
separate table called Employee. Each sales representative only has one employee record and
each employee record can only refer to one sales rep record.
One-to-Many
A one-to-many relationship is when each record in one table can connect to many (zero or more)
records in another table. A foreign key will exist within the table on the many side of the
relationship and will connect to a primary key in the one side of the relationship. This is the most
common type of relationship within relational databases.
Figure: One to Many relationship Between Product table & Category table
Many-to-Many
Many-to-many relationships are only conceptual. They are not used in relational databases
because they are converted into two sets of one-to-many relationships. In a many-to-many
relationship, each record in one table can connect to many records in another table but each
record in the other table can also connect to many records in the original table.
Computer Science 9618 Notes Databases Subject Teacher: Fahim Siddiq 03336581412
An entity relationship diagram (ERD) shows the relationships (connections) between each entity.
Each entity is represented by a rectangle. Each relationship is represented by a line.
Figure shows a one-to-one relationship between a Sales Rep and an Employee. Each sales rep is
related to one employee and each employee can only be one sales rep.
Figure shows a one-to-many relationship between Category and Product. Each category can have
many products, but each product has only one category.
Figure shows a many-to-many relationship between Order and Product. Each order can be for
many products and each product can exist on many orders. This is a conceptual diagram only.
Other RDBMSs may use two symbols at each end of the relationship. For example, 0:1or 0| could
be used to depict that there can be between zero and one related record on that side of the
relationship, whereas 1:1or || could be used to depict that there must be exactly one related
record on that side of the relationship.
Computer Science 9618 Notes Databases Subject Teacher: Fahim Siddiq 03336581412
Primary key
A primary key is a unique identifier for each record in a table. Therefore, the field used for the
primary key must contain unique values and not have any repeating values.
Examples of primary keys could include:
• registration plate for a car
• student number for a student
• product code for a product.
Compound key
A compound key is two or more fields combined to form a unique identity.
Foreign key
A foreign key is a field in a table that refers to the primary key in another table. It is used to create
the relationship between the two tables. The foreign key must always have the same data type
and field size as the primary key it is linking to.
Referential Integrity
Referential integrity exists when data in the foreign key of the table on the many side of a
relationship exists in the primary key of the table on the one side of a relationship.
In the Order table above, Customer ID 5 does not exist in the Customer table. This means that
the Order table does not contain referential integrity because the related customer does not
exist.
Computer Science 9618 Notes Databases Subject Teacher: Fahim Siddiq 03336581412
The physical storage of the data is represented here as being on disk. The details of the storage
(the internal schema) are known only at the internal level, the lowest level in the ANSI
architecture. This is controlled by the database management system (DBMS) software. The
programmers who wrote this software are the only ones who know the structure for the storage
of the data on disk. The software will accommodate any changes that might be needed in the
storage medium.
At the next level, the conceptual level, there is a single universal view of the database. This is
controlled by the database administrator (DBA) who has access to the DBMS. In the ANSI
architecture the conceptual level has a conceptual schema describing the organization of the
data as perceived by a user or programmer. This may also be described as a logical schema. At
the external level there are individual user and programmer views. Each view has an external
schema describing which parts of the database are accessible. A view can support a number of
user programs.
An important aspect of the provision of views is that they can be used by the DBA as
a mechanism for ensuring security. Individual users or groups of users can be given
appropriate access rights to control what actions are allowed for that view. For example,
a user may be allowed to read data but not to amend data. Alternatively, there may only be
access to a limited number of the tables in the database.
Developer Interface: Gives access to software tools provided by a DBMS for creating tables the
DBMS provides facilities for a programmer to develop a user interface.
Query: It provides a query processor that allows a query to be created and processed. The query
is the mechanism for extracting and manipulating data from the database.
Computer Science 9618 Notes Databases Subject Teacher: Fahim Siddiq 03336581412
Report: The other feature likely to be provided by the DBMS is the capability for creating a report
to present formatted output.
Database Administrator
The DBA is responsible for setting up the user and programmer views and for defining the
appropriate, specific access rights. An important feature of the DBMS is the data dictionary which
is part of the database that is hidden from view from everyone except the DBA. It contains
metadata about the data. This includes details of all the definitions of tables, attributes and so
on but also of how the physical storage is organized.
There are a number of features that can improve performance. Of special note is the capability
to create an index for a table. This is needed if the table contains a large number of attributes
and a large number of tuples. An index is a secondary table that is associated with an attribute
that has unique values. The index table contains the attribute values and pointers to the
corresponding tuples in the original table. The index can be on the primary key or on a secondary
key. Searching an index table is much quicker than searching the full table.
The integrity of the data in the database is a key concern. One potential cause of problems occurs
when a transaction is started but a system problem prevents its completion. The result would be
a database in an undefined state. The DBMS should have a built-in feature that prevents this from
happening. As with all systems, regular backup is a requirement. The DBA will be responsible for
backup of the stored data.
Data Definition Language (DDL) is a way to adjust the structure of a database. You might have
created databases in the past using a GUI such as Access or even MySQL. DDL allows you to create
databases from pure code including the ability to:
CREATE
You need to know what they all do (as listed above), though you only need to know how to
implement the CREATE TABLE command. Let's look at how we could have made the crooks
table above:
ALTER
An ALTER statement in SQL changes the properties of a table in a relational database without
the need to access the table manually.
DROP
Dropping a table is like dropping a nuclear bomb. It is irreversible and is frowned upon in
modern society.
By running this line of code, the table "crooks" will be removed from the database with no
chance of it being recovered unless backups have been previously made.
Setting Primary Keys
Primary keys can be set after table creation via the alter statement.
Where the constraint name would be UserId and the table's primary key would be made up of
the user_id and the username columns.
This could also be done after table creation:
To help us understand how these things work we are going to use a test data set. Databases are
used in all areas of the computer industry, but for the moment we are going to use a dataset
that keeps track of crooks in England, noting, names, gender, date of birth, towns and numbers
of scars. Take a look at the crooks data table below:
This would display all the results. But what if we just want to display the names and number of
scars of the female crooks?
name numScars
Jane 1
Kelly 10
Marea 6
SELECT
The SELECT statement allows you to ask the database a question (Query it), and specify what data
it returns. We might want to ask something like Tell me the name and ages of all the crooks. Of
course, this wouldn't work, so we need to put this into a language that a computer can
understand: Structured Query Language or SQL for short:
name DoB
Geoff 12/05/1982
Jane 05/08/1956
Keith 07/02/1999
Oliver 22/08/1976
Computer Science 9618 Notes Databases Subject Teacher: Fahim Siddiq 03336581412
Kelly 11/11/1911
Marea 14/07/1940
But suppose we wanted to filter these results, for instance: Tell me the ID, name and ages of all
the crooks who are male and come from Snape. We need to use another statement, the WHERE
clause, allowing us to give the query some criteria (or options):
ID name DoB
3 Keith 07/02/1999
Say the police knew that a crime had been committed by a heavily scarred woman (4+ scars),
they want a list of all the scarred women:
Marea Wythenshawe 6
Computer Science 9618 Notes Databases Subject Teacher: Fahim Siddiq 03336581412
However, the police want to quickly sort through and see who is the most heavily scarred. We
are going to use an ORDER command:
INNER JOIN
We spoke earlier about how databases are great because they allow you to link tables together
and perform complex searches on linked data. So far, we have only looked at searching one table.
When you use a social network such as Facebook you can often see a list of all your friends in the
side bar as well as details from your record such as your name and place of work. How did they
find this data? They would have searched for all the relationships that involve your ID, returning
the names of people involved AND returned values such as job title from your personal record.
This looks like using two queries: --return relationship information --return personal record
information It would be possible to do this, but it's far easier to use one query for both things.
Take a look at this example. The police want to know the name and town of a criminal (ID = 45)
along with all the descriptions of crimes they have performed:
SELECT name, town, description --select things to return (from different tables)
FROM crooks, crime --name tables that data comes from
WHERE crook.Id = crime.crimId --specify the link dot.notation means table.field. The Ids are
the same
AND crook.Id = 45 --specify which crook you are looking at
ORDER BY date ASC --order the results by the oldest first
INSERT
We might also want to add new things to our database, for example when we are adding new
Criminal records or a new friendship link on Facebook. To add new records, we use the INSERT
command with values of the fields we want to insert:
Sometimes we might not want to insert all the fields, some of them might not be compulsory:
INSERT INTO crooks (ID, name, town) --specific fields to insert into
VALUES (999, 'Frederick', 'Shotley')
Computer Science 9618 Notes Databases Subject Teacher: Fahim Siddiq 03336581412
DELETE
Sometimes you might fall out with friends on Facebook so that you don't even want them to see
your restricted page. You'd want to delete your relationship (it's actually likely that Facebook
retains the connection but flags it as 'deleted', that isn't important here). The police might also
find that a crook is in fact innocent and then want to delete their criminal record. We need a
DELETE command to do all of these things, to permanently delete records from databases.
Imagine that we find out that Geoff was framed and he is completely innocent: