0% found this document useful (0 votes)
51 views

Database and Design by Bandeshah

This document discusses databases and database management systems (DBMS). It begins by explaining flat file databases, which store all information in a single file, and their limitations when handling large amounts of data. The document then introduces relational databases as an improvement, which split data across multiple tables that can be linked together. Key benefits of relational databases over flat files include avoiding data duplication, enabling complex queries, and providing better security and flexibility. The document also provides a brief overview of what a DBMS is and its role in handling the storage, retrieval, and updating of data.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
51 views

Database and Design by Bandeshah

This document discusses databases and database management systems (DBMS). It begins by explaining flat file databases, which store all information in a single file, and their limitations when handling large amounts of data. The document then introduces relational databases as an improvement, which split data across multiple tables that can be linked together. Key benefits of relational databases over flat files include avoiding data duplication, enabling complex queries, and providing better security and flexibility. The document also provides a brief overview of what a DBMS is and its role in handling the storage, retrieval, and updating of data.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 36

HND with Bandeshah

DATABASE AND
HND Database & Design DATA MODELLING
Compiled: By Engr. Shahzadah Ashraf Bande’Shah

STUDENTS NAME:

shahzadah.ashraf 03332076121
@gmail.com

Follow On Facebook :
DATE ISSUED: https://www.facebook.com/shahzad
ah.ashraf

For More Resources:


http://www.bandashah.com

CONTACT NUMBER:

HND DataBase

Page 1 of 136
8.1 Database Management Systems (DBMS)

What Are Databases?

A database is a stored collection of information arranged in a logical, structured manner. Databases have
been a staple of business computing from the very beginning of the digital era.

Flat File

Originally, databases were flat. This means that the information was stored in one long text file, called
a tab delimited file. Each entry in the tab delimited file is separated by a special character, such as a
vertical bar (|). Each entry contains multiple pieces of information (fields) about a particular object or
person grouped together as a record. For example:

Lname, FName, Age, Salary|Smith, John, 35, $280|Doe, Jane, 28, $325|Brown, Scott, 41,
$265|Howard, Shemp, 48, $359|Taylor, Tom, 22, $250

Flat File Database

When there is only a single table in the database, this is called a 'flat file database'.

A flat file database looks something like this: -

ID Title First name Surname Address City Postcode Telephone


1 Mr Tom Smith 42 Mill Street London WE13GW 010344044
2 Mrs Sandra Jones 10 Low Lane Hull HU237HJ 022344033
2 Mr John Jones 10 Low Lane Hull HU237HJ 022344033

A flat file database is an excellent way of storing a relatively small amount of records (few thousand
perhaps).

For example a spreadsheet application such as Excel can be used as a flat file database. Each row in a
worksheet can be a record and each column, a field. The worksheet is effectively a table.

Features Of Flat File Database:

Placing data in a flat file database has the following advantages


 All records are stored in one place
 Easy to set up using a number of standard office applications
 Easy to understand
 Simple sorting of records can be carried out
 Record can be viewed or extracted on the basis of simple criteria

Everyday things like business contacts, customer lists and so on can be stored and used in a flat file
database.

But they do have some serious disadvantages when it comes to more than a few thousand records.

Page 5 of 136
Disadvantages

1. Potential Duplication
As more and more records are added to the database it becomes difficult to avoid duplicate records. This is
because there is no mechanism built in to the system to prevent duplication. Later you will see how
'primary keys' are used to prevent this.

2. Non-Unique Records

Notice that Mr. & Mrs. Jones have identical ID's. This is because the person producing this database
decided they may want to sort on identical telephone numbers and so has applied identical ID to the two
records. This is fine for that purpose, but suppose you only wanted to extract Mrs. Jones' record. Now it is
much more difficult.

3. Harder To Update

Suppose that this flat file database also stored their work place details - this will result in multiple records
for each person. Again, this is fine - but suppose Sandra Jones now wanted to be known as 'Sandra
Thompson' after re-marrying? This will have to be done over potentially many records and so flat file
updates are more error-prone than other methods

4. Inherently Inefficient
Consider a situation where the database now needs to hold an extra field to hold their email address. If
there are tens of thousands of records, there may be many people having no email address, but each
record in a flat file database has to have the same fields, whether they are used or not. Other methods
avoid this wasted storage.

5. Harder To Change Data Format


Suppose the telephone numbers now have to have a dash between the area code and the rest of the
number, like this 0223-44033. Adding that extra dash over tens of thousands of records would be a
significant task in a flat file database.

6. Poor At Complex Queries


If we wanted to find all records with a specific telephone number, this is a simple single-field criteria that a
flat file can easily deal with. But now suppose we wanted all people living in Hull who share the same
surname and similar postcode? - The criteria can quickly become too complex for a flat file to manage.

7. Poor At Limiting Access


Suppose this flat file database held a confidential field in each record that only certain staff are allowed to
see - perhaps salaries. This is difficult to achieve in a flat file database - once a person has entered a valid
password to gain access, that person is able to see everything.

Because of these limitations other types of database have been developed.

Page 6 of 136
Relational Database

To overcome the limitations of a simple flat file database that has only a single table, another type of
database has been developed called a 'relational database'.

A relational database holds its data over a number of tables instead of one. Records within the tables are
linked (related) to records held in other tables.

The picture below shows two tables. The main one is called 'customers'. This contains almost the same
fields as we have seen in the flat file database. But there is one key difference - the city is now held in a
separate table called 'city'. The line between them shows there is a link (relationship) between a record in
the city table and records in the main table.

The line between the fields has a '1' on one side and the infinity sign on the other. In Access this indicates a
'one-to-many' relationship. This is described in more detail on the Entity Relationship Diagram.

A small relational database may only contain two tables whilst a large corporate database could contain
hundreds of tables.

Page 7 of 136
Advantages Of A Relational Database

Splitting data into a number of related tables brings many advantages over a flat file database. These
include:

1. Data Is Only Stored Once

In the previous example, the city data was gathered into one table so now there is only one record per
city. The advantages of this are

 No multiple record changes needed


 More efficient storage
 Simple to delete or modify details.
 All records in other tables having a link to that entry will show the change.

2. Complex Queries Can Be Carried Out

A language called SQL has been developed to allow programmers to 'Insert', 'Update', 'Delete', 'Create',
'Drop' table records. These actions are further refined by a 'Where' clause. For example

SELECT * FROM Customer WHERE ID = 2

This SQL statement will extract record number 2 from the Customer table. Far more complicated queries
can be written that can extract data from many tables at once.

3. Better Security

By splitting data into tables, certain tables can be made confidential. When a person logs on with their
username and password, the system can then limit access only to those tables whose records they are
authorized to view. For example, a receptionist would be able to view employee location and contact details
but not their salary. A salesman may see his team's sales performance but not competing teams.

4. Cater For Future Requirements

By having data held in separate tables, it is simple to add records that are not yet needed but may be in the
future. For example, the city table could be expanded to include every city and town in the country, even
though no other records are using them all as yet. A flat file database cannot do this.

Summary - advantages of a relational database over flat file

 Avoids data duplication


 Avoids inconsistent records
 Easier to change data
 Easier to change data format
 Data can be added and removed easily
 Easier to maintain security.

Page 8 of 136
DBMS (Data Base Management System)

It’s a software that handles the storage, retrieval, and updating of data in a computer system.

A DBMS provides an environment that is both convenient and efficient to use when there is a large volume
of data and many transactions to be processed. Different categories of DBMS can be used, ranging from
small systems that run on personal computers to huge systems that run on mainframes.

Page 9 of 136
Benefits of DBMS:

A DBMS is responsible for processing data and converting it into information. For this purpose, the database
has to be manipulated, which includes querying the database to retrieve specific data, updating the
database, and finally, generating reports.

These reports are the source of information, which is, processed data. A DBMS is also responsible for data
security and integrity.
The benefits of a typical DBMS are as follows:

Data Storage

The programs required for physically storing data, handled by a DBMS, is done by creating complex data
structures, and the process is called data storage management.

Data Definition

A DBMS provides functions to define the structure of the data in the application. These include defining and
modifying the record structure, the type and size of fields, and the various constraints/conditions to be
satisfied by the data in each field.

Data Manipulation

Once the data structure is defined, data needs to be inserted, modified, or deleted. The functions, which
perform these operations, are also part of a DBMS. These functions can handle planned and unplanned data
manipulation needs. Planned queries are those, which form part of the application. Unplanned queries are
ad-hoc queries, which are performed on a need basis.

Data Security And Integrity

Data security is of utmost importance when there are multiple users accessing the database. It is required
for keeping a check over data access by users. The security rules specify, which user has access to the
database, what data elements the user has access to, and the data operations that the user can perform.
Data in the database should contain as few errors as possible. For example, the employee number for adding
a new employee should not be left blank. Telephone number should contain only numbers. Such checks are
taken care of by a DBMS.
Thus, the DBMS contains functions, which handle the security and integrity of data in the application. These
can be easily invoked by the application and hence, the application programmer need not code these
functions in the programs.

Data Recovery And Concurrency

Recovery of data after a system failure and concurrent access of records by multiple users are also handled
by a DBMS.

Performance

Optimizing the performance of the queries is one of the important functions of a DBMS. Hence, the DBMS
has a set of programs forming the Query Optimizer, which evaluates the different implementations of a
query and chooses the best among them.

Page 10 of 136
Multi-User Access Control

At any point of time, more than one user can access the same data. A DBMS takes care of the sharing of data
among multiple users, and maintains data integrity.

Database Access Languages And Application Programming Interfaces (APIS)

The query language of a DBMS implements data access. SQL is the most commonly used query language. A
query language is a non-procedural language, where the user needs to request what is required and need
not specify how it is to be done. Some procedural languages such as C, Visual Basic, Pascal, and others
provide data access to programmers through these APIs and other tools.

Data Modelling

Conceptual, logical and physical schema

This shows that a data model can be an external model (or view), a conceptual model, or a physical model.
This is not the only way to look at data models, but it is a useful way, particularly when comparing models.

Conceptual Schema

Describes the semantics of a domain (the scope of the model). For example, it may be a model of the
interest area of an organization or of an industry. This consists of entity classes, representing kinds of
things of significance in the domain, and relationships assertions about associations between pairs of
entity classes. A conceptual schema specifies the kinds of facts or propositions that can be expressed using
the model. In that sense, it defines the allowed expressions in an artificial "language" with a scope that is
limited by the scope of the model. Simply described, a conceptual schema is the first step in organizing the
data requirements.

Page 11 of 136
Logical Schema

Describes the structure of some domain of information. This consists of descriptions of (for example)
tables, columns, object-oriented classes, and XML tags. The logical schema and conceptual schema are
sometimes implemented as one and the same.

Physical Schema

Describes the physical means used to store data. This is concerned with partitions, CPUs, table spaces, and
the like.

According to ANSI, this approach allows the three perspectives to be relatively independent of each other.
Storage technology can change without affecting either the logical or the conceptual schema. The
table/column structure can change without (necessarily) affecting the conceptual schema. In each case, of
course, the structures must remain consistent across all schemas of the same data model.

Database Schema

Database schema is the skeleton of database. It is designed when database doesn't exist at all and very
hard to do any changes once the database is operational. Database schema does not contain any data or
information.

A database schema defines its entities and the relationship among them. Database schema is a descriptive
detail of the database, which can be depicted by means of schema diagrams. All these activities are done
by database designer to help programmers in order to give some ease of understanding all aspect of
database.

Database Schema Can Be Divided Broadly In Two Categories:

Physical Database Schema: This schema pertains to the actual storage of data and its form of storage like
files, indices etc. It defines the how data will be stored in secondary storage etc.

Logical Database Schema: This defines all logical constraints that need to be applied on data stored. It
defines tables, views and integrity constraints etc.

Developer Environment And Query Processor

Every database software provides the interface to design schemas and manipulate data through the query
processors.

Page 12 of 136
1.8.2 Relational Database Modelling

RDBMS (Relational Database Management System)

A relational database consists of at least two tables along with a definition of the relationships between
the two tables.

At this level of complexity, it is straightforward to keep the data consistent. The term used is 'data
integrity'. For example if a record in one table is deleted, then a related record in the other table may have
to be deleted as well because they share a relationship. If the deletion is not done properly then you end
up with 'orphan' records that have no relevance and are just taking up storage.

Another problem might be trying to insert data into a field of the wrong data type, for example inserting a
string into a Boolean field will cause an error.
What is needed is a piece of software that can keep track of all the rules implicit in the database and to
maintain data integrity.

This software is called a 'Relational Database Management System' or RDBMS.

The prime purpose of a relational database management system is to maintain data integrity. This means
all the rules and relationships between data are consistent at all times.
But a good DBMS will have other features as well.
These include:

 A command language that allows you to create, delete and alter the database (data description
language or DDL)
 A way of documenting all the internal structures that make up the database (data dictionary)
 A language to support the manipulation and processing of the data (data manipulation language)
 Support the ability to view the database from different viewpoints according to the requirements of
the user
 Provide some level of security and access control to the data

The simplest RDBMS may be designed with a single user in mind e.g. the database is 'locked' until that
person has finished with it. Such a RDBMS will only cost a few hundred pounds at most and will have only a
basic capability.

On the other hand an enterprise level DBMS can support a huge number of simultaneous users with
thousands of internal tables and complex 'roll back' capabilities should things go wrong.

Obviously this kind of system will cost thousands along with a need to have professional database
administrators looking after it and database specialists to create complex queries for management and
staff.

Page 13 of 136
Data Dictionary:

A 'data dictionary' describes the structure and attributes of data 'items' to be used within a software
application (usually a database).

A data dictionary includes the names and descriptions of the tables and the fields contained in each table.
It also documents information about the data type, field length and other things such as validation.
The main purpose of the data dictionary is to provide metadata, or information about data. Technically, it
is a database about a database.
There is no one set standard in terms of layout or the level of detail to which a data dictionary should be
written.
Software development teams need a comprehensive data dictionary to refer to during the development
and maintenance of a new database. This is so that they are all working using the same data formats when
reading or writing data.

Typical Data Dictionary


When putting together a data dictionary for a database project, the analyst considers what kind of
information needs to be documented.

The table below shows some of them:

Table name The unique name of every table in the database

Field names List of every field name in the database

The data type allocated for a particular field, for example,


Field data type
text, date, Boolean

The number of elements that have been allocated for storing


Field length
data in that field e.g. Integer(11) or Varchar(50)

i.e. what will automatically appear in the field of any new


Default value of fields
record

Presence checks, lookup, range checks, picture check which


Field validation
are applied to a field

Page 14 of 136
Keys The primary and foreign keys for each table

Relationships e.g. one-to-many

Indexes Any field that has been indexed to improve search speed

Access rights or permissions for the


i.e. who can change / edit / modify / read only
database

Data Dictionary Example


The table below is an example of a typical data dictionary entry. The IT staff uses this to develop and
maintain the database.

Field Name Data Type Other information

CustomerID Autonumber Primary key field

Lookup: Mr, Mrs, Miss, Ms


Title Text
Field size 4

Field size 15
Surname Text
Indexed

FirstName Text Field size 15

Format: Medium Date


DateOfBirth Date/Time
Range check: >=01/01/1930

Field size: 12
HomeTelephone Text
Presence check

Page 15 of 136
Terminology Associated With A Relational Database Model

There are a lot of new terms to learn when you begin to cover the database section at AS level. Some of
these terms that you must understand and be able to use are entities and attributes. You must also
understand how to draw and interpret an entity relationship (E-R) diagram.

Entities

An Entity is a person, place, thing or concept about which data can be collected. Examples include
EMPLOYEE, HOUSE, and CAR

A database contains one or more related tables.


Each table holds all of the information about an object, person or thing.

Some examples of database tables might be:

- a customer table

- an appointments table

- an exam sessions table

- a teachers' names table

- a concert venue table

Page 16 of 136
Attributes

An attribute describes the facts, details or characteristics of an entity


Remember that an entity is a person, place, thing or concept about which data can be collected.
Each entity is made up of a number of 'attributes' which represent that entity.

Example

Database Relationships

Entity Relationship Diagrams

These relationships can be shown in the form of


a diagram.
This diagram is known as an 'entity relationship
diagram', E-R diagram or ERD
As part of your exam, you will probably have to
either draw or interpret an E-R diagram. Before
you can do this, you need to be able to interpret
the relationships between the entities.
These relationships take the form of:

- One-to-one
- One-to-many
- Many-to-many
Page 17 of 136
Diagram Name Description

Many to An Author can write several Books, and a


Many Book can be written by several Authors

A Biological Mother can have many


One to
children, and a child can have only one
Many
Biological Mother

One to A Country has only one Capital City, and


One a Capital City is in only one Country

Changing A Many-To-Many Into A One-To-Many Relationship

A many-to-many relationship is not a good idea when designing database relationships.


To overcome this, an extra entity is usually added to the database design which then allows the
relationship to become a many-to-one or a one-to-many relationship.
Remember our example on the previous page

A third entity can be added called 'rentals'.


Thus the relationship becomes:
Customers can have many different rentals
A rental belongs to one customer
AND
A video can be rented many times
Many rentals can contain that video

Page 18 of 136
Designing Relationships

When designing or solving relationships for an E-R diagram, it is helpful to remember the following
information:

The 'Many' side is usually the foreign key


The 'One' side is usually the primary key

Before you design or set up a database, you should work out:

- The entities
- The attributes
- The entity relationships

This process is called 'data modelling'

Database Keys

We have discussed flat-file and relational databases.


A fundamental feature of a database is the ability to point to any given record. After all, what is the use of
a database if you can't get to the right records?
With this in mind, the concept of a 'key' was developed.

- Primary key
- Secondary key
- Foreign key
- Compound primary Key

Primary Key

Primary keys are an essential part of relational database design.


The 'primary key' of a table uniquely identifies each record.

Many primary keys are single field values but more complex situations may use several attributes to define
a primary key. Then it is called a 'compound primary key'.

*Additional point

Primary keys do not have to be numbers; they can be anything that makes each record unique.
For example, a username-password table might have a fixed length 'hash' value as its primary key, like this:

ID username password
1dab223bffh joe ninety

Page 19 of 136
Foreign Key

A foreign key points to a record in another table whose primary key is the same value.
What is happening is that a copy of the primary key value in the City table is being stored in the customers
table. For example, a typical record entry looks like this:

Mr. & Mrs. Jones live in the city of Hull and the foreign key in each record contains '2'. This then points to
'Hull' in the city table whose primary key is also '2'.

Note that it’s quite common to have more than one record having the same foreign key.

This simply reflects the fact that a one-to-many relationship exists between the two tables.

Secondary Key

Primary key is the main way that records within a table are defined and sorted. But it is also useful to be
able to select certain category of records without having to use the primary key.

For example, with the customer records shown above perhaps a useful way to view records would be by
title. In which case a 'secondary key' is set up within the database. This secondary key indexes the records
according to title.

Page 20 of 136
The good thing about secondary keys is that the secondary index allows often-queried category of records
to be extracted far quicker. But the downside is that the indexes could make the database significantly
larger. So the database analyst has to carefully weigh up the benefit of adding secondary keys to the
database.

Secondary keys do not have to be single fields, they can be a combination of fields. In the example below a
secondary key is indexing records based on title and surname. so it would be a really quick query to extract
all people called 'Mr. Jones'

Normalization
Desired characteristics of a database include it being efficient in terms of storage and easy to maintain.
The first point, of storage, means redundant data should be avoided and the second point, of maintenance,
means that a good design will logically separate data into tables.

Normalization is a design method that can be used to achieve this.


What is normalization?

“a technique for designing relational database tables to minimize duplication of information and, in so
doing, to safeguard the database against certain types of logical or structural problems” (wikipedia.com)

Normalization provides rules that help:

- Organize the data efficiently.


- Eliminate redundant data.
- Ensure that only related data are stored in a table.

First Normal Form - 1NF

For a database to be in first normal form (1NF), the following rules have to be met for each table in
the database

- There are no columns with repeated or similar data


- Each data item cannot be broken down any further.
- Each row is unique i.e. it has a primary key
- Each field has a unique name

'Atomic' is the word used to describe a data item that cannot be broken down any further.

Page 21 of 136
Which of these tables are NOT in first normal form?

1.

Title Firstname Surname Full name Address City Postcode

Mr Tom Smith Tom Smith 42 Mill Street London WE13GW

2.

ID IP Address username last accessed Activity Result active

1003 198.168.1.5 Smith 20081021:14.10 Save file success y

3.

ItemID Product Description Size Colour Colour Colour

234 Shoe High Heel 6 red blue brown

4.

StudentID Firstname Surname SchoolID* ClassID*

354 Tom Smith 6 5F

Comments:

Table 1. This is not in 1NF. There is no primary key defined and so this record cannot be guaranteed to be
unique. Also Full name is redundant - data is not atomic - as it is simply a combination of Firstname and
Surname.

Table 2. This is in at least 1NF. It has a primary key identified by the underline. The data is atomic. Each
field has a unique name. There are no repeat data.

Table 3. This is not in 1NF. It has a primary key, so it passes that test, data is atomic - tick in the box, but
the colour the shoe can come in is being repeated - and furthermore the fields have the same name - so
not in 1NF

Table 4. This is in 1NF as it meets all the rules for the first normal form.

Question to ask yourself to spot 1NF

Does it have a primary key?


Are each field name unique?
Is the data atomic?
Are there repeating / redundant fields.

Page 22 of 136
1NF Examples 2

Suppose a designer has been asked to compile a database for a fan club web site. Fans visit the web site to
find like-minded friends.
The entities to be stored are
This indicates that each band has many fans. Each person is a fan of only one group.

BAND FAN
The attributes of a fan are:
The attributes of band are:
FanID
BandID
firstname
band name
surname
musictype
email address(es)
The database needs to be in first normal form.

First Attempt

This is the first time this person has designed a database and is not really sure of how to go about it. He
designs the FAN table and loads it with the following records

FanID Firstname Surname BandID* email

1 Tom Smith 23 [email protected]

2 Mary Holden 56 [email protected] , [email protected]

He has correctly set up a primary key. He also used a foreign key to refer to the band. But this is not in 1NF
because Mary has two email addresses loaded into the email field. The data is not atomic. Loading data in
this way is also going to make it very difficult to extract email addresses. Also the data length of the email
field now has to be long enough to handle many email addresses, this is very inefficient and would be a
problem if exceeded.

Second Attempt

He soon realizes this is not a good idea. So then he decides to create two email fields

FanID Firstname Surname BandID* email email2

1 Tom Smith 23 [email protected]

2 Mary Holden 56 [email protected] [email protected]

This is also a poor approach - note that email2 is not being used in Tom's record and so is causing wasted
storage, so not 1NF which seeks to avoid wasted / redundant data.
Another problem is what if a fan has many more emails? Adding more and more email fields will make the
wasted storage even worse.
Another problem is that the query to extract email addresses is far more complex than it needs to be as it
has to examine each email field.

Page 23 of 136
Solution

After trying out various ideas, he comes up with a good solution - create another entity called 'email' and
use a foreign key in that table to link the fan and email tables together. The ER diagram is as follows:

The ER diagram shows that each fan can have many emails, but an email can only belong to one fan.
The FAN and EMAIL table now look like this

FAN

FanID Firstname Surname BandID*


1 Tom Smith 23
2 Mary Holden 56

EMAIL

EID FanID* email


1 1 [email protected]
2 2 [email protected]
3 2 [email protected]

Mary (FanID = 2) has two entries in the email table. There is no problem adding even more emails for her.
Extracting emails is now simple as there is only one email column. There is no wasted storage.

The tables are now in first normal form (1NF) as they obey the following rules

- Each table has a primary key


- Each field name is unique
- Data is atomic
- No repeating / redundant fields

Page 24 of 136
Second Normal Form 2NF

Most tables tend to have a single-attribute (i.e. simple) primary key. Like this

CUSTOMER

ID Firstname Surname Telephone email


2 Tom Smith 22323 [email protected]

But sometimes a table has a primary key made up of more than one attribute i.e. it has a compound
primary key.

CONCERT

Venue Artist Attendance Profit Style


Wembley Girls Aloud 53000 12334 Girl band
NEC Leona Lewis 45000 66433 Female soloist

The table above is using both the venue and artist as the compound primary key.
It is in this situation that the extra rule for second normal form comes in handy. The rule states

Non-key attributes must depend on every part of the primary key


The table must already be in first normal form

So inherently, any table that is already in 1NF and has a simple primary key is automatically in second
normal form as well.

Consider the Concert example above - this is NOT in second normal form. Notice the attribute called Style.
This is describing the style of artist - it has nothing to do with where the concert took place! And so its
value does not depend on EVERY part of the primary key, so the rule for second normal form is not being
met.

The reason for this rule is to ensure there is no redundant data being stored.

For example, let's add another Girls Aloud concert to the table

Venue Artist Attendance Profit Style


Wembley Girls Aloud 53000 12334 Girl band
NEC Leona Lewis 45000 66433 Female soloist
NEC Girls Aloud 76090 53789 Girl band

Notice that the 'girl band' value is being repeated and so is causing the database to be bigger than it needs
to be.

Of course there could be more than one attribute related to different parts of the primary key. Consider a
table like this:

Page 25 of 136
CONCERT

Venue Artist Date Attendance Profit City No1Hits Style


Wembley Girls Aloud 1/10/09 53000 12334 London 5 Girl band
NEC Leona Lewis 1/10/09 45000 66433 Birmingham 2 Female soloist
NEC Girls Aloud 7/11/09 76090 53789 Birmingham 5 Girl band

As before the Style attribute only depends on Artist, but now No1Hits also only depends on the Artist. This
table also includes City and this only depends on the Venue.

So to make this database into second normal form, four tables need to be created

CONCERT

VenueID Artist Date Attendance Profit


005 0112 1/10/09 53000 12334
006 0115 1/10/09 45000 66433
006 0112 7/11/09 76090 53789

STYLE
Style ID Style
001 Girl band
002 Solo artist
003 Rap

ARTIST
ArtistID Artist No1Hits StyleID
0112 Girls Aloud 20 001
0115 Leona Lewis 3 002

VENUE
Venue ID Venue City
005 Wembly London
006 NEC Birmingham

Summary
The rules for second normal form are

 Non-key attributes must depend on every part of the primary key


 The table must already be in first normal form Page 26 of 136
Third Normal Form
For a database to be in third normal form, the following rules have to be met

- It is already in 2NF
- There are no non-key attributes that depend on another non-key attribute

What this is trying to do is to spot yet another source of redundant data. If the value of an attribute can be
obtained by simply making use of another attribute in the table, then it does not need to be there. Loading
that attribute into another table and linking to it will make the database smaller.

To clarify, consider the table below

CONCERT

Venue Artist Date Attendance Profit City Country


Wembley Girls Aloud 1/10/08 53000 12334 London UK
NEC Leona Lewis 1/10/08 45000 66433 Birmingham UK
Carnegie Hall Girls Aloud 7/11/08 76090 53789 New York USA

Notice that the country could be obtained by referring to the City - if the concert was in London then you
know it is also in the UK - no need to look at the primary key!

So to make this database into third normal form, split the table as follows

CONCERT

Venue Artist Date Attendance Profit City*


Wembley Girls Aloud 1/10/08 53000 12334 London
NEC Leona Lewis 1/10/08 45000 66433 Birmingham
Carnegie Hall Girls Aloud 7/11/08 76090 53789 New York

CITIES

City Country
London UK
Bimingham UK
New York USA

The new table called CITIES has City as the primary key and country as an attribute. The Concert table has
City as a foreign key. So now you can obtain the country in which any particular concert took place and
there is no redundant data.

Page 27 of 136
3NF Examples

Reminder, 3NF means:

- It is already in 2NF
- There are no non-key attributes that depend on another non-key attribute

Example 1

CUSTOMER

CustomerID Firstname Surname City PostCode


12123 Harry Enfield London SW7 2AP
12443 Leona Lewis London WC2H 7JY
354 Sarah Brightman Coventry CV4 7AL

This is not in strict 3NF as the City could be obtained from the Post code attribute. If you created a table
containing postcodes then city could be derived.

CustomerID Firstname Surname PostCode*


12123 Harry Enfield SW7 2AP
12443 Leona Lewis WC2H 7JY
354 Sarah Brightman CV4 7AL

POSTCODES

PostCode City
SW7 2AP London
WC2H 7JY London
CV4 7AL Coventry

Example 2

VideoID Title Certificate Description


12123 Saw IV 18 Eighteen and over
12443 Igor PG Parental Guidance
354 Bambi U Universal Classification

The Description of what the certificate means could be obtained from the certificate attribute - it does not
need to refer to the primary key VideoID. So split it out and use the primary key / secondary key approach.

Page 28 of 136
Example 3

CLIENT

ClientID CinemaID* CinemaAddress


12123 LON23 1 Leicester Square. London
12443 COV2 34 Bramby St, Coventry
354 MAN4 56 Croydon Rd, Manchester

CINEMAS

CinemaID CinemaAddress
LON23 1 Leicester Square. London
COV2 34 Bramby St, Coventry
MAN4 56 Croydon Rd, Manchester

In this case the database is almost in 3NF - for some reason the Cinema Address is being repeated in the
Client table, even though it can be obtained from the Cinemas table. So simply remove the column from
the client table

Example 4

ORDER

OrderID Quantity Price Cost


12123 2 10.00 20.00
12443 3 20.00 60.00
354 4 30.00 120.00

In this case the cost of any order can be obtained by multiplying quantity by price. This is a 'calculated
field'. The database is larger than it needs to be as a query could work out the cost of any order. So to be in
strict 3NF you would remove the Cost column.

Benefits Of Normalization

1. The database does not have redundant data; it is smaller in size so less money needs to be spent on
storage.

2. Because there is less data to search through, it is much faster to run a query on the data.
3. Because there is no data duplication there is better data integrity and less risk of mistakes.
4. Because there is no data duplication there is less chance of storing two or more different copies of the
data.

5. Once change can be made which can instantly be cascaded across any related records.

Page 29 of 136
Problems With Normalization

1. You need to be careful with trying to make data atomic. Just because you can split some types
of data further, it isn't always necessarily the correct thing to do. For example, telephone
number might contain the code followed by the number 01234 567890. It wouldn't be sensible
to separate out these two items.

2. You can end up with more tables than an unnormalised database

3. The more tables and the more complex the database, the slower queries can be to run

4. It is necessary to assign more relationships to interact with larger numbers of tables

5. With more tables, setting up queries can become more complex

Page 30 of 136
1.8.3 Data Definition Language (DDL) And Data Manipulation Language
(DML)

DML (Data Manipulation Language)

One of the functions of a RDBMS is to provide a means of manipulating data within the database. This
includes operations such as

 Insert
 Delete
 Update
 Process data

This is the role of the data manipulation language (DML) built in to the system.

However, writing commands by hand can be slow and error prone. So to help with this, many systems
allow the user to set up a task by using 'Query by Example'. The users are presented with a graphical view
of the tables and they then use a number of icons such as 'filter' to manipulate the data. With some
systems you can drag fields graphically into appropriate areas on the screen to set up the query.

Behind the scenes, the QBE tool is compiling and running the required DML commands.

It means that users do not need to have a sophisticated understanding of database command languages in
order to use the database.

DDL (Data Description Language)

One of the basic functions of a RDBMS is to provide a method of creating a database from scratch. This is
the role of the data description language (DDL).

The language allows tables to be defined in terms of:

 Field names
 Data type
 Data size / length
 Validation rules
 Default values
 Presence check
 Auto incrementing requirement
 Indexing requirement
 Primary key

As each table is defined and the relationships between them is established, then the overall design
emerges. This is called the 'Schema'.

It should be noted that a single large database may have several groups of people using it for different
purposes. In which case a different schema is needed for each of them.

Page 31 of 136
SQL

Structured Query Language (SQL) is used to perform function on a database. There are four main functions
that you should be familiar with: SELECT, INSERT, UPDATE, and DELETE.

Built in SQL commands are normally written in capital letters, making your statements easier to read.
However, you can get away without them.

To help us understand how these things work we are going to use a test data set. Databases are used in all
areas of the computer industry, but for the moment we are going to use a dataset that keeps track of
crooks in England, noting, names, gender, and date of birth, towns and numbers of scars. Take a look at the
crooks data table below:

ID name gender DoB town numScars


1 Geoff male 12/05/1982 Hull 0
2 Jane female 05/08/1956 York 1
3 Keith male 07/02/1999 Snape 6
4 Oliver male 22/08/1976 Blaxhall 2
5 Kelly female 11/11/1911 East Ham 10
6 Marea female 14/07/1940 Wythenshawe 6

To select all the items from this table we can use:

SELECT * FROM crooks

This would display all the results. But what if we just want to display the names and number of scars of the
female crooks?

SELECT name, numScars FROM crooks


WHERE gender = 'female'

The result of this query would be:

name numScars
Jane 1
Kelly 10
Marea 6

SELECT

The SELECT statement allows you to ask the database a question (Query it), and specify what data it
returns. We might want to ask something like Tell me the name and ages of all the crooks. Of course this
wouldn't work, so we need to put this into a language that a computer can understand: Structured Query
Language or SQL for short:

Page 32 of 136
SELECT name, DoB --what to return
FROM crooks --where are you returning it from

This would return the following:


name DoB
Geoff 12/05/1982
Jane 05/08/1956
Keith 07/02/1999
Oliver 22/08/1976
Kelly 11/11/1911
Marea 14/07/1940

But suppose we wanted to filter these results, for instance: Tell me the ID, name and ages of all the crooks
who are male and come from Snape. We need to use another statement, the WHERE clause, allowing us to
give the query some criteria (or options):

SELECT ID, name, DoB


FROM crooks
WHERE town = 'Snape' AND gender = 'male' --Criteria

This would return the following:


ID name DoB
3 Keith 07/02/1999

Say the police knew that a crime had been committed by a heavily scarred woman (4+ scars), they want a
list of all the scarred women:

SELECT name, town, scars


FROM crooks
WHERE numScars >= 4 AND gender = 'female' --Criteria

This would return:


name town numScars
Kelly East Ham 10
Marea Wythenshawe 6

However, the police want to quickly sort through and see who is the most heavily scarred. We are going to
use an ORDER command:

SELECT name, town


FROM crooks
WHERE numScars >= 4 AND gender = 'female' --Criteria
ORDER BY numScars DESC --sorts the numScars values in big to small order

ORDER BY numScars sorts your returned data into DESCending (big to small) or ASCending (small to big)
order

Page 33 of 136
INNER JOIN

We spoke earlier about how databases are great because they allow you to link tables together and
perform complex searches on linked data. So far we have only looked at searching one table.

When you use a social network such as Facebook you can often see a list of all your friends in the side bar
as well as details from your record such as your name and place of work. How did they find this data? They
would have searched for all the relationships that involve your ID, returning the names of people involved
AND returned values such as job title from your personal record.

This looks like using two queries: --return relationship information—return personal record information It
would be possible to do this, but it's far easier to use one query for both things.

Take a look at this example. The police want to know the name and town of a criminal (ID = 45) along with
all the descriptions of crimes they have performed:

SELECT name, town, description --select things to return (from different tables)
FROM crooks, crime --name tables that data comes from
WHERE crook.Id = crime.crimId --specify the link dot.notation means table.field. The Ids are the same
AND crook.Id = 45 --specify which crook you are looking at
ORDER BY date ASC --order the results by the oldest first

Operators Used In The WHERE Clauses

Operator Meaning of the operator Example


= Checks if they're equivalent Id1 = 123

Checks if a field is greater than or less Id1 > 123


> and <
than Id1 < 123

<> Checks if it's not equal to Id1 <> 123

Similar to "> and <", but also checks for Id1 >= 123
>= and <=
equality Id1 <= 123

Will be accepted if either the left, right


OR Id1 = 123 OR Id2 <> 444
or both are true

Only accepted if both the left part and


AND Id1 = 123 AND Id2 <> 444
the right part are true

Inverts the boolean value of the


NOT NOT Id1 = 123
statement

Checks if it's a null value contained


IS NULL Id1 IS NULL
within the variable

... BETWEEN ...


Checks if something is within a range Id1 BETWEEN 2.8 AND 3.14159265
AND ...

Page 34 of 136
DML (Data Manipulation Language)

 Insert
 Delete
 Update
 Process data

UPDATE

Database aren't always perfect and there may be times that we want to change the data inside our
database. For example in Facebook if someone is annoying you and you limit their access to your profile,
you'd update the access field from 'normal' to 'restricted'. Or if one of our crooks gets an additional scar
you'd have to update the numScars field.
Let's take a look at that example, where our crook Peter gains a scar on his right cheek. This was his initial
state:
name: Peter
numScars: 7
UPDATE crooks
SET numScars = 8

But we have a problem here, this statement updates all records to numScars = 8. This means that every
crook will now have 8 scars!

ID name gender DoB town numScars


1 Geoff male 12/05/1982 Hull 8
2 Jane female 05/08/1956 York 8

3 Keith male 07/02/1999 Snape 8

4 Oliver male 22/08/1976 Blaxhall 8


5 Kelly female 11/11/1911 East Ham 8
6 Marea female 14/07/1940 Wythenshawe 8

We need to specify which crooks we want to update by using a WHERE clause, you saw it earlier in the
SELECT example.

UPDATE crooks
SET numScars = 8
WHERE name = "Peter" --only updates those people who are called Peter

Page 35 of 136
INSERT

We might also want to add new things to our database, for example when we are adding new Criminal
records or a new friendship link on Facebook. To add new records we use the INSERT command with values
of the fields we want to insert:

INSERT INTO crooks


VALUES (1234,'Julie', 'female','12/12/1994','Little Maplestead',67)

Sometimes we might not want to insert all the fields, some of them might not be compulsory:

INSERT INTO crooks (ID, name, town) --specific fields to insert into
VALUES (999, 'Frederick', 'Shotley')

ID name gender DoB town numScars


1 Geoff male 12/05/1982 Hull 0
2 Jane female 05/08/1956 York 1
3 Keith male 07/02/1999 Snape 6
4 Oliver male 22/08/1976 Blaxhall 2
5 Kelly female 11/11/1911 East Ham 10
6 Marea female 14/07/1940 Wythenshawe 6
1234 Julie female 12/12/1994 Little Maplestead 67
999 Frederick Shotley

DELETE

Sometimes you might fall out with friends on Facebook so that you don't even want them to see your
restricted page. You'd want to delete your relationship (it's actually likely that Facebook retains the
connection but flags it as 'deleted', that isn't important here). The police might also find that a crook is in
fact innocent and then want to delete their criminal record. We need a DELETE command to do all of these
things, to permanently delete records from databases. Imagine that we find out that Geoff was framed and
he is completely innocent:

DELETE FROM crooks


WHERE name = 'Geoff'

Page 36 of 136
Data Definition Language (DDL)

- Create tables: CREATE TABLE


- Change the structure of a table: ALTER
- Delete tables: DROP

CREATE

You need to know what they all do (as listed above), though you only need to know how to implement the
CREATE TABLE command. Let's look at how we could have made the crooks table above:

CREATE TABLE crooks


(
ID INTEGER PRIMARY KEY,
NAME VARCHAR(16),
GENDER VARCHAR(6),
DOB DATE,
TOWN VARCHAR(20),
NUMSCARS INTEGER
)

ALTER

An ALTER statement in SQL changes the properties of a table in a relational database without the need to
access the table manually.

ALTER TABLE crooks ADD convictions INTEGER


ALTER TABLE crooks DROP COLUMN convictions

DROP

Dropping a table is like dropping a nuclear bomb. It is irreversible and is frowned upon in modern society.

DROP TABLE crooks

By running this line of code, the table "crooks" will be removed from the database with no chance of it
being recovered unless backups have been previously made.

Page 37 of 136
Setting Primary Keys

Primary keys can be set after table creation via the alter statement.

ALTER TABLE Persons


ADD PRIMARY KEY (id)

Primary keys can also be set during table creation

CREATE TABLE users


(
user_id int NOT NULL,
username varchar(255) NOT NULL,
password varchar(255) NOT NULL,
Address varchar(255),
PRIMARY KEY (user_id)
)

Setting Composite Keys

To set a primary key made up of two columns during table creation you could do something such as this

CREATE TABLE users


(
user_id int NOT NULL,
username varchar(255) NOT NULL,
password varchar(255) NOT NULL,
Address varchar(255),
CONSTRAINT pk_UserId PRIMARY KEY (user_id,username)
)

Where the constraint name would be UserId and the table's primary key would be made up of the user_id
and the username columns.
This could also be done after table creation:

ALTER TABLE users


ADD CONSTRAINT pk_UserID PRIMARY KEY (user_id,username)

Setting Foreign Keys

To set a foreign key or group of foreign keys during table creation, you could do something like this:

CREATE TABLE FoodProduct


(
product_id int NOT NULL,
product_name varchar(255) NOT NULL,
ingredients varchar(1023) NOT NULL,
allergen varchar(511) NOT NULL
)

Where the primary key is "product_id" and all the other attributes are declared within the ( ) structure and
are followed by a comma.
Page 38 of 136

You might also like