0% found this document useful (0 votes)
133 views

Bcs Database - Complete Reference 2022

The document discusses key concepts related to database systems including: 1) File systems and database systems are the two traditional approaches to data management, with databases offering advantages like reduced redundancy, improved consistency and integrity, and increased sharing and security of data. 2) A database management system (DBMS) is software that allows creation, maintenance and access to a database, providing an interface between users and the integrated data. 3) Other concepts covered include data dictionaries, database administrators, the relational data model, and types of data integrity inherent in relational databases.

Uploaded by

Nuwantha
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
133 views

Bcs Database - Complete Reference 2022

The document discusses key concepts related to database systems including: 1) File systems and database systems are the two traditional approaches to data management, with databases offering advantages like reduced redundancy, improved consistency and integrity, and increased sharing and security of data. 2) A database management system (DBMS) is software that allows creation, maintenance and access to a database, providing an interface between users and the integrated data. 3) Other concepts covered include data dictionaries, database administrators, the relational data model, and types of data integrity inherent in relational databases.

Uploaded by

Nuwantha
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 109

1

BCS THE CHARTERED INSTITUTE FOR IT

BCS HIGHER EDUCATION QUALIFICATIONS


BCS Level 5 Diploma in IT

DATABASE SYSTEMS
COMPLETE REFERENCE

SEPTEMBER 2022
2

FILE SYSTEMS

The traditional approach to data management is called a file system (some call it
Conventional file system). In this approach, each department of the organization has its
own data files and application programs. There is no integration between different
applications. Conventional file systems are created and accessed using conventional
programming languages such as Cobol and Pascal.

This has many drawbacks:

Data redundancy - Data is duplicated in several files (e.g. customer data in accounts
dept for credit sales, same customer data in advertising dept for promotions). This
results in waste of storage, makes insertions, updates etc. to be redundant and it
creates a potential for data inconsistancies (e.g. customer address changed in one file,
not in the other file).
3

Example: Accounts Department maintains a Customer file for credit purchases

Data inconsistency – Same data field can contain two different values in two different
files (e.g. a customer changes his address and it is changed in one file but not in the
other, now the same customer has two different addresses)

Poor data sharing – data sharing across applications is not supported

Data dependence – file/data structures are embedded into application programs. As a


result, changes to file structures require changes to programs. This increases the cost of
software maintenance.

Poor data integrity – All data validations must be coded in application programs. This
makes room for invalid data to get in.

Proceduaral Query language – Data are manipulated by using procedural code such as
“if else” and “while”. This is diificult and time consuming.
4

DATABASE SYSTEMS

This is the modern approach to data management. A database is a collection of


integrated files that can be shared by many users. It is a collection of organized data. A
database integrates all corporate data into a single uniform structure.

A database is created and maintained by a software called DBMS (Database


Management System). Examples: Oracle, Microsoft SQL Server. The DBMS provides an
interface between the users and the database. Advantages of database systems:

1. Minimum Data Redundancy – In a database each data item is stored just once.
This is due to the fact that a database integrates ALL corporate data into a single
structure. This allows users to extract data from multiple files as if they were in a
single file.
2. Consistency of data – Minimum data redundancy helps to achieve data
consistancy. Since each data item is stored just once, something like same
customer having two different addresses cannot happen.
3. Support for data sharing – Data can be shared across multiple applications.
Access paths are created between related files/tables so that differrent
applications can access the same data.
4. Improved data integrity – A database offers stronger data integrity via three
types of integrity that are inherent to a relational database : Entity Integrity,
5

Referential Integrity and Domain Integrity. For additional integrity, a database


offers triggers and stored procedures. Integrity Constraints are stored within the
database itself (in the data dictionary/catalog) so applications need not code
them.
5. Increased security – DBMS provides sophisticated security facilities via
Authentication, Access privileges and SQL Views. These integirty constraints are
stored within the system catalog so that the DBMS can monitor them easily.
Most DBMSs also allow data encryption to protect sensitive data.
6. Data independence – Data structures are separated from application programs.
They are created and modified via the DBMS. Application programs do not refer
to data structures. As a result, data structures can be changed without requiring
changes to application programs. This reduces software maintenance effort and
cost.
7. Non procedural query language (e.g. SQL) – This makes query writing easier.

Drawbacks / challenges of a Database systems:

DBMS’ are expensive to acquire. Hardware such as database servers can be expensive.
A database environment is a highly complex set up requiring sophisticated technology
(e.g. transaction logs, checkpoints, concurrency control). It demands the presence of a
DBA. It also requires extensive user training.

NOTE: File systems are still beneficial in some systems. They are economically feasible
in small, single user business applications that have only few data files, each with few
hundred records (e.g. a staff payroll or a stock control system of a small business or a
library management system of a small library). For small scale systems the complex
technology of a database, the high cost of DBMS and hardware and the hiring of a
DBA do not provide adequate ROI. Many advanced features of a DBMS including
concurrency control, recovery, integrity and security are not relevant to such systems.
6

DATA DICTIONARY / SYSTEM CATALOG

A DD serves as a repository for Meta data (i.e. Data about data). It is stored within the
database and is accessible to the DBMS on-line. Information stored in the DD include:

 Descriptions of base tables (Table names, attribute names and their data types
and sizes etc.)
 Integrity constraints (e.g. Entity integrity, referential integrity etc.)
 Security constraints (declarations of authorized users, their passwords and access
privileges etc.)
 Descriptions of views
 Information about indexes, clusters etc.
 Database statistics (e.g. number of tuples in each table)
An Example of SQL code that accesses Meta data:

Select *

From User_Sys_Privs

Where Username = ‘TOM’

This will retrieve all the system privileges granted to user ‘TOM’.

A data dictionary is vital for the correct functioning of a database.


7

DATABASE ADMINISTRATOR

A high level manager responsible for design, implementation and maintenance of a


computer database. His/her duties include:

 Creating Creating logical and physical schema (table structures)


 Ensuring security – Access control (create user accounts/logins and assign
passwords to each user), grant access privileges to users, Identify user groups and
assign privileges to roles, identify data requirements of each user/user group and
create SQL views to ensure confidentiality of data, implement additional security
measures such as data encryption and physical controls, bio metric devices etc.
 Making back ups and initiating recovery operations when a failure occurs
 Ensuring data integrity
 Database tuning – Monitor the performance of the database and make
modifications to improve performance. The steps that can be used to improve
performance include:
- Table de-normalizations to speed up joins
- Creating indexes to speed up queries
- Creating clusters to speed up joins
- Changes to schema (e.g. Splitting a table into two)
- Hardware level improvements (e.g. Change cabling, change topology)
8

NOV 2020
9

RELATIONAL MODEL

The relational model of data representation was based on set theory and Boolean
algebra. It uses the concept of a relation to represent a file. A relation is a two
dimensional table in which rows represent records and columns represent fields. It is
equivalent to a set in mathematics. Data are stored in relations.
Rows are called tuples while the columns are called attributes.
The intersection of a row and a column forms a cell. A cell can contain only an atomic
value.
Each relation/table has a primary key that can be used to identify rows/records
uniquely. Joining two tables is facilitated via foreign keys.
10

Three types of integrity are inherent to a relational database:

1. Entity integrity – This is concerned with primary keys. This rule says that the
column or group of columns chosen as the primary key should be unique and
not null.
2. Referential integrity – This rule says that mismatching foreign keys are not
permitted into the system.
3. Domain integrity – This is concerned with the validity of values within
columns. The values entered into a column of a table should be drawn from
the correct domain.

A relational database is manipulated by a high level non-procedural query language


(e.g. SQL). The manipulative power of the relational model was based on relational
algebra. In particular, the operators PROJECT, RESTRICT and JOIN are fundamental to
the manipulation of a relational database.

Terminology:

Tuple – A row in a table. It represents a record.


Attribute – A column in a table. It represents a column.
Primary key – A column or group of columns that can be used to identify a row/record
of a table uniquely.
e.g. Employee (EmployeeNo, Name , Salary)
Primary key
11

Domain – A pool of values, from which the actual values appearing in a column of a
table are drawn. All the values of a given domain belong to the same data type. e.g.
Domain of marks Type : Integer Values : Numbers from 0 to100

Foreign key – A non-key attribute of one table which draws its values from the primary
key domain of some related table. In a relational database, foreign keys are used to join
related tables together.

Customer (CustomerNo, Name , Address)

Order (OrderNo, Date, Amount, CustomerNo*)

Foreign key

Null – A special character which is used to represent missing or inapplicable


information. It is different from zero or space. When performing queries, the DBMS
ignores null values.

Composite key - A group of two or more attributes that can be used to identify a record
uniquely (i.e. primary key consisting two or more fields)

e.g. Assessment (StudentNo, SubjectNo, Marks)

Candidate key – An attribute or group of attributes that could serve as the primary key.

Example: Employee (EmployeeID, Name, Address, NICNumber, DateofBirth, email)

EmployeeID, NICNumber and email are candidate keys

Alternate key – Once we choose the primary key from a set of candidate keys, the other
candidate keys become alternate keys.

In the Employee table, if we choose EmployeeID as the primary key then NICNumber
and email will become alternate keys.
12

Degree – Number of columns in a table. e.g. in the above employee table the degree is
4 because it has 4 columns.

Cardinality – Number of rows in a table. E.g. e.g. in the above employee table the
cardinality is 3 4 because it has 3 rows/tuples

VERY IMPORTANT NOTE

In ER Diagrams - Degree means how many Entity Types are involved in a relationship.
This can be UNARY, BINARY and TERNARY.

In ER Diagrams - Cardinality means number of instances involved in the relationship.


This can be 1 TO 1 (1:1), 1 TO Many (1:N) or Many to Many (M:N).

In Relational Model - Degree means the number of COLUMNS in a table. The degree of
the student table below is 3 (i.e. 3 columns/attributes)

STUDENTS

STUDENTID NAME AGE

125 Roy 13

127 Jane 16

128 Ben 13

129 Tina 17

In relational model, Cardinality means the number of rows in a table. In the above
table, cardinality is 4 (i.e. 4 rows/tuples)
13

RELATIONAL ALGEBRA
A set of 8 operators for querying a relational database. In algebra, we formulate a query
by writing a series of instructions. Each instruction produces an output which is taken as
the input to the next instruction. This feature is called the property of “Closure”.

PROJECT π

This takes one relation as the input and produces another relation as the output, which contains a
subset of columns from the input relation. It allows us to partition a table vertically and select certain
columns.

Employee

Empno Name Deptno Salary

001 Brown A1 5000

002 Thomas A2 8500

003 David A1 10000

004 Sam A2 7000

PROJECT EMPLOYEE OVER NAME, SALARY GIVING T1

T1 π name,salary (Employee)

T1

Name Salary SQL implements PROJECT by using the “Select” clause.

Brown 5000

Thomas 8500 Select name, salary


David 10000
From Employee;
Sam 7000
14

SELECT / RESTRICT σ
This takes one relation as the input and produces another relation as the output, which
is a subset of rows from the input relation. It allows us to partition a table horizontally
and select certain rows, which satisfies a condition.

RESTRICT EMPLOYEE WHERE SALARY > 8000 GIVING T1.

T1 σ(Salary>8000) (Employee)

T1

Empno Name Deptno Salary

002 Thomas A2 8500

003 David A1 10000

The SQL clause “Where” corresponds to Select/Restrict.

Select empno, name, salary

From Employee

Where salary>8000;
15

JOIN ∞
This is used to combine related tuples from two relations into single tuples over a
common attribute. Joins are performed between a foreign key and a matching primary
key. Where the foreign key value of one table matches with a primary key value of
another table, the two corresponding tuples are joined.

Employee

Empno Name Deptno Salary

001 Brown A1 5000

002 Thomas A2 8500

003 David A1 10000

004 Sam A2 7000

Department

DeptNo DepName

A1 IT

A2 Accounts

EMPLOYEE JOIN DEPARTMENT GIVING T1

T1  Employee ∞ Deptno = Deptno Department


16

JOIN RESULT

T1

Empno Name Salary Employee.Deptno DepName Department.Deptno

001 Brown 5000 A1 IT A1

002 Thomas 8500 A2 Accounts A2

003 David 10000 A1 IT A1

004 Sam 7000 A2 Accounts A2

In SQL, this is implemented by equating the primary key of one table


with the foreign key of the other table.

Select *

From Employee, Department

Where Employee.DeptNo = Department.DeptNo;


17

UNION U
It takes two union compatible relations (i.e. The two input tables must have the same number of
columns and each corresponding pair of columns must have the same domain) as the input and
produces a result consisting of all tuples appearing in either or both original relations. Duplicate tuples
are eliminated from the result.

ADMINISTRATOR
UNION COMPATIBLE TABLES
StaffNo Name

001 Brown
Two tables with the same
002 Jane
number of columns where
003 Ajit
each corresponding pair of
columns have the same
LECTURER
domain
StaffNo Name

004 Lee

002 Jane

ADMINISTRATOR UNION LECTURER GIVING T1

T1  ADMINISTRATOR U LECTURER

StaffNo Name
SQL implementation
001 Brown

002 Jane Select * From Administrator


003 Ajit
UNION
004 Lee

Select * From Lecturer;


18

CARTESIAN PRODUCT X

It takes two tables as the input and produces a result, which contains all possible
combinations of rows from the input tables. The two input tables need not be union
compatible.

EMP
TheSQL clause ‘FROM’ corresponds to
EMPID NAME SALARY Cartesian Product.
111 Tom 5000
222 Sam 6000
Select *

DEP From EMP, DEP;


DEPNAME TELNO
IT 0813344
Admin 0774444

EMP PRODUCT DEP GIVING T3

T3 EMP X DEP

T3

EMPID NAME SALARY DEPNAME TELNO


111 Tom 5000 IT 0813344
111 Tom 5000 Admin 0774444
222 Sam 6000 IT 0813344
222 Sam 6000 Admin 0774444
19

INTERSECT
This takes two union compatible relations as the input and produces tuples that
are common to both relations as the output.

ADMINITRATOR INTERSECT LECTURER GIVING T1

T1  ADMINISTRATOR LECTURER

T1

StaffNo Name

002 Jane
SQL implementation

(Select * From Administrator)

INTERSECT

(Select * From Lecturer)


20

DIFFERENCE/MINUS -
This takes two union compatible relations as the input and produces tuples that are found in the first
one but not in the second one as the output. (i.e. When denoted by R MINUS S, the result is all tuples in
R that are not in S).

The SQL clause ‘MINUS’ corresponds to


ADMINITRATOR MINUS LECTURER GIVING T1 DIFFERENCE.

T1  ADMINISTRATOR – LECTURER
(Select * From Administrator)

MINUS
T1

StaffNo Name (Select * From Lecturer)


001 Brown

003 Ajit
21

DIVIDE ÷

This takes one binary table (Table with two columns - X,Y) and one unary table (Table
with one column - X) as the input. The unary table’s column is also present in the binary
table. We divide the binary table by the unary table to produce a result that contains all
Y column values which are common to the whole set of X column values in the unary
table.

T1
Find the suppliers who supply ALL the products
PRODUCT SUPPLIER
TV Brown
DVD Sam
USB Brown
T3  T1 ÷ T2
TV Ben
USB Ben
DVD Brown
T3
TV Sam
SUPPLIER
Brown
T2

PRODUCT
TV
DVD
USB SQL does not support DIVIDE
operation
22

ALGEBRA EXERCISES

Customer (cno, cname , address, gender)

Orders (orderno , date , amount , cno* )

Find the names and addresses of male customers

T1  σ gender=”M” (Customer)

T2  π name, address (T1)

We could also write it as follows in nested notation

π name, address ( σ gender=”M” (Customer))

Find the order numbers and amounts of orders placed by Roy

T1  σ cname = ‘Roy’ (Customer)

T2  T1 ∞ cno=cno Orders
T3  π orderno, amount (T2)
We could also write it as follws

π orderno, amount ((σ cname = ‘Roy’ (Customer)) ∞ cno = cno Orders)

Alternative answer

T1  Customer ∞ cno = cno Orders


T2  σ cname = ‘Roy’ (T1)
T3  π orderno, amount (T2)
23

Find the names of customers who don’t have orders

T1  π cno (Customers)

T2  π cno (Orders)

T3  T1 – T2

T4  T3 ∞ cno = cno Customers

T5 π cname (T4)

We could also write it as follows

π cname ((π cno (Customers)) – (π cno (Orders)) ∞ cno = cno customers)


24

ENTITY RELATIONSHIP DIAGRAMMING


 An Entity Relationship model is a detailed, conceptual representation of the
data requirements of an organization (its called a conceptual model).

 The ER model is expressed in terms of entity types in the business


environment, the relationships among those entity types and the attributes
(properties) of those entity types

 An ER model is normally expressed as an ER Diagram

 An ER model serves as a blueprint for a database. Once the ER model is


complete, it can be mapped easily to the data structures of the desired
database model.
25

Entity Type – The term entity type may refer to a living thing such as a person or
animal, an object, an organization or an event which is important to the organization.
The organization wishes to store data about it. e.g. Student, Customer, Course, Product

Student

Entity instance – A single occurrence of an entity type. (e.g. BCS is an entity instance of
Course, Ann Davis is an instance of Student). Sometimes the word “entity’ is used to
refer to an entity instance.

Attributes – The properties of an entity are known as attributes (e.g. name and salary
are attributes of Employee entity)

Salary

Name Employee Dob

qualification

Address
Age

HouseNo Street Town

Simple attribute – An attribute that cannot be broken down into components (e.g.
salary in the diagram above)

Composite attribute – An attribute that can be broken down into sub parts or
components (e.g. address)
26

Multi value attribute – an attribute that may store several values for a single entity
occurrence/instance. It is shown by a double line ellipse (e.g. qualification). Later in the
design, each multi value attribute maps to a separate table in the database.

Single value attribute - an attribute that stores a single value for a single entity
occurrence/instance. (e.g. salary, name, date of birth)

Derived attribute - an attribute that is computed dynamically from one or more stored
attributes (e.g. Age is a derived attribute because it can be computed from date of
birth). A derived attribute has a dashed line ellipse.

Stored attribute - an attribute that is physically stored in the database

Relationship – An association among entity types. Also called relationship type. (e.g. Student
follows Course). A relationship involves a property called Cardinality/Multiplicity. This is
concerned with the number of instances of one entity type which is related to a single instance
of another entity type. Based on cardinality, we can identify three types of relationships:

One to Many (1:N)

1 N
Customer Places Order

One to One (1:1)

1 Ruled 1
Country President
by

Many to Many (M:N)

M N
Student Follows Course
M
27

Participation
This refers to the extent to which an entity type participates in a relationship. This can
be partial (i.e. optional) or total (i.e. mandatory).

Total – Every entity instance must participate in the relationship. Also called mandatory
participation. Total participation is shown by a double line connecting the relevant
entity type to the relationship.

Partial - It is not necessary for all instances to participate in the relationship. There can
be some entity instances that do not participate in the relationship. Also called optional
participation. Partial participation is shown by a single line connecting the relevant
entity type to the relationship.

In the above example, customer’s participation in the relationship is partial because


there can be customers without orders. Order’s participation in the relationship is total
because every order must relate to a customer.
28

Degree of relationship

This refers to the number of entity types involved in a relationship.

Unary – An entity type relates to itself. Also called a recursive relationship.

1 N
EMPLOYEE

Manages

Binary – A relationship between two entity types. This is the most common type of
relationship.

M Follows N
Student Course

Ternary – A simultaneous relationship between three entity types.


29

Strong Entity Types Vs Weak Entity Types

A strong entity type (also called regular entity type) has its own identifier or primary
key. Its existence does not depend on another entity type. That is, it can exist
independently.
A weak entity type cannot exist independently. Its existence depends on a strong
entity type or owner. A weak entity type has total participation in the relationship with
its owner. A weak entity type does not have its own primary key. It has a partial key
which can uniquely identify weak entities related to the same owner entity.

In the example below, “Dependant” (e.g. wife, child) is a weak entity type. Its partial key
would be first name, assuming two persons do not have the same name within the
family. At a later design stage, this partial key would be combined with the owner’s
primary key to form a full primary key. So the primary key of Dependant would become
EmpNo+FirstName.
30

ASSOCIATIVE ENTITY TYPE

An entity type that associates the instances of one or more entity types & contains
attributes that are peculiar to the relationship between those entity instances.

EXERCISE:

Draw an ER diagram for the following entity types:

Patient

Appointment

Doctor

Ward
31

Mapping ER MODEL to Relational Schema

Entity Types and Attributes

Each entity type becomes a table. Attributes of the entity become columns of the
table. Identification attribute of the entity becomes the primary key.

Each multi value attribute is mapped to a separate table. The primary key of the original
entity will be taken to the new table as foreign key.

A composite attribute is mapped by creating a separate column for each sub-part


32

One to Many (1:N)

For each one-to-many relationship, take the primary key on the “one” end to the
table representing the “many” end as the foreign key.
33

Many to Many (M:N)

A many-to-many relationship cannot be accommodated directly to relational schema. If


we do that, foreign keys will get multiple values. Therefore, It should be simplified by
breaking it into two one-to-many relationships. This is done by introducing a link entity.
The “many” ends of the relationship will come to the link entity.

If the relationship is an associative entity type, then its attributes will be taken to the
link table
34

One to One (1:1)

There are three variations:

Both Ends Total/Mandatory Participation

We can combine both entity types into a single table. The identification attribute of
either entity type could be chosen as the primary key of the new table.

One End Mandatory other end Optional

In this example, every vehicle relates to a single manager who can have only one vehicle, but
there can be managers without vehicles. Create two tables and take the primary key of the
optional end (i.e. Manager) to mandatory end (i.e. Vehicle) as the foreign key. This will avoid
the need for storing null values for foreign keys.
35

Both Ends Optional

Assume that there can be buses without a driver currently assigned and there can be buses
without a driver being assigned. Create two tables and take the primary key of either end to
other end as the foreign key.

Weak Entity Types

Weak entity type has a partial identifier (firstname in this example). Forming a primary
key is done by combining the partial id with the owner’s primary key.
36

Unary One to One (1:1)

Create a single table and form a foreign key whose values are drawn from the
identification attribute/primary key of the same table

Person

PersonID Name Age Gender MariedToID

P1 Brown 34 M P2

P2 Jane 27 F P1

P3 Ted 25 M P4

P4 Tina 22 F P3
37

Unary One to Many (1:N)

Create a single table and form a foreign key whose values are drawn from the
identification attribute/primary key of the same table (same procedure as the one used
for unary 1:1)

Employee (EmpID, Name, Salary, ManagerID*)

Employee

EmpID Name Salary ManagedBy*

E1 John 5000

E2 Sam 3000 E1

E3 Jane 2000 E1

E4 Kate 6000

E5 Ben 3000 E4
38

Unary Many to Many (M:N)

A unary many-to-many relationship is mapped by creating two tables: one for the original
entity and another to serve as a link entity. The link table should contain two foreign keys, both
drawing values from the primary key domain of the original entity.

ResearchArticle

ArticleD Title Date

A1 Factors that affect employee motivation 2/5/2010

A2 Challenges to HR management 1/25/1995

A3 Self Actualization 10/10/1998

Cites

ReferencingID ReferredToID

A1 A2

A1 A4

A3 A1

A3 A2
39

Ternary

Create four tables – three for the original entity types and the other for linking the
different entity instances of the original three. The new link table has three foreign keys
whose values are drawn from the primary key domains of the original entity types.
40

AN ORDER PROCESSING SYSTEM

Relational Schema
Customer (cno, name, address, contactno)
Order (orderno, date, orderTotal, cno*)
OrderLine (orderno*, productno* , orderQuantity)
Product (pno, pname, price, QuantityOnHand)
41

Sketches of table structures


CUSTOMER

CNO CNAME ADDRESS


C1 Shalini Fonseka Kandy
C2 Brown Gomez Colombo
C3 Tilak De Silva Galle

ORDER

ORDERNO DESCRIPTION DATE CNO


001 School get together 18/10/2020 C1
002 Birthday party 22/11/2020 C1
003 Christmas Party 25/12/2020 C2
004 Office Dinner 31/12/202 C3

ORDERLINE

ORDERNO PRODUCT_NO QUANTITY


001 P1 25
001 P2 50
001 P3 4
002 P1 10
002 P3 10
003 P2 60

PRODUCT

PRODNO PNAME UNITPRICE


P1 Dinner Pack – Chicken 300
P2 Chinese Roll 60
P3 Pepsi 1 Lt 200
P4 Milk Shake Vanilla 180
42

INSTANCE DIAGRAMS
This shows how entity instances connect at run time depending on their cardinality.

Example: Binary 1 to Many relationship between Customer and Order

CUSTOMER PLACES ORDER

001
C1
002

C2
003

C3

NOTE: Customer C3 is currently not having any orders (optional


participation)
43

NORMALIZATION
Normalization is a way of grouping data in relational database design that will eliminate
data redundancies. That in turn removes data anomalies. The difficulties of inserting,
updating and deleting data are known as data anomalies.

Update anomaly – a modification to a single data item will require looking for multiple
occurrences of the same data item.

Insert anomaly – Inserting data about one thing depends on data about another thing.

Delete anomaly – Deleting unwanted data will also result in the deletion of useful data.

Normalization is a bottom up approach for database design. It is a process of non-loss


decomposition. An issue related to normalization is called functional dependency. We
say attribute B is functionally dependant on attribute A, if for every value of attribute A,
there is exactly one value of attribute B.

In other words, the value of A uniquely determines the value of B.

A B

Product No Price

In a properly normalized relation, each non-key attribute is fully functionally dependant


on the key, the whole key and nothing but the key.
44

What is wrong the following table ?

EMPLOYEE INFORMATION
ENO ENAME SALARY DNO DNAME DLOCATION

125 Roy 5000 A1 Admin Colombo

126 Akram 7000 A2 Factory Galle

127 Jane 5000 A1 Admin Colombo

128 Rajan 4000 A2 Factory Galle

129 Nisha 8000 A1 Admin Colombo

130 Amal 7000 A2 Sales Kelaniya

131 Kate 8500 A3 Research Kandy

THERE ARE REDUNDANCIES.

BECAUSE OF REDUNDANCIES THERE ARE ANOMALIES.

Changing Dep Name or Dep Location requires changes to many places(UPDATE


ANOMALY)

Cannot insert a new department without an employee

(INSERT ANOMALY)

Deleting an employee (e.g. Kate) also deletes information about department (DELETE
ANOMALY)
45

Following is an extract of a student grading report required in a school. This is an un-normalized


table because it contains repeating groups (i.e. multi value attributes). Such information cannot
be stored in a relation.

We improve the above table by removing repeating groups.

Student

Std# Sname Major

001 Sam IT

002 Brown Arts

003 Tiran IT

The primary key of the student table is Std#. All the attributes of the Student table are
fully functionally dependant on the primary key. Therefore, the student table is in third
normal form (3NF).
46

Subject-Lecturer-Grade 1NF

Std# Subject# Title Lec# LecName LecAdd Grade

001 111 Databases 156 Ann Kandy A

001 222 ICT 157 John Galle C

001 333 Networks 156 Ann Kandy B

002 444 Geography 160 Dave Colombo A

002 222 ICT 157 John Galle C

Above table has a composite key: Std#+Subject#.

Grade depends fully on the whole key, but other attributes like subject#, Title etc.
depend only on Subject#. This is called a partial dependency. As a result, there are
anomalies (e.g. updating the address of a lecturer such as Ann, Cannot insert a new
subject without at least one student).

A table with partial dependencies is in first normal form (1NF).

A 1NF table violates 2NF.

We improve on the table by removing partial dependencies.


47

StudGrade – 3NF

Std# Subject# Grade

001 111 A

001 222 C

001 333 B

002 444 A

002 222 C

SubjectLecturer – 2NF

Subject# Title Lec# LecName LecAdd

111 Databases 156 Ann Kandy

222 ICT 157 John Galle

333 Networks 156 Ann Kandy

444 Geography 160 Dave Colombo

The primary key of the SubjectLecturer table is Subject#. Subject title functionally
depends on the key, but other attributes like lecturer name and address depend on
Lec#. This is called a non-key/transitive/hidden dependency. As a result there will be
anomalies (e.g. Changing Ann’s address, If ICT is deleted, so would John, Cannot insert a
new lecturer unless we give him a subject).

A table with non-key dependencies is in second normal form (2NF).

A 2NF table vilates 3NF.


48

So we remove non-key/transitive dependencies to yield the following:

Subject – 3NF

Subject# Title Lec#

111 Databases 156

222 ICT 157

333 Networks 156

444 Geography 160

Lecturer – 3NF

Lec# LecName LecAdd

156 Ann Kandy

157 John Galle

160 Dave Colombo

Relational schema of the final set of 3rd normal form tables:

Student ( Std#, Sname , Major)

Studgrade ( Std#, Subject#, Grade)

Subject ( Subject#, Title , Lec# )

Lecturer ( Lec#, LecName , LecAdd)


49

EMPLOYEE INFORMATION
ENO ENAME SALARY DNO DNAME DLOCATION

125 Roy 5000 A1 Admin Colombo

126 Akram 7000 A2 Factory Galle

127 Jane 6000 A1 Admin Colombo

128 Rajan 4000 A2 Factory Galle

129 Nisha 8000 A1 Admin Colombo

130 Amal 3000 A2 Sales Kelaniya

… … … … … ..

What is the key?

ENO

What normal form the table is in?

2NF

Because there is a non key/transitive dependancy (DNAME, DLOCATION


do not depend on ENO, they depend on DNO)

Following dependancy chart shows this:

What normal form the table violates? 3NF


50

IMPROVED DESIGN (NORMALIZED TABLES)

EMPLOYEE

ENO ENAME SALARY DNO*

125 Roy 5000 A1

126 Akram 7000 A2

127 Jane 6000 A1

128 Rajan 4000 A2

129 Nisha 8000 A1

130 Amal 3000 A2

… … … …

DEPARTMENT

DNO DNAME DLOCATION

A1 Admin Colombo

A2 Factory Galle

A2 Sales Kelaniya

… … ..

EMPLOYEE (ENO , ENAME , SALARY , DNO )

DEPARTMENT (DNO, DNAME , DLOCATION)


51

DBMS Architecture / ANSI-SPARC Three Schema Architecture


Internal level - The internal level has internal schema, which describes the physical
storage structure of the database. It is concerned with how data are physically stored in
some storage medium such as hard disk. Issues like indexes, data compression (Zipping),
physical representation of data fields, encryption etc. are relevant to this level.

Conceptual level - The conceptual level has conceptual schema that describes the
structure of the whole database for a community of users. It describes the entity types,
their relationships and the integrity constraints of the entire database. The conceptual
schema does not include the details of physical storage structures and access
mechanisms.

External level - The external level includes a number of external schemas or user views.
Each external schema describes the part of the database that a particular user group is
interested and hides the rest of the database from that user group. Application
programs and user application interfaces such as Database forms/Web Forms work on
this level (i.e. they access the database via the external schemas or views).
52

An example of three levels

External Level

Create view EmpContacts as Select Name as FullName, address from employee;

Create view EmpSal as Select Empno, Salary as basicSalary from employee;

Conceptual Level

Employee Table

CREATE TABLE EMPLOYEE

(Empno char (5) primary key,

Name varchar (20),

Address varchar (30)

Salary decimal (8,2));

Internal level

Eno B+ Tree UNIQUE Index CREATED

Na bytes 20

Add bytes 30

Sal floating point binary, total of 8 digits rounded to 2 decimals


53

Mappings

The correspondence from one level of the three-schema architecture to another level is
called a mapping. The conceptual/internal mapping defines the correspondence
between the conceptual level and the stored database. The external/conceptual
mapping defines the correspondence between the conceptual level and the user views.

Data independence

The ability to change the schema at one level of a database system without affecting
the schema at the next higher level is known as data independence.

1. Logical data independence


It is the ability to change the conceptual schema without having to change external
schemas or application programs. The separation of external schema from conceptual
schema provides logical data independence. We may change the conceptual structure
to expand the database (add a new attribute or table) or to reduce the database (delete
an attribute or a table) or to restructure the database (e.g. split a table, merge two
tables into one). These changes will not impact on the external schema or application
programs.

2. Physical data independence


It is the ability to change the internal schema without having to change the conceptual
schema. Hence, the external schemas need not be changed as well. The separation of
conceptual schema from internal schema provides physical data independence. Changes
to internal schema may be needed because some physical data had to be reorganized to
improve retrieval performance (e.g. creating an index).
54

CHALLENGES OF IMPLEMENTING THREE SCHEMA ARCHITECTURE

Database vendors adopted the three-schema terminology, but they have implemented
it in incompatible ways. There was no single standard. Various groups attempted to
define their own standards for the conceptual schema and physical schema. However,
external schema (SQL Views) has one standard among most vendors. As a result,
achieving Logical Data Independence is fairly uniform across many RDBMS products
such as Oracle and MS SQL Server.
55

TRANSACTION MANAGEMENT
A transaction is a collection of operations that forms a single logical unit of work (e.g.
transferring a sum of 1000 from account A to account B).
A transaction takes the database from one consistent state to another. To ensure the
integrity of the database, the DBMS must maintain the following properties of a
transaction known as ACID (Atomicity, Consistency, Isolation and Durability).

1. Atomicity
When executing a transaction, the DBMS must ensure that either the entire transaction
is performed or none of it is performed (All or nothing). Atomicity is ensured by the
operations COMMIT and ROLLBACK.

COMMIT
This signals the successful end of a transaction. It tells the transaction manager that a
logical unit of work has been successfully completed and that all updates made by the
transaction should be made permanent.

ROLLBACK
This signals the unsuccessful end of a transaction. It tells the transaction manager that
something has gone wrong and all updates made so far by that transaction should be
undone/removed.
56

In addition to above operations, most DBMSs allow “ Save points” to be defined in a


transaction so that partial rollbacks can be performed.
57

2. Consistency
All transactions must preserve the consistency and the integrity of the database.
Transaction must not leave the database in an inconsistent state. This is enforced by
data integrity constraints (Entity, Referential, Domain and Additional integrity).

3. Isolation
Concurrent transactions must not interfere with each other. Any given transaction’s
updates should be concealed from other transactions until that transaction commits.
Isolation is enforced by concurrency control methods such as locking.

4. Durability
Once a transaction completes successfully, all updates carried out by that transaction on
the database must persist, in case of a software or a hardware failure. The changes
made by the transaction must be preserved. This is supported by back up and recovery
methods.

Concurrency
Allowing multiple transactions to access the same database at the same time is
called concurrent processing. Concurrent processing is a desiarable feature of a
DBMS as it improves data utilization.
Despite its many benefits, concurrent processing has its own problems which, IF
NOT CONTROLLED, would lead to an inconsistent database.
58

Concurrency Problems

Lost Update - Two concurrent transactions read a record to update it, and the first one
to write the record loses its update, when the second one completes (Second transaction
overwrites the update made by the first one).
Time Transaction A Transaction B Record1
(Disk)
0 Seats = 10
1 Read Rec1 [Seats=10] Seats = 10
2 Seats = Seats + 3 Seats = 10
3 Read Rec1[Seats=10] Seats = 10
4 Write Rec1, Commit Seats = 13
5 Seats=Seats+5 Seats = 13
6 Write Rec1, Commit Seats = 15

The correct No. of seats should be 18

Uncommitted Dependency (Dirty read) - One transaction B reads a value that has
been changed by an as yet incomplete transaction A. If transaction A rolls back, then
transaction B is using a data value which is no longer valid.

Time Transaction A Transaction B Record 1


[Disk]
0 Seats = 10
1 Read Rec1 [Seats=10] Seats = 10
2 Seats = Seats + 5 Seats = 10
3 Write Rec1 Seats = 15
4 Read Rec1 Seats = 15
[Seats=15]
5 Rollback Seats = 10
6 Seats = Seats + 2 Seats = 10
Write Rec1, Commit Seats = 17
The correct No. of seats should be 12
59

Inconsistent Analysis - A transaction reads a long series of records one at a time,


and meanwhile some of these records are getting modified by other transactions. Some
before they are accessed, and some afterwards. As a result the reading transaction can
show inconsistent values. Example: Transaction A is summing up account balances.

Transaction B is transferring an amount of 10 from account 3 to account 1.


Initial balances: Acc1 = 40 : Acc2 = 50 : Acc3 = 30

Time Transaction A Transaction B


1 Read Acc1 Sum = 40
2 Read Acc3
3 Update 3020
4 Read Acc2 Sum = 90
5 Read Acc1
6 Update 4050 Commit
7 Read Acc3 Sum=110
The correct sum should be 120

Unrepeatable read - A transaction T1 reads a data item twice and the item is changed
by another transaction T2 between the two reads. Hence the reading transaction T1
receives two different values of the same data item within the same transaction.

Time Transaction A Transaction B Record 1


[Disk]
0 Seats = 10
1 Read Rec1 Seats = 10
[Seats=10]
2 Read Rec 1 [Seats=10] Seats = 10
3 Seats = Seat + 1 Seats = 10
4 Write Rec 1 Seats = 11
5 Read Rec1 [Seats=11] Seats = 11
60

Phantom read - A transaction applies a query to display a set of tuples based on a


condition. Meanwhile, another transaction inserts a tuple that will satisfy the condition
set by the first transaction. If the first transaction applies the same query again, then it
sees a tuple that did not exist earlier (a phantom).

Solutions for Concurrency Problems:

Locking
A transaction can lock a data granule such as a record or table so that other
transactions will not be able to access it. The transaction can retain this lock until
COMMIT so that other transactions will not be able to interfere with it. Locking can
prevent concurrency problems such as Lost update, Dirty read, Unrepeatable read etc.

Types of Locks:
Shared Lock (Read Lock) : This gives read-only access to a data unit such as a record and
prevents any transaction from updating the data. Any number of transactions can hold a
shared lock on a data unit at a given time.

Exclusive Lock (Write Lock) : The transaction imposing the write lock can have both
read and write access to the data. Other transactions cannot read from or write to the
it. A data unit can have only one write lock at a given time.
61

SCHEDULES AND SERIALIZABILITY


When database transactions are executing concurrently in an interleaved manner,
the order of execution of operations from the various transactions is known as a
schedule.

T1 T2

Write (X)

Read (x)

Write (A)

Read (A)

Write (A)

SERIAL SCHEDULE
A serial schedule is a schedule where all operations of each transaction are
executed consecutively without any interleaved operation from other
transactions.

T1 T2

Read (x)

Write (x)

Read(x)

Write (x)
62

SERIAL Vs. CONCURRENT/NON-SERIAL SCHEDULES


A serial schedule is a schedule where all operations of each transaction are executed
consecutively without any interleaved operation from other transactions. A non-serial
schedule is a schedule where the operations from a set of concurrent transactions are
interleaved.

Example - With X = 50 and Y = 1 as initial correct state and X = 65 and Y = 4 as the


final correct state

Serial schedule – T1 followed by T2


T1 T2

Read (x)

x = x + 10

write (x)

read (y)

y = y +2

write (y)

read (x)

x = x +5

write (x)

read (y)

y = y +1

write (y)
63

A non-serial / concurrent schedule equivalent to the serial schedule given


earlier (This schedule achieves serialzability and as a result produces a correct
and consistent database state)

T1 T2

Read (x)

x = x + 10

write (x)

read (x)

x = x +5

write (x)

Read (y)

y=y+2

write (y)

read (y)

y=y+1

write (y)

Above schedule is serializable since its effect is same as a serial schedule


64

A non-serial / concurrent schedule that results in an incorrect state


T1 T2

Read (x)

x = x + 10

read (x)

x = x +5

write (x)

read (y)

Write (x)

read (y)

y=y+2

write (y)

y=y+1

write (y)

Final incorrect values: X = 60 Y=2

This schedule is NOT serializable

SERIALIZABILITY
To ensure that concurrent execution of transactions does not take the database to an
incorrect state, the system must enforce Serializability. This concept says that a non-
serial schedule must have the same effect as a serial schedule.
65

DATABASE RECOVERY

Database recovery is concerned with restoring the database to a correct and


consistent state after hardware or software failure.

Recovery facilities:

1. Regular back ups


Back up copies of the database should be made on a regular basis. These back up copies
should be kept off the site for added security. Magnetic tape cartridges are commonly
used as back up medium. Guidelines for making effective back ups:

- Take back ups on a regular frequency (e.g. every day at 6PM)


- Label each back up with file name, date, time
- Encrypt the back up and keep the decrypt key with the DBA
- Keep back ups off the site for added security
- Choose a high capacity medium with fast data transfer rate (e.g. tape cartridge)
- Synchronize database back up with transaction log back up

2. Transaction Log
A transaction log records essential details of each transaction. Data that are recorded
for each transaction include:

 Log sequence number (LSN)


 Date and time
 Data item name
 Type of operation (e.g. update, delete)
 Before image (old value) of the changed data item
 After image (new value) of the changed data item
66

A single transaction can have several log records depending on the number of data
items accessed. The log allows two types of recovery:

Backward recovery - When a transaction fails halfway through, it should be undone (i.e.
rollback). This is done by using before images of changed data items.

Forward recovery – This is used to reapply changes (i.e. redo) to the database. This is
done by using after images of changed data items.

3. Checkpoints
Periodically (e.g. every 15 minutes) the DBMS takes a checkpoint where it ensures the
consistency of the database. At the checkpoint, any committed transactions still waiting
in the buffer are force written to disk and a special checkpoint record is written to the
transaction log. This checkpoint record indicates the time of the check point and a list of
transactions currently in progress. Checkpoints make recovering from a failure easier.

Write Ahead Logging (WAL) Protocol

To be effective in recovery operations, the log must record database


modifications before they are applied to the actual database.
67

Types of failure and methods of recovery:

1. Media failure (Disk crash)

Database is physically damaged. (e.g. disk crash)

How to recover: We must take the most recent back up copy of the database and copy
it to a new disk. We need to bring the database to a state immediately before it got lost.
This is done by REDOING the transactions that have occurred since the last back up.
Redoing involves applying after images of changed data items from the log to the
database. This is called FORWARD RECOVERY.

2. Transaction failure
A single transaction aborts due to an exception/run time error.

How to recover: Rollback/undo the changes made by the transaction by applying


BEFORE images from the log.
68

3. System failure

All the transactions currently in progress in the database server are lost. e.g. power
failure. There are 2 recovery methods for system failure:

Recovering under Immediate update

Modifications to data items are carried out as they occur to disk without waiting for the
transaction to reach its COMMIT point. In case of a failure, all the transactions that
were in progress at the time of failure must be rolled back/undone. Since the
transactions are allowed to commit before all its changes are written to the actual
database (but fully recorded in the log under WAL protocol), then there may be a need
to REDO committed transactions in case of a failure. So the transactions that were
committed after the last checkpoint are redone. This is called UNDO/REDO recovery
algorithm.

Undo/Redo recovery technique used for immediate update

T1 , T3  UNDO/ROLLBACK T2 , T4  REDO
T5  DO NOTHING
69

Recovering under Deferred Update

Defer or postpone any actual updates to the database until the transaction completes
its execution successfully and reaches its commit point.

After the transaction reaches its commit point the updates are recorded in the actual
database.

If a transaction fails before reaching its commit point, there is no need to UNDO any
operation. However, it may be necessary to REDO a committed transaction from the
log because its effect may not have been recorded in the actual database. Hence,
deferred update is known as NO_UNDO/REDO recovery algorithm.

NO Undo/Redo recovery used for Deferred update

T1 , T3  Nothing to do since their changes did not


affect database
T2 , T4  REDO
T5  DO NOTHING since it was committed before
checkpoint
70

DATABASE DESIGN STAGES

Requirements collection and specification

Gather client’s data requirements using methods like interviewing, questionnaire etc.

Conceptual Design

This is a detailed conceptual model of all entity types, relationships and constraints,
free of any technlogy or database model. This is expressed as an ER diagram.

Logical design

This maps the conceptual model (i.e. ER) to logical schema of the target database
model. This is independent of any technology (DBMS, platform etc. ) but dependant of a
particular data model. It is a concise description of the data requirements. If the target
model is relational then the result is fully normalized relational schema in bracketting
notation.
71

logical relational schema

Airplane (regno, make, capacity, countryname*, pilotid*)


Country (countryname, capital)
Airport (ID, Location, capacity)
Pilot (pilotid, name, salary, contactno)

Physical design

The result of logical design is converted to data structures of the target database model
using the target DBMS. The result is physical schema which can be used create the
physical database. This is technology dependant. Physical schema is created using SQL
DDL.

Create Table Airplane

( regno int primary key,

Make varchar(20) not null,

capacity int,

….. );

Create Table Pilot

( ID int primary key ,

fullname varchar(20),

age int,

);
72

DATABASE SECURITY
Authentication
Authentication is concerned with the positive identification of database users. It
ensures that only authorized users with a login account and a valid password are
allowed access to the database. To enroll users into the DBMS, the DBA allocates a
unique user name, a password and a predefined profile. Certain initial privileges such as
any server roles, what databases are allowed to access, disk space quota etc. are
assigned to the profile.

Create User Brown Identified By star4567 Profile app_developer;

Database Access Privileges / Access Rights

Once the users are logged into the database, they should be given appropriate database
privileges. For a given user, these privileges define the types of action he/she can take
against the database.

Types of privileges include SELECT, INSERT, UPDATE and DELETE. Some users might be
given read only access while others may be given the full set of privileges. Access
privileges can be defined for individual users as well as for user roles.

Assigning privileges is done by using the SQL command GRANT.

GRANT SELECT, UPDATE ON EMPLOYEE TO ‘JANE’;

Above statement issues read and update rights on employee table to Jane.

GRANT ALL ON EMPLOYEE TO ‘SAM’;

Above statement issues ALL privileges (insert, update, delete, select) on employee table to
Jane.
73

Removing privileges is done by using the SQL command REVOKE

REVOKE INSERT, DELETE ON STUDENT FROM ‘BROWN’;

Roles - We can group users according to their roles. For example, in a bank, the roles
could include Teller, Manager and System Administrator. Once we classify the users into
roles, we can then issue access privileges to roles rather than individual users.

Create Role Teller Identified By ‘abc64’;

Grant Select, Update on Account To Teller;

Views

Confidentiality of data can be enforced via SQL Views. Views can be used to hide
sensitive/confidential data from inappropriate staff. Views ensure that a given user
sees only what is relevant to him/her. Views are used by the DBA to control access to
the database.

A view is a virtual table. It derives data from underlying base tables.

Example: Supposing there is an employee table with empid, name, address, contactno and
salary. The reception need to know their contact numbers. So we create a view that has only
employee names and contact numbers. Confidential data like salary are hidden from the user
group.

Create View EmpContact As Select Name, ContactNo From Employee;

Security can be further enhanced by providing Access Privileges to Views via GRANT
statement of SQL.
74

NOTE: GRANT clause can only provide column level or table level security (i.e. it can
only restrict access to columns or tables). It cannot provide row level security. To restrict
access to rows, views can be used. Grants can then be defined on views to define who
can access them.

Create View SalesEmp As

Select * From EMPLOYEE Where Department = ‘Sales’;

GRANT Select, Insert ON SalesEmp TO ‘Brown’;

Data Encryption

This involves scrambling data so that they cannot be understood without the proper
decryption key. Sensitive information typically encrypted include credit card numbers,
sensitive personal information (e.g. health details) and user passwords. The decryption
key should be given only to authorized users.

Techniques and Precautions to protect a database from unauthorized users

- Create user accounts with user names and passwords


- Assign access privileges to users on database objects (e.g. read only rights to
some users)
- Create SQL Views to hide confidential data from inappropriate staff
- Encrypt sensitive data
- Maintain good password policy (minimum size 10 characters, mix of upper case,
lower case and numbers, automatic password aging check, passowrd encryption)
- Apply physical and network controls (firewall, CCTV, guards, id tags, alarms etc.)
75

OCTOBER 2021
76

DATABASE SERVER ARCHITECTURES

CLIENT/SERVER

Client/server model divides duties between client machines and a server. The
server is a powerful machine that contains the database. Clients send queries to
the server. The server processes the queries and send the answers to the clients.
User authentication, concurrency control, integrity checking all done by server.

There are two types of client/server architectures:


77

Three Tier ADVANTAGES:


1. Client has only input and output (thin client) – this improves access speed. It
enables fast download of web based information.
2. Easy modifications to application logic/business rules - If business rules are
changed, only one place (i.e. application server) needs to be updated
3. More secure – Business logic is not downloaded to clients

Above advantages/reasons make Three Tier more suitable for Web applications.
78

CLOUD COMPUTING

The idea of Cloud is to obtain computing services such as servers, data analytics,
storage and other things over the Internet at a price (usually a monthly rental fee). This
is similar to the way we obtain services such as electricity and water. Companies
offering these computing services are called cloud providers.

The advantage of cloud based database services include:

1. Cost

Cloud computing eliminates the high initial cost of buying expensive hardware and
software. It also eliminates many costs related to running and maintaining an on-site
data center.

2. Expertise

Companies that provide cloud services specialize in necessary experitse and technical
“know how”. Such expert knowledge is not easily available to small companies.

3. Performance

Cloud computing services run on powerful, fast and secure data centres. They are
regularly upgraded to latest computing hardware. This vastly reduces network latency
for business applications.

Difference between Cloud and Client/Server:

Client/server model is a traditional network arrangment that divides duties between


client machines and a server. The entire client/server set up, management and
maintenance is “in house”. It is owned and maintained by the company that uses it. Its
services are for that company only (not for rent). Cloud is an outside, Internet based
service provider.
79

SQL

DATA DEFINITION LANGUAGE (DDL)

This is used to CREATE, MODIFY/ALTER and DELETE data structures. DDL


statements modify meta data stored in the data dictionary/catalog.

Creating a table structure : Example

Create Table Customer


(CustomerID Char(5) Primary Key,
Name varchar(20)
Age int ) ;
Modifying a table structure – Add a column: Example

Alter Table Customer


ADD email varchar(25) ;
Modifying a table structure – Remove a column: Example

Alter Table Customer


DROP COLUMN email ;
Removing a table structure : Example

DROP TABLE Customer;


80

DATA TYPES IN SQL

INT – A fairly large integer.

DECIMAL(M,D) –Defining the display length M and the number of decimals D is

required.

DATE – A date is YYYY-MM-DD format between 1000-01-01 and 9999-12-31.

DATETIME – A date and time combination in YYYY-MM-DD hh:mm:ss format.

TIME – Stores time in hh:mm:ss format

YEAR(M) – Stores a year in 2 digit or 4 digit format.

e.g. YearJoined YEAR(2) will store years in 2 digits.

CHAR(size) – A fixed size string between 1 and 255 characters in length.

VARCHAR(size) - A variable length string between 1 and 255 characters in length.

BLOB (Binary Large Object) – It is used to store large amounts of binary data such

as images, audio and video.

CLOB (Character Large Object) – It is used to store very large text objects
81

DATA MANIPULATION LANGUAGE (DML)

This is used to INSERT, UPDATE, DELETE and SELECT (Retrieve) data.

EMP

EMPNO NAME SALARY GENDER

125 Brown 7000 M

126 Jane 6000 F

128 Tom 10000 M

129 Anne 15000 F

1. Retrieve Data (SELECT)

Select Name, Salary From Emp ;

ROW LEVEL FILTERING


We could apply a condition/predicate to data retrieval by using “WHERE” clause. For
example, to retrieve names and salaries of male employees, we could write:

Select Name, Salary

From Emp

Where Gender=’M’;

The condition can have multiple parts – the logical operators AND, OR ,NOT can be used
to construct multi part conditios.

Select * From Emp Where Salary>6000 AND Gender=’F’;


82

2. Insert Data

Insert Into Emp Values (‘130’,’Grey’,25000,’M’);

Insert Into Emp (Empno, Name, Salary, Gender ) Values (‘130’,’Grey’,25000,’M’);

3. Update Data

CHANGE BROWN’S SALARY TO 18000

Update Emp

Set Salary = 18000

Where Name = ‘Brown’ ;

INCREASE ALL SALARIES BY 2000/=

Update Emp

Set Salary = Salary + 2000 ;

CHANGE JANE’S NAME TO ‘JANE DE SILVA’

Update Emp

Set Name= ‘Jane De Silva’

Where Name = ‘Jane’ ;


83

4. Delete Data

DELETE BROWN

Delete From Emp

Where Name = ‘Brown’

DELETE ALL EMPLOYEES

Delete From Emp;

TRUNCATE TABLE Command

This command deletes all the records/rows, but the table structure remains. This
operation cannot be rolled back. It performs an automatic COMMIT.

e.g. TRUNCATE TABLE EMP;

Note: This is same as DELETE FROM EMP but the difference is that DELETE
command can be rolled back, truncate cannot.
84

Sorting Information
Select ename, salary

From Employee Where Gender = ‘M’

Order By Salary DESC ;

NOTE: ASC – ascending, DESC – descending, default is ascending

Pattern Matching

SQL pattern matching enables you to use “_” to match any single character and “%” to
match an arbitrary number of characters. SQL patterns are case-insensitive by default.
Pattern matching is done by using LIKE or NOT LIKE comparison operators.

EMP

ENO NAME SALARY


125 Amal Perera 25000
425 Tom Jones 45000
762 Rani Perera 15000
545 Ajit Silva 26000

List employees who have names starting with ‘A’


Select *
from emp
where name like 'A%' ;
85

List employees whose second name is ‘Perera’


Select *
from emp
where name like '%Perera'
List employees whose employee numbers have 2 as second digit
Select *
from emp
where eno like '_2%' ;

Summary Information

Summaries are produced by using “Group By” clause. It isused with one or more
aggregate functions. When a summary output is subject to a condition, “where” clause
cannot be used, instead a clause called HAVING is used.
PURCHASE
SUPPLIER PRODUCT QUANTITY
LG AC 25
SONY TV 10
LG DVD PLAYER 40
SONY DIGITAL CAMERA 15
APPLE IPOD 7

List total quantity for each supplier

Select Supplier, SUM(Quantity) As Totqty

From Purchase

Group By Supplier ;
86

List the number of products supplied by each supplier, from highest to lowest

Select Supplier, COUNT(*) As NumProd

From Purchase

Group By Supplier

Order By COUNT(*) DESC;

List the total and average number of products supplied by each supplier provided
he/she supplies more than one product

Select Supplier, SUM(Quantity) As Totqty , AVG(Quantity) As NumProd

From Purchase

Group By Supplier

Having COUNT(*) > 1 ;


87

Table Joins
EMP

EMPNO ENAME DEPNO SALARY GENDER

125 Joe A1 7000 M

128 Sam A2 10000 M

137 Jane A1 12000 F

140 Grey A2 15000 M

145 Tina A3 20000 F

167 Kelly A1 5000 F

DEP

DEPNO DEPNAME TELNO

A1 IT 45454

A2 Admin 56565

A3 Sales 67677

A4 R&D 22222

List the employee names along with their department numbers and department names

Select ENAME, DEP.DEPNO, DEPNAME


From EMP, DEP
Where EMP.DEPNO = DEP.DEPNO ;
88

List the names and salaries of employees who belong to “IT” department

Select ENAME, SALARY

From EMP e, DEP d

Where e.DEPNO = d.DEPNO And DEPNAME ='IT';


List the names, salaries and department names of female employees who earn more than
6000

Select ENAME, SALARY, DEPNAME

From EMP e, DEP d

Where GENDER=’F’ AND SALARY> 6000 AND

e.DEPNO = d.DEPNO AND DEPNAME ='IT';


List the names of departments that have got at least one employee

Select DISTINCT DEPNAME

From EMP, DEP

Where EMP.DEPNO = DEP.DEPNO;

Note: We can also answer the above query using “IN” clause

Select DEPNAME

From DEP

Where DEPNO IN

( Select DEPNO

From EMP) ;
89

Using “IN” clause method, List the names of employees who belong to “ADMIN” department

Select ENAME

From EMP

Where DEPNO IN

( Select DEPNO

From DEP

Where dname = ‘ADMIN’) ;

List the names of departments that haven’t got any employees

Select DEPNAME

From DEP

Where DEPNO NOT IN

( Select DEPNO

From EMP) ;

Using “IN” clause method, List the names of male employees who belong to “IT’ department

Select ENAME

From EMP Where Gender = ‘M’ AND DEPNO IN

( Select DEPNO

From DEP

Where dname = ‘IT’) ;


90

Types of TABLE JOINS


A table join operation combines rows from two tables into a single table. This is normally
done by equating the primary key values of one table with the foreign key values of the
other table. Where the primary key values match with the foreign key values, the two
corresponding tuples are joined.

Inner Join

Inner join is the most common type of join. It combines rows/tuples of two tables based
on a join predicate which refers to the primary keys and foreign keys. Where the primary
key value of one table matches with the foreign key value of the other table, the two
corresponding tuples are joined.

Select * From EMP, DEP

Where EMP.DEPNO = DEP.DEPNO;


91

Above query can also be written as:

Select * From EMP Inner Join DEP

On EMP.DEPNO = DEP.DEPNO;

Outer Join

An outer join extracts rows/tuples that do not match their primary and foreign key
values in addition to the tuples that match. There are three types of outer joins: LEFT
OUTER JOIN, RIGHT OUTER JOIN and FULL OUTER JOIN.

LEFT OUTER JOIN – Where table A and table B are joined and table A is specified as left,
a left outer join produces all the tuples from table A including the ones that do not
match with table B.

SQL

Select * From EMP

Left Outer Join DEP On EMP.DEPNO = DEP.DEPNO ;

ALGEBRA

Emp depno = depno Dep


92

NAME EMP.DEPNO DEPNAME DEP.DEPNO

JOE A1 IT A1

SAM A2 ADMIN A2

BROWN A1 IT A1

PAT A2 ADMIN A2

TOM

RIGHT OUTER JOIN – a right outer join produces all the rows from the right table
including the tuples that do not match with the left table.

SQL

Select * From EMP

Right Outer Join DEP On EMP.DEPNO = DEP.DEPNO;

ALGEBRA

Emp depno = depno Dep

NAME EMP.DEPNO DEPNAME DEP.DEPNO

JOE A1 IT A1

SAM A2 ADMIN A2

BROWN A1 IT A1

PAT A2 ADMIN A2

QM A3
93

FULL OUTER JOIN – Produces all the tuples from both tables (it combines left outer
join and right outer join)

SQL

Select * From EMP

Full Outer Join DEP On EMP.DEPNO = DEP.DEPNO;

ALGEBRA

Emp depno = depno Dep


94

SQL VIEWS
A view is a virtual table. It derives its data from underlying base tables.

Views are used by the DBA to control access to the database. Views can be used to
hide sensitive/confidential data from inappropriate staff. Views ensure that a given
user sees only what is relevant to him/her. In a multiuser database, each user/user
group has their own view of the database.

Views belong to the external level of the ANSI-SPARC Three Schema Architecture.

Views can be manipulated just like other tables within some limits. We can also create a
view from another view.

Querying a view is less efficient than querying base tables directly.

EMP
Fname Lname Address Salary Gender Dno
Tom Jones Kandy 5000 M A1
Rita Heyworth Colombo 8000 F A2
Alan Turing Galle 4000 M A1
Roy Silva Jaffna 9000 M A1

CREATE VIEW EmpContact AS

SELECT Fname, Lname, Address FROM Emp ;

Views can be accessed and manipulated just like ordinary tables:

SELECT *

FROM EmpContact ;

UPDATE EmpContact SET Salary = 6000

Where Ename = ‘Tom’;


95

Views also can have derived attributes.

CREATE VIEW Salaries As SELECT Lname,Salary, Salary*8/100 As EPF

FROM Emp;

Views can have table joins too.

CREATE VIEW KandyOrders As


SELECT Custname, orderno, amount FROM Customer C, Orders OD
WHERE C.CITY = ‘KANDY’ AND C.CNO = OD.CNO;

Limits that apply to views:

 Derived information (i.e. calculations) and aggregate functions (sum, avg etc.)
cannot be modified
 Views with ”GROUP BY”, DISTINCT and ORDER BY cannot be modified

Modifying a view

It is NOT possible to modify a view but you can replace an existing view like this:

Create Or Replace View DEMO As Select * From STUDENT Where Gender= ‘M’;

Advantages of Views:

- Views do not occupy storage space (they are virtual tables)


- Views allow query simplification – complex queries can be turned into views so
that end users can simply call the view instead of entering complex code
- Views provide security – confidential data can be hidden
- Views enable logical data independence – base table can be restructured without
impacting on end user queries and applications

Deleting a view

Example: DROP VIEW empcontact ;


96

DATA CONTROL LANGUAGE (DCL)

This is used to create user accounts and grant access privileges to those users on
database objects such as tables and views. Privileges are access rights (e.g. SELECT
privilege for data retrieval) over the database objects given to users. DCL is also used to
remove those access privileges.

DCL has GRANT statement to issue privileges and REVOKE statement to remove
privileges.

DML privileges include SELECT, INSERT, UPDATE and DELETE (‘ALL’ can be given when
issuing all privileges)

GRANT SELECT,INSERT ON employee TO ‘Tina’;


GRANT ALL ON student TO ‘Ben’;
DDL privileges include CREATE TABLE, ALTER TABLE and DROP TABLE.

GRANT create table TO ‘Brown’;


Creating a user account with DCL

CREATE USER ‘Ted’ IDENTIFIED BY ‘moon123’;

Creating a role with DCL

CREATE ROLE manager IDENTIFIED BY ‘star536’;

Removing privileges

REVOKE INSERT, DELETE ON STUDENT FROM ‘BROWN’;

REVOKE ALL ON EMPLOYEE FROM ‘ROY’;


97

GRAPH DATABASES
A graph database organizes data into nodes, relationships and properties. Nodes are
entities or objects that have properties, such as name and age. Nodes have
relationships to other nodes. Graph databases are use din NOSQL systems.

A graph database is a group of nodes related to each other. Edges of the graph
connects one node to another via a relationship.

A node can hold any number of attributes (e.g. empid, name, salary) called properties.
Nodes can be tagged with labels (e.g. Employee, Company, City).

Relationships provide directed, meaningful connections between two node entities


(e.g. Compnay LOCATED_IN City). The direction of the relationship controls
navigation/traversal through the graph.
98

KEY–VALUE DATABASES/MODEL
The database consists of lots of aggregates (an aggregate is a grouping of related data)
with each aggregate having a key / ID to retrieve data. The aggregate may contain
alphanumerical data (e.g. product name, price), images, audio and video.

With a key-value database, we can only access an aggregate by a lookup based on its
key. That is, it allows us to search for a given key (e.g. product id) and retrieve the
corressponding data value (e.g. product price).
99

OBJECT ORIENTED DATABASE


An OO Database is an integrated collection of objects.

An object is a software component that has a unique ObjectID (OID), a state and a set
of operations that work on the state. The Object ID is system generated. The OID is
used by the system to identify objects uniquly and to implement interobject references
(e.g. a customer object refering to order objects)

The state of an object is made up of its attribute values. The operations specify the
behavior of objects. These are implemented as methods or functions.

Objects are grouped into classes. A class is a blueprint for a set of similar objects. A class
is a type. An object is an instance of a class. Classes are created using Object Definition
Language (ODL).

An OO database has PERSISTENT objects, that is they continue to exist even after
terminating the program.
100

PAST PAPER QUESTIONS AND ANSWERS

March 2013 – Q5. - (a) Describe the various interfaces, tools and techniques that a
user (technical or otherwise) may employ when interacting with a database.
(10 Marks)

SQL interface at command line terminal - A basic text only interface where the user can
enter and execute SQL code (example: Oracle SQL plus). This can be used for complex
queries and database administration work. It requires a comprehensive training on SQL.
Suitable for advanced users such as DBA, not for end users.

QBE (Query By Example) interfaces (example: QBE of Microsoft Access) that allow end
users to perform queries simply by filling a template. This is easy to use and does not
require any special training but it is not suitable for complex queries.
101

Report generators and Form generators – An easy to use GUI based interface that
enables end users create their own database reports (example: Report Wizard of
Microsoft Access) and database access forms (example: Form Wizard of Microsoft
Access). Users do not need coding knowledge. These facilities enable faster application
development but may require more memory and processing power. Also there may be
customizing problems. Suitable for end users.

SQL script - A file of SQL code used to perform a batch of database tasks. (e.g. a
Notepad file with .sql extension containing a series of CREATE TABLE statements).
Created by technical users such as the DBA or developers. It demands strong coding
skills and technical knowledge.

Database Utilities – Provides automated support for common database related tasks.
(e.g. ETL tools for loading data into a data warehouse). Requires high levels of technical
knowledge. Used by the DBA and operators
102

Middleware – Software that interposes between a client and a server so that a client
application can connect to a remote database (examples: ODBC, JDBC)

Web forms – Provides a simple and easy to use facility to query, insert, view or update
data in a database. It is used at the client layer (presented in a browser) of three tier
client/server architecture. Any with Internet can access it (e.g. customers, employees)

Client side processing tools (e.g. javascript) to carry out client side processing such as
validating user input. Performing validations on the client itself is more efficient as it
saves the round trip to the server.

Server side processing tools (e.g. PHP) to carry out server side processing such as
requesting data from a database.
103

(b) Explain what the term data validation means. Using your own examples, describe
the various data validation techniques that may be embedded into a forms-based
interface to a database. (10 Marks)

Data validation is concerned with ensuring that only valid, accurate and well-
formatted data is accepted into the database. Data validation ensures data integrity.
A form based interface could carry out following validations:

Validating user name and password at log on – This is done by comparing the input
user name and password with the stored ones in the database

Range checks for numerical quantities (for example, marks entered for a single subject
should be in the rang 0 to 100)

Format masks (for example, date of birth can have DD/MM/YY format)

Consistency checking (e.g. a user who enters “Mr” for title, must enter “Male” for
gender)

Membership/existence check (example: A supplier must be a valid entry in the


database)

Presence checks to ensure important fields are not left blank (example: customer
name and address cannot be null)

(c) Describe the form components that may be used to implement these data
validation techniques. (5 Marks)

Drop down list boxes (combo boxes) ensure only a predefined item is selected out of
many (example: selecting country of the user)
104

Radio buttons ensure only a single choice is made out of several (example: selecting
gender out of male or female)

On screen calendar when prevents typing mistakes when entering date fields such as
reservation date

Check boxes enable the user to select several items out of many

Double entry text boxes (example: entering same password twice)

APR 2013 - ER QUESTION

(c) Using examples derived from the ER model explain the difference between :-

i) Strong and Weak Entity Types

A strong entity type can exist independantly on its own. Its existance does not depend on
another entity type. A strong enbity type has its own primary key. Order is an example for a
strong entity type. Its primary key is ordernumber.

A weak entity type cannot exist on its own. Its existance depends on another entity type
(owner). It cannot form its own primary key. It only has a partial key. A complete primary key is
formed by combining the partial key with the owner’s foreign key.

Orderitem is a weak entity type. It has itemnumber as partial key. It is unique only for items of a
single order. We can form a complete primary key by combining the partial key itemnumber
with owner’s foreign key ordernumber.
105

ii) Binary and Ternary Relationship (6 marks)

A binary relationship exists between two entity types. OrderAuthor relationship


between Customer and Order is a Binary relationship.

A ternary relationship is a simultaneous relationship between three entity types.


106

Following instance diagram shows how instances of three entity types relate.

March 2018 - A1

a) Explain the MAIN objectives of the ANSI-SPARC architecture for a DBMS. Discuss
briefly the challenges of achieving these objectives in practice. (10 marks)

The main objective is to achieve DATA INDEPENDANCE by describing the database at 3


levels namely INTERNAL , CONCEPTUAL and EXTERNAL level. Data independence
enables the schema of one layer to be changed without affecting the schema of next
higher layer.

(Explain 3 levels BRIEFLY and draw the diagram)

The separation of 3 layers achieve 2 types of data independence:

1. Physical data independence


(explain briefly)
2. Logical data independence
(explain briefly)

Challenges facing 3 schema architecture

The main challenge is that there is no universal standard among database vendors ……..
(explain)
107

A2. The integrity of database transactions must be maintained in a highly concurrent


multi-user online transaction processing environment. Given the above context,
describe the techniques a DBMS uses to maintain database integrity. For guidance,
your answer must address, with the aid of relevant examples, the following five
topics: (25 marks)

a) What function is performed by a TYPICAL database transaction compared to a


program that reads/writes to a traditional file-based system?

A database trasnaction performs insert, update, delete and retrieve operations on


database tables. These operations are coded in SQL.

A database transaction must adhere to ACID principles.

Atomicity ……… (briefly explain)

Consistancy ……… (briefly explain)

Isolation …….. (briefly explain)

Durability ……… (briefly explain)

b) How database integrity is affected by concurrent transactions.

When concurrent transactions update the same data at the same time, problems like lost
update , dirty read, etc. could make the database inconsistent. Follwing example shows the
problem of lost update: Two concurrent transactions read a record to update it, and the first
one to write the record loses its update, when the second one completes (Second transaction
overwrites the update made by the first one).
108

Time Transaction A Transaction B Record1


(Disk)
0 Seats = 10
1 Read Rec1 [Seats=10] Seats = 10
2 Seats = Seats + 3 Seats = 10
3 Read Rec1[Seats=10] Seats = 10
4 Write Rec1, Commit Seats = 13
5 Seats=Seats+5 Seats = 13
6 Write Rec1, Commit Seats = 15

The correct No. of seats should be 18


c) Why database transactions have to be isolated to preserve database integrity.

Concurrent transactions must not interfere with each other. They should be isolated
from one another. Even though there will be many transactions running concurrently,
any given transaction’s updates should be concealed from other transactions until that
transaction commits. If not, the updates made by one trnsaction can get overwritten by
another (lost update) or changes made by one transaction may be read by another
before first one commits (dirty read). Isolation is enforced by concurrency control
methods such as locking. When a transaction locks a data unit, it cannot be accessed
by others until the first transaction completes.

d)What happens when transactions have to be aborted?

An aborted transaciton leaves the database in a partially updated, inconsistent state.


Therefore, its effect must be undone using ROLLBACK operation. The rollback operation
is caried out by applying before images of changed data items from the transaction log.
109

e) How the DBMS recovers transactions that are lost following system failure or
crashes.

Transaction log is needed to recover from system failure. Undo/rollback the


Transactions that failed halfway through using BEFORE IMAGES. Redo the transactions
that were committed after the last check point using AFTER IMAGES.

Both the transaction log and the most recent back up copy is needed to recover from a
disk crash. Copy the back up to a new disk and REDO all the transactions that have
occurred since the time of back up to the time of crash. This is done by applying AFTER
images of changed data items.

You might also like