DATABASE SYSTEMS More lecture notes
DATABASE SYSTEMS More lecture notes
Consider a saving bank enterprise that keeps information about all customers and
savings accounts in permanent system files at the bank.
The bank will need a number of applications e.g.
iii. Data isolation - Since data is scattered in various files and files may be in
different formats, it may be difficult to write new applications programs to
retrieve the appropriate data.
iv. Concurrent access anomalies - Interaction of concurrent updates may result
in inconsistent data e.g. if 2 customers withdraw funds say 50/= and 100/=
from an account at about the same time the result of the concurrent execution
may leave the account in an incorrect state.
v. Security problems - Not every user of the database system should be able to
access all the data. Since application programs are added to the system in an
ad-hoc manner, it is difficult to enforce security constraints.
vi. Integrity - The data value stored in the database must satisfy certain types of
consistency constraints e.g. a balance of a bank account may never fall below
a prescribed value e.g. 5,000/=. These constraints are enforced in a system by
adding appropriate code in the various application programs. However, when
new constraints are added there is need to change the other programs to
enforce.
Conclusion.
These difficulties among others have prompted the development of DBMS.
2
Unlike the file system with may separate and unrelated files, the Database consists of
logically related data store in a single data repository. The problems inherent in file
systems make using the database system very desirable and therefore, the database
represents a change in the way the end user data are stored accessed and arranged.
This is a database system that supports one user at a time such that if user A is using the
database, users B & C must wait until user A complete his or her database work.
If a single user database runs on a personal computer it’s called a desktop database.
This is a database that supports multiple users at the same time for relatively small
number e.g. 50 users in a department the database is referred to as a workgroup database.
While one, which supports many departments is called an enterprise database.
This is a database system that supports a database distributed across several different
sites.
This is a database system that supports immediate response transaction e.g. sale of a
product.
3
It also eliminates extra processing necessary to trace the required data in a large mass
of data. It also eliminates inconsistencies. Any redundancies that exist in the DBMS
are controlled and the system ensures that his multiple copies are consistent.
4. Integrity - Centralized control can also ensure that adequate checks are incorporated
to the DBMS provide data integrity. Data integrity means that the data contained in
the database is both accurate and consistent e.g. employee age must be between 28-25
years.
5. Security - Only authorized people must access confidential data. The DBA ensures
that proper access procedures are followed including proper authentication schemes
process that the DBMS and additional checks before permitting access to sensitive
data. Different levels of security can be implemented for various types of data or
operations.
7. Data Independence - It involves both logical and physical independence logical data
independence indicates that the conceptual schemes can be changed without affecting
the existing external schemes. Physical data independence indicates that the physical
storage structures/devices used for storing the data would be changed without
necessitating a change in the conceptual view or any of the external use.
2. Centralization Problems
4
5
File System Environment
-
________________________________________________________________________
______
Personnel
DATABASE
Department
Employees
Customers
Sales
Sales Department
Integrated DBMS Inventory
Accounts
System
Accounting
Department
The database eliminates most of the file systems' data inconsistencies, anomalies and
structural dependency problems.
The current generation of DBMS software stores not only the data structures in a central
location but also stores the relationships between the database components
The DBMS also takes care of defining all the required access paths of the required
component.
6
1.6 The Database System Environment
The term database system refers to an organisation of components that define and
regulate the collection storage, management and use of data within a database
environment.
The database system is composed of 5 major parts i.e.
a. Hardware d. Procedures
b. Software e. Data
c. People
7
Hardware
This identifies all the systems physical devices e.g. the composition peripherals, storage
devices etc.
Software
These are a collection of programs used by the computers within the database system.
i. O.S - manages all hardware components and makes it possible for all other
and software to run on the composition.
ii. The DBMS - manages the database within the database system e.g. Oracle,
DB2, Ms Access etc.
iii. Applications programs and utilities to access and manipulate data in the
DBMS.
People
These are all database systems users:-
1. Systems administrator - Oversees the database systems general operations.
2. Database administrator (DBA) - Manages the DBMS use and ensures that the
database is functioning properly. His functions include:
3. Database designers - These are the database architects who design the database
structure.
8
5. End users - These are the people who use the application programs to run the
organizations daily operations. They fall in the following classes:
i. Sophisticated users - These interact with the system without writing programs.
They form their requests in a database query language. They include all users
with technical skills such as programming and are able to formulate their own
commands to manipulate the database. Eg, DB administrators, DB designers,
System Administrators
ii. Specialized database applications that do not fit in the traditional data
processing framework e.g. CAD Systems, knowledge based & expect
systems.
iii. Application programmers: These interact with the system through the DML
& applications.
iv. Naive – Unsophisticated user who interact with the systems by invoking one
of the permanent application programs that have been written previously.
These are users who rely upon the application s/w and procedures when
querying the database. These user cannot design their own commands.
Procedures
These are instructions and rules that govern the design and use of the database
system.
They enforce standards by which business is conducted within the organisation an
with customers.
They also ensure that there is an organized way to monitor and audit both the data
that enter the database and the information that is generated through the use of such
data.
DATA
This covers the collection for facts stored in the database and since data is the raw
material from which information is generated the determination of what data is to be
stored into the database and how the data is to be organized is a vital part of the database
designer jobs.
9
2.0 DATABASE ARCHITECTURE AND ENVIRONMENT
Application 1 will contain values for the attributes employee Name and Employee.
Address and this record can be described in pseudo-code as
In a database environment, data can be stored in this application and their requirement be
integrated by whoever is responsible for centralized control (DBA).
The integrated version would appear as recorded containing attributes required by both
applications.
10
The views supported are derived from the conceptual record by using appropriate
mapping.
The application programs no longer require information about the storage structure;
storage device types or access methods. These are absorbed by the DBMS.
User 1
User 2
Conceptual View
Employee.Name:String
Employee.Soc_Sec_No:Integer DBA
Employee.Address:String
Employee.Annual_Sal:Double
Internal View
Name:String Length 25 Offset 0
Soc_Sec_No:Integer 9 Offset 25
Address: String Length 5 Offset 34
Salary: 9,2 dec Offset 39
The 3 level scheme architecture is called the ANSI/SPARC model (American National
Standard Institute/Standards Planning and Requirements Committee.)
It is divided into 3 levels:
External
Conceptual
Internal
11
The view of each level is described as a scheme, which is an outline or a plan that
describes the records and relations existing in the view. It also describes the way in
which entities at one level of abstraction can be mapped onto the next level.
External Level (External or User view)
This is at the highest level of database abstraction where only those portions of the
database of concern to the user or application programs are included.
Any number of user views may be possible, some of which may be identical.
Each external view is described by means of a scheme called external scheme, which
consists of a definition of the logical records and the relationships in the external view.
It also contains the method of devising the objects in the external view from the objects in
the conceptual view (entities, attributes and relationships).
Internal View
This is the lowest level of abstraction closest to the physical storage method used.
It indicated how data would be stored and describe the data structures and access methods
to be used by the database. The internal schema implements it.
12
Mapping between views
Two mappings are required, one between external and conceptual views and another
between the conceptual records to internal ones.
Data Independence
This is the immunity of users/application programs from changes in storage structure and
access mechanism.
The 3 levels of abstractions along with the mappings from internal to conceptual and
from conceptual to external provide 2 distinct levels of data independence i.e.:
Logical Data Independence
Physical Data Independence
This indicates that the conceptual schema can be changed without affecting the existing
external schema.
The mapping between the external and conceptual levels would absorb the change.
It also insulates application programs from operations such as combining two records into
one or splitting an existing record into 2 or more records. The LDI is achieved by
providing the external level or user view database.
The application programs or users see the database as described by the respective
external view.
DBMS provided a mapping from this view to the conceptual view.
NB: The view at conceptual level of the database is the sum total of the current and
anticipated views of the database.
This indicates that the physical storage structures or devices used for storing the data can
be changed without necessitating a change in the conceptual view or any of the external
view. Any change is absorbed by the mapping between the conceptual and internal
views.
13
2.3 Components Of The DBMS
A DBMS is software used to build, maintain and control database systems. It allows a
systematic approach to the storage and retrieval of data in a computer.
Most DBMS(s) have several major components, which include the following:
1. Data Definition Language (DDL) - These are commands used for creating and
altering the structure of the database.
The structures comprise of Field Names, Field sizes, Type of data for each field, File
organizational technique. The DDL commands are used to create new objects, alter
the structure of existing ones or completely remove objects from the system.
2. Data Manipulation language (DML) - This is the user language interface and is
used for executing and modifying the contents of the database. These commands
allow access and manipulation of data for output. They include commands for
adding, inserting, deleting, sorting, displaying, painting etc. These are the most
frequently used commands once the database has been created.
3. Data Control Language (DCL) - These are commands used to control access to the
database in response to DML commands. It acts as an interface between the DML
and the OS. It provides security and control to the data.
5. Form Generator - A form is a screen display version of a paper form, which can be
used for both input and output.
6. Menu Generator - This is used to generate different types of menus to suit user
requirements.
7. Report Generator - This is a tool that gives non- specialized users the capability of
providing reports from one or more files through easily constructed statements. The
reports may be produced either constructed statements. The reports may be produced
either on screen or paper. A report generator has the following features:
Page headings and footings
Page Numbering
Sorting
Combining data from several files
Column headings
Totaling and subtotaling
Grouping of data
14
Reports titling
8. Business Graphics - Some DBMS may provide means of generating graphical output
e.g. bar charts, pie charts scatter graphics line plots etc. others will allow users to
export data into graphics software.
A DBMS performs several functions that guarantee the integrity and consistency of the
data in the database. Most of these functions are transparent to end-users and can be
achieved only through the use of a DBMS. They include:
ii. Data Storage Management - Creation of complex structure required for data
storage is done by DBMS thus relieving us from the difficult task of defining
and programming the physical data characteristics. A modern DBMS system
provides storage for data and related data entry forms or screen definitions,
report definition, data validation rules, procedural code structures to handle
video and picture formats etc.
15
iii. Data Transformation and Presentation - Transformation of entered data to
conform the data structures that are required to store the data is done by the
DBMS relieving us the core issue of making a distinction between the data
logical formats and data physical format. By maintaining data independence
the DBMS translates logical requests it no commands that physically locate
and retrieve the requested data. That is the DBMS transform the physically
retrieved data to conform to the users logical expectations. This is by
providing application programs with software independence and data
abstraction.
iv. Security Management - The DBMS creates the systems security that enforces
users security and data privacy within the database. Security rules determine
which users can access database which data item each user can access and
which data operations (read, add, delete, modify) the user may perform. This
is important in multi user database system where many users can access the
database simultaneously.
v. Multi User Access Control - The database creates complex structures that
allow multi-user access to the structure. In order to provide data integrity and
consistency the DBMS users sophisticated algorithms to ensure that multiple
users can access the database con-currently and still guarantee integrity of the
database.
vi. Back-up and recovery management - To ensure data safety and integrity
current DBMS systems provide special utilities that allow the DBA to perform
routing and special backup and restore procedures. Recovery management
deals with recovery of the database after a failure such as a bad sector in the
disk, a power failure etc. Such capability is critical to the preservation of the
database integrity.
vii. Data integrity Management - The DBMS promotes and enforces integrity
rules to eliminate data integrity problems thus minimizing data redundancy
and maximizing data consistency. The relationships stored in the Data
Dictionary are used to enforce data integrity. Data integrity is especially
important in transaction oriented database systems.
16
ix. Database Communication interfaces - Current generation of DBMS's provide
special communication routines designed to allow the database to accept end-
use r requests within a computer network environment. The DBMS may
provide communication functions to access the database through the internet
using internet browsers e.g. Netscape or Explorer as the front-ends
A database system is partitioned into modules that deal with each of the responsibilities
of the overall systems. The design of the database system must include consideration of
the interface between the database system and the O.S. The functional components of a
database system include:
File Manager
Data base manager
Query processor
DML pre-compiler
DDL compiler
File Manager
This manages the allocation of space in the disk storage and the data structures used to
represent information stored on the disk. It deals more on the physical aspects.
Database Manager
Provides the interface between the low level data stored in the database and the
application and programs the queries submitted to the system.
Query Processor
This translates statements in a query language into low-level instruction that the DB
manager understands. In addition the query processor attempts to transform a user
request into more efficient statement, thus finding a good strategy for executing the
query.
DML Pre-compiler
This converts the DML statements embedded in an application program to normal
procedure calls in the language. The pre-compiler must interact with the query processor
order generate the appropriate code.
DDL Compiler - This converts DDL statements to a set of table containing metadata.
17
Major Components Of Dbms
Programmers Users DBA
Query DDL
DML processor compiler
Pre-processor
System
buffers
Database &
System catalog
18
2. Database Design
3. Implementation
5. Operation
Once the database has passed the evaluation stage it is considered to be operational, the
database, its management, its users and its application programs constitute a complete I.S.
The beginning of the operational phase starts the process of system evaluation.
19
3.0 CONCEPTUAL DATA MODEL
A database model is a collection of logical constructs used to represent the data structure
and relationships found within the database.
These are models used in describing data at the conceptual and view levels. They are
used to specify the overall logical structure of the database and to provide a higher-level
description implementation. It is hard to understand.
These are models that are used to describe data at the lowest level. They are very few in
number and the two widely known ones are:
i. Unifying model
ii. Frame memory model
NB: Like the E-R model, the object-oriented model is based on a collection of object
where an object contains values stored in instance variables with the object.
20
3.2.1 E-R Model Basic Concepts
The model employs the following components:
Entity sets
Relationship sets
Attributes
1. Entity sets
An entity is a thing or object in the real world that is distinguishable from all other
objects. It may be concrete e.g. a person or a book or it may be abstract e.g. a loan,
holiday a concept etc. An entity set is a set of entities of the same type that share the
same properties or attitudes e.g. a set of all persons who are customers of a bank.
2. Relationship sets
An association between two or more entities is called a relationship.
3. Attributes
They are descriptive properties or characteristics possessed by each member of an entity
set.
2. Single valued and Multi valued Attribute - The social security number or ID number
can only have a single value at any instance and therefore its said to be single valued.
An attribute like dependant name can take several values ranging from o-n thus it is
said to be multi valued.
3. Null Attributes - A null value is used when an entity does not have a value for an
attribute e.g. dependent name.
4. Calculated attribute - The value for this type of attribute can be derived from the
values of other related attributes or entities e.g.
i. Employment length value can be derived from the value for the start date and
the current date.
ii. Loans held can be a count of the number of loans a customer has.
21
3.2.3 Relationship Sets
A relationship is an association amongst several entities while a relationship set is a set of
relationships of the same tuple. It is a mathematical relation on n>2 possible non-distinct
entity sets e.g. consider 2 entity sets, loan and branch. A relationship set loan, branch can
be defined to denote association between a bank loan and the branch in which that loan is
obtained.
Example
Consider 2 entity sets Customer and loan.
A relationship set - A borrower can be defined to denote the association between
customers and the bank loans that the customers have.
Types Of Relationships
i. One to one relationship (1:1) - An entity in A is associated with utmost one entity in
B is associated with at utmost one entity in A.
a1 b1
a2 b2
a3 b3
a4 b4
ii. One to Many relationship (1:M) - An entity in A is associated with any number of
entities in B while an entity in B can be associated with at most one entity in A .
a1 b1
a2 b2
a3 b3
a4 b4
a5 b5
22
iii. Many to one relationship (M:1) - An entity in A is associated with at most one entity
in B and an entity in B can be associated with a number of entities in A.
a1 b1
a2 b2
a3 b3
a4 b4
a5 b5
iv. Many to many (M:N) - An entity in A is associated with at least one entity in B and
an entity in B can be associated with a number of entities in A.
a1 b1
a2 b2
a3 b3
a4 b4
Existence Dependencies
Exercise.
Differentiate between super key, primary candidates and candidate keys.
23
The three main components of an ERD are:
The entity is a person, object, place or event for which data is collected. For
example, if you consider the information system for a business, entities would
include not only customers, but the customer's address, and orders as well. The
entity is represented by a rectangle and labelled with a singular noun.
The relationship is the interaction between the entities. In the example above, the
customer places an order, so the word "places" defines the relationship between
that instance of a customer and the order or orders that they place. A relationship
may be represented by a diamond shape, or more simply, by the line connecting
the entities. In either case, verbs are used to label the relationships.
The cardinality defines the relationship between the entities in terms of numbers.
An entity may be optional: for example, a sales rep could have no customers or
could have one or many customers; or mandatory: for example, there must be at
least one product listed in an order. There are several different types of cardinality
notation; crow's foot notation, used here, is a common one. In crow's foot
notation, a single bar indicates one, a double bar indicates one and only one (for
example, a single instance of a product can only be stored in one warehouse), a
circle indicates zero, and a crow's foot indicates many. The three main cardinal
relationships are: one-to-one, expressed as 1:1; one-to-many, expressed as 1:M;
and many-to-many, expressed as M:N.
Exercise.
Draw an E-R diagram that shows the hospital environment, theatres, patients (in and out-
patients) doctors, nurses, wards and ward beds.
24
Specialisation
An entity set may include sub-groupings of entities that are distinct in some way from
other entities in the set. This is called specialization of the entity set e.g. the entity bank
account could have different types e.g.
Credit account
Checking account
Savings account - interest rate
Checking account - overdraft amount
For the standard if may be divided by number count of checks gold minimum balance
and an interest payment.
Senior checking account - age limit
A specialised entity set may be specialised by one or more distinguishing features.
Aggregation
This is abstraction through which relationship are heated as higher-level entities e.g. the
relationship set borrower and the entity sets customer and loan can be treated as a higher
set called borrower as a whole.
25
3.4 Entity modeling (Diagrammatic representation) relationships
Student Payment
Lecturer Student
NB: Whenever the degree of a relationship is many to many we must decompose the
relationship to one-to -one or one-to-many. The decomposition process will create a new
entity.
Exercise
A company consists of a number of departments each having a number of employees
each department has a manager who must be on a monthly payroll, other employees are
either on a monthly or weekly payroll and are members of the sports club if they so wish.
Construct an entity - relationship diagram depicting the scenario.
Optional
Family Child
Mandatory
Course Student
26
Representing Attributes
Although E-R diagrams describe many of the features of the logical model, they do not
show the attributes associated with each entity, this additional information is represented
conveniently in form of a table.
Exercise
Consider the entity relationship Student_Course that defines a course undertaken by
many students.
Generate a sample tabular representation of the above assuming key attributes are course-
code and stud-no respectively.
A hospital wishes to maintain a database to assist the administration of its wards and
operating theatres, and to maintain information relating to its patients, surgeons and
nurses.
Only one surgeon may perform an operation, any other surgeons present being
considered as assisting at the operation. Surgeons come under the direction of senior
surgeons, called consultants, who may also perform or assist at operations. Information
recorded about a surgeon includes name, address and phone number.
An operation can be performed in only one theatre but a given theatre may be the location
of many operations.
A nurse may or may not be assigned to a theatre and he/she cannot be assigned to more
than one theatre. A theatre may have many nurses assigned to it.
Required.
Design and develop a database system for the above application. This should include:
27
Integrity and security features.
28
3.5 DATA NORMALIZATION
Normalisation is the process of applying a number of rules to the tables, which have been
identified in order to simplify. The aim is to highlight dependencies between the various
data items so that we can reduce these dependencies.
1NF
A table or relation is said to be in first normal form, if and only if it contains no repeating
groups i.e. it has no repeated values for particular attributes in a simple record. If there
are repeating groups and attributes they should be isolated to form a new entity.
2NF
A table is said to be in 2NF if and only if it is in 1NF and every non-key attribute is fully
dependent on the key attribute. Attributes not fully dependent should be isolated to form
a new entity.
3NF
A table is said to be in 3NF if and only if it is 2nd NF and every non-key attribute is not
dependent on any other non-key attribute. All non-key attributes that are dependent on
other non-key attributes, should be isolated to form a new entity
Example: An invoice
Address ___________________________
___________________________
29
Un-normalised data.
Invoice (Invoice no., Date, Customer, Cust_address, Deliv_To,Product code, Quantity,
Unit Price, amount, Invoice amount)
2NF (identity and separate non-key attributes not fully dependent on key attribute)
Corresponding ERD
Invoice Product
Invoice
Product
Customer
30
3. The 3NF produces well-designed databases, which provide a high degree of
independence.
3.5.2 Disadvantages
Exercise
A customer account details in a bank are stored in a table that has the following
structure, normalise this data to 3NF. Customer (branch -no, account no, address,
postcode, tel)
A hospital drug dispensing record requires that, for each patient, the pharmacy must
record the
following information.
Total …………………….
Paid ……………………..
Balance …………………
31
(a) Explain what you understand by data normalization stating each of the
three normal forms.
(b) Perform data normalization for the table to 3NF. Showing clearly the
results of each stage.
32
4.0 RELATIONAL DATABASE SYSTEM
Motivation
1. To shield programmers and users from the structural complexities of the database.
2. For conceptual simplicity
Properties of Relations
1. There is no duplicate tuples – The body of a relation is a mathematical set, which by
definition does not include duplicate elements.
2. Tuples are unordered - Sets are unordered
3. Attributes are unordered - The heading of a relation is a set that is unordered.
4. All simple attributes values are atomic meaning that relations do not contain repeating
groups (normalized)
Primary Keys
These are special type of more general construct candidate keys. A candidate key is a
unique identifier and each relation has at least one candidate key. For a given relation,
one of the candidate keys is chosen to be the primary key and the rest are called alternate
keys.
Let r be a relation with attributes a1, a2, an. The set of attributes K= (Ai, Ai .........AK) of R
is said to be a candidate key of R. If it satisfies the following 2 time independent
properties:
i. Uniqueness - At any given time, no 2 distinct types of R have the same values
for Ai, Aj ----------AK.
ii. Minimality - None of Ai, Aj -------- Ak can be discarded from K without
destroying the uniqueness property.
33
4.2 Relational Database Language
Components of SQL
i. Data Definition Language (DDL) - DDL provides commands for defining relation
schemes, deletion relation, creating indices and modifying relation schemers
ii. Interactive Data Manipulation Language (DML) - DML includes a query language
based on both relational calculus. It includes commands to insert tuples into, delete
tuples from and modify tuples in the database.
iii. Embedded DML - This is designed for use within general purpose programming
languages such as PL/1. Cobol, Pascal, Fortran and C.
iv. View Definition - The SQL DDL includes commands for specifying access rights to
relations and view.
v. Integrity - The SQL DDL includes commands for specifying integrity constraints that
the data stored in the database must satisfy. Updates that violate integrity constraints
as disallowed.
vi. Transaction Control - SQL includes commands for specifying the beginning and
ending of transactions. Several implementations also allow explicit locking of data
for concurrency control.
SELECT
This corresponds o a projection operation of the relational algebra. Its used to list the
attributed desired in the result of a query.
FROM
This corresponds to a Cartesian product operation of the relational algebra. It lists the
relations to be scanned in the evaluation of the expression
WHERE
Corresponds to the predicate of the relational algebra. It consist of a predicate involving
attributes of the relations that appear in the FROM clause.
34
A typical SQL query will be of the form:
SELECT
A1,A2, A3, ................An
FROM
R1, R2, R3, .....................Rn
WHERE
P
Ai represents an attribute; each r a relation and P is a predicate.
Select clause
Examples (i) SELECT Branch name
FROM Loan
STUDENT COURSE
Code Stud.id Name Code Title
IMIS 001 Charles IMIS Info. Systems
BIT 002 Mary BIT Bachelor of IT
BIT 003 Maina CIT Cert in IT
CIT 004 Judy DIT Dip in IT
The select clause can also contain arithmetical expressions involving operations +, -, *,
and operating on constants or attributes of tables e.g.
SELECT Branch_name, Loan_number, Amount*100
FROM loan
Where Clause
Specifies a condition that has to be met. SQL uses the logical connectives AND, OR and
NOT in the where clause. It also uses operands of logical connectives <, < =, >, >=, =
and < >. It also includes a BETWEEN operations e.g.
(i) Select loan_number
From loan
(ii) Select loan_number
From loan
Where branch_name = "River Road" and Amount Between 10,000 And 15,000.
35
From Clause
This specifies the source (relations), which is a Cartesian product. The SQL uses the
notion relation-name. Attribute-name to avoid ambiguity in case where an attribute
appears in the schemer of more that one relation e.g.
Example
Select Customer_name, borrower. loan number
From borrower, loan
Where borrower.loan_number = loan.loan_number
AND branch_name= "Moi Avenue"
This will return the name of the customer the loan-number is the customer loan no.
appears in Moi Avenue.
SQL provides a mechanism for renaming both relations and attributes by use of the As
clause it is of the form
Old_name AS New_name. e.g.
By default the order by clause lists items in ascending order. To specify the sort order use
'desc' for descending order or ‘asc’ for ascending e.g.
Select *
From loan
Order by amount desc, loan-number desc
36
4.2.2 Aggregate Functions
These are functions that take a collection (set or multi-set) of values as input and return a
single value. These are
Average: Avg
Minimum: Min
Maximum: Max
Total: Sum
Count: Count
The input to sum and average must be a collection of numbers but the other operators can
operate on collection of non-numeric data-types e.g. strings
Example
(i) SELECT Branch name, Avg(balance)
FROM Account
GROUP BY Branch -name
Null Values
Null values indicate absence of information about the value of an attribute. e.g.
SELECT loan-number
FROM loan
WHERE Amount is Null
37
Query to find the names of all branches that have assets greater than at least one branch
located in Brooklyn would be.
Examples
(i) “Mary %” matches any string beginning with “Mary”
(ii) “%ry” Matches any string containing “ry” as a sub-string e.g. very, mary, ary etc.
(iii) “- - -“ Matches any string of exactly three characters.
(iv) “- - -%” Matches any string of at least 3 characters.
The query to find customer names for all customers whose addresses include the sub-
string “main” would be:-
SELECT Customer-name
FROM Customer
WHERE Customer -street LIKE “%main %”
For patterns to include special pattern characters (i.e. % and _) SQL allows the
specification of an escape character. The escape character is placed immediately before a
special pattern character to indicate the special pattern. Character is to be treated like a
normal character. The key work ESCAPE is used.
Examples.
LIKE “ab\%cd%”ESCAPE “\” - matches all strings beginning with “ab%cd”
LIKE “ab\\cd%” ESCAPE”\” - matches all strings beginning with “ab\cd”
Mismatches.
SQL allows the search for mismatches using the NOT LIKE comparison operator Set
Operations.
38
4.2.5 SQL and Set
SQL operations Union, Intersect and Except operate on relations and correspond to the
relational operations , and -,
(i) Union
To indicate duplicates
To find customers who have both a loan and an account at the bank
To find customers who have an account but no loan at the bank we write
39
Example
SELECT Loan_number
FROM Loan
WHERE Amount is NULL
To test for the absence of a null value we use the predicate “IS NOT NULL”
4.4.6 VIEWS
Example
CREATE VIEW Customer AS
(SELECT Branch_name, Customer_name
FROM Depositor.account)
WHERE Depositor.Account_number, Account.account_number
NB: A create view clause creates a view definition in the database which stays there until
a command DROP View (view name) is executed.
(i) Deletion
DELETE FROM r
WHERE P
P represents the predicate, r represent the relation.
The statement first finds all tuples t in r which P(t) is true & then deletes them from r
Where clause can be omitted in which case all tuples in P are deleted.
40
Example
DELETE FROM Loan
- Deletes all tuples from the loan relation.
(ii) Insertion
To insert data into a relation:-
Specify a tuple to be inserted or
Write a query whose result is a set of tuples to be inserted
Tuples to be inserted must be in the correct arity.
Example
(iii) Updates
To change a value in a tuple without changing all values the UPDATE statement can be
used.
Examples
(i) UPDATE Account
SET Balance = Balance * 1.05
(ii) UPDATE Account
SET Balance = Balance *1.06
WHERE balance >10,000
41
Update Of A View
A modification is permitted through a view only if the view in question is defined in
terms of one relation of the actual relational database i.e. of a logical level db
Example
CREATE VIEW Branch_loan AS
SELECT Branch_name, loan_number
FROM loan
INSERT INTO Branch_loan
VALUES (“Moi Avenue”, “Accoo8”)
Syntax
CREATE TABLE r(A1D1, A2D2, -----, AnDn,
[Integrity Constraints],
…………
………...
………...
[Integrity - constraints]
Examples
(i) CREATE TABLE Customer
(Customer_name CHAR(20) NOT NULL,
Customer_street CHAR(30),
Customer_city CHAR(30),
PRIMARY KEY (customer_name))
42
5.0 TRANSACTIONS MANAGEMENT AND CONCURRENCY CONTROL
(i) Atomicity
(ii) Consistency
Transaction execute independently of one another i.e. even though multiple transactions
may execute concurrently the system quantities that for ever pair of transactions Ti and Tj
it appears to Ti that either Tj finished execution after Ti started or Tj started execution
after Ti finished each transactions is unaware of other transactions executing concurrently
in the system.
43
(i) Lost Update Problem
Another user can override an apparently successfully completed update operation by one
user.
t2 Fetch R
Update R t3
t4 Update R
Violations of integrity constraints governing the database can arise when 2 transactions
are allowed to execute concurrently without being synchronized.
Consider.
Transaction A Time Transaction B
__ t1 Fetch R
__ t2 Update R
Fetch R t3 __
t4 Roll back
Transactions that only read the database can obtain the wrong result if they're allowed to
read partial result or incomplete transactions, which has simultaneously updated the
database. Consider 2 transactions A & B operating on an account records. Transaction A
44
is summing account balances while transaction B is transferring amount 10 from account
3 to account 1.
A transaction consists of a sequence of reads and writes of database. The entire sequence
of reads and writes by all concurrent transactions in a database taken together is known as
schedule. The order of interleaving of operations from different transactions is crucial to
maintaining consistency of the database.
A serial schedule is the way in which all the reads and writes of each transaction are run
sequentially one after another.
A schedule is said to be serialised if all reads and writes of each transaction can be re
ordered in such a way that when they are grouped together as in a serial schedule, they
net affect of executing this re-organised schedule is the same as that of the original
schedule.
45
(i) Locking method
A lock guarantees exclusive use of data item to a current transaction. Transaction T1 does
not have access to a data item that is currently used by transaction T2. A transaction
acquires a lock prior to data access. The lock is released (Unlock) when the transaction is
completed so that another transaction can lock the data item for its exclusive use.
Shared Locks
These are used during read operations since read operations cannot conflict. More than
one transaction is permitted to hold read locks simultaneously of the same data item.
2-Phase locking
To ensure serialisability the 2- phase locking protocol defines how transaction acquire
and relinquish locks. 2-phase locking guarantees serialisability but it does not prevent
deadlocks. The 2-phases are:
(a) Growing phase in which a transaction acquires all the required locks without
unlocking any data. Once all the locks have been acquired the transaction is
in its locked point.
(b) Stinking phase in which a transaction releases all locks and cannot obtain any
new lock.
Deadlocks
46
Techniques To Control Deadlocks
1. Deadlock Prevention
A transaction requesting a new lock is aborted if there is a possibility that a dead lock can
occur. If the transaction is aborted, all the changes made by this transaction are rolled
back and all locks obtained by the transaction are released. The transaction is then
rescheduled for execution. Deadlock prevention works because it avoids the conditions
that lead to deadlock.
2. Deadlock Detection
The DBMS periodically tests the database for deadlocks. If the deadlock is found one of
the transactions (the "victim”) is aborted (rolled back and restarted) and the other
transaction continues.
3. Deadlock Avoidance
The transaction must obtain all the locks it needs before it can be executed. This
technique avoids rolled up of conflicting transactions by requiring that locks be obtained
in successions, but the serial lock assignment increase action response times.
Conclusion:
The best deadlock control method depends on the database environment, if the
probability is low, deadlock detection is recommended, if probability is high, deadlock
prevention is recommended and if response time is not high on the system priority list
deadlock avoidance might be employed.
All database operations read and write within the same transaction must have the same
time stamp. The DBMS executes conflicting operations in the time stamp order thereby
ensuring serialisability of the transactions.
If 2 transactions conflict, one is often stopped, re-scheduled and assigned a new time
stamp value. The main draw back of time stamping approach is that each value stored in
the database requires 2 additional time- stamp fields, one for the last time the field was
read and one for the last update. Time stamping thus increases the memory needs and the
databases.
47
The optimistic approach is based on the assumption that the majority of database
operations do not conflict. The optimistic approach does not require locking or time
stamping techniques; instead a transaction is executed without restrictions until it is
committed. In this approach, each transaction moves through 2 or 3 phases; read,
validation and write phase.
Read Phase
The transaction reads the database, executes the needed computations and makes the
updates to private copy of the database values. All update operations of the transaction
are recorded in a temporary update file, which is not accessed by the remaining
transactions.
Validation Phase
The transaction is validated to ensure that the changes made will not affect the integrity
and consistency of the database. If a validation phase is negative, the transaction is
restarted and the changes are discarded.
Write Phase
The changes are permanently applied (written) to the database.
Conclusion
The optimistic approach is acceptable for mostly read or query database system that
require very few update transactions.
48
49