CIT843
CIT843
1
Basic Concepts in DBMS
COURSE GUIDE
Course Developer/Writer
Course Editor
Programme Leader
Course Coordinator
2
Basic Concepts in DBMS
CONTENTS PAGE
3
Basic Concepts in DBMS
Introduction
Let me welcome you to this course titled CIT 843 - Database Management System. I
promise enjoyment and reward as you study Database Management System.
Database Management System is a two credit uint course available to all students
offering Masters of Science (M.Sc.) Computer and Information Technology (CIT).
A Database Management System (DBMS) is a set of software programs that controls the
organization, storage, management, and retrieval of data in a database. It is a set of pre-
written programs that are used to store, update and retrieve a Database. The DBMS
accepts requests for data from the application program and instructs the operating system
to transfer the appropriate data. When a DBMS is used, information systems can be
changed much more easily as the organization's information requirements change. New
categories of data can be added to the database without disruption to the existing system.
Organizations may use one kind of DBMS for daily transaction processing and then
move the detail onto another computer that uses another DBMS better suited for random
inquiries and analysis. Overall systems design decisions are performed by data
administrators and systems analysts. Detailed database design is performed by database
administrators.
Course Aims
The aim of the course is not complex. The course will allow you to develop background
knowledge as well as core expertise in Database Management Systems.
Course Objectives
To achieve the aims set out, the course has a set of objectives. Each unit has specific
objectives which are included at the beginning of unit. You should read these objectives
before you study the unit. You may wish to refer to them during your study to check on
your progress. You should always look at the unit objectives after completion of each
unit. By doing so, you would have followed the instructions in the unit.
Below are the comprehensive objectives of the course as a whole. By meeting these
objectives, you should have achieved the aims of the course as a whole. In addition to the
aims above, this course sets out some objectives. Thus, after going through the course,
you should be able to:
4
Basic Concepts in DBMS
To complete this course you are required to read each study unit, read the textbooks and
read other materials which may be provided by the National Open University of Nigeria.
Each unit contains self-assessment exercises and at certain points in the course you would
be required to submit assignments for assessment purposes. At the end of the course there
is a final examination. The course should take you about a total 15 weeks to complete.
Below you will find listed all the components of the course, what you have to do and how
you should allocate your time to each unit in order to complete the course on time and
successfully.
This course entails that you spend a lot of time to read. I would advice that you avail
yourself the opportunity of attending the tutorial sessions where you have the opportunity
of comparing your knowledge with that of other people.
This course consists of units and course guide. This course guide tells you briefly what
the course is all about, what course materials you will be using and how you can work
with these materials. In addition, it advocates some general guidelines for the amount of
time you are likely to spend on each unit of the course in order to complete it
successfully.
It gives you advice in terms of your Tutor-marked Assignment which will be made
available in the assignment file. There will be regular tutorial classes that are related to
the course. It is advisable for you to attend these tutorial sessions. The course will prepare
you for the challenges you will meet in the database management systems.
Study Units
5
Basic Concepts in DBMS
Brainbell.com (2008). Microsoft Access Tutorial. Retrieved June 20th, 2008, from
http://www.brainbell.com/tutorials/ms-office/Access_2003/
Bcschool.net (2003-2006). Create Database Applications using Microsoft Access,
Retrieved June 20th, 2008, from http://www.bcshool.net/staff/accesshelp.htm
cisnet.baruch.cuny.edu (2008). Microsoft Access Tutorial. Retrieved January 15th,
2008, from http://cisnet.baruch.cuny.edu/holowczak/classes/2200/access/accessall.html
databasedev.co.uk (2003-2006). Data Redundancy Defined - Relational Database
Design. In, Database Solutions for Microsoft Access, Retrieved October 10th, 2006,
from: http://www.databasedev.co.uk/data-redundancy.html
David M. Kroenke, David J. Auer (2008). Database Concepts. New Jersey . Prentice
Hall
Elmasri Navathe (2003). Fundamentals of Database Systems. England. Addison
Wesley.
Fred R. McFadden, Jeffrey A. Hoffer (1994). Modern Database management. England.
Addison Wesley Longman
Graeme C. Simsion, Graham C. Witt (2004). Data Modeling Essentials. San Francisco.
Morgan Kaufmann
Microsoft.com (2009). Microsoft Access help file. Retrieved March 15th, 2009 from
http://microsoft.com/office/access/default.htm.
Microsoft.com (2009). Microsoft Access Tutorial: Retrieved March 15th, 2009 from
http://www.bcshool.net/staff/accesshelp.htm
Pratt Adamski, Philip J. Pratt (2007). Concepts of Database Management. United
States. Course Technology.
6
Basic Concepts in DBMS
Presentation Schedule
Your course materials have important dates for the early and timely completion and
submission of your TMA and attending tutorials. You should remember that you are
required to submit all your assignments by the stipulated time and date. You should guide
against falling behind in your work.
Assessment
There are three aspects to the assessment of the course. First is made up of self-
assessment exercises, second consists of the tutor-marked assignments and third is the
written examination/end of course examination.
You are advised to do the exercises. In tackling the assignments, you are expected to
apply information, knowledge and techniques you gathered during the course. The
assignment must be submitted to your facilitator for formal assessment in accordance
with the deadlines stated in the presentation schedule and the assignment file. The work
you submit to your tutor for assessment will count for 30% of your total course work. At
the end of the course you will need to sit for a final or end of course examination of about
three hour duration. This examination will count for 70% of your total course mark.
Assessment File
Assessment file for this course will be made available to you. In this file, you will find
details of work that you must submit to your tutor for marking. The marks you obtain in
the continuous assessment will count towards your final marks. You are expected to pass
both the continuous assessment and the final examination.
Tutor-Marked assignment
The TMA is a continuous assessment component of your course. It accounts for 30% of
the total score. You will be given four (4) TMAs to answer. The three of these must be
answered before you are allowed to sit for the end of course examination. The TMAs
would be given to you by your facilitator and returned after you have done the
assignment. Assignment questions for the units in this course are contained in the
assignment file. You will be able to complete your assignment from the information and
materials contained in your reading, references and study units. However, it is desirable
in all degree level of education to demonstrate that you have read and researched more
into your references, which will give you a wider view point and may provide you with a
deeper understanding of the subject.
Make sure that each assignment reaches your facilitator on or before the deadline given in
the presentation schedule and assignment file. If for any reason you can not complete
your work on time, contact your facilitator before the assignment is due to discuss the
7
Basic Concepts in DBMS
possibility of an extension. Extension will not be granted after the due date unless there
are exceptional circumstances.
The end of course examination for Introduction to database management Systems will be
for about three hours and it has a value of 70% of the total course work. The examination
will consists of questions, which will reflect the type of self-testing, practice exercise and
tutor-marked assignment problems you have previously encountered. All areas of the
course will be assessed.
Kindly use the time between finishing the last unit and sitting for the examination to
revise the whole course. You might find it useful to review your self-test, TMA,s and
comments on them before the examination. The end of course examination covers
information from all parts of the course.
Assignment Marks
Assignment 1 – 4 Four assignments, best three marks of the
four count at 10% each – 30% of the course
marks.
End of course examination 70% of overall course marks.
Total 100% of course materials.
Course Overview
The first module unit focuses on the meaning, concepts and advantages of database
management system. Module two deals with architecture of the database management
system, relational database integrity, transaction and concurrency management,
redundancy and associated problems. The third module introduces you to Microsoft
Access as an example of database management systems.
Although you will be required to study the units on your own, arrangements have been
made for regular interactions with your tutor at the study center. The tutor is expected to
conduct tutorials and useful discussion sessions with you and the other members at the
study center. Please be available at each tutorial session and participate actively.
There are 16 hours of tutorials provided in support of this course. You will be notified of
the dates, times and location of these tutorials as well as the name and phone number of
your facilitator, as soon as you are allocated a tutorial group.
8
Basic Concepts in DBMS
Your facilitator will mark and comment on your assignments, keep a close watch on your
progress and any difficulty you might face and provide assistance to you during the
course. You are expected to mail your Tutor Marked Assignments to your facilitator
before the schedule date (at least two working days are required). They will be marked by
your tutor and returned to you as soon as possible.
Do not delay to contact your facilitator by telephone or e-mail if you need assistance.
The following might be circumstances in which you would find assistance necessary,
hence you would have to contact your facilitator if:
• You do not understand any part of the study or the assigned readings.
• You have difficulty with the self-tests
• You have a question or problem with an assignment or with the grading of an
assignment.
You should endeavour to attend the tutorials. This is the only chance to have face to face
contact with your course facilitator and to ask questions which are answered instantly.
You can raise any problem encountered in the course of your study.
To gain much benefit from the course tutorials prepare a question list before attending
them. You will learn a lot from participating actively in discussions.
Summary
You will be required to design your own simple information retrieval system for a given
application. This will give you thorough exposure to a multitude of DBMS tasks, such as
database creation, maintenance, query processing etc. At this stage you would summarize
your experiences with the knowledge gained. You would also be made to provide
feedback to the course instructor about your view of the pros and cons of a DBMS from
your perspective, and about how the course enhanced your sphere of knowledge, and how
the course can be improved even further. This would serve the purpose of the course
instructor learning from the learners about the application side of things and also about
better structuring of courses.
I wish you success in the course and I hope that you will find it both interesting and
useful. Thank you.
9
Basic Concepts in DBMS
10
Basic Concepts in DBMS
Course Code CIT 843
Course Developer/Writer
Course Editor
Programme Leader
Course Coordinator
12
Basic Concepts in DBMS
Page
1.0 Introduction 2
2.0 Objectives 2
3.0 What is Database? 2
3.1 Database Management System (DBMS) 3
3.2 Advantages of DBMS 3
3.3 Example Database 4
3.4 Brief History of Database 4
3.5 Contents of a Database 6
3.5.1 User Data 6
3.5.2 Metadata 7
3.5.3 Indexes 8
3.6 Data Modeling and Database Design 8
3.6.1 Database Development Process 9
3.6.2 Designing a Database – A Brief Example 9
4.0 Conclusion 11
5.0 Summary 11
6.0 Tutor Marked Assignment 12
7.0 Further Reading and Other Resources 12
13
Basic Concepts in DBMS
1.0 Introduction
Data Management is one of the areas of Computer Science that has applications in almost
every field. In this unit, we shall examine some basic terms in database management
system.
2.0 Objectives
By the end of this unit, you should be able to:
a. Define database
b. Know why you need database management system
c. Know the advantages of using database management system
15
Basic Concepts in DBMS
a. The Database;
b. The DBMS; and
c. Application Programs (what users interact with)
16
Basic Concepts in DBMS
15
Mrs.
124 Awolowo Lagos LA 0003 1000
James
Ave.
43 Gwagwa
125 Mr. Ade AB 0004 6000
Ln. Maitama
43 Gwagwa
125 Mr. Ade Maitama AB 0005 9000
Ln.
Mr. &
127 Mrs. 61 Zik Rd. Garki AB 0006 500
Bayo
Mr. &
127 Mrs. 61 Zik Rd. Garki AB 0007 800
Bayo
Activity A
Use table 1 to answer the following questions
17
Basic Concepts in DBMS
b. 1968 File-Based:
i. Predecessor of database, data maintained in a flat file.
ii. Processing characteristics determined by common use of magnetic tape
medium.
iii. Data are stored in files with interface between programs and files.
iv. Mapping happens between logical files and physical file, one file
corresponds to one or several programs.
v. Various access methods exits, e.g., sequential, indexed, random.
vi. Requires extensive programming in third-generation language such as
COBOL, BASIC.
vii. Limitations:
1. Separation and isolation: Each program maintains its own set of data,
users of one program may not aware of holding or blocking by other
programs.
2. Duplication: Same data is held by different programs, thus, wastes
space and resources.
3. High maintenance costs such as ensuing data consistency and
controlling access
4. Sharing granularity is very coarse
5. Weak security
d. In 1970, Ted Codd at IBM’s San Jose Lab proposed relational models. Two major
projects started and both were operational in late 1970s. INGRES at University of
California, Berkeley became commercial and followed up POSTGRES which was
incorporated into Informix. System R at IBM san Jose Lab, later evolved into
DB2, which became one of the first DBMS product based on the relational model.
(Oracle produced a similar product just prior to DB2.)
e. 1976: Peter Chen defined the Entity-relationship(ER) model
18
Basic Concepts in DBMS
a. User Data
b. Metadata
c. Indexes
d. Application metadata
19
Basic Concepts in DBMS
vi. The customers table has 4 records and 5 columns. The Accounts table has 7
records and 3 columns.
vii. Note relationship between the two tables - CustomerID column.
viii. How should we split data into the tables? What are the relationships between the
tables?
These are questions that are answered by Database Modeling and Database
Design. We shall consider Database modeling in unit 2.
3.5.2 Metadata
Recall that a database is self describing, therefore, Metadata can be described as:
Have a look at the Database Documentor feature of MS Access (under the tools
menu, choose Analyze and then Documentor). This tool queries the system tables
to give all kinds of Metadata for tables, etc. in an MS Access database.
3.5.3 Indexes
In keeping with our desire to provide users with several different views of data, indexes
provide an alternate means of accessing, sorting and searching data.
20
Basic Concepts in DBMS
An index for our new banking example might include the account numbers in a sorted
order.
Indexes allow the database to access a record without having to search through the entire
table.
Updating data requires an extra step: The index must also be updated.
Example: Look at the Documentor tool in MS Access. It can also show metadata for
Queries, Forms, Reports, etc.
c. Data Model:
i. A set of primitives for defining the structure of a database.
ii. A set of operations for specifying retrieval and updates on a database
iii. Examples: Relational, Hierarchical, Networked, Object-Oriented
21
Basic Concepts in DBMS
The following are brief outline describing the database development process.
a. User needs assessment and requirements gathering: Determine what the users
are looking for, what functions should be supported, how the system should
behave.
b. Data Modeling: Based on user requirements, form a logical model of the system.
This logical model is then converted to a physical data model (tables, columns,
relationships, etc.) that will be implemented.
c. Implementation: Based on the data model, a database can be created.
Applications are then written to perform the required functions.
d. Testing: The system is tested using real data.
e. Deployment: The system is deployed to users. Maintenance of the system begins.
For our Bank example, lets assume that the managers are interested in creating a database
to track their customers and accounts.
a. Tables
CUSTOMERS
CustomerId, Name, Street, City, State, Zip
ACCOUNTS
CustomerId, AccountNumber, AccountType, DateOpened, Balance
Note that we use an artificial identifier (a number we make up) for the customer
called CustomerId. Given a CustomerId, we can uniquely identify the remaining
information. We call CustomerId a Key for the CUSTOMERS table.
b. Relationships
The relationship between CUSTOMERS and ACCOUNTS is by CustomerId.
Since a customer may have more than one account at the bank, we call this a One
to Many relationship. (1:N).
c. Domains
A domain is a set of values that a column may have. Domain also includes the
type and length or size of data found in each column.
CUSTOMERS
Column Domain
22
Basic Concepts in DBMS
ACCOUNTS
Column Domain
Data Type Size
CustomerId (FK) Integer 20
AccountNumber (Key) Integer 15
AccountType Character 2
DateOpened Date
Balance Real 12,2
This logical model is then converted to a physical model and implemented as tables.
The following is some example data for the Accounts and Customers tables:
Customers Table
Accounts Table
23
Basic Concepts in DBMS
d. Business Rules
Business rules allow us to specify constraints on what data can appear in tables
and what operations can be performed on data in tables. For example:
Activity B
Briefly explain the following terms:
(a) User data (b) Metadata (c) Indexes (d) Tables (e) Relationship (f)
Domains
4.0 Conclusion
A database is a collection of information that is organized so that it can easily be
accessed, managed, and updated. Database Management System is a software package
designed to store and manages databases.
5.0 Summary
In this unit we have learnt that:
24
Basic Concepts in DBMS
25
Basic Concepts in DBMS
Page
1.0 Introduction 14
2.0 Objectives 14
3.0 What is Data Modeling? 14
3.1 Data Modeling in the Context of Database Design 14
3.2 Components of a Data Model 14
3.3 Why is Data Modeling Important? 15
3.4 What Makes a Good Data Model? 15
3.5 Entity-Relationship Model 16
3.6 Basic Constructs of E-R modeling 16
3.6.1 Entities 16
3.6.2 Attributes 17
3.6.3 Identifiers 18
3.6.4 Relationships 18
3.6.5 Generalization Hierarchies 20
3.7 E-R Notation 21
4.0 Conclusion 22
5.0 Summary 22
6.0 Tutor Marked Assignment 23
7.0 Further Reading and Other Resources 23
1.0 Introduction
26
Basic Concepts in DBMS
This unit is about one of the most critical stages in the development of a computerized
information system – the design of data structures and the documentation of that design
in a set of data model.
2.0 Objectives
By the end of this unit, you should be able to:
d. Know what data modeling and Entity Relationship is all about
e. Understand the E-R modeling constructs
f. Identify an entity in an E-R relation
g. Know what relationship is in E-R relationship model
h. Draw graph of relations in E-R relationship model
i. Know the advantages of using database management system
There are two major methodologies used to create a data model: the Entity-Relationship
(ER) approach and the Object Model. In this unit, we shall focus on the Entity-
Relationship approach.
The data model has two outputs. The first is an entity-relationship diagram which
represents the data structure in a pictorial form. Because the diagram is easily learned, it
is valuable tool to communicate the model to the end-user. The second component is a
27
Basic Concepts in DBMS
data document. This is a document that describes in detail the data objects, relationships,
and rules required by the database.
The data model is also detailed enough to be used by the database developers as a
“blueprint” for building the physical database.
The information contained in a data model will be used to define the relational tables, the
primary and the foreign keys, stored procedures, and triggers.
A poorly designed database will require more time in the long-run. Without a careful
planning you may create a database that omits data required to create critical reports,
produces results that are incorrect or inconsistent, and is unable to accommodate changes
in user’s requirements.
Activity A
1. What is Data Modeling?
28
Basic Concepts in DBMS
The Entity-Relationship (ER) model is a conceptual data model that views the real world
as entities and relationships. A basic component of the model is the Entity-Relationship
diagram which is used to visually represent data objects. Today, ER model is commonly
used for database design. For the database designer, the utility of the ER model is:
a. It maps well to the relational model. The constructs used in the ER model can
easily be transformed into relational tables.
b. It is simple and easy to understand with a minimum of training. Therefore, the
model can be used by the database designer to communicate the design to the end
user.
c. In addition, the model can be used as a design plan by the database developer to
implement a data model in specific database management software.
The ER model views the real world as a construct of entities and association between
entities. E-R Modeling Constructs are: Entity, Relationship, Attributes, and Identifiers
It is important to get used to this terminology and to be able to use it at the appropriate
time. For example, in the ER Model, we do not refer to tables. Here we call them entities.
3.6.1 Entities
Entities are the principal data object about which information is to be collected. Entities
are usually recognizable concepts, either concrete or abstract, such as person, places,
things, or events which have relevance to the database. Some specific examples of
entities are:
i. EMPLOYEES
ii. PROJECTS
iii. CUSTOMER
iv. ORGANIZATION
v. PART
vi. INGREDIENT
vii. PURCHASE ORDER
viii. CUSTOMER ORDER
PRODUCT
ix. INVOICES
29
Basic Concepts in DBMS
Entities are classified as independent or dependent (in some methodologies, the terms
used are strong and weak, respectively). An independent entity is one that does not rely
on another for identification. A dependent entity is one that relies on another for
identification. The following terms are used with entity:
3.6.2 Attributes
Attributes describe the entity of which they are associated. i.e., properties used to
distinguish one entity instance from another.
i. EmployeeID
ii. First Name
iii. Last Name
iv. Street Address
v. City
vi. Local Government Area
vii. State
viii. Date of First Appointment
ix. Current Status
x. Date of Birth
i. ProductID
ii. Product_Description
30
Basic Concepts in DBMS
iii. Weight
iv. Size
v. Cost
The domain of an attribute is the collection of all possible values an attribute can have.
The domain of Name is a character string.
3.6.3 Identifier
Identifier is a special attribute used to identify a specific instance of an entity.
3.6.4 Relationships
A Relationship represents an association between two or more entities. An example of a
relationship would be:
Relationships are classified by their degree, connectivity, cardinality, direction, type, and
existence. Not all modeling methodologies use all these classifications.
31
Basic Concepts in DBMS
Binary relationships, the association between two entities, are the most common type in
the real world. A recursive binary relationship occurs when an entity is related to itself.
An example might be "some employees are married to other employees".
A ternary relationship involves three entities and is used when a binary relationship is
inadequate. Many modeling approaches recognize only binary relationships. Ternary or
n-ary relationships are decomposed into two or more binary relationships.
employees can be assigned to no more than two projects at the same time;
A single employee can be assigned to many projects; conversely, a single project can
have assigned to it many employee. Here the cardinality for the relationship between
employees and projects is two and the cardinality between project and employee is three.
Many-to-many relationships cannot be directly translated to relational tables but instead
must be transformed into two or more one-to-many relationships using associative
entities.
32
Basic Concepts in DBMS
(c) Direction
(d) Type
An identifying relationship is one in which one of the child entities is also a dependent
entity. A non-identifying relationship is one in which both entities are independent.
(e) Existence
Existence denotes whether the existence of an entity instance is dependent upon the
existence of another, related, entity instance. The existence of an entity in a relationship
is defined as either mandatory or optional. If an instance of an entity must always occur
for an entity to be included in a relationship, then it is mandatory. An example of
mandatory existence is the statement "every project must be managed by a single
department". If the instance of the entity is not required, it is optional. An example of
optional existence is the statement, "employees may be assigned to work on projects".
A generalization hierarchy is a form of abstraction that specifies that two or more entities
that share common attributes can be generalized into a higher level entity type called a
supertype or generic entity. The lower-level of entities become the subtype, or categories,
to the supertype. Subtypes are dependent entities.
Generalization occurs when two or more entities represent categories of the same real-
world object. For example, Wages_Employees and Classified_Employees represent
categories of the same entity, Employees. In this example, Employees would be the
supertype; Wages_Employees and Classified_Employees would be the subtypes.
33
Basic Concepts in DBMS
Generalization hierarchies can be nested. That is, a subtype of one hierarchy can be a
supertype of another. The level of nesting is limited only by the constraint of simplicity.
Subtype entities may be the parent entity in a relationship but not the child.
3.7 ER Notation
There is no standard for representing data objects in ER diagrams. Each modeling
methodology uses its own notation. Today, there are a number of notations used; among
the more common are Bachman, crow's foot, and IDEFIX.
All notational styles represent entities as rectangular boxes and relationships as lines
connecting boxes. Each style uses a special set of symbols to represent the cardinality of
a connection. The symbols used for the basic ER constructs are:
i. Entities are represented by labeled rectangles. The label is the name of the entity.
Entity names should be singular nouns.
ii. Relationships are represented by a solid line connecting two entities. The name of
the relationship is written above the line. Relationship names should be verbs.
iii. Attributes, when included, are listed inside the entity rectangle. Attributes which
are identifiers are underlined. Attribute names should be singular nouns.
iv. Cardinality of many is represented by a line ending in a crow's foot. If the crow's
foot is omitted, the cardinality is one.
v. Existence is represented by placing a circle or a perpendicular bar on the line.
Mandatory existence is shown by the bar (looks like a 1) next to the entity for an
instance is required. Optional existence is shown by placing a circle next to the
entity that is optional.
34
Basic Concepts in DBMS
Activity B
1. Come up with a list of attributes for each of the entities in section 3.3.1
2. Choose one of your attributes as the identifier for each of the entities.
4.0 Conclusion
The data model is relatively small part of the total systems specification but has a high
impact on the quality and useful life of the system. Time spent producing the best
possible design is very likely to be repaid many times over in the future.
5.0 Summary
In this unit, we have learnt that:
35
Basic Concepts in DBMS
36
Basic Concepts in DBMS
Page
1.0 Introduction 25
2.0 Objectives 25
3.0 Requirements Analysis 26
3.1 Steps in Building the Data Model 26
3.2 Identifying Data Objects and Relationships 26
3.2.1 Entities 27
3.2.2 Attributes 28
3.2.3 Relationship 29
3.2.4 Naming Data Objects 30
3.3 Developing the Basic Schema 30
3.3.1 Binary Relationships 30
3.3.2 Recursive Relationships 31
3.4 Refining the Entity-Relationship Diagram 32
3.4.1 Entities must participate in a Relationship 33
3.4.2 Resolve many-to-many Relationships 33
3.4.3 Eliminate redundant relationships 34
3.5 SET Primary and Foreign Keys 34
3.5.1 Define Primary Key Attributes 34
3.5.2 Foreign Keys 37
3.6 Adding Attributes to the Model 38
3.6.1 Relate attributes to entities 38
3.6.2 Parent-Child Relationships 38
3.6.3 Multivalued Attributes 39
3.6.4 Attributes That Describe Relations 39
3.6.5 Derived Attributes and Code Values 39
3.7 Generalization Hierarchies 40
3.7.1 Descript ion 40
3.7.2 Creating a Generalization Hierarchy 41
3.7.3 Types of Hierarchies 41
3.7.4 Rules 42
3.8 Add Data Integrity Rules 43
3.9 Domains 44
4.0 Conclusion 45
5.0 Summary 45
6.0 Tutor Marked Assignment 46
7.0 Further Reading and other Resources 47
37
Basic Concepts in DBMS
1.0 Introduction
The data model is one part of the conceptual design process. The other is the function
model. The data model focuses on what data should be stored in the database while the
function model deals with how the data is processed. To put this in the context of the
relational database, the data model is used to design the relational tables. The functional
model is used to design the queries that will access and perform operations on those
tables.
Data modeling is preceded by planning and analysis. The effort devoted to this stage is
proportional to the scope of the database. The planning and analysis of a database
intended to serve the needs of an enterprise will require more effort than one intended to
serve a small workgroup.
The information needed to build a data model is gathered during the requirements
analysis. Although not formally considered part of the data modeling stage by some
methodologies, in reality the requirements analysis and the ER diagramming part of the
data model are done at the same time.
2.0 Objectives
By the end of this unit, you should be able to:
j. Know what data modeling and Entity Relationship is all about
k. Understand the E-R modeling constructs
l. Identify an entity in an E-R relation
m. Know what relationship is in E-R relationship model
n. Draw graph of relations in E-R relationship modelKnow the advantages of using
database management system
The modeler, or modelers, works with the end users of an organization to determine the
data requirements of the database. Information needed for the requirements analysis can
be gathered in several ways:
38
Basic Concepts in DBMS
The requirements analysis is usually done at the same time as the data modeling. As
information is collected, data objects are identified and classified as either entities,
attributes, or relationship; assigned names; and, defined using terms familiar to the end-
users. The objects are then modeled and analysed using an ER diagram. The diagram can
be reviewed by the modeler and the end-users to determine its completeness and
accuracy. If the model is not correct, it is modified, which sometimes requires additional
information to be collected. The review and edit cycle continues until the model is
certified as correct.
In order to begin constructing the basic model, the modeler must analyze the information
gathered during the requirements analysis for the purpose of:
39
Basic Concepts in DBMS
To accomplish these goals the modeler must analyze narratives from users, notes from
meeting, policy and procedure documents, and, if lucky, design documents from the
current information system.
While the definitions of the constructs in the ER Model are simple, the model does not
address the fundamental issue of how to identify them. Some commonly given guidelines
are:
3.2.1 Entities
40
Basic Concepts in DBMS
3.2.2 Attributes
Attributes are data objects that either identify or describe entities. Attributes that identify
entities are called key attributes. Attributes that describe an entity are called non-key
attributes.
Attribute values should be atomic, that is, present a single fact. Having disaggregated
data allows simpler programming, greater reusability of data, and easier implementation
of changes. Normalization also depends upon the "single fact" rule being followed.
Common types of violations include:
Two areas where data modeling experts disagree is whether derived attributes and
attributes whose values are codes should be permitted in the data model.
a. derived data is often important to both managers and users and therefore should
be included in the data model
b. it is just as important, perhaps more so, to document derived attributes just as you
would other attributes
c. including derived attributes in the data model does not imply how they will be
implemented
41
Basic Concepts in DBMS
A coded value uses one or more letters or numbers to represent a fact. For example, the
value Gender might use the letters "M" and "F" as values rather than "Male" and
"Female". Those who are against this practice cite that codes have no intuitive meaning to
the end-users and add complexity to processing data. Those in favor argue that many
organizations have a long history of using coded attributes, that codes save space, and
improve flexibility in that values can be easily added or modified by means of look-up
tables.
3.2.3 Relationships
Here the cardinality of the relationship from employees to projects is three; from projects
to employees, the cardinality is two. Therefore, this relationship can be classified as a
many-to-many relationship.
Mandatory relationships, on the other hand, are indicated by words such as must have.
For example:
In the case of the specific relationship form (1:1 and 1:M), there is always a parent entity
and a child entity. In one-to-many relationships, the parent is always the entity with the
cardinality of one. In one-to-one relationships, the choice of the parent entity must be
42
Basic Concepts in DBMS
made in the context of the business being modeled. If a decision cannot be made, the
choice is arbitrary.
• unique
• have meaning to the end-user
• contain the minimum number of words needed to uniquely and accurately
describe the object
For entities and attributes, names are singular nouns while relationship names are
typically verbs.
Some authors advise against using abbreviations or acronyms because they might lead to
confusion about what they mean. Other believes using abbreviations or acronyms are
acceptable provided that they are universally used and understood within the
organization.
You should also take care to identify and resolve synonyms for entities and attributes.
This can happen in large projects where different departments use different terms for the
same thing.
Once entities and relationships have been identified and defined, the first draft of the
entity relationship diagram can be created. This section introduces the ER diagram by
demonstrating how to diagram binary relationships. Recursive relationships are also
shown.
43
Basic Concepts in DBMS
(a) One-To-One: Figure 3.1A shows an example of a one-to-one diagram. Reading the
diagram from left to right represents the relationship every employee is assigned a
workstation. Because every employee must have a workstation, the symbol for
mandatory existence—in this case the crossbar—is placed next to the WORKSTATION
entity. Reading from right to left, the diagram shows that not all workstation are assigned
to employees. This condition may reflect that some workstations are kept for spares or for
loans. Therefore, we use the symbol for optional existence, the circle, next to
EMPLOYEE. The cardinality and existence of a relationship must be derived from the
"business rules" of the organization. For example, if all workstations owned by an
organization were assigned to employees, then the circle would be replaced by a crossbar
to indicate mandatory existence. One-to-one relationships are rarely seen in "real-world"
data models.
44
Basic Concepts in DBMS
relationship reflects the "business rule" that not all departments in the organization will
be responsible for managing projects. Reading from right to left, the diagram tells us that
every project must be the responsibility of exactly one department.
A recursive relationship is an entity that is associated with itself. Figure 3.2 shows an
example of the recursive relationship.
An employee may manage many employees and each employee is managed by one
employee.
45
Basic Concepts in DBMS
46
Basic Concepts in DBMS
Notice that the schema changes the semantics of the original relation to
47
Basic Concepts in DBMS
1. identify and define the primary key attributes for each entity
2. validate primary keys and relationships
3. migrate the primary keys to establish foreign keys
To qualify as a primary key for an entity, an attribute must have the following properties:
In some instances, an entity will have more than one attribute that can serve as a primary
key. Any key or minimum set of keys that could be a primary key is called a candidate
key. Once candidate keys are identified, choose one, and only one, primary key for each
entity. Choose the identifier most commonly used by the user as long as it conforms to
the properties listed above. Candidate keys which are not chosen as the primary key are
known as alternate keys.
An example of an entity that could have several possible primary keys is Employee. Let's
assume that for each employee in an organization there are three candidate keys:
Employee ID, Social Secur ity Number, and Name.
Name is the least desirable candidate. While it might work for a small department where
it would be unlikely that two people would have exactly the same name, it would not
work for a large organization that had hundreds or thousands of employees. Moreover,
there is the possibility that an employee's name could change because of marriage.
Employee ID would be a good candidate as long as each employee was assigned a unique
identifier at the time of hire. Social Security would work best since every employee is
required to have one before being hired.
Sometimes it requires more than one attribute to uniquely identify an entity. A primary
key that made up of more than one attribute is known as a composite key. Figure 3.4
shows an example of a composite key. Each instance of the entity Work can be uniquely
identified only by a composite key composed of Employee ID and Project ID.
48
Basic Concepts in DBMS
WORK
An artificial keyis one that has no meaning to the business or organization. Artificial keys
are permitted when
1) no attribute has all the primary key properties, or
2) the primary key is large and complex.
Dependent entities, entities that depend on the existence of another entity for their
identification, inherit the entire primary key from the parent entity. Every entity within a
generalization hierarchy inherits the primary key of the root generic entity.
Once the keys have been identified for the model, it is time to name and define the
attributes that have been used as keys.
There is no standard method for representing primary keys in ER diagrams. For this
document, the name of the primary key followed by the notation (PK) is written inside
the entity box. An example is shown in Figure 3.5A.
49
Basic Concepts in DBMS
Basic rules governing the identification and migration of primary keys are:
a. Every entity in the data model shall have a primary key whose values uniquely
identify entity instances.
b. The primary key attribute cannot be optional (i.e., have null values).
c. The primary key cannot have repeating values. That is, the attribute may not have
more than one value at a time for a given entity instance is prohibited. This is
known as the No Repeat Rule.
d. Entities with compound pr imary keys cannot be split into multiple entities with
simpler primary keys. This is called the Smallest Key Rule.
e. Two entities may not have identical primary keys with the exception of entities
within generalization hierarchies.
f. The entire primary key must migrate from parent entities to child entities and
from supertype, generic entities, to subtypes, category entities.
Every dependent and category (subtype) entity in the model must have a foreign key for
each relationship in which it participates. Foreign keys are formed in dependent and
subtype entities by migrating the entire primary key from the parent or generic entity. If
the primary key is composite, it may not be split.
Foreign key attributes are not considered to be owned by the entities to which they
migrate, because they are reflections of attributes in the parent entities. Thus, each
attribute in an entity is either owned by that entity or belongs to a foreign key in that
entity.
50
Basic Concepts in DBMS
If the primary key of a child entity contains all the attributes in a foreign key, the child
entity is said to be "identifier dependent" on the parent entity, and the relationship is
called an "identifying relationship." If any attributes in a foreign key do not belong to the
child's primary key, the child is not identifier dependent on the parent, and the
relationship is called "non identifying."
Foreign keys attributes are indicated by the notation (FK) beside them. An example is
shown in Figure 3.5 (B) above.
Activity A
1. Explain the following terms in relation to data objects
i. Primary Key
ii. Candidate Key
iii. Composite Key
iv. Artificial Key
v. Foreign Key
2. What do you understand by Binary relationship in database design?
Non-key attributes describe the entities to which they belong. In this section, we discuss
the rules for assigning non-key attributes to entities and how to handle multivalued
attributes.
The process of relating attributes to the entities begins by the modeler, with the assistance
of the end-users, placing attributes with the entities that they appear to describe. Once this
is completed, the assignments are validated by the formal method of normalization.
Before beginning formal normalization, the rule is to place non-key attributes in entities
where the value of the primary key determines the values of the attributes. In general,
entities with the same primary key should be combined into one entity. Some other
guidelines for relating attributes to entities are given below.
51
Basic Concepts in DBMS
If an attribute is dependent upon the primary key but is multivalued, has more than one
value for a particular value of the key, reclassify the attribute as a new child entity. If the
multivalued attribute is unique within the new entity, it becomes the primary key. If not
migrate the primary key from the original, now parent, entity.
For example, assume an entity called PROJECT with the attributes Proj_ID (the key),
Proj_Name, Task_ID, Task_Name
PROJECT
Task_ID and Task_Name have multiple values for the key attribute. The solution is to
create a new entity, let's call it TASK and make it a child of PROJECT. Move Task_ID
and Task_Name from PROJECT to TASK. Since neither attribute uniquely identifies a
task, the final step would be to migrate Proj_ID to TASK.
In some cases, it appears that an attribute describes a relationship rather than an entity (in
the Chen notation of ER diagrams this is permissible). For example,
Possible attributes are the date the books were checked out and when they are due.
Typically, such a situation will occur with a many-to-many relationship and the solution
is the same. Reclassify the relationship as a new entity which is a child to both original
entities. In some methodologies, the newly created is called an associative entity.
52
Basic Concepts in DBMS
• derived data is often important to both managers and users and therefore should
be included in the data model.
• it is just as important, perhaps more so, to document derived attributes just as you
would other attributes
• including derived attributes in the data model does not imply how they will be
implemented.
A coded value uses one or more letters or numbers to represent a fact. For example, the
value Gender might use the letters "M" and "F" as values rather than "Male" and
"Female". Those who are against this practice cite that codes have no intuitive meaning to
the end-users and add complexity to processing data. Those in favor argue that many
organizations have a long history of using coded attributes, that codes save space, and
improve flexibility in that values can be easily added or modified by means of look-up
tables.
Up to this point, we have discussed describing an object, the entity, by its shared
characteristics, the attributes. For example, we can characterize an employee by their
employee id, name, job title, and skill set.
3.7.1 Description
53
Basic Concepts in DBMS
In a disjoint hierarchy, an entity instance can be in only one subtype. For example, the
entity EMPLOYEE, may have two subtypes, CLASSIFIED and WAGES. An employee
may be one type or the other but not both. Figure 1 shows A) overlapping and B) disjoint
generalization hierarchy.
54
Basic Concepts in DBMS
3.7.4 Rules
The primary rule of generalization hierarchies is that each instance of the supertype entity
must appear in at least one subtype; likewise, an instance of the subtype must appear in
the supertype.
Subtypes can be a part of only one generalization hierarchy. That is, a subtype can not be
related to more than one supertype. However, generalization hierarchies may be nested
by having the subtype of one hierarchy be the supertype for another.
Subtypes may be the parent entity in a relationship but not the child. If this were allowed,
the subtype would inherit two primary keys.
55
Basic Concepts in DBMS
Data integrity is one of the cornerstones of the relational model. Simply stated data
integrity means that the data values in the database are correct and consistent.
Data integrity is enforced in the relational model by entity and referential integr ity rules.
Although not part of the relational model, most database software enforces attribute
integrity through the use of domain information.
The entity integrity rule states that for every instance of an entity, the value of the
primary key must exist, be unique, and cannot be null. Without entity integr ity, the
primary key could not fulfill its role of uniquely identifying each instance of an entity.
A foreign key creates a hierarchical relationship between two associated entities. The
entity containing the foreign key is the child, or dependent, and the table containing the
primary key from which the foreign key values are obtained is the parent.
In order to maintain referential integrity between the parent and child as data is inserted
or deleted from the database certain insert and delete rules must be considered.
i. Dependent. The dependent insert rule permits insertion of child entity instance
only if matching parent entity already exists.
ii. Automatic. The automatic insert rule always permits insertion of child entity
instance. If matching parent entity instance does not exist, it is created.
iii. Nullify. The nullify insert rule always permits the insertion of child entity
instance. If a matching parent entity instance does not exist, the foreign key in
child is set to null.
iv. Default. The default insert rule always permits insertion of child entity instance.
If a matching parent entity instance does not exist, the foreign key in the child is
set to previously defined value.
56
Basic Concepts in DBMS
v. Customized. The customized insert rule permits the insertion of child entity
instance only if certain customized validity constraints are met.
vi. No Effect. This rule states that the insertion of child entity instance is always
permitted. No matching parent entity instance need exist, and thus no validity
checking is done.
i. Restrict. The restrict delete rule permits deletion of parent entity instance only if
there are no matching child entity instances.
ii. Cascade. The cascade delete rule always permits deletion of a parent entity
instance and deletes all matching instances in the child entity.
iii. Nullify. The nullify delete rules always permits deletion of a parent entity
instance. If any matching child entity instances exist, the values of the foreign
keys in those instances are set to null.
iv. Default. The default rule always permits deletion of a parent entity instance. If
any matching child entity instances exist, the value of the foreign keys are set to a
predefined default value.
v. Customized. The customized delete rule permits deletion of a parent entity
instance only if certain validity constraints are met.
vi. No Effect. The no effect delete rule always permits deletion of a parent entity
instance. No validity checking is done.
3.9 Domains
A domain is a valid set of values for an attribute which enforce that values from an insert
or update make sense. Each attribute in the model should be assigned domain information
which includes:
a. Data Type—Basic data types are integer, decimal, or character. Most data bases
support variants of these plus special data types for date and time.
b. Length—This is the number of digits or characters in the value. For example, a
value of 5 digits or 40 characters.
c. Date Format—The format for date values such as dd/mm/yy or yy/mm/dd
d. Range—The range specifies the lower and upper boundaries of the values the
attribute may legally have
e. Constraints—Are special restrictions on allowable values. For example, the
Beginning_Pay_Date for a new employee must always be the first work day of
the month of hire.
f. Null support—Indicates whether the attribute can have null values
57
Basic Concepts in DBMS
g. Default value (if any)—The value an attribute instance will have if a value is not
entered.
The values of primary keys must be unique and nulls are not allowed.
The data type, length, and format of primary keys must be the same as the corresponding
primary key. The uniqueness property must be consistent with relationship type. A one-
to-one relationship implies a unique foreign key; a one-to-many relationship implies a
non-unique foreign key.
Activity B
1. What do you understand by the term Generalization Hierarchy?
2. Explain the following terms:
i. Entity Integr ity
ii. Referential integrity
iii. Entity
iv. Attributes
v. Relationship
vi. Supertypes
4.0 Conclusion
Data modeling stage is a very important stage in database or information system design.
There would be problem either now or in the nearest future if proper and complete data
modeling is not done.
5.0 Summary
In this unit, we have learnt that:
i. Data modeling must be preceded by planning and analysis. Planning defines the
goals of the database, explains why the goals are important, and sets out the path
by which the goals will be reached. Analysis involves determining the
requirements of the database. This is typically done by examining existing
documentation and interviewing users.
ii. An effective data model completely and accurately represents the data
requirements of the end users. It is simple enough to be understood by the end
user yet detailed enough to be used by a database designer to build the database.
The model eliminates redundant data, it is independent of any hardware and
58
Basic Concepts in DBMS
i. Each student has a first and last name, and a student number.
ii. Each course has a course number (e.g. CIT843) and a title.
iii. A course will have multiple offerings, identified by year, term, and possibly
section.
iv. A student may enroll in multiple course offerings.
v. Each course offering divides its overall evaluation into one or more components
(e.g. Assignments, Quizzes, Seminars, Final Exam), each weighted some
specified fraction of the offering's final grade (e.g. assignments are worth 35%,
quizzes worth 35% and the final exam worth 30%).
vi. Each component is made up of one or more graded items (e.g. assignment #3 is a
single graded item).
vii. Each graded item records the order number of the item within its component, the
date of evaluation, and the maximum mark possible (e.g. quiz number four will be
held October 19, 2006, and it is out of a total of 10 marks).
viii. A student's mark is recorded for each graded item in a course offering.
ix. When evaluating some components, one or more of the graded items with the
lowest marks are dropped from the calculation.
x. In extenuating circumstances, an instructor may drop a student's mark from his or
her evaluation.
xi. At the end of the term, the student's final mark in a course offering is converted to
a letter grade and GPA.
59
Basic Concepts in DBMS
60
Basic Concepts in DBMS
Page
1.0 Introduction 49
2.0 Objectives 49
3.0 Data Structure and Terminologies 49
3.1 Schema Conversion Rules 51
3.2 Null Values 52
3.3 Keys 52
3.3.1 Candidate Keys 52
3.3.2 Primary Keys 52
3.3.3 Foreign Keys 53
3.3.4 Surrogate Keys 53
3.4 Schema Diagram Notation 53
3.5 Conversion Specifics 54
3.5.1 One-to-One Relationships 54
3.5.2 One-to-Many Relationships 54
3.5.3 Many-to-Many Relationships 55
3.6 Relationship Participation 56
3.7 Subtype Entities 57
3.8 Reflexive Relationships 58
3.9 Properties of Relational Tables 59
3.10 Relational Data Integrity 60
4.0 Conclusion 61
5.0 Summary 61
6.0 Tutor Marked Assignment 61
7.0 Further Reading and Other Resources 61
61
Basic Concepts in DBMS
1.0 Introduction
Previously, we covered modeling the user's view as an E-R diagram, and Entities,
Relationships, Attributes and Identifiers were used.
The relational model was formally introduced by Dr. E. F. Codd in 1970 and has evolved
since then, through a series of writings. The model provides a simple, yet rigorously
defined, concept of how users perceive data. The relational model represents data in the
form of two-dimension tables. Each table represents some real-world person, place, thing,
or event about which information is collected. A relational database is a collection of
two-dimensional tables. The organization of data into relational tables is known as the
logical view of the database. That is, the form in which a relational database presents data
to the user and the programmer. The way the database software physically stores the data
on a computer disk system is called the internal view.
2.0 Objectives
This unit discusses the basic concepts—data structures, relationships, and data integrity—
that are the basis of the relational model.
62
Basic Concepts in DBMS
There are alternate names used to describe relational tables. Some manuals use the terms
tables, fields, and records to describe relational tables, columns, and rows, respectively.
The formal literature tends to use the mathematical terms, relations, attributes, and tuples.
Figure 4.2 summarizes these naming conventions.
63
Basic Concepts in DBMS
We will always follow this procedure from start to finish when converting a model to a
schema.
i. Convert all entities to tables. The entity name becomes the table name, and the
entity attributes become the table columns.
ii. Find the candidate keys for each table, and from them choose a primary key for
each table (if possible).
iii. Replace one-to-one, one-to-many, and subtype entity relationships with foreign
key columns in the appropriate tables.
iv. Replace many-to-many entity relationships with a new join table that contains
foreign key columns of the related tables.
v. Based on the participation of the entity relationships, set the column datatype for
the foreign keys to allow or disallow NULL values.
vi. Now that the foreign key columns are in place, find the candidate keys for the
tables again (including any newly-added tables), and select a primary key from
these. Add surrogate keys if necessary.
vii. Write down the functional dependencies between columns for each table and
validate the tables against the COMP210 interpretation of the first 5 normal forms
for potential modification anomalies. If there are problems, revisit the E-R model,
make corrections, and begin the schema conversion procedure again from the top.
viii. Translate the validated schema into SQL DDL and create the tables, indices, and
any referential or unique integrity constraints, in an RDBMS.
64
Basic Concepts in DBMS
Relational databases introduce the useful, but occasionally misused, concept of a NULL
value. NULL should be thought of the absence of any meaningful value: it is not the
same as zero, or false, or an empty string. Bear in mind that any mathematical operation
including a NULL will always result in NULL: 2 + 0 = 2, but 2 + NULL is just NULL
again. Note that the equality operator (=) can't be used with NULL, because NULL does
not equal NULL, and even more strangely, NULL does not equal NULL.
NULL values are most often used in columns to indicate either Unknown or Not
Applicable. Of the two, Not Applicable is the more "proper" interpretation, but both uses
are common.
A column in a table may be declared to either allow or disallow NULL values. Despite
their utility, columns that allow NULL values should be kept to an absolute minimum
(just as we tried to minimize the number of E-R relationships with optional participation).
An optional attribute in the E-R model will convert to a column that accepts NULL
values (that's why we tried to minimize the number of those too).
3.3 Keys
Keys are very important to relational schemas; much more so than identifiers to E-R
models.
• A primary key must not contain any columns that allow NULL values.
• The value of a primary key, whether composed of one or many columns, must
never change.
As long as these restrictions are followed, the choice of primary key is largely a design
decision, often influenced by either performance or business concerns.
65
Basic Concepts in DBMS
i. Performance: If the primary key for a table is a long text string, the RDBMS will
have to create an enormous index for the table. Foreign keys that reference this
table will be similarly huge. Surrogate keys are usually 32-bit integers and can
therefore be manipulated very efficiently.
ii. Primary keys must not change values: if, for example, the primary key for a table
is an employee's full name, the values of the primary key could potentially change
(through marriage, or for celebrity-induced reasons: "The Artist Formerly Known
As Prince"). A surrogate key is not affected by such changes, particularly if the
users of the system never see the surrogate key values.
iii. No other primary key can be found for a table.
The notation used for relational schemas is arbitrary at best. There are no standards. I
have used a simplified version of Crow's Foot notation, although it's not unusual to just
use unmarked lines to connect tables and infer their relationship from the primary and
foreign keys. Microsoft Access's "Relationships" tool can also be used to draw schema
diagrams.
Primary key columns are underlined in the schema notation (as were identifier attributes)
and may also be followed by "(PK)". Foreign key columns are indicated by "(FKx)"
where x is a number assigned to the foreign key to distinguish it from other foreign keys
in the table. Columns that accept NULL values are identified by ": NULL" following the
column name. All other columns are assumed not to accept NULL values.
66
Basic Concepts in DBMS
The actual mechanics of converting entities into tables is relatively straightforward. Each
of the special cases is described in the following sections.
Since USER_ACCOUNT already had a primary key (from the user_name identifier),
then PERSON will receive the user_name foreign key column and it will become the
primary key for the PERSON table.
67
Basic Concepts in DBMS
In this example, the foreign key is inserted into the REGION table because it lies on the
"many" side of the relationship. Since REGION already had a column named "name", it
was renamed to region_name and the foreign key was renamed to country_name so there
wouldn't be any confusion. The primary key for REGION can now be chosen as the
combination of country_name and region_name because both together guarantee
uniqueness (a country will not have two regions with the same name).
In this example, both the ACTOR and FILM entities had unique identifiers, so those are
chosen as the primary keys and are received by the new join table, FILM_ACTOR, as
foreign keys. The role attribute of the many-to-many relationship becomes an attribute of
the join table. The primary key for the FILM_ACTOR table is the combination of all
foreign key columns.
If no other name suggests itself for the new join table, it's acceptable to concatenate the
names of the two formerly-related tables as above (although ROLE might have made
more sense--there's no reason a table can't have a column with the same name).
The "relationships" that connect the join table to the other tables will always be one-to-
many, with the "many" side closest to the join table. Imagine slicing the many-to-many
68
Basic Concepts in DBMS
relationship in half, and then swapping the two halves and placing them on either side of
the join table.
Activity A
1. Explain the following terms
i. Candidate Key
ii. Foreign Key
iii. Surrogate Key
iv. Null value
2. Explain with aid of diagram how the following E-R notation can be
converted to Relational Model. Pay good attention to Primary, Foreign
and Surrogate keys.
• If the optional symbol is on the same side as the foreign key, no further action is
necessary.
• If the optional symbols is on the opposite side from the foreign key, the foreign
key column(s) must accept NULL values.
For example, the optional symbol is on the same side as the foreign key in this
model/schema ("a country may not have any sub-regions"):
The resulting schema is exactly the same as it would have been for a full mandatory
relationship.
69
Basic Concepts in DBMS
In this next example, the optional symbol is on the opposite side from the foreign key
("some extra-country regions exist"):
i. The foreign key still goes on the "many" side of the one-to-many relationship,
and, as before, it is renamed country_name to avoid confusion.
ii. Because this foreign key is on the opposite side from the optional symbol, its
column must accept NULL values.
iii. A primary key cannot contain columns that accept NULL values, so the
combination of country_name and region_name no longer services.
iv. A new surrogate key, region_id, is instead chosen as the REGION table's
pr imary key.
Now you see why we try to avoid optional participation when modelling. It can make
quite a difference to a schema, but only when the optional symbol appears on the "one"
side of a relationship. (Therefore, many-to-many relationships are immune from this
complexity--join tables never contain NULL foreign key columns.)
70
Basic Concepts in DBMS
In this example, EXECUTIVE is a subtype of EMPLOYEE, and has extra attributes that
apply only to executive employees. Once converted to schema form, the relationship is
nothing more than one-to-one. A subtype table will always carry a foreign key to the
supertype or parent table, and that foreign key will almost always serve as the primary
key for the subtype table. Because the foreign key is on the same side of the relationship
as the optional symbol, it does not accept NULL values.
The same principle applies to all non-subtype one-to-one relationships with one side that
has an optional participation: whenever possible, put the foreign key into the table on the
same side as the optional to avoid NULL foreign key columns.
Note that the "is a" sense of the subtype relationship evaporates after schema conversion.
The EXECUTIVE table doesn't automatically inherit EMPLOYEE's attributes; the
association between the two tables is now indistinguishable from a normal one-to-one
relationship.
In this example, the reflexive relationship represents cat mothers: "a cat may have many
kittens, or none if it is not a mother, and a cat must have one mother, or none at all if the
mother is unknown." Because this is a one-to-many relationship, the foreign key goes on
the "many" side of the relationship, or right back into the CAT table. There is already a
71
Basic Concepts in DBMS
tag_no column, so the foreign key must be renamed ("mother_tag_no" seems reasonable
because it mentions the reflexive relationship's label).
It may not be immediately obvious, but this foreign key is on the opposite side from the
optional symbol that appears on the "one" side of the reflexive relationship. Therefore the
mother_tag_no column must accept NULL values.
This property implies that columns in a relational table are not repeating group or arrays.
Such tables are referred to as being in the "first normal form" (1NF). The atomic value
property of relational tables is important because it is one of the cornerstones of the
relational model.
In relational terms this means that all values in a column come from the same domain. A
domain is a set of values which a column may have. For example, a Monthly_Salary
column contains only specific monthly salaries. It never contains other information such
as comments, status flags, or even weekly salary.
This property simplifies data access because developers and users can be certain of the
type of data contained in a given column. It also simplifies data validation. Because all
values are from the same domain, the domain can be defined and enforced with the Data
Definition Langua ge (DDL) of the database software.
This property ensures that no two rows in a relational table are identical; there is at least
one column, or set of columns, the values of which uniquely identify each row in the
table. Such columns are called primary keys and are discussed in more detail in
72
Basic Concepts in DBMS
This property states that the ordering of the columns in the relational table has no
meaning. Columns can be retrieved in any order and in various sequences. The benefit of
this property is that it enables many users to share the same table without concern of how
the table is organized. It also permits the physical structure of the database to change
without affecting the relational tables.
This property is analogous the one above but applies to rows instead of columns. The
main benefit is that the rows of a relational table can be retrieved in different order and
sequences. Adding information to a relational table is simplified and does not affect
existing queries.
Data integrity means, in part, that you can correctly and consistently navigate and
manipulate the tables in the database. There are two basic rules to ensure data integrity;
entity integrity and referential integrity.
The entity integrity rule states that the value of the primary key can never be a null value
(a null value is one that has no value and is not the same as a blank). Because a primary
key is used to identify a unique row in a relational table, its value must always be
specified and should never be unknown. The integrity rule requires that insert, update,
and delete operations maintain the uniqueness and existence of all primary keys.
The referential integrity rule states that if a relational table has a foreign key, then every
value of the foreign key must either be null or match the values in the relational table in
which that foreign key is a primary key.
Activity B
1. List six properties of Relational Tables
2. Explain the following terms
i. Reflexive Relationships
ii. Entity Integr ity
iii. Referential Integrity
73
Basic Concepts in DBMS
4.0 Conclusion
5.0 Summary
In this unit, we have learnt:
vii. Conversion from E-R Model to RM and the rules governing the conversion
viii. NULL values are most often used in columns to indicate either Unknown or Not
Applicable
ix. Keys are very important to relational schemas; much more so than identifiers to
E-R models.
x. A one-to-many relationship is converted by inserting a foreign key into the table
that lies on the "many" side of the relationship.
xi. To implement a many-to-many relationship in a schema, an additional table must
be added that contains foreign keys for each of the two related tables.
xii. Relational tables have six properties: Values are atomic; Column values are of
the same kind; Each row is unique; The sequence of columns is insignificant; The
sequence of rows is insignificant; and Each column must have a unique name.
xiii. There are two basic rules to ensure data integrity; entity integrity and referential
integrity.
xiv. Relational tables are sets. The rows of the tables can be considered as elements of
the set. Operations that can be performed on sets can be done on relational tables.
The eight relational operations are: Union; Product; Division; Projection; Join;
Selection; Intersection; and Difference
74
Basic Concepts in DBMS
Page
1.0 Introduction 63
2.0 Objectives 63
3.1 Data Redundancy 63
3.1.1 Reasons against most types of unnecessary duplicate data 64
3.1.2 Types of data anomalies 64
3.1.3 How to eliminate redundant data 64
3.2 Basic Concepts of Normalization 64
3.3 Functional Dependencies 65
3.4 Overview of Normalization 65
3.5 Sample Data 66
3.6 First Normal Form 66
3.7 Second Normal Form 67
3.8 Third Normal Form 68
3.9 Advanced Normal Form 71
3.9.1 Boyce-Codd Normal Form 71
3.9.2 Fourth Normal Form 71
3.9.3 Fifth Normal Form 72
3.9.4 Domain Key Normal Form (DK/NF) 74
3.10 De-Normalization 74
3.11 Example of Normalization 74
4.0 Conclusion 77
5.0 Summary 79
6.0 Tutor Marked Assignment 80
7.0 Further Reading and other Resources 81
75
Basic Concepts in DBMS
1.0 Introduction
Normalization is a design technique that is widely used as a guide in designing relational
databases. Normalization is essentially a two step process that puts data into tabular form
by removing repeating groups and then removes duplicated data from the relational
tables.
Normalization theory is based on the concepts of normal forms. A relational table is said
to be a particular normal form if it satisfied a certain set of constraints. There are
currently five normal forms that have been defined. In this unit, we will cover the first
three normal forms that were defined by E. F. Codd.
2.0 Objectives
By the end of this unit, you should be able to:
Unnecessary data can occur when an organization stores several copies of similar
information about the same data in multiple departments within an organization (ie;
Sales, Support, and Marketing) maintaining their "own" customer databases (ie;
SALES_CUST, SUPPORT_CUST, and MARKETING_CUST). It can also occur if
repeatable data types are contained within repeating fields, and not segregated into their
own tables and related by a unique ID key (ie; CUST_ID).
76
Basic Concepts in DBMS
To eliminate redundant data from your database, you must take special care to organize
the data in your data tables. Normalization is a method of organizing your data to prevent
redundancy. Normalization involves establishing and maintaining the integrity of your
data tables as well as eliminating inconsistent data dependencies.
Normalization requires that you adhere to rules, established by the database community,
to ensure that data is organized efficiently. These rules are called normal form rules.
Normalization may require that you include additional data tables in your database.
Normal form rules number from one to three, for most applications. The rules are
cumulative such that the rules of the 2nd normal form are inclusive of the rules in the 1st
normal form. The rules of the 3rd normal form are inclusive of the rules in the 1st and
2nd normal forms, etc.
77
Basic Concepts in DBMS
relational database should be in the third normal form (3NF). A relational table is in 3NF
if and only if all non-key columns are:
Mutual independence means that no non-key column is dependent upon any combination
of the other columns. The first two normal forms are intermediate steps to achieve the
goal of having all tables in 3NF. In order to better understand the 2NF and higher forms,
it is necessary to understand the concepts of functional dependencies and lossless
decomposition.
which can be read as in the relational table named R, column x functionally determines
(identifies) column y.
78
Basic Concepts in DBMS
A company obtains parts from a number of suppliers. Each supplier is located in one city.
A city can have more than one supplier located there and each city has a status code
associated with it. Each supplier may provide many parts. The company creates a simple
relational table to store this information that can be expressed in relational notation as:
where
In order to uniquely associate quantity supplied (qty) with part (p#) and supplier (s#), a
composite primary key composed of s# and p# is used.
A relational table, by definition, is in first normal form if all values of the columns are
atomic. That is, they contain no repeating values. Figure1 shows the table FIRST in 1NF.
79
Basic Concepts in DBMS
Although the table FIRST is in 1NF it contains redundant data. For example, information
about the supplier's location and the location's status has to be repeated for every part
supplied. Redundancy causes what are called update anomalies. Update anomalies are
problems that arise when information is inserted, deleted, or updated. For example, the
following anomalies could occur in FIRST:
• INSERT. The fact that a certain supplier (s5) is located in a particular city
(Athens) cannot be added until they supplied a part.
• DELETE. If a row is deleted, then not only is the information about quantity and
part lost but also information about the supplier.
• UPDATE. If supplier s1 moved from London to New York, then six rows would
have to be updated with this new information.
The definition of second normal form states that only tables with composite primary keys
can be in 1NF but not in 2NF.
That is, every non-key column must be dependent upon the entire primary key. FIRST is
in 1NF but not in 2NF because status and city are functionally dependent upon only on
the column s# of the composite key (s#, p#). This can be illustrated by listing the
functional dependencies in the table:
1. Identify any determinants other than the composite key, and the columns they
determine.
2. Create and name a new table for each determinant and the unique columns it
determines.
3. Move the determined columns from the original table to the new table. The
determinate becomes the primary key of the new table.
80
Basic Concepts in DBMS
4. Delete the columns you just moved from the original table except for the
determinate which will serve as a foreign key.
5. The original table may be renamed to maintain semantic meaning.
To transform FIRST into 2NF we move the columns s#, status, and city to a new table
called SECOND. The column s# becomes the primary key of this new table. The results
are shown below in Figure 2.
Tables in 2NF but not in 3NF still contain modification anomalies. In the example of
SECOND, they are:
INSERT. The fact that a particular city has a certain status (Rome has a
status of 50) cannot be inserted until there is a supplier in the city.
The third normal form requires that all columns in a relational table are dependent only
upon the primary key. A more formal definition is:
81
Basic Concepts in DBMS
Table PARTS is already in 3NF. The non-key column, qty, is fully dependent upon the
primary key (s#, p#). SUPPLIER is in 2NF but not in 3NF because it contains a transitive
dependency. A transitive dependency is occurs when a non-key column that is a
determinant of the primary key is the determinate of other columns. The concept of a
transitive dependency can be illustrated by showing the functional dependencies in
SUPPLIER:
Note that SUPPLIER.status is determined both by the primary key s# and the non-key
column city. The process of transforming a table into 3NF is:
1. Identify any determinants, other the primary key, and the columns they determine.
2. Create and name a new table for each determinant and the unique columns it
determines.
3. Move the determined columns from the original table to the new table. The
determinate becomes the primary key of the new table.
4. Delete the columns you just moved from the original table except for the
determinate which will serve as a foreign key.
5. The original table may be renamed to maintain semantic meaning.
To transform SUPPLIER into 3NF, we create a new table called CITY_STATUS and
move the columns city and status into it. Status is deleted from the original table, city is
left behind to serve as a foreign key to CITY_STATUS, and the original table is renamed
to SUPPLIER_CITY to reflect its semantic meaning. The results are shown in Figure 3
below.
82
Basic Concepts in DBMS
The results of putting the original table into 3NF has created three tables. These can be
represented in "psuedo-SQL" as:
SUPPLIER_CITY(s#, city)
Primary Key (s#)
Foreign Key (city) references CITY_STATUS.city
The advantage of having relational tables in 3NF is that it eliminates redundant data
which in turn saves space and reduces manipulation anomalies. For example, the
improvements to our sample database are:
INSERT. Facts about the status of a city, Rome has a status of 50, can be
added even though there is not supplier in that city. Likewise, facts about
new suppliers can be added even though they have not yet supplied parts.
Activity A
83
Basic Concepts in DBMS
After 3NF, all normalization problems involve only tables which have three or more
columns and all the columns are keys. Many practitioners argue that placing entities in
3NF is generally sufficient because it is rare that entities that are in 3NF are not also in
4NF and 5NF. They further argue that the benefits gained from transforming entities into
4NF and 5NF are so slight that it is not worth the effort. However, advanced normal
forms are presented because there are cases where they are required.
A relational table is in the fourth normal form (4NF) if it is in BCNF and all multivalued
dependencies are also functional dependencies.
Fourth normal form (4NF) is based on the concept of multivalued dependencies (MVD).
A Multivalued dependency occurs when in a relational table containing at least three
columns, one column has multiple rows whose values match a value of a single row of
one of the other columns. A more formal definition given by Date is:
is true if and only if the set of B-values matching a given pair of A-values
and C-values in R depends only on the A-value and is independent of the
C-value.
MVD always occur in pairs. That is R.A —>> R.B holds if and only if
R.A —>> R.C also holds.
84
Basic Concepts in DBMS
Suppose that employees can be assigned to multiple projects. Also suppose that
employees can have multiple job skills. If we record this information in a single table, all
three attributes must be used as the key since no single attribute can uniquely identify an
instance.
The relationship between emp# and prj# is a multivalued dependency because for each
pair of emp#/skill values in the table, the associated set of prj# values is determined only
by emp# and is independent of skill. The relationship between emp# and skill is also a
multivalued dependency, since the set of Skill values for an emp#/prj# pair is always
dependent upon emp# only.
To transform a table with multivalued dependencies into the 4NF move each MVD pair
to a new table. The result is shown in Figure1.
While the first four normal forms are based on the concept of functional dependence, the
fifth normal form is based on the concept of join dependence. Join dependency means
that an table, after it has been decomposed into three or more smaller tables, must be
capable of being joined again on common keys to form the original table. Stated another
way, 5NF indicates when an entity cannot be further decomposed. 5NF is complex and
not intuitive. Most experts agree that tables that are in the 4NF are also in 5NF except for
"pathological" cases. Teorey suggests that true many-to-many-to-many ternary relations
are one such case.
Adding an instance to an table that is not in 5NF creates spurious results when the tables
are decomposed and then rejoined. For example, let's suppose that we have an employee
who uses design skills on one project and programming skills on another. This
information is shown below.
85
Basic Concepts in DBMS
1211 11 Design
1211 28 Program
Next we add an employee (1544) who uses programming skills on Project 11.
1211 11 Design
1211 28 Program
1544 11 Program
Next, we project this information into three tables as we did above. However, when we
rejoin the tables, the recombined table contains spurious results.
emp# pr j# skill
1211 11 Design
1211 11 Program <<—spurious data
1211 28 Program
1544 11 Design <<—spurious data
1544 11 Program
By adding one new instance to a table not in 5NF, two false assertions were stated:
Assertion 1
Assertion 2
86
Basic Concepts in DBMS
Constraint: An rule governing static values of an attribute such that we can determine if
this constraint is True or False. Examples:
i. Functional Dependencies
ii. Multivalued Dependencies
iii. Inter-relation rules
iv. Intra-relation rules
Domain: The physical (data type, size, NULL values) and semantic (logical) description
of what values an attribute can hold.
3.10 De-Normalization
This relation is not in DK/NF because it contains a functional dependency not implied by
the key.
We can normalize this into DK/NF by splitting the CUSTOMER relation into two:
CUSTOMER (CustomerID, Name, Address, Zip)
CODES (Zip, City, State)
We may pay a performance penalty - each customer address lookup requires we look in
two relations (tables).
3.11 Example
This is an example that runs through all of the normal forms from beginning to end using
the same tables.
87
Basic Concepts in DBMS
Example relation:
EMPLOYEE ( Name, Project, Task, Office, Phone )
Example Data:
Is EMPLOYEE in 1NF?
88
Basic Concepts in DBMS
Split EMPLOYEE_OFFICE_PHONE.
89
Basic Concepts in DBMS
Bili 200Y T2
Sule 100X T33
Sule 200Y T33
Sule 300Z T33
Edo 100X T2
Office Phone
400 1400
442 1442
588 1588
Split EMPLOYEE_PROJECT_TASK.
Name Project
90
Basic Concepts in DBMS
Bili 100X
Bili 200Y
Sule 100X
Sule 200Y
Sule 300Z
Edo 100X
Name Task
Bili T1
Bili T2
Sule T33
Edo T2
R4 (Office, Phone)
Office Phone
400 1400
442 1442
588 1588
91
Basic Concepts in DBMS
4. Starting with 1NF, go through each normal form and state why the relation is in
the given normal form.
Activity B
1. Explain the following terms:
i. Boyce-Codd Normal Form
ii. Fourth Normal Form
iii. Fifth Normal Form
4.0 Conclusion
Once our E-R model has been converted into relations, we may find that some relations
are not properly specified. There can be a number of problems:
Typical way to solve these anomalies is to split the relation in to two or more relations -
Process called Normalization.
5.0 Summary
In this unit, we have learnt:
xv. Data Redundancy is a condition that exists when a data environment contains
unnecessarily duplicated data.
xvi. Normalization is a set of techniques for organizing data into tables in such a way
to eliminate certain type of redundancy and incompleteness, and associated
complexity and/or anomalies when updating it.
xvii. Normalization is a set of techniques for organizing data into tables in such a way
to eliminate certain type of redundancy and incompleteness, and associated
complexity and/or anomalies when updating it.
xviii.
xix. The designer starts with a single file and divides it into tables based on
dependencies among the data item.
xx. Normalization relies on correct identification of determinants and keys.
xxi. A table is in 3NF if every determinant of a non key item is a candidate key.
xxii. In practice, normalization is used primarily as a check on the correctness of a
model developed using a top-down approach.
92
Basic Concepts in DBMS
Question 1
A company obtains parts from a number of suppliers. Each supplier is located in one city.
A city can have more than one supplier located there and each city has a status code
associated with it. Each supplier may provide many parts. The company creates a simple
relational table to store this information that can be expressed in relational notation as:
where
In order to uniquely associate quantity supplied (qty) with part (p#) and supplier (s#), a
composite primary key composed of s# and p# is used.
FIRST
s# status city p# qty
s1 10 Lagos p1 300
s1 10 Lagos p2 200
s1 10 Lagos p3 400
s1 10 Lagos p4 200
s1 10 Lagos p5 100
s1 10 Lagos p6 100
s2 15 Jos p1 300
s2 15 Jos p2 400
s3 15 Jos p2 200
s4 10 Lagos p2 200
s4 10 Lagos p4 300
s4 10 Lagos p5 400
93
Basic Concepts in DBMS
d. Update anomalies
e. Second Normal Form
f. Third Normal Form
David M. Kroenke, David J. Auer (2008). Database Concepts. New Jersey . Prentice
Hall
Elmasri Navathe (2003). Fundamentals of Database Systems. England. Addison
Wesley.
Fred R. McFadden, Jeffrey A. Hoffer (1994). Modern Database management. England.
Addison Wesley Longman
Graeme C. Simsion, Graham C. Witt (2004). Data Modeling Essentials. San Francisco.
Morgan Kaufmann
Pratt Adamski, Philip J. Pratt (2007). Concepts of Database Management. United
States. Course Technology.
94
Basic Concepts in DBMS
Page
1.0 Introduction 83
2.0 Objectives 83
3.0 Relational Algebra 83
3.1 Set Theoretic Operations 83
3.1.1 Union 84
3.1.2 Difference 84
3.1.3 Intersection 85
3.2 Union Compatible Relations 85
3.3 Cartesian Product 85
3.4 Selection and Projection Operations 87
3.4.1 Selection Operators 87
3.4.2 Selection Examples 88
3.4.3 Projection Operator 90
3.4.4 Projection Examples 90
3.4.5 Combining Selection and Projection 90
3.5 Aggregate Functions 91
3.6 Join Operations 93
3.6.1 Join Examples 93
4.0 Conclusion 94
5.0 Summary 95
6.0 Tutor Marked Assignment 95
7.0 Further Reading and other Resources 95
95
Basic Concepts in DBMS
1.0 Introduction
Relational tables are sets. The rows of the tables can be considered as elements of the set.
Operations that can be performed on sets can be done on relational tables. The relational
operations are the main focus of this unit:
2.0 Objectives
By the end of this unit, you should be able to:
a. The Relational Model consists of the elements: relations, which are made up of
attributes.
b. A relation is a set of attributes with values for each attribute.
c. Relational Algebra is a collection of operations on Relations.
d. Relations are operands and the result of an operation is another relation.
e. Two main collections of relational operators:
In this section, we shall consider the following set operations: Union, Intersection, and
Difference.
A
Firstname Suurname Score
John Ayodeji 75
96
Basic Concepts in DBMS
Sheu Abdul 76
Taiwo Lawrence 77
Dan Musa 78
B
Firstname Surname Score
Wale Osa 65
Chidi Brown 66
Dan Musa 78
3.1.1 Union: A B
The union operation of two relational tables is formed by appending rows from one table
to those of a second table to produce a third. Duplicate rows are eliminated.
A B
3.1.2 Difference: A - B
The difference of two relational tables is a third that contains those rows that occur in the
first table but not in the second. The Difference operation requires that the tables be union
compatible. As with arithmetic, the order of subtraction matters. That is, A - B is not the
same as B - A.
97
Basic Concepts in DBMS
A-B
3.1.3 Intersection: A B
The intersection of two relational tables is a third table that contains common rows. Both
tables must be union compatible.
A B
98
Basic Concepts in DBMS
Activity A
• Assume relation C
• Compute A C
Compute A C
Show that A - C is not equal to C - A
The Cartesian product of two relational tables is the concatenation of every row in one
table with every row in the second. The product of table A (having m rows) and table B
(having n rows) is the table C (having m x n rows). The product is denoted as A X B or A
TIMES B.
Lunch Drink
Pounded Yam Malt
Bean Cake Beer
99
Basic Concepts in DBMS
AXB
The project operator retrieves a subset of columns from a table, removing duplicate rows
from the result while the select operator, sometimes called restrict to prevent confusion
with the SQL SELECT command, retrieves subsets of rows from a relational table based
on a value(s) in a column or columns.
100
Basic Concepts in DBMS
T F T F
T F
T T F T T T
F T
F F F F T F
Result:
b. Select only those Staff with last name Ajayi who are professors:
Result:
101
Basic Concepts in DBMS
c. Select only those Staff who are either Professors or in the Economics department:
Designation = 'Professor' Department = 'ECO' (STAFF)
Result:
d. Select only those Employees who are not in the CIT department or Lecturer I:
(Designation = 'Lecturer I' Department = 'CIT') (STAFF)
Result:
Activity B
102
Basic Concepts in DBMS
b. Do expressions ii, iii and iv above all evaluate to the same thing?
Results:
Name Department
Ajayi CIT
Chidi ECO
Musa ECO
Bello CIT
Ajayi ACC
The selection and projection operators can be combined to perform both operations.
Results:
103
Basic Concepts in DBMS
Name
Ajayi
Bello
b. Show the name and designation of those Employees who are not in the CIT
department or Lecturer I:
name, Designation ( (Designation = 'Lecturer I' Department = 'CIT') (STAFF) )
Result:
Name Designation
Musa Professor
Activity C
Aggregate functions are sometimes written using the Projection operator or the Script F
character: as in Elmasri/Navathe book. In this section, we shall consider the following
aggregate functions:
o SUM
o MINIMUM
o MAXIMUM
o AVERAGE, MEAN, MEDIAN
o COUNT
104
Basic Concepts in DBMS
Let us assume that the table STAFF has the following records:
Results:
MIN(salary)
35000
Results:
AVG(salary)
51000
Results:
COUNT(name)
105
Basic Concepts in DBMS
d. Find the total payroll for the Economics department: SUM (salary) ( Department =
'ECO' (STAFF) )
Results:
SUM(salary)
85000
A join operation combines the product, selection, and, possibly, projection. The join
operator horizontally combines (concatenates) data from one row of a table with rows
from another or the same table when certain criteria are met. The criteria involve a
relationship among the columns in the join relational table. If the join criterion is based
on equality of column value, the result is called an equijoin. A natural join is an equijoin
with redundant columns removed. The following are the properties of join operation:
• Join operations bring together two tables and combine their columns and records
or ows in a specific fashion.
Let us assume we have the STAFF table from above and the following DEPARTMENT
table:
106
Basic Concepts in DBMS
Results:
b. Find all information on every employee including their department info where the
employee works in an office numbered less than the department main office:
STAFF (staff.room < department.mainoffice) (staff.department = department.dept)
DEPARTMENT
Results:
4.0 Conclusion
The relational algebra is a procedural query language with several fundamental
operations: select (unary), project (unary), rename (unary), cartesian product (binary),
union (binary), set-difference (binary), set-intersection, natural join, division, assignment.
Operations produce a new relation as a result.
107
Basic Concepts in DBMS
5.0 Summary
xxiii. The union operation of two relational tables is formed by appending rows from
one table to those of a second table to produce a third
xxiv. The difference of two relational tables is a third that contains those rows that
occur in the first table but not in the second
xxv. The intersection of two relational tables is a third table that contains common
rows.
xxvi. The product of two relational tables, also called the Cartesian product, is the
concatenation of every row in one table with every row in the second.
xxvii. The project operator retrieves a subset of columns from a table, removing
duplicate rows from the result.
xxviii. A join operation combines the product, selection, and, possibly, projection.
i. Union
ii. Intersection
iii. Difference
iv. Cartesian product.
v. Selection
vi. Projection
vii. Join
viii. Division
108
Basic Concepts in DBMS
Page
1.0 Introduction 97
2.0 Objectives 97
3.0 What can SQL do? 97
3.1 Database Tables 97
3.2 SQL Data Type 98
3.2.1 Numeric Data Type 98
3.2.2 Character Strings 98
3.2.3 Date and Time 98
3.2.4 Microsoft Access Data Types 99
3.2.5 MySQL Data Types 99
3.2.6 SQL Server Data Types 102
3.3 Data Definition Language 104
3.3.1 Create Database Statement 104
3.3.2 Create Table Statement 104
3.3.3 SQL Constraints 105
3.3.4 Not Null Constraint 105
3.3.5 SQL Unique Constraint 106
3.3.6 Primary Key Constraint 107
3.3.7 SQL Foreign Key Constraint 107
3.3.8 Use Command 108
3.3.9 Alter 109
3.3.10 Drop 109
3.4 Data Manipulation Language 109
3.4.1 The SQL Select Statement 110
3.4.2 The SQL SELECT DISTINCT Statement 110
3.4.3 The WHERE Clause 110
3.4.4 The AND & OR Operators 111
3.4.5 The ORDER BY Keyword 111
3.4.6 The INSERT INTO Statement 111
3.4.7 The UPDATE Statement 112
3.4.8 The DELETE Statement 112
3.4.9 The LIKE Operator 113
3.4.10 The IN Operator 113
3.4.11 The BETWEEN Operator 113
3.4.12 SQL JOIN 114
4.0 Conclusion 118
5.0 Summary 118
109
Basic Concepts in DBMS
SQL was developed at IBM by Andrew Richardson, Donald C. Messerly and Raymond
F. Boyce in the early 1970s. This version, initially called SEQUEL, was designed to
manipulate and retrieve data stored in IBM's original relational database product, System
R.
a. Data Definition Language (DDL) Used to create (define) data structures such as
tables, indexes, clusters
b. Data Manipulation Language (DML) is used to store, retrieve and update data
from tables.
2.0 Objectives
A database most often contains one or more tables. Each table is identified by a name
(e.g. "Customers" or "Orders"). Tables contain records (rows) with data.
110
Basic Concepts in DBMS
The table above contains three records (one for each person) and five columns (P_Id,
LastName, FirstName, Address, and City).
Each implementation of SQL uses slightly different names for the data types.
111
Basic Concepts in DBMS
112
Basic Concepts in DBMS
MySQL is another powerful RDBMS in use today. In MySQL there are three main data
types: text, number, and Date/Time types (see figure 7.3 for more detail).
Text types:
Note: The values are sorted in the order you enter them.
Number types:
113
Basic Concepts in DBMS
*The integer types have an extra option called UNSIGNED. Normally, the integer goes
from an negative to positive value. Adding the UNSIGNED attribute will move that
range up so it starts at zero instead of a negative number.
Date types:
114
Basic Concepts in DBMS
*Even if DATETIME and TIMESTAMP return the same format, they work very
differently. In an INSERT or UPDATE query, the TIMESTAMP automatically set itself
to the current date and time. TIMESTAMP also accepts various formats, like
YYYYMMDDHHMMSS, YYMMDDHHMMSS, YYYYMMDD, or YYMMDD.
Table 7.4 lists some of the available Data Types in Microsoft SQL Sever.
Character strings:
Unicode strings:
Binary types:
115
Basic Concepts in DBMS
Number types:
Date types:
116
Basic Concepts in DBMS
Example: Let us create a database called "my_db". We use the following CREATE
DATABASE statement:
This statement creates an empty database named "my_db" on your DBMS. After creating
the database, your next step is to create tables that will contain data
117
Basic Concepts in DBMS
The CREATE TABLE statement is used to create a table in a database. SQL CREATE
TABLE Syntax is:
The data type specifies what type of data the column can hold. See tables 7.2 to 7.4 for a
complete reference of all the data types available in MS Access, MySQL, and SQL
Server.
Example: Let us create a table called "Persons" that contains five columns: P_Id,
LastName, FirstName, Address, and City. We use the following CREATE TABLE
statement:
The P_Id column is of type int and will hold a number. The LastName, FirstName,
Address, and City columns are of type varchar with a maximum length of 255 characters.
Constraints are used to limit the type of data that can go into a table. Constraints can be
specified when a table is created (with the CREATE TABLE statement) or after the table
is created (with the ALTER TABLE statement).
a. NOT NULL
b. UNIQUE
c. PRIMARY KEY
d. FOREIGN KEY
118
Basic Concepts in DBMS
The NOT NULL constraint enforces a column to NOT accept NULL values.
The NOT NULL constraint enforces a field to always contain a value. This means that
you cannot insert a new record, or update a record without adding a value to this field.
The following SQL enforces the "P_Id" column and the "LastName" column to not
accept NULL values:
The UNIQUE and PRIMARY KEY constraints both provide a guarantee for uniqueness
for a column or set of columns.
Note that you can have many UNIQUE constraints per table, but only one PRIMARY
KEY constraint per table.
Example1: The following SQL creates a UNIQUE constraint on the "P_Id" column when
the "Persons" table is created:
Example 2: To create a UNIQUE constraint on the "P_Id" column when the table is
already created, use the following SQL:
119
Basic Concepts in DBMS
The PRIMARY KEY constraint uniquely identifies each record in a database table.
Primary keys must contain unique values. A primary key column cannot contain NULL
values.
Each table should have a primary key, and each table can have only one primary key.
Example 1:The following SQL creates a PRIMARY KEY on the "P_Id" column when
the "Persons" table is created:
Example 2: To create a PRIMARY KEY constraint on the "P_Id" column when the table
is already created, use the following SQL:
Let us illustrate the foreign key with an example. Look at table 7.1 above and table 7.5:
120
Basic Concepts in DBMS
Note that the "P_Id" column in the "Orders" table points to the "P_Id" column in the
"Persons" table.
The "P_Id" column in the "Persons" table is the PRIMARY KEY in the "Persons" table.
The "P_Id" column in the "Orders" table is a FOREIGN KEY in the "Orders" table.
The FOREIGN KEY constraint is used to prevent actions that would destroy link
between tables.
The FOREIGN KEY constraint also prevents that invalid data is inserted into the foreign
key column, because it has to be one of the values contained in the table it points to.
Example 1:The following SQL creates a FOREIGN KEY on the "P_Id" column when the
"Orders" table is created:
Example 2: To create a FOREIGN KEY constraint on the "P_Id" column when the
"Orders" table is already created, use the following SQL:
3.3.7 USE
The USE command allows you to specify the database you wish to work with within your
121
Basic Concepts in DBMS
DBMS.
USE employees
3.3.8 ALTER
Once you have created a table within a database, you may wish to modify the definition
of it. The ALTER command allows you to make changes to the structure of a table
without deleting and recreating it. Take a look at the following command:
This example adds a new attribute to the personal_info table -- an employee's salary. The
"money" argument specifies that an employee's salary will be stored using a dollars and
cents format. Finally, the "null" keyword tells the database that it's OK for this field to
contain no value for any given employee.
3.3.9 DROP
The final command of the Data Definition Language, DROP, allows us to remove entire
database objects from our DBMS. For example, if we want to permanently remove the
personal_info table that we created, we'd use the following command:
Similarly, the command below would be used to remove the entire employees database:
Use this command with care! Remember that the DROP command removes entire data
structures from your database. If you want to remove individual records, use the
DELETE command of the Data Manipulation Language.
Activity A
1a. what do you understand by Data Definition Language? Then list some of the
available DDL commands
b. Write briefly on the following commands:
i. Create
ii. Use
iii. Alter
iv. Drop
122
Basic Concepts in DBMS
Data Manipulation Language (DML) is used to manipulate (select, insert, update, delete)
data.
The SELECT statement is used to select data from a database. The result is stored in a
result table, called the result-set. The SQL SELECT syntax:
In a table, some of the columns may contain duplicate values. This is not a problem;
however, sometimes you will want to list only the different (distinct) values in a table.
The DISTINCT keyword can be used to return only distinct (different) values. The syntax
is
The WHERE clause is used to extract only those records that fulfill a specified criterion.
The syntax is:
SELECT column_name(s)
FROM table_name
WHERE column_name operator value
Example:
Note: SQL uses single quotes around text values (most database systems will also accept
double quotes). Although, numeric values should not be enclosed in quotes.
Operator Description
= Equal
<> Not equal
> Greater than
< Less than
123
Basic Concepts in DBMS
The AND operator displays a record if both the first condition and the second condition is
true.
Example
This will select only the persons with the first name equal to "Tove" AND the Last name
equal to "Syendson":
The OR operator displays a record if either the first condition or the second condition is
true.
Example
This will select only the persons with the first name equal to "Tove" OR the first name
equal to "Ola":
The ORDER BY keyword is used to sort the result-set by a specified column. It sorts the
records in ascending order by default. If you want to sort the records in a descending
order, you can use the DESC keyword. The syntax is:
The INSERT INTO statement is used to insert a new row in a table. The syntax is:
124
Basic Concepts in DBMS
The UPDATE statement is used to update existing records in a table. The SQL UPDATE
syntax is:
UPDATE table_name
SET column1=value, column2=value2,...
WHERE some_column=some_value
Note: Notice the WHERE clause in the UPDATE syntax. The WHERE clause specifies
which record or records that should be updated. If you omit the WHERE clause, all
records will be updated!
The DELETE statement is used to delete rows in a table. The SQL DELETE Syntax is:
Note: Notice the WHERE clause in the DELETE syntax. The WHERE clause specifies
which record or records that should be deleted. If you omit the WHERE clause, all
records will be deleted!
Activity B
Use the customer table shown in table 7.6 to answer the following SQL statements,
displaying the resulting record sets
125
Basic Concepts in DBMS
The LIKE operator is used to search for a specified pattern in a column. The SQL LIKE
Syntax is:
SELECT column_name(s)
FROM table_name
WHERE column_name LIKE pattern
Example: If we want to select the persons living in a city that starts with "s" from the
table 7.6; We use the following SELECT statement:
The "%" sign can be used to define wildcards (missing letters in the pattern) both before
and after the pattern.
The IN operator allows you to specify multiple values in a WHERE clause. The syntax is:
SELECT column_name(s)
FROM table_name
WHERE column_name IN (value1,value2,...)
Example: if we want to select the persons with a last name equal to "Hansen" or
"Pettersen" from the Persons table; We use the following SELECT statement:
SELECT * FROM Persons
WHERE LastName IN ('Hansen','Pettersen')
The BETWEEN operator selects a range of data between two values. The values can be
numbers, text, or dates. The SQL BETWEEN Syntax is:
126
Basic Concepts in DBMS
SELECT column_name(s)
FROM table_name
WHERE column_name
BETWEEN value1 AND value2
Example: If we want to select the persons with a last name alphabetically between
"Hansen" and "Pettersen" from the persons table. We use the following SELECT
statement:
The JOIN keyword is used in an SQL statement to query data from two or more tables,
based on a relationship between certain columns in these tables.
A primary key is a column (or a combination of columns) with a unique value for each
row. Each primary key value must be unique within the table. The purpose is to bind data
together, across tables, without repeating all of the data in every table.
Persons table
Note that the "P_Id" column is the primary key in the "Persons" table. This means that no
two rows can have the same P_Id. The P_Id distinguishes two persons even if they have
the same name.
127
Basic Concepts in DBMS
4 24562 1
5 34764 15
Note that the "O_Id" column is the primary key in the "Orders" table and that the "P_Id"
column refers to the persons in the "Persons" table without using their names.
Notice that the relationship between the two tables above is the "P_Id" column.
• JOIN: Return rows when there is at least one match in both tables
• LEFT JOIN: Return all rows from the left table, even if there are no matches in
the right table
• RIGHT JOIN: Return all rows from the right table, even if there are no matches
in the left table
The INNER JOIN keyword return rows when there is at least one match in both tables.
The SQL INNER JOIN Syntax is:
SELECT column_name(s)
FROM table_name1
INNER JOIN table_name2
ON table_name1.column_name=table_name2.column_name
Example:
Using tables 7.1 and 7.7 above; if we want to list all the persons with any orders. We use
the following SELECT statement:
128
Basic Concepts in DBMS
The LEFT JOIN keyword returns all rows from the left table (table_name1), even if there
are no matches in the right table (table_name2). The SQL LEFT JOIN Syntax is:
SELECT column_name(s)
FROM table_name1
LEFT JOIN table_name2
ON table_name1.column_name=table_name2.column_name
Example: If we ant to list all the persons and their orders - if any, from the tables 7.3 and
7.4 above. We use the following SELECT statement:
The LEFT JOIN keyword returns all the rows from the left table (Persons), even if there
are no matches in the right table (Orders).
The RIGHT JOIN keyword Return all rows from the right table (table_name2), even if
there are no matches in the left table (table_name1). SQL RIGHT JOIN Syntax is:
SELECT column_name(s)
FROM table_name1
RIGHT JOIN table_name2
ON table_name1.column_name=table_name2.column_name
Example: Let us list all the orders with containing persons - if any, from the tables 7.3
and 7.4 above. We use the following SELECT statement:
129
Basic Concepts in DBMS
The RIGHT JOIN keyword returns all the rows from the right table (Orders), even if
there are no matches in the left table (Persons).
The UNION operator is used to combine the result-set of two or more SELECT
statements.
Notice that each SELECT statement within the UNION must have the same number of
columns. The columns must also have similar data types. Also, the columns in each
SELECT statement must be in the same order. SQL UNION Syntax:
(a) Employees_Norway:
E_ID E_Name
01 Hansen, Ola
02 Svendson, Tove
03 Svendson, Stephen
04 Pettersen, Kari
(b) Employees_USA:
130
Basic Concepts in DBMS
E_ID E_Name
01 Turner, Sally
02 Kent, Clark
03 Svendson, Stephen
04 Scott, Stephen
E_Name
Hansen, Ola
Svendson, Tove
Svendson, Stephen
Pettersen, Kari
Turner, Sally
Kent, Clark
Scott, Stephen
If we want to list all the different employees in Norway and USA; we use the following
SELECT statement:
Activity C
1. Using tables 7.1 (Person’s table) and 7.7 (Order’s table), Demonstrate how to
execute the following commands:
i. Join
ii. Left Join
iii. Right Join
iv. Full Join
4.0 Conclusion
131
Basic Concepts in DBMS
5.0 Summary
132
Basic Concepts in DBMS
Employees Table
Department Table
David M. Kroenke, David J. Auer (2008). Database Concepts. New Jersey . Prentice
Hall
Elmasri Navathe (2003). Fundamentals of Database Systems. England. Addison
Wesley.
Fred R. McFadden, Jeffrey A. Hoffer (1994). Modern Database management. England.
Addison Wesley Longman
Graeme C. Simsion, Graham C. Witt (2004). Data Modeling Essentials. San Francisco.
Morgan Kaufmann
Pratt Adamski, Philip J. Pratt (2007). Concepts of Database Management. United
States. Course Technology.
133
Basic Concepts in DBMS
Page
1.0 Introduction 122
2.0 Objectives 122
3.1 SQL Aggregate Functions: 122
3.1.1 AVG Function 122
3.1.2 COUNT Function 123
3.1.3 FIRST Function 125
3.1.4 LAST Function 125
3.1.5 MAX Function 126
3.1.6 MIN Function 126
3.1.7 SUM Function 127
3.1.8 GROUP BY Statement 127
3.1.9 HAVING Clause 128
3.2 SQL Scalar functions 129
3.2.1 UCASE Function 130
3.2.2 LCASE Function 130
3.2.3 MID Function 131
3.2.4 LEN Function 131
3.2.5 ROUND Function 132
3.2.6 NOW Function 132
3.2.7 FORMAT Function 133
4.0 Conclusion 134
5.0 Summary 134
6.0 Tutor Marked Assignment 134
7.0 Further Reading and other Resources 135
134
Basic Concepts in DBMS
1.0 Introduction
A function is a special type of command word in the SQL command set. In effect,
functions are one-word commands that return a single value. The value of a function can
be determined by input parameters, as with a function that averages a list of database
values. But many functions do not use any type of input parameter, such as the function
that returns the current system time, CURRENT_TIME.
The SQL supports a number of useful functions. This unit covers those functions,
providing detailed descriptions and examples.
2.0 Objectives
By the end of this unit, you should be able to:
o. Perform arithmetic operations such as: finding average of column, finding sum of
a column, finding the number of records in a table; finding the minimum and
maximum values in a column.
p. Convert a field to upper or lower case
q. Extract characters from a text field
r. Format how a column or field should be displayed.
The AVG function returns the average value of a numeric column. AVG Syntax is:
135
Basic Concepts in DBMS
AveragePrice
950
We may decide to find the customers that have order Price value higher than the average
Price value.
Customername
Henry Bank
Niyi Alade
James Adeola
136
Basic Concepts in DBMS
The COUNT(column_name) function returns the number of values (NULL values will
not be counted) of the specified column:
Now we want to count the number of orders from "Customer Niyi Alade".
The result of the SQL statement above will be 2, because the customer Niyi Alade has
made 2 orders in total:
NiyiAlade
2
Example: Let us consider our order table again. Now we want to find the number of
records in the order table.
NumberOfOrders
6
137
Basic Concepts in DBMS
Example: Now we want to count the number of unique customers in the "Orders" table.
TotalCustomers
3
Example: We will still make use of our orders table in section 3.11
FirstOrderPrice
1000
138
Basic Concepts in DBMS
LastOrderPrice
100
This time around we want to find the largest value of the Price column.
LargestPrice
2000
Example from our Orders table: let us find the smallest value of the Price column.
139
Basic Concepts in DBMS
SmallestPrice
100
The SUM function is used to calculate the total for a column. The syntax is,
Example from Orders table: we want to find the sum of all Price field.
OrderTotal
5700
Now we want to find the total sum (total order) of each customer.
140
Basic Concepts in DBMS
Customername SUM(Price)
Henry Bank 2000
Niyi Alade 1700
James Adeola 2000
Customername SUM(Price)
Henry Bank 5700
Niyi Alade 5700
Henry Bank 5700
Henry Bank 5700
James Adeola 5700
Niyi Alade 5700
Example: Now we want to find if any of the customers have a total order of less than
2000.
Customername SUM(Price)
141
Basic Concepts in DBMS
Now we want to find if the customers "Henry Bank" or "James Adeola" have a total order
of more than 1500.
Customername SUM(Price)
Henry Bank 2000
James Adeola 2000
Activity A
1. Write out the SQL syntax for the following functions
i. AVG()
ii. COUNT()
iii. FIRST()
iv. LAST()
v. MAX()
vi. MIN()
vii. SUM()
142
Basic Concepts in DBMS
We shall make use of the following Persons table throughout this section
Example: We have a Persons table in section 3.2, now we want to select the content of
the Surname and FirstName columns, and convert the Surname column to uppercase.
Surname FirstName
HENRY Bank
EBUKA Tunji
PETER Kasim
The LCASE() function converts the value of a column to lowercase. The syntax is:
Example: Let us select the content of the Surname and FirstName columns from our
Persons table, and convert the Surname column to lowercase.
Surname FirstName
143
Basic Concepts in DBMS
henry Bank
ebuka Tunji
peter Kasim
Parameters: Description
Example: Let us extract the first four characters of the "City" column from Persons table.
City
Lago
Abuj
Kadu
Example: Let us select the length of the values in the Address column of Persons table.
144
Basic Concepts in DBMS
LengthOfAddress
15
14
14
Parameter Description
column_name Required. The field to round.
Decimals Required. Specifies the number of decimals to be returned.
Now we want to display the product name and the price rounded to the nearest integer.
ProductName UnitPrice
Sugar 10
Salt 33
Palm Oil 16
145
Basic Concepts in DBMS
Example: Let us consider the product table again. Now we want to display the products
and prices per today’s date.
The FORMAT function is used to format how a field is to be displayed. The syntax is:
Parameter Description
column_name Required. The field to be formatted.
Format Required. Specifies the format.
Example: Let us make use of the products table here. Now we want to display the
products and prices per today's date (with today's date displayed in the following format
"YYYY-MM-DD").
146
Basic Concepts in DBMS
Activity B
i. UCASE
ii. LEN
iii. MID
iv. LCASE
v. ROUND
vi. NOW
vii. FORMAT
4.0 Conclusion
SQL has many built-in functions for performing calculations on data. These functions
were categorized into: SQL Aggregate functions and SQL Scalar functions. The
aggregate functions operate against a collection of values, but return a single,
summarizing value. Scalar functions Operate against a single value, and return a single
value based on the input value. Some scalar functions, CURRENT_TIME for example, do
not require any arguments.
5.0 Summary
xl. The basics of retrieving data from the database using SQL.
xli. AVG function is to return the average value of a column in a database table.
xlii. COUNT function returns the number of rows in a database table.
xliii. FIRST function returns the first value in a database table.
xliv. LAST function returns the last value in a database table.
xlv. MAX function returns the largest value
xlvi. MIN function returns the smallest value
xlvii. SUM function returns the sum
xlviii. UCASE function converts a field to upper case
xlix. LCASE converts a field to lower case
l. MID function extract characters from a text field
li. LEN function returns the length of a text field
lii. ROUND function rounds a numeric field to the number of decimals specified
liii. NOW function returns the current system date and time
liv. FORMAT function formats how a field is to be displayed
147
Basic Concepts in DBMS
Employees Table
148
Basic Concepts in DBMS
Page
1.0 Introduction 137
2.0 Objectives 137
3.0 Multi Users Databases 137
3.1 What is Transaction? 138
3.1.1 Transaction Examples 138
3.1.2 Multi-Statement Transactions 140
3.2 Transaction Management with SQL 140
3.2.1 Rolling Back 140
3.2.2 Transaction Log 141
3.2.3 Stored Procedures 142
3.3 Concurrency Control and Locking 144
3.3.1 The Scheduler 144
3.3.2 Characteristics of Locks 144
3.3.3 Two Phase Locking (2PL) 145
3.3.4 Deadlocks 146
3.3.5 How to Prevent Deadlock 147
3.4 Database Recovery and Management 147
3.4.1 Reprocessing 147
3.4.2 Automated Recovery with Rollback / Rollforward 148
3.5 Database Backup 149
4.0 Conclusion 149
5.0 Summary 149
6.0 Tutor Marked Assignment 150
7.0 Further Reading and other Resources 150
149
Basic Concepts in DBMS
1.0 Introduction
In a database management system (DBMS), a transaction consists of one or more data-
manipulation statements and queries, each of which is reading and/or writing information
into the database. A transaction is usually issued to the DBMS in SQL language wrapped
in a transaction, using a pattern similar to the following:
If there is no error during the execution of the transaction then the system commits the
transaction. A transaction commit operation applies all data manipulations within the
scope of the transaction and store the results to the database. If an error occurs during the
transaction, or if the user specifies a rollback operation, the data manipulations within the
transaction are not persisted to the database. In no case can a partial transaction be
committed to the database since that would leave the database in an inconsistent state.
In this unit, we shall make use of use the pubs database found in Microsoft SQL Server
DBMS if the need arises.
2.0 Objectives
By the end of this unit, you should be able to know:
• How can we prevent users from interfering with each other's work?
• How can we safely process transactions on the database without
corrupting or losing data?
• If there is a problem (e.g., power failure or system crash), how can we
recover without loosing all of our data?
150
Basic Concepts in DBMS
Transaction processing systems are systems with large databases and hundreds of
concurrent users.
a. Atomic means that all the work in the transaction is treated as a single unit. Either
it is all performed or none of it is.
b. Consistent means that a completed transaction leaves the database in a consistent
internal state.
c. Isolations mean that the transaction sees the database in a consistent state. This
transaction operates on a consistent view of the data. If two transactions try to
update the same table, one will go first and then the other will follow.
d. Durability means that the results of the transaction are permanently stored in the
system.
The simplest transaction in SQL Server is a single data modification statement. The
following is a transaction even though it does not do much.
UPDATE authors
SET au_fname = 'John'
WHERE au_id = '172-32-1176'
SQL Server first writes to the log file what it is going to do. Then it does the actual
update statement and finally it writes to the log that it completed the update statement.
The writes to the log file are written directly to disk but the update itself is probably done
to a copy of the data that resides in memory. At some future point that database will be
written to disk. If the server fails after a transaction has been committed and written to
the log, SQL Server will use the transaction log to "roll forward" that transaction when it
starts up next.
Example #1:
151
Basic Concepts in DBMS
User A User B
Read Salary for emp 101 Read Salary for emp 101
Multiply salary by 1.03 Multiply salary by 1.04
Write Salary for emp 101 Write Salary for emp 101
Example #2:
User A User B
Read inventory for Prod 200 Read inventory for Prod 200
Decrement inventory by 5 Decrement inventory by 7
Write inventory for Prod 200 Write inventory for Prod 200
First, what should the values for salary (in the first example) really be?
The DBMS must find a way to execute these two transactions concurrently and
ensure the result is what the users (and designers) intended.
These two are examples of the Lost Update or Concurrent Update problem. Some
changes to the database can be overwritten.
Consider how the operations for user's A and B might be interleaved as in example
#2. Assume there are 10 units in inventory for Prod 200:
In the first case, the incorrect amount is written to the database. This is called the
Lost Update problem because we lost the update from User A - it was overwritten by
user B.
152
Basic Concepts in DBMS
The second example works because we let user A write the new value of Prod 200
before user B can read it. Thus User B's decrement operation will fail.
Here is another example. User's A and B share a bank account. Assume an initial
balance of $200.
The reason we get the wrong final result (remaining balance of $100) is because
transaction B was allowed to read stale data. This is called the inconsistent read
problem.
If we insist only one transaction can execute at a time, in serial order, then
performance will be quite poor.
BEGIN TRAN
UPDATE authors
SET au_fname = 'John'
WHERE au_id = '172-32-1176'
UPDATE authors
SET au_fname = 'Marg'
WHERE au_id = '213-46-8915'
COMMIT TRAN
153
Basic Concepts in DBMS
Note that we have a BEGIN TRAN at the beginning and a COMMIT TRAN at the end.
These statements start and complete a transaction. Everything inside these statements is
considered a logical unit of work. If the system fails after the first update, neither update
statement will be applied when SQL Server is restarted. The log file will contain a
BEGIN TRAN but no corresponding COMMIT TRAN.
BEGIN TRAN
UPDATE authors
SET au_fname = 'John'
WHERE au_id = '172-32-1176'
UPDATE authors
SET au_fname = 'JohnY'
WHERE city = 'Lawrence'
IF @@ROWCOUNT = 5
COMMIT TRAN
ELSE
ROLLBACK TRAN
Suppose that for whatever reason, the second update statement should update exactly five
rows. If @@ROWCOUNT, which hold the number of rows affected by each statement,
is five then the transaction commits otherwise it rolls back. The ROLLBACK TRAN
statement undoes all the work since the matching BEGIN TRAN statement. It will not
perform either update statement. Note that Query Analyzer will show you messages
indicating that rows were updated but you can query the database to verify that no actual
data modifications took place.
154
Basic Concepts in DBMS
Typical uses for stored procedures include data validation (integrated into the database)
or access control mechanisms. Furthermore, stored procedures are used to consolidate
and centralize logic that was originally implemented in applications. Large or complex
processing that might require the execution of several SQL statements is moved into
stored procedures, and all applications call the procedures only.
Hopefully most of transactions will occur in stored procedures. Let us look at the second
example inside a stored procedure.
VALUES ('172-32-1176',
'Gates',
'Bill',
'800-BUY-MSFT',
1)
UPDATE authors
SET au_fname = 'Johnzzz'
WHERE au_id = '172-32-1176'
COMMIT TRAN
GO
155
Basic Concepts in DBMS
The problem with this stored procedure is that transactions do not care if the statements
run correctly or not. They only care if SQL Server failed in the middle. If you run this
stored procedure, it will try to insert a duplicate entry into the authors database. You will
get a primary key violation error message. The message will even tell you the statement
has been terminated. But the transaction is still going. The UPDATE statement runs just
fine and SQL Server then commits the transaction. The proper way to code this is:
IF @@ERROR <> 0
BEGIN
ROLLBACK TRAN
return 10
END
UPDATE authors
SET au_fname = 'Johnzzz'
WHERE au_id = '172-32-1176'
IF @@ERROR <> 0
BEGIN
ROLLBACK TRAN
return 11
END
COMMIT TRAN
GO
You will notice that we check each statement for failure. If the statement failed (i.e.
@@ERROR <> 0) then we rollback the work performed so far and use the RETURN
statement to exit the stored procedure. It is very important to note that if we do not check
for errors after each statement we may commit a transaction improperly.
Activity A
1. What is Transaction?
What are the properties of a transaction?
156
Basic Concepts in DBMS
We need the ability to control how transactions are run in a multiuser database. Let us
define some basic terms that are related to concurrency control:
Locks may be applied to data items in two ways: Implicit and Explicit.
Implicit Locks are applied by the DBMS, while Explicit Locks are applied by
application programs.
157
Basic Concepts in DBMS
Locks may be of the following types depending on the requirements of the transaction:
i. An Exclusive Lock prevents any other transaction from reading or modifying the
locked item.
ii. A Shared Lock allows another transaction to read an item but prevents another
transaction from writing the item.
i. A transaction acquires locks on data items it will need to complete the transaction.
This is called the growing phase.
ii. Once one lock is released, all no other lock may be acquired. This is called the
shrinking phase.
Let us consider our previous examples in section 3.11, this time using exclusive lock:
158
Basic Concepts in DBMS
159
Basic Concepts in DBMS
3.3.4 Deadlock
Deadlock refers to a specific condition when two or more processes are each waiting for
each other to release a resource, or more than two processes are waiting for resources in a
circular chain. Deadlock is a common problem in multiprocessing where many processes
share a specific type of mutually exclusive resource known as a software lock or soft lock.
Deadlock occurs when two transactions wait for each other to unlock data.
Client applications using the database may require exclusive access to a table, and in
order to gain exclusive access they ask for a lock. If one client application holds a lock on
a table and attempts to obtain the lock on a second table that is already held by a second
client application, this may lead to deadlock if the second application then attempts to
obtain the lock that is held by the first application. (But this particular type of deadlock is
easily prevented, e.g., by using an all-or-none resource allocation algorithm.)
This is also called a deadlock. One transaction has locked some of the resources and is
waiting for locks so it can complete. A second transaction has locked those needed items
but is awaiting the release of locks the first transaction is holding so it can continue.
160
Basic Concepts in DBMS
In any of these situations, data in the database may become inconsistent or lost.
Database Recovery is the process of restoring the database and the data to a consistent
state. This may include restoring lost data up to the point of the event (e.g. system crash).
3.4.1 Reprocessing
In a Reprocessing approach, the database is periodically backed up (a database save) and
all transactions applied since the last save are recorded
If the system crashes, the latest database save is restored and all of the transactions are re-
applied (by users) to bring the database back up to the point just before the crash.
• Before Image: A copy of the table record (or page) of data before it was changed
by the transaction.
• After Image: A copy of the table record (or page) of data after it was changed by
the transaction.
• Rollback: Undo any partially completed transactions (ones in progress when the
crash occurred) by applying the before images to the database.
• Rollforward: Redo the transactions by applying the after images to the database.
This is done for transactions that were committed before the crash.
• Recovery process uses both rollback and rollforward to restore the database.
• In the worst case, we would need to rollback to the last database save and then
rollforward to the point just before the crash.
• Checkpoints can also be taken (less time consuming) in between database saves.
• The DBMS flushes all pending transactions and writes all data to disk and
transaction log.
161
Basic Concepts in DBMS
• Database can be recovered from the last checkpoint in much less time.
One solution: Shut down the DBMS (and thus all applications), do a full backup - copy
everything on to tape. Then start up again.
• An Incremental backup will backup only those data changed or added since the
last full backup. Sometimes called a delta backup.
• Follows something like:
a. Weekend: Do a shutdown of the DBMS, and full backup of the database
onto a fresh tape(s).
b. Nightly: Do an incremental backup onto different tapes for each night of
the week.
Activity B
1. Explain the terms in relation to database transactions management:
i. Concurrency Control
ii. Transaction Throughput
iii. Serialization
iv. Implicit Locking
4.0 Conclusion
5.0 Summary
lv. Transactions are set of read and write operations that must either execute to
completion or not at all.
lvi. A transaction has four key properties that are abbreviated ACID. ACID is an
acronym for Atomic, Consistent, Isolated, and Durability.
lvii. Concurrency Control is a method for controlling or scheduling the operations in
such a way that concurrent transactions can be executed.
162
Basic Concepts in DBMS
i. Before Image
ii. After image
iii. Rollback
iv. Roll forward
v. Checkpoints
163
Basic Concepts in DBMS
Page
1.0 Introduction 152
2.0 Objectives 152
3.1 Database Security 152
3.1.1 Need for Database Security 152
3.1.2 Approaches to Database Security 152
3.1.3 Database Security Goals and Threats 153
3.1.4 Security Threat Classification 153
3.1.5 Classification of Database Security 154
3.1.6 Database Security at Design Level 154
3.1.7 Database Security at Maintenance Level 154
3.1.8 Database Security System 155
3.1.9 Authorization Subsystem 155
3.2 The SQL GRANT and REVOKE Statements 157
3.2.1 The Grant Statement 157
3.2.2 The Revoke Statement 158
4.0 Conclusion 159
5.0 Summary 159
6.0 Tutor Marked Assignment 160
7.0 Further Reading and other Resources 160
164
Basic Concepts in DBMS
1.0 Introduction
Database security refers to the protection of data against unauthorized access, alteration,
or destruction. System needs to be aware of certain constraints that users must nit violate;
those constraints must be specified in a suitable language and must be maintained in the
system catalog; DBMS must monitor users to ensure that the constraints are enforced.
Security in a database involves mechanisms to protect the data and to ensure that it is not
accessed, altered or deleted without proper authorization. Access to data should be
restricted and should be protected from accidental destruction.
This unit provides an overview of database security and recovery. The need for database
security, classification of database security and different type of database failures were
discussed.
2.0 Objectives
By the end of this unit, you should be familiar with the following concepts:
165
Basic Concepts in DBMS
c. Policy questions
d. Operational problems
e. Hardware control
f. Operating system support
g. Issues that the specific concern of the database system itself
Security threat can be classified into accidental and intentional, according to the way they
occur.
The accidental threats include human errors, software errors, and natural or accidental
disasters:
a. Human errors include giving incorrect input and incorrect use of applications
166
Basic Concepts in DBMS
The intentional threat includes authorized users who abuse their privileges and authority,
hostile agents like unauthorized users executing improper reading or writing of data.
a. Physical Security: this refers to the security of hardware associated with the
system and the protection of site where the computer resides. Natural events such
as fire, floods, and earthquakes can be considered as part of the physical threats. It
is advisable to have backup copies of databases in the face of massive disasters.
b. Logical Security: This refers to the security measures residing in the operating
system or the DBMS to handle threats to the data.
a. Operating system Availability: The operating system should verify that users
and application programs attempting to access the system are authorized.
167
Basic Concepts in DBMS
Authorization rules are controls incorporated in the data management system that restrict
access to data and also restrict the actions that people may take when they access data
Authentication may be carried out by the operating system or the relational database
management system. Users are giving individual account or username and password.
c. Encryption: This is the coding of data so that they cannot be read and understood
easily. Some DBMS products include encryption routines that automatically encode
sensitive data when they are stored. They also provide complementary routines for
decoding the data.
Activity A
1. What are the goals of database security?
2. What do you understand by database security threat?
The database security system stores authorization rules and enforces them for each
database access. When a group of users access the data in the database, then privileges
may be assigned to the groups rather than individual users. In this section we shall
consider authorization subsystem of database security system
168
Basic Concepts in DBMS
iv. Constraint: A more specific rule regarding an aspect of the object and
action.
i. Subject-Based Security
a. Subjects are individually defined in the DBMS and each object and action is
specified.
b. For example, user Salim (a Subject) has the following authorizations:
Objects
Actions EMPLOYEES ORDERS PRODUCTS ...
Read Y Y Y ...
Insert N Y N ...
Modify N Y Y ...
Delete N N N ...
Grant N N N ...
169
Basic Concepts in DBMS
a. Objects are individually defined in the DBMS and each subject and action is
specified.
b. For example, the EMPLOYEES table (an Object) has the following
authorizations:
Subjects
Actions SALIM JOHN GAFAR DBA ...
Read Y Y N Y ...
Insert N N N Y ...
Modify N N N Y ...
Delete N N N Y ...
Grant N N N Y ...
Another option of the GRANT statement is WITH GRANT OPTION. This allows the
grantee to propagate the authorization to another subject.
Example:
Let us assume that we have a database with:
170
Basic Concepts in DBMS
GRANT SELECT
ON products
TO salim;
GRANT SELECT
ON employees, departments
TO john;
Example:
If Mr. Salim leaves the company, then we should revoke the privileges given to him
as follows:
Many RDBMS have an ALL PRIVILEGES option that will revoke all of the
privileges on an object from a subject:
171
Basic Concepts in DBMS
Activity B
1. Explain SQL commands with examples:
a. Grant statement
b. Revoke statement
2. Let us assume that we have a database with the following information:
4.0 Conclusion
Database security is the system, processes, and procedures that protect a database from
unintended activity. Unintended activity can be categorized as authenticated misuse,
malicious attacks or inadvertent mistakes made by authorized individuals or processes.
Databases provide many layers and types of information security, typically specified in
the data dictionary, including: Access control, Auditing, Authentication, Encrypt ion, and
Integrity controls
5.0 Summary
lxv. Database security refers to the protection of data against unauthorized access,
alteration, or destruction.
lxvi. Access to data should be restricted and should be protected from accidental
destruction.
lxvii. In a multi-users environment, database security is needed in order to maintain the
consistency of the data.
lxviii. Some of the goals of Database security include confidentiality, integrity, and
availability
lxix. Security threat can be classified into accidental and intentional, according to the
way they occur.
lxx. Database security can be classified into physical and logical security.
lxxi. The database security system stores authorization rules and enforces them for
each database access.
lxxii. Authorization rules take into account a few main ideas such as: Subjects, Objects,
Actions, and Constraints.
lxxiii. GRANT is used to grant an action on an object to a subject
172
Basic Concepts in DBMS
2. How would you ensure database security at Design and Maintenance levels?
173
Basic Concepts in DBMS
Page
1.0 Introduction 162
2.0 Objectives 162
3.0 Database System Architectures 162
3.1 Traditional Mainframe Architecture 162
3.1.1 Advantages of Traditional Mainframe Architecture 163
3.1.2 Disadvantages of Traditional Mainframe Architecture 163
3.2 Personal Comput er - Stand-Alone Database 163
3.3 File Sharing Architecture 164
3.3.1 Advantages of File Sharing Architecture 164
3.3.2 Disadvantages of File Sharing Architecture 164
3.4 Two-Tier Client/Server Architecture 165
3.4.1 Advantages of client/server 166
3.4.2 Disadvantages of client/server 167
3.5 N-Tier Client/Server Architectures 167
3.5.1 Advantages of N-Tier Client Server 168
3.5.2 Advantages of N-Tier Client Server 168
3.6 Open Database Connectivity (ODBC) 169
3.6.1 ODBC CLIENT 169
3.6.2 ODBC Driver for the ODBC Server 169
3.6.3 DBMS Server 170
3.6.4 How do these three components interact? 170
3.6.5 What is so great about ODBC? 170
3.6.7 ODBC Implementation 170
4.0 Conclusion 174
5.0 Summary 174
6.0 Tutor Marked Assignment 175
7.0 Further Reading and other Resources 175
174
Basic Concepts in DBMS
1.0 Introduction
The database architecture is the set of specifications, rules, and processes that dictate how
data is stored in a database and how data is accessed by components of a system. It
includes data types, relationships, and naming conventions. The database architecture
describes the organization of all database objects and how they work together. It affects
integrity, reliability, scalability, and performance. The database architecture involves
anything that defines the nature of the data, the structure of the data, or how the data
flows.
2.0 Objectives
There are a number of database system architectures presently in use. One must examine
several criteria:
Figure 11.1 is a block diagram of the mainframe architecture. Some of the properties of
traditional mainframe database system architecture are:
175
Basic Concepts in DBMS
d. Multiple users access the applications through simple terminals (e.g., IBM 3270
terminals or VT220 terminals) that have no processing power of their own. User
interface is text-mode screens.
e. Example: DB2 database and COBOL application programs running on an IBM
390.
This is referred to as single-user database mode. In the single-user database mode, you
can create any number of databases on local or network drives for individual use. This
framework is suitable for users who wish to maintain private databases of personal or
corporate information, but without losing the ability to easily exchange data with other
users' private databases, or with central corporate databases.
Figure 11.2 is a block diagram of the stand-alone architecture. Some of the properties of
stand-alone database system architecture are:
176
Basic Concepts in DBMS
Figure 11.3 is a block diagram of the file sharing architecture. Some of the properties of
stand-alone database system architecture are:
177
Basic Concepts in DBMS
178
Basic Concepts in DBMS
Figure 11.4 is a block diagram of the client-server architecture. Some of the properties of
client-server database system architecture are:
a. Client machines:
a. Processing of the entire Database System is spread out over clients and server.
b. DBMS can achieve high performance because it is dedicated to processing
transactions (not running applications).
c. Client Applications can take full advantage of advanced user interfaces such
as Graphical User Interfaces.
179
Basic Concepts in DBMS
A variation of the n-tier architecture is the web-based n-tier application. These systems
combine the scalability benefits of n-tier client/server systems with the rich user interface
of web-based systems (see figure 11.5).
Because the middle tier in three-tier architecture contains the business logic, there is
180
Basic Concepts in DBMS
greatly increased scalability and isolation of the business logic, as well as added
flexibility in the choice of database vendors.
Activity A
181
Basic Concepts in DBMS
Open Database Connectivity (ODBC) is Microsoft's strategic interface for accessing data
in a heterogeneous environment of relational and non- relational database management
systems. ODBC provides an open, vendor- neutral way of accessing data stored in a
variety of proprietary personal computer, minicomputer, and mainframe databases.
ODBC alleviates the need for independent software vendors and corporate developers to
learn multiple application programming interfaces. ODBC now provides a universal data
access interface. With ODBC, application developers can allow an application to
concurrently access, view, and modify data from multiple, diverse databases.
This ODBC driver is software that resides on the front-end. The ODBC Driver Catalog
contains an extensive listing of ODBC Drivers. For example, the Microsoft ODBC Driver
Pack is a collection of seven ODBC Drivers ready to be used or bundled with ODBC
clients. A SQL Server ODBC Driver is included with Access.
182
Basic Concepts in DBMS
Any ODBC client can access any DBMS for which there is an ODBC Driver. DBMS
SERVER is a back-end or server DBMS, for example SQL Server, Oracle, AS/400,
Foxpro, Microsoft Access, or any DBMS for which an ODBC driver exists.
Look in the Control Panel, select administrative tools (shown in figure 11.6, from
Windows XP):
183
Basic Concepts in DBMS
Open up the Data Sources ODBC icon. This is called the ODBC Data Source
Administrato.
Click on the ODBC Drivers tab to see which drivers are installed:
184
Basic Concepts in DBMS
In the above example (see figure 11.7), we have ODBC drivers for:
To add more drivers, download or install the ODBC driver from the database
manufacturer. The ODBC driver will then appear on this list.
Clicking on the User DSN tab shows those data sources that have been defined for a user
(see figure 11.8).
185
Basic Concepts in DBMS
Systems DSNs may be used by anyone with an account on the computer system
Both User and System DSNs are maintianed in the registry of the local machine.
File DSNs store all of the DSN information in a file that can be shared between users of
many machines. e.g., put the File DSN on a file server.
Figure 11.9 shows the setup dialog for a Microsoft Access ODBC driver:
186
Basic Concepts in DBMS
Activity B
1. Explain the following terms:
a. ODBC
b. ODBC Client
c. ODBC Driver
d. DBMS Server
2. Explain how to setup an ODBC for SQL Server
4.0 Conclusion
By determining which tier(s) these components are processed on we can get a good idea
of what type of architecture and subtype we are dealing with.
5.0 Summary
187
Basic Concepts in DBMS
i. Application Logic
ii. One-tier Database Architecture
iii. Two-tier Database Architecture
iv. Three-tier Database Architecture
David M. Kroenke, David J. Auer (2008). Database Concepts. New Jersey . Prentice
Hall
Elmasri Navathe (2003). Fundamentals of Database Systems. England. Addison
Wesley.
Fred R. McFadden, Jeffrey A. Hoffer (1994). Modern Database management. England.
Addison Wesley Longman
Pratt Adamski, Philip J. Pratt (2007). Concepts of Database Management. United
States. Course Technology.
188
Basic Concepts in DBMS
Page
1.0 Introduction 177
2.0 Objectives 177
3.0 Database Design Steps 177
3.1 Planning a Microsoft Access Application 177
3.2 Creating a Microsoft Access Application 177
3.3 Introduction to Microsoft Access 180
3.3.1 Starting Microsoft Access 180
3.3.2 Create a database using the Database Wizard 181
3.3.3 Create a database without using the Database Wizard 181
3.4 Creating Tables in Microsoft Access 183
3.4.1 Creating a Table Using the Design View 183
3.4.2 Primary Key 183
3.4.3 Switching Views 184
3.4.4 Entering Data 184
3.4.5 Manipulating Data 185
3.5 Creating Relationships between Tables 186
4.0 Conclusion 189
5.0 Summary 189
6.0 Tutor Marked Assignment 190
7.0 Further Reading and other Resources 191
189
Basic Concepts in DBMS
1.0 Introduction
2.0 Objectives
This section serves as a reminder to what we have learnt from all the previous units. The
basic steps to design a database are as follows:
To design a Microsoft Access Database application, you will first need to define the
purpose of the application by determining how it will be used and what the results that it
must produce are. You can gather this information by talking to the people who will be
using the application. You will want to list the tasks that the users must perform with the
database and gather together examples of the current paper forms and reports that they
use and produce.
After analyzing the database users need and workflow, you can then decide how users
would be able to navigate through your application and complete their tasks. Some of the
190
Basic Concepts in DBMS
customized navigation tools that you can incorporate into your application design include
command buttons, custom menu commands and custom toolbar buttons. You can also
design an interface that controls how the database applications will start-up and what
parts of the database application are available to individuals or groups of the database
users. Other elements of the graphical user interface that you should consider are the
layout that you should use, how you will group particular objects and the logic that you
will apply to allow the user to move from one object to another.
As you will be designing database queries, forms, reports and other objects based upon
your database table design, it is extremely important that you take time up front to plan a
sound database structure and its relationships. As you analyze the data that the users will
be working with, separate it into different subjects, each of which will become an entity.
You can eliminate data redundancy and inconsistent data dependency by normalizing
your data to ensure that all tables are in at least third normal form.
You will also need to plan the database security. Planning security means that you can
control what individuals or groups of users can do with the database tables, queries,
forms, reports, macros and modules. You will need to determine, again by interviewing
the application's users, who should have access to an application's objects and data and
who should be able to change an object's design.
Once you have worked through the stages of Planning a Microsoft Access Application,
you will then move onto creating the application in Microsoft Access. The following
checklist details the application needs and data sources:
a. Investigation Phase
i. Talk to the users who will be working with the database application to find out
their data input needs, reporting needs, querying and other data needs and
application security needs.
ii. Create a rough prototype by using the Database Wizard and other wizards and
templates.
191
Basic Concepts in DBMS
j. Implementing Security
i. Create user and group security dependant upon who can access what in the
database application.
ii. Assign permissions to the security groups.
iii. Make Backups of the application.
192
Basic Concepts in DBMS
Microsoft Access is a powerful program to create and manage your databases. It has
many built in features to assist you in constructing and viewing your information.
First of all you need to understand how Microsoft Access breaks down a database. Some
keywords involved in this process are: Database File, Table, Record, Field, Data-type.
Database File: This is your main file that encompasses the entire database and that is
saved to your hard-drive or floppy disk. Example is Bank.mdb
Table: A table is a collection of data about a specific topic. There can be multiple tables
in a database. Examples: Customers and Account tables
Field: Fields are the different categories within a Table. Tables usually contain multiple
fields. Example: Customers LastName, Customers FirstName etc
Data types: Data types are the properties of each field. A field only has one data type.
Example: the Lastname field in student table could be of type Text
ii. Click on Start --> Programs --> Microsoft Access (see figure 12.2)
193
Basic Concepts in DBMS
a. When Microsoft Access first starts up, a dialog box is automatically displayed
with options to create a new database or open an existing one. If this dialog box is
displayed, click Access Database Wizards, pages, and projects and then click
OK.
If you have already opened a database or closed the dialog box that displays when
Microsoft Access starts up, click New Database on the toolbar.
b. On the Databases tab, double-click the icon for the kind of database you want to
create.
c. Specify a name and location for the database.
a. When Microsoft Access first starts up, a dialog box is automatically displayed
with options to create a new database or open an existing one. If this dialog box is
displayed, click Blank Access Database, and then click OK.
194
Basic Concepts in DBMS
If you have already opened a database or closed the dialog box that displays when
Microsoft Access starts up, click New Database on the toolbar, and then double-
click the Blank Database icon on the General tab.
b. Specify a name and location for the database and click Create. (Figure 12.3is the
screen that shows up following this step)
The two main features of this main screen are the menu bar that runs along the top of the
window and the series of tabs in the main window. The menu bar is similar to other
Microsoft Office products such as Excel. The menus include:
i. File - Menu items to Open, Close, Create new, Save and Print databases and their
contents. This menu also has the Exit item to exit Access.
ii. Edit - Cut, Copy, Paste, Delete
iii. View - View different database objects (tables, queries, forms, reports)
iv. Insert - Insert a new Table, Query, Form, Report, etc.
v. Tools - A variety of tools to check spelling, create relationships between tables,
perform analysis and reports on the contents of the database.
vi. Window - Switch between different open databases.
vii. Help - Get help on Access.
195
Basic Concepts in DBMS
Tables are the main units of data storage in Access. There are a number of ways to create
a table in Access. Access provides wizards that guide the user through creating a table by
suggesting names for tables and columns. The other main way to create a table is by
using the Design View to manually define the columns (fields) and their data types.
To create a table in Access using the Design View, perform the following steps:
i. Click on the Tables tab in the left hand pane of the database dialog box
ii. Double click on the "Create Table in Design View" item in the right hand pane.
The Table Design View will appear. Fill in the Field Name, Data Type and Description
for each column/field in the table. Refer to table 7.2, Module 2, Unit 1 for available data
types in Microsoft Access.
One or more fields (columns) whose value or values uniquely identify each record in a
table. A primary key does not allow Null values and must always have a unique value. A
primary key is used to relate a table to foreign keys in other tables.
NOTE: You do not have to define a primary key, but it is usually a good idea. If you do
not define a primary key, Microsoft Access asks you if you would like to create one when
you save the table.
To define primary key, simply select the field or fields to be used and select the
primary key button (see figure 12.4)
196
Basic Concepts in DBMS
To switch views form the datasheet (spreadsheet view) and the design view, simply click
the button in the top-left hand corner of the Access program.
Datasheet view button allows you to enter raw data into your database table (see figure
12.5)
Design view button allows you to enter fields, data-types, and descriptions into your
database table (see figure 12.6)
Click on the Datasheet View and simply start entering the data into each field (see figure
12.7). NOTE: Before starting a new record, the primary key field must have something
in it.
197
Basic Concepts in DBMS
To add a new row, simply drop down to a new line and enter the information
To update a record, simply select the record and field you want to update, and change its
data with what you want
To delete a record, select the entire row and hit the Delete Key on the keyboard
Activity A
198
Basic Concepts in DBMS
After you have set up multiple tables in your Microsoft Access database, you need a way
of telling Access how to bring that information back together again. The first step in this
process is to define relationships between your tables.
A relationship works by matching data in key fields - usually a field with the same name
in both tables. In most cases, these matching fields are the primary key from one table,
which provides a unique identifier for each record, and a foreign key in the other table.
In the Bank database we have created, the Customers table is related to the Accounts
table by virtue of the CustomerID field appearing in both tables.
i. On the menu bar click on Tools --> Relationships. The Show Table dialog box
will appear as shown in figure 12.8
199
Basic Concepts in DBMS
ii. Highlight both the Customers table and the Accounts table as shown below and
then click on the Add button.
iii. Click on the Close button to close this dialog box. The Relationships screen will
now reappear with the two tables displayed in figure 12.9
iv. To connect the Customers table with the Accounts table to form a relationship,
click on the CustomerID field in the Customers table and drag it over on top of
the CustomerID field on the Accounts table. Upon releasing the mouse button, the
Edit Relationships dialog box will appear as in figure 12.10
200
Basic Concepts in DBMS
Access tries to determine the Relationship Type between the two tables. Most times two
tables have a One-to-Many relationship and this is usually the default chosen by Access.
For this example, Access knows that CustomerID is a key of the Customer table so it
chooses this field as the "One" side. This makes the Accounts table the "Many" side as
One customer may have Many accounts.
v. One additional step to be taken is the check off the box labeled "Enforce
Referential Integrity". This option puts constraints into effect such that an
Accounts record can not be created without a valid Customer and Access will also
prevent a user from deleting a Customer record if a related Accounts record
exists. At this point, click on the Create button to create the relationship. The
Relationships screen should reappear with the new relationship in place as shown
if figure 12.11
a. When the Cascade Update Related Fields check box is set, changing a
primary key value in the primary table automatically updates the matching
value in all related records.
b. When the Cascade Delete Related Records check box is set, deleting a
record in the primary table deletes any related records in the related table
201
Basic Concepts in DBMS
Note the symbols "1" (indicating the "One" side) and the infinity symbol (indicating the
"Many" side) on the relationship. Close the relationships screen and select Yes to save the
changes to the Relationships layout.
Activity B
4.0 Conclusion
Microsoft Access is a very powerful relational database management tool for creating
database applications. It has many built in features to assist you in constructing and
viewing your information.
5.0 Summary
lxxxv. Microsoft Access is a powerful program to create and manage your databases.
lxxxvi. Database file is the main file that encompasses the entire database.
lxxxvii. Tables are the main units of data storage in Microsoft Access.
lxxxviii. Tables can be created by using design view or by using wizards
lxxxix. Fields are the different attributes of a Table.
xc. Data types are the properties of each field and a field only has one data type.
xci. Primary key is one or more fields (columns) whose value or values uniquely
identify each record in a table.
xcii. Before starting a new record, the primary key field must have something in it.
xciii. It necessary to define relationships between tables in a database
xciv. A relationship works by matching data in key fields in both tables.
xcv. Referential integr ity option in relationship puts constraints into effect.
202
Basic Concepts in DBMS
xcvi. When the Cascade Update Related Fields check box is set, changing a primary
key value in the primary table automatically updates the matching value in all
related records.
xcvii. When the Cascade Delete Related Records check box is set, deleting a record in
the primary table deletes any related records in the related table.
2. Examine the following flat file and design the relational model for this kind of a
203
Basic Concepts in DBMS
204
Basic Concepts in DBMS
Page
1.0 Introduction 193
2.0 Objectives 193
3.1 Types of Microsoft Access Queries 193
3.1.1 Select Query 194
3.1.2 Action Query 194
3.1.3 Parameter Query 196
3.1.4 Aggregate Query 196
3.2 Creating Select Queries in MS Access 196
3.3 Creating a Calculated Field 199
3.4 Working with IIf Function 200
3.5 Summarising Group of Records 200
4.0 Conclusion 202
5.0 Summary 202
6.0 Tutor Marked Assignment 203
7.0 Further Reading and other Resources 204
205
Basic Concepts in DBMS
1.0 Introduction
Queries are very useful tools when it comes to databases and they are often called by the
user through a form. They can be used to search for and grab data from one or more of
your tables, perform certain actions on the database and even carryout a variety of
calculations depending on your needs.
In this unit, we will use Access to create a variety of queries that analyze and manipulate
database information.
2.0 Objectives
Microsoft Access allows for many types of queries, some of the main ones being select,
action, parameter and aggregate queries. Table 13.1 shows different types of queries in
Microsoft Access
Append Appends or adds selected records from one table to another table. Useful
Query for importing information into a table.
Delete
Deletes selected records from one or more tables.
Query
Update Updates selected information in a table. For example, you could raise the
206
Basic Concepts in DBMS
The select query is the simplest type of query and because of that, it is also the most
commonly used one in Microsoft Access databases. It can be used to select and display
data from either one table or a series of them depending on what is needed.
In the end, it is the user-determined criteria that tell the database what the selection is to
be based on. After the select query is called, it creates a "virtual" table where the data can
be changed, but at no more than one record at a time.
When the action query is called, the database undergoes a specific action depending on
what was specified in the query itself. This can include such things as creating new
tables, deleting rows from existing ones and updating records or creating entirely new
ones.
Action queries are very popular in data management because they allow for many records
to be changed at one time instead of only single records like in a select query.
a. Append Query – takes the set results of a query and append (or add) them to an
existing table.
b. Delete Query – deletes all records in an underlying table from the set results of a
query.
c. Make Table Query – as the name suggests, it creates a table based on the set
results of a query.
d. Update Query – allows for one or more field in your table to be updated.
207
Basic Concepts in DBMS
v. If you select an existing table, click one of the following options: Current
Database (if the table is in the currently open database) or Another Database (and
type the name of the other database, including the path, if necessary). Click OK,
and then add the fields you want to append and identify a matching field if Access
does not supply one.
vi. Click OK and click the View button on the toolbar to view the results of the query
or the Run button on the toolbar to append the records.
i. In the Database window, click the Queries tab in the Objects bar and click the
New button.
ii. Select Design view and click OK.
iii. Add the appropriate tables and/or queries and click close, and then connect any
unrelated tables.
iv. Click the Query Type button list arrow on the toolbar and select Delete Query.
v. Click the View button to view the results of the delete query.
vi. If you are satisfied that the appropriate records will be deleted, click the Run
button on the toolbar and click yes to confirm the deletion.
i. Create a new query in Design view, and then select the tables and/or queries you
want to use in the update query.
ii. Click the Query Type button list arrow on the toolbar and select Update Query or
select Query Update Query from the menu.
iii. Double-click the fields that you want to appear in the query or click and drag the
fields onto the design grid.
208
Basic Concepts in DBMS
iv. Enter an expression to update the selected field and enter any criteria, if needed,
to select which records should be updated.
v. Click the View button to view the results of the update query. If you're satisfied
that the appropriate records will be updated, click the Run button on the toolbar to
update the records.
In Microsoft Access, a parameter query works with other types of queries to get whatever
results you are after. This is because, when using this type of query, you are able to pass a
parameter to a different query, such as an action or a select query. It can either be a value
or a condition and will essentially tell the other query specifically what you want it to do.
It is often chosen because it allows for a dialog box where the end user can enter
whatever parameter value they wish each time the query is run. The parameter query is
just a modified select query.
This is a special type of query. It works on other queries (such as selection, action or
parameter) just like the parameter query does, but instead of passing a parameter to
another query it totals up the items by selected groups. It essentially creates a summation
of any selected attribute in the table.
a. Sum
b. Avg
c. Min
d. Max
e. First
f. Last
g. Group By
h. Count
i. StDev
j. Var
k. Expression
l. Where
Creating a query can be accomplished by using either the query design view or the Query
wizard. In this section, we will use design view.
Queries are accessed by clicking on the Queries tab in the Access main screen as shown
in figure 13.1:
209
Basic Concepts in DBMS
i. Click the queries icon in the objects bar, then double-click create query in design
view.
ii. Select the table or query you want to use and click.
iii. Repeat step 2 as necessary for additional tables or queries. Click close when you
are through.
iv. Double-click each field you want to include from the field list or drag the field
from the field list onto the design grid add (see figure 13.2).
210
Basic Concepts in DBMS
v. In the design grid enter any desired criteria for the field in the criteria row (see
figure 13.3).
vi. Click the sort box list arrow for the field and select a sort order (see figure 13.4).
211
Basic Concepts in DBMS
Activity A
1. Display the list of customers that reside in Lagos sorted by their surname.
A calculated field performs some type of arithmetic on one or more fields in a database to
come up with a completely new field.
You must create an expression (or formula) to perform a calculation. To enter fields in an
expression, type the field name in brackets ([FName]). If a field name exists in more than
one table, you will need to enter the name of the table that contains the field in brackets
([Customer]) followed by an exclamation mark (!). Then type the field name in brackets,
such as [FName].For example, customer surname in a customers table could be
represented as [Customers]![SName]
212
Basic Concepts in DBMS
Functions are used to create more complicated calculations or expressions than operators
can.
There are several hundred functions in Microsoft Access, but all of them are used in a
similar way:
An argument in Microsoft Access is the value a function uses to perform its calculation.
This section introduces a very useful database function: the IIf function. The IIf function
evaluates a condition and returns one value if the condition is true and another value if
the condition is false. The syntax is:
Part Description
When you work with queries, you will often be less interested in the individual records
and more interested in summarized information about groups of records. A query can
calculate information about a group of records in one or more tables. For example, you
could create a query that finds the total tuition fee paid by student in all the departments a
particular academic. The Total row lets you group and summarize information in a query.
The Total row normally is tucked away from view in the query design window you can
make the Total appear by selecting View followed by Totals from the menu. Once the
Total row is displayed, you can tell Microsoft Access how you want to summarize the
fields. Table 13.2 is a summary of the available Total options in Microsoft Access.
213
Basic Concepts in DBMS
Option Description
Groups the values in the field so that you can perform calculations on the
Group By
groups.
Sum Calculates the total (sum) of values in a field.
Avg Calculates the average of values in a field.
Min Finds the lowest value in a field.
Max Finds the highest value in a field.
Count Counts the number of entries in a field, not including blank (Null) records.
StDev Calculates the standard deviation of values in a field.
Var Calculates the variance of values in a field.
First Finds the values from the first record in a field.
Last Finds the values from the last record in a field.
Expression Tells Access that you want to create your own expression to calculate a field.
Where Specifies criteria for a field to limit the records included in a calculation.
Activity B
214
Basic Concepts in DBMS
e. In the next panel, you will be asked to choose between a detail or summary que ry.
Choose detailed query and click on the Next button
f. Save the query with qryAccounts and click on the Finish button..
2. Modify the query to sort the ouput on the account number and only display the saving
account.
a. From the Queries tab on the Access main screen, highlight the qryAccounts and
click on the Design button.
b. Change the Sort order for the AccountNumber field to Ascending.
Add the following statement to the Criteria under the AccountType field:
= 'Savings'
c. Run the query by pulling down the Query menu and choosing the Run menu item.
The output is shown below:
d. Save and close the query to return to the Access main screen.
a. Create a new query called "Accounts Summary Query" that joins the Cusomers
table (include the CustomerID and Name fields) with the Accounts table (include
the Balance field only).
b. In the second step of the wizard, click on the Summary choice (instead of Details)
and then click on the Summary Options... button.
c. Check all of the Summary option boxes such as Sum, AVG, Min and Max.
4.0 Conclusion
Queries are a fundamental means of accessing and displaying data from tables. Queries
can access a single table or multiple tables.
5.0 Summary
xcviii. Queries can be used to search for and grab data from one or more tables.
xcix. some of the main queries types in Microsoft Access are: select, action, parameter
and aggregate queries
c. Select query can be used to select and display data from either one table or a
series of them depending on what is needed.
ci. There are four kinds of action queries: Append, Delete, Make-Table, and Update
cii. Parameter query can be used to pass parameter to select and action queries
ciii. Aggregate query essentially creates a summation of any selected attribute in the
table.
215
Basic Concepts in DBMS
1. Which of the following criterion is NOT written using the proper syntax?
A. "Harris"
B. Between 1/1/2000 and 12/31/2000
C. NO VALUE
D. 500
2. Which of the following types of queries are action queries? (Select all that apply.)
A. Parameter queries.
B. Append queries.
C. Update queries.
D. Crosstab queries.
3. Which of the following expressions is NOT written in the correct syntax?
A. [Order Total]*[Tax Rate]
B. "Order Total"*0.1
C. [tblCustomerTours]![Cost]*[tblEmployees]![Commission]
D. 100+10
4. If you are having trouble remembering how to write expressions using the correct
syntax, you can use the Expression Builder to help you create the expression.
(True or False?)
5. Rebate: IIF([Age]65,"Senior","Adult") This expression is an example of:
A. Something I learned back in high school algebra and thought I would
never see again.
B. A financial expression.
C. Something that belongs in a Microsoft Excel book.
D. A conditional expression.
6. A query prompts a user for a date and then displays only records that contain the
specified date. Which type of query is this?
A. A parameter query.
B. A crosstab query.
C. An action query.
D. An update query.
7. You must create a report if you want to calculate totals for a group of records, as
queries can't perform this task. (True or False?)
8. A query summarizes information in a grid, organized by regions and months.
Which type of query is this?
A. A parameter query.
B. A crosstab query.
C. An action query.
D. An update query.
9. Your company finally agreed to buy you a nifty 3COM Palm palmtop. Now you
want to extract your clients from the company's database and put them into a
separate table that you can export to your Palm. Which type of query could help
you accomplish this task?
A. A parameter query.
B. A crosstab query.
216
Basic Concepts in DBMS
C. An update query.
D. A make-table query.
10. If you are creating a crosstab query, what must the table you are querying
contain?
A. At least one text field.
B. At least one number field.
C. More than 100 records.
D. Lots of confusing information.
11. How can you add a table to the query design window?
A. Select Edit Add Table from the menu.
B. Click the Show Table button on the toolbar.
C. Select the table from the Table list on the toolbar.
D. Select Tools Add Table from the menu.
12. You want a query to calculate the total sales for your employees. How can you do
this from the query design window?
A. Click the Totals button on the toolbar. In the Total row select "Group By"
under the Employee field and "Sum" under the Sales field.
B. Click in the Sales field and click the AutoSum button on the toolbar.
C. You need to export this information to Microsoft Excel and calculate it
there.
Brainbell.com (2008). Microsoft Access Tutorial. Retrieved June 20th, 2008, from
http://www.brainbell.com/tutorials/ms-office/Access_2003/
Bcschool.net (2003-2006). Create Database Applications using Microsoft Access,
Retrieved June 20th, 2008, from http://www.bcshool.net/staff/accesshelp.htm
Cisnet.baruch.cuny.edu (2009). Microsoft Access Tutorial. Retrieved March 15th, 2009
from http://cisnet.baruch.cuny.edu/holowczak/classes/2200/access/accessall.html.
Databasedev.co.uk (2009). Microsoft Access Tutorial: Retrieved March 15th, 2009 from
http://www.databasedev.co.uk/plan-an-access-application.html
217
Basic Concepts in DBMS
Page
1.0 Introduction 206
2.0 Objectives 206
3.0 Introduction to Forms 206
3.1 Controls 206
3.2 Creating Forms using Forms Wizard 211
3.3 Making Simple Design Changes 215
3.4 Creating a Calculated Control 216
3.5 Form/Subforms 216
4.0 Conclusion 217
5.0 Summary 217
6.0 Tutor Marked Assignment 217
7.0 Further Reading and other Resources 218
218
Basic Concepts in DBMS
1.0 Introduction
Data entry forms are the primary means of entering data into tables in the database. In the
previous units, we described how to add data to a table using a spreadsheet-like view of
the data. Data entry forms offer a more user-friendly interface by adding labels for each
field and other helpful information.
Microsoft Access provides several different ways of creating data entry forms. These
include creating the forms by hand using a Design View as well as a number of wizards
that walk the user through the forms creation process.
This unit explains everything you have ever wanted to know about forms.
2.0 Objectives
Microsoft Access provides the tools for developing graphical user interfaces (GUI) that
facilitate the use of database applications.
An Access GUI consists of a set of Forms. Forms are front ends for accessing the data
that is stored in database tables or that is generated by queries. Some of the available
controls are: Text labels, Text boxes, List boxes, Combo boxes, Option groups, Buttons,
Objects created by other applications, Decorative lines and boxes.
3.1 Controls
Every control has a set of properties. Properties determine where a form/control gets its
data from, whether the form/control can be used for editing data or for displaying data
only as well as several details which determine how the form/control is displayed.
Form/control properties are automatically set by “Wizard” programs provided by Access
in order to facilitate the creation of forms. Users only need to edit them occasionally, in
219
Basic Concepts in DBMS
order to fine-tune the appearance and behavior of the forms they create. Some of the
available properties in Microsoft Access are shown in Table 14.1a and b
220
Basic Concepts in DBMS
221
Basic Concepts in DBMS
Allow Datasheet
View
Allow
PivotTable View
Allow
PivotChart View
Scroll Bars * Format Determines whether scroll bars appear on the form.
Record Selectors Format
Determines whether a form contains a record selector.
*
Navigation Format
Determines whether a form has navigation buttons.
Buttons *
Dividing Lines Format Determines if lines appear between records in continuous
forms.
Auto Resize Format Resizes the form automatically to display a complete record.
222
Basic Concepts in DBMS
223
Basic Concepts in DBMS
Microsoft Access provides a set of Wizards that facilitate the creation of new forms.
224
Basic Concepts in DBMS
d. Select which fields of the selected table/query will actually appear on the form.
(See figure 14.2)
e. Click on the Next button
f. Select the general layout of the form (See figure 14.3). The available options are:
i. Columnar - Places the labels to the left of each field. This is similar
to a paper form. This layout is suitable for viewing data one record
at a time.
ii. Tabular - Places the field labels at the top of the screen and the
records are displayed below. This is similar to how a spreadsheet
would display the data and is suitable for displaying multiple
records of data at a time.
iii. Datasheet - The data appears in the same fashion as when viewing
or adding data to a table.
iv. Justified - Places the labels above each field with the fields spread
out on the form. This is suitable for viewing a single record at a
time as with the columnar layout.
g. Click on the Next button
h. Choose the desired form background pattern (see figure 14.4).
i. Give name under which the new form will be stored (see figure 14.5).
j. Click Finish Button.
k. Finally, the form will be created and open for data display/editing (see figure
14.6).
225
Basic Concepts in DBMS
226
Basic Concepts in DBMS
227
Basic Concepts in DBMS
In Design View, the structure of the form in terms of its controls and their properties can
be manipulated. To make changes to a form:
a. Open the Form.
b. From the View Menu, select Design view.
c. Select any control and then resize it, or move it around to any part of the form.
d. Selected controls can be deleted by pressing CTRL-X
e. The property list of a control can be accessed by doing a right-mouse-click on a
selected control and then selecting “Properties” from the menu that appears.
f. Property lists are conveniently organized into categories:
i. Format --includes properties that affect how the control is displayed
ii. Data --includes properties that affect where the control gets its data from
(notice that some controls get their data from locally defined queries,
example “Reports To”)
iii. Event --includes properties that specify what happens where various
events involving the control
g. The properties of a FORM can be accessed by right-clicking on an area that lies
outside of the boundaries of the form (“gray area). The most important form
properties specify where a form gets its contents from (which table or query),
whether it is used for editing or display only, whether it can be resized, etc.
h. New controls can be added to a form by dragging and dropping them from the
Toolbox. Newly created controls then need to be sized and moved to their final
position in the form. Their properties also need to be set. In most cases, the only
228
Basic Concepts in DBMS
property that matters in the “Control Source”, which specifies the field to which
the new control, is bound.
Activity A
a. Click on the Forms tab on the Access main screen and then click on Create form
by using wizard.
b. Select the Accounts table.
c. Select all of the available fields and click on the Next button.
d. Choose a Tabular layout and click on the Next button.
e. Choose the Standard style and click on the Next button.
f. Name the form: AccountsDataEntry
g. Click on the Finish button to create, save and view the new form.
h. Close the form and return to the Access main screen, by pulling down the File
menu and choosing Close.
A calculated control is an unbound control that displays totals and other arithmetic
computations on a form. You create calculated controls by entering an expression (or
formula) to perform the calculation in the control's Control Source property.
3.5 Form/Subforms
A subform is a form within a form. The primary form is called the main form, and the
form within the form is called the subform. Subforms are especially useful when you
want to show data from tables or queries with a one-to-many relationship. For example, a
Customer form might have a subform that displays each customer's Accounts
The main form and subform are linked so that the subform displays only records that are
related to the current record in the main form. For example, when the main form displays
a particular customer, the subform displays only accounts for that customer.
Activity B
229
Basic Concepts in DBMS
2. Use AutoForm to create and save a columnar form named "Customers," using the
Customers table as the underlying data source.
3. Add a text box control with today's date in the bottom-right corner of the
Customers form.Hint: You will need to change the text box control's data source
to the expression =Today( ).
4. Rearrange the control fields on the form, so that the LastName and FirstName
fields appear before the SSN field.
5. Change the Customer form's tab order to reflect the new field order.
6. Delete the DOB field control from the form.
7. Resize the Customers form as necessary, then use the SubForm Wizard to create a
subform based on the Insurance Claims table.
8. Modify the Insurance Claims subform so that its Default View property is Single
Form View.
Save your changes to the main form and the subform. Then close the form and the
Homework database
4.0 Conclusion
A form is nothing more than a graphical representation of a table. You can add, update,
and delete records in your table by using a form. A form is very good to use when you
have numerous fields in a table. This way you can see all the fields in one screen.
5.0 Summary
civ. Forms are front ends for accessing the data that is stored in database tables or that
is generated by queries.
cv. Forms are made up of controls and individual control is typically “bound” to a
particular field of the table or query that is associated with the form.
cvi. Properties determine where a form/control gets its data from.
cvii. Forms can be created in two ways: Design view and Wizards
cviii. A subform is a form within a form. The pr imary form is called the main form, and
the form within the form is called the subform.
1. Which of the following statements about the AutoForm Wizard is NOT true?
A. The AutoForm Wizard is the fastest and easiest way to create a form in
Microsoft Access.
B. The AutoForm Wizard can only create five types of forms: Datasheet,
Columnar, Tabular, PivotTable, or PivotChart.
C. Forms created with the AutoForm Wizard usually come out looking sharp
and professional and don't require any further clean-up work.
230
Basic Concepts in DBMS
D. The AutoForm Wizard can only create forms based on a single table or
query.
2. Which of the following statements is NOT true?
A. The Field List displays all the fields from a form's underlying table or
query.
B. Click the Field List button on the Toolbar to display the Field List.
C. You can add fields to a form by dragging them from the Field List onto
the form.
D. The Field List displays all the fields from every table in a database.
3. Controls and their corresponding text labels cannot be moved independently of
one another. (True or False?)
4. If you move a control on a form, the Tab Order, in which you advance from one
field to the next when you press the Tab key, is automatically updated. (True or
False?)
5. A form that has a Datasheet Default View property would display one record at a
time in the form. (True or False?)
6. A calculated field... (Select all that apply.)
A. ...is a bound control.
B. ...is a control that contains an expression.
C. ...can perform calculations on fields values, such as
=[Cost]*[Commission].
D. ...can perform calculations on explicit values, such as =2+4.
7. Which of the following set(s) of tables would benefit from a subform? (Select all
that apply.)
A. A Customer table and the Customer Orders table.
B. A Customer table and Products table.
C. A Customer table and Foreign Currency table.
D. A Customer table and a Customer Contacts table.
8. When you add a subform to a main form, Access always recognizes how the two
forms are related (True or False?)
Brainbell.com (2008). Microsoft Access Tutorial. Retrieved June 20th, 2008, from
http://www.brainbell.com/tutorials/ms-office/Access_2003/
Bcschool.net (2003-2006). Create Database Applications using Microsoft Access,
Retrieved June 20th, 2008, from http://www.bcshool.net/staff/accesshelp.htm
Cisnet.baruch.cuny.edu (2009). Microsoft Access Tutorial. Retrieved March 15th, 2009
from http://cisnet.baruch.cuny.edu/holowczak/classes/2200/access/accessall.html.
Databasedev.co.uk (2009). Microsoft Access Tutorial: Retrieved March 15th, 2009 from
http://www.databasedev.co.uk/plan-an-access-application.html
231
Basic Concepts in DBMS
Page
1.0 Introduction 220
2.0 Objectives 220
3.0 Introduction to Reports 220
3.1 Understanding Report Sections 220
3.2 Creating a Single Report using Wizards 221
3.3 Report Controls 228
3.4 Design View 229
4.0 Conclusion 230
5.0 Summary 230
6.0 Tutor Marked Assignment 231
7.0 Further Reading and other Resources 232
232
Basic Concepts in DBMS
1.0 Introduction
A report is an effective way to present your data in a printed format. Because you have
control over the size and appearance of everything on a report, you can display the
information the way you want to see it.
This unit explains everything you will need in creating and working with reports.
2.0 Objectives
On successful completion you will be able to create and modify a variety of reports.
Reports present information from tables and queries in a format that looks great when
printed. Reports help to print records from tables or queries in a professional way; you
can even include calculations, graphics, or a customized header or footer
Reports are similar to queries in that they retrieve data from one or more tables and
display the records. However, reports add formatting to the output including fonts, colors,
backgrounds and other features. Reports are often printed out on paper rather than just
viewed on the screen.
Reports can also summarize and analyze the information in the database. The following
are some of the available features in Microsoft Access Reports:
a. Formatting Options: Change the type, size, and color of the fonts used in a report
or add lines, boxes, and graphics.
b. Sorting and Grouping Options: Reports are great for summarizing and organizing
information.
c. Combine Data from Linked Tables: One report can display data from several
related tables or queries.
Microsoft Access breaks reports up into separate parts called sections. Each section has
its own specific purpose and always prints in the same order on a report. Table 15.1
shows the available sections.
233
Basic Concepts in DBMS
Resolution Description
report's column headings.
Group Used to place text, such as a group name, at the beginning of each group of
Header records.
Contains text and the actual fields that are displayed for each record. This
Detail
would be equivalent to the main body in a word-processing document.
Group Used to place text and numeric summaries, such as totals or averages, at
Footer the end of each group of records.
Contains text that appears at the bottom of each page of a report, such as
Page Footer
page numbers.
Report Contains text that appears at the end of the last page of a report. Often also
Footer contains numeric summaries for the report, such as a grand total.
b. Select the Create Report by using wizard then select the Customer table as
shown in figure 15.1.
c. Next specify the fields from the Customer table that will appear on the report. In
this case, we want all of the fields to appear. Move each of the fields from the
234
Basic Concepts in DBMS
Available Fields side over to the Selected Fields side as in the following
figure as shown in figure 15.2. Then click on the Next button.
d. In the next step, we have the opportunity to add Grouping Levels to the report. A
grouping level is where several records have the same value for a given field and
we only display the value for the first records. In this case, we will not use any
grouping levels so simply click on the Next button as shown in figure 15.3.
235
Basic Concepts in DBMS
e. In the next step, we are given the opportunity to specify the sorting order of the
report. For this example, we will sort the records on the CustomerID field. To
achieve this, pull down the list box next to the number 1: and choose the
CustomerID field as shown in the figure 15.4. Then click on the Next button.
The next step is to specify the layout of the report. The three options are:
i. Columnar - Places the labels to the left of each field. This is similar to a paper
form.
ii. Tabular - Places the field labels at the top of the report page and the records are
displayed below. This is similar to how a spreadsheet would display the data.
iii. Justified - Places the labels above each field with the fields spread out on the
report page.
Generally, reports use the tabular layout. For this example, choose Tabular layout and
set the page Orientation to Landscape so that all of the fields will fit across one page.
This is shown in the figure 15.5. Click on the Next button to continue.
236
Basic Concepts in DBMS
In the next step, the style of the report can be selected. For this example, choose the
Corporate style as shown in figure 15.6 and click on the Next button to continue.
237
Basic Concepts in DBMS
Finally, give a name for the new report: CustomerReport and then click on the Finish
button to create, save and display the new report (see figures 15.7 and 15.8).
The output from the report is shown in the figure below. Note that on some screens, the
first or last fields may not display without scrolling over to the left or right.
238
Basic Concepts in DBMS
Once the report is displayed, it can be viewed, printed or transferred into Microsoft Word
or Microsoft Excel. The button bar across the top of the screen is as shown in figure 15.9
To close the report and return to the Access main screen, pull down the File menu and
choose Close or click on the Close button.
Activity A
239
Basic Concepts in DBMS
1. From the Reports tab on the Access main screen, click on the Create Report
using Wizard.
2. Select the Accounts table.
3. Select all of the fields in the Accounts table by moving them all over to the
Selected Fields side then click Next
4. Group the report by CustomerID by clicking on the CustomerID field and then
clicking on the right arrow button. This is shown in the following figure:
Click on the Next button.
5. Choose to sort the report on the AccountNumber field. Note that a new button
will appear called Summary Options. Click on the Summary Options button.
Choose the Balance field and select the Sum option. Choose the option to show
both Detail and Summary data (see figure 15.10.) Then click on the OK button.
240
Basic Concepts in DBMS
To close the report and return to the Access main screen, pull down the File menu and
choose Close.
Any object that appears on a report is called a control. A text box used to display record
information or a column heading are both examples of controls. You add controls to a
report by clicking the control you want to use from the tool box and then dragging it onto
the report. See Table 15.1 for Toolbox .
241
Basic Concepts in DBMS
Toolbox
Description
Button
Creates a text label that appears the same for every record, such as a heading.
Most controls already include a text label.
Creates a text box that displays information from tables and queries in a
report.
Creates a box around a group of option buttons so that the user is only
allowed to make one selection from the group box. Normally used in forms,
not reports.
Creates a toggle button. Normally used in forms, not reports.
Creates an option button (or radio button) that displays data from two or
more options. Normally used in forms, not reports.
Creates a box that is empty or contains a checkmark. Use to display data
from a Yes/No field.
Creates a combo box. Normally used in forms, not reports.
Creates a list box. Normally used in forms, not reports.
Creates a button that runs a macro or Visual Basic function. Normally used
in forms, not reports.
Displays a picture by using a graphic file that you specify.
Inserts an OLE object that is not bound to a field in the current database. Use
an Unbound Object Frame to display information from an external source or
program, such as a spreadsheet, graphic, or other file.
Inserts an OLE object that is bound to a field in the database. Use Bound
Object Frames to display pictures or other OLE information in the database.
Inserts a page break.
Creates a tab control. Normally used in forms, not reports.
Inserts another report within the main report. Use when you want to show
data from a one-to-many relationship.
Enables you to draw a line in the report.
Enables you to draw a rectangle in the report.
Click to display other toolboxes and OLE objects.
Design view is used to modify a report so as to make it easier to read and understand. For
example, you might want to add or delete a field, change a column heading, or change the
locations of the fields in the report. Figure 15.12 shows a sample report in design view.
242
Basic Concepts in DBMS
a. In the Database window, click the Reports icon in the Objects bar
b. Select the report you want to modify and click the Design button.
Activity B
What are the types of reports that can be created in Microsoft Access?
4.0 Conclusion
Microsoft Access Reports have powerful built in tool that allow you to present your data
in a professional way. It is possible to include calculations, graphics, and a customized
header or footer in a report.
5.0 Summary
cix. Reports present information from tables and queries in a format that looks great
when printed.
cx. Microsoft Access breaks reports up into separate parts called sections.
cxi. Reports can be created by using report wizard or design view.
243
Basic Concepts in DBMS
cxii. Design view is used to modify a report so as to make it easier to read and
understand.
cxiii. Any object that appears on a report is called a control.
1. Which of the following statements about the AutoReport Wizard is NOT true?
A. The AutoReport Wizard is the fastest and easiest way to create a report in
Microsoft Access.
B. The AutoReport Wizard can only create two types of reports: Columnar
and Tabular.
C. Reports created with the AutoReport Wizard usually come out looking
sharp and professional and don't require further clean-up work.
D. The AutoReport Wizard can only create reports based on a single table or
query .
2. Which of the following statements is NOT true?
A. The Field List displays all the fields from a report's underlying table or
query.
B. Click the Field List button on the Toolbar to display the Field List.
C. You can add fields to a report by dragging them from the Field List onto
the report.
D. The Field List displays all the fields from every table in a database.
3. Controls and their corresponding text labels cannot be moved independently of
one another. (True or False?)
4. Which of the following statements is NOT true?
A. You can move a control to a different location on a report by clicking,
dragging, and dropping the control.
B. To add a page number to a report, select View Header/Footer from the
menu and click the Page Number button on the Header/Footer toolbar.
C. You can resize a report by clicking and dragging the right edge of the
report.
D. You can resize a control by clicking the control to select it, grabbing one
of its sizing handles, and dragging and releasing the mouse button when
the control reaches the desired size.
5. You want a report to group and total sales by month. Where would you place a
calculated control containing the following expression =SUM([Sales]) to calculate
the totals for each month?
A. In the Month Group Footer section.
B. In the Page Footer section.
C. In the Report Footer section.
D. In the Summary section.
6. Which of the following is NOT a report section?
A. Report Header section.
B. Page Header section.
C. Summary section.
D. Detail section.
244
Basic Concepts in DBMS
7. The only way to sort a report's records is to base the report on a query, which
actually does the work of sorting the records. (True or False?)
8. Which of the following expressions is incorrect?
A. =Total for: [Employee].
B. =[InvoiceDate]+30.
C. =[LastName]&" "&[FirstName].
D. =[Units]*[UnitPrice].
9. You want to track the progress of the stock market on a daily basis. Which type of
chart should you use?
A. Line chart.
B. Column chart.
C. Row chart.
D. Pie chart.
10. How do you adjust a page's margins?
A. Click and drag the edge of the page to where you want the margin set.
B. Select Format Page Setup from the menu, click the Margins tab, and
adjust the margins.
C. Select File Page Setup from the menu, click the Margins tab, and
adjust the margins.
D. Click the Margins button on the Formatting toolbar.
11. How can you view a report's sorting and grouping options?
A. Select Format Sorting and Grouping from the menu.
B. By double-clicking the Report Selector box in the upper left corner of the
report.
C. Select File Page Setup from the menu and click the Sorting and
Grouping tab.
D. Click the Sorting and Grouping button on the toolbar.
12. What is the procedure for selecting multiple controls on a report?
A. Press and hold down the Shift key as you click each object that you want
to select.
B. Use the arrow pointer to draw a box around the object that you want to
select.
C. If the controls are aligned along a horizontal or vertical line, click the
horizontal or vertical ruler above or to the left of the controls.
D. All of these.
Brainbell.com (2008). Microsoft Access Tutorial. Retrieved June 20th, 2008, from
http://www.brainbell.com/tutorials/ms-office/Access_2003/
Bcschool.net (2003-2006). Create Database Applications using Microsoft Access,
Retrieved June 20th, 2008, from http://www.bcshool.net/staff/accesshelp.htm
Cisnet.baruch.cuny.edu (2009). Microsoft Access Tutorial. Retrieved March 15th, 2009
from http://cisnet.baruch.cuny.edu/holowczak/classes/2200/access/accessall.html.
Databasedev.co.uk (2009). Microsoft Access Tutorial: Retrieved March 15th, 2009 from
http://www.databasedev.co.uk/plan-an-access-application.html
245