Introduction Chapter
Introduction Chapter
Chapter 1
1.1 Introduction
An organization must have accurate and reliable data (information) for effective decision
making. Data (information) is the backbone and most critical resource of an organization that
enables managers and organizations to gain a competitive edge. In this age of information
explosion, where people are bombarded with data, getting the right information, in the right
amount, at the right time is not an easy task. So, only those organizations will survive that
successfully manage information.
A database system simplifies the tasks of managing the data and extracting useful
information in a timely fashion. A database system is an integrated collection of related files,
along with the details of the interpretation of the data. A Data Base Management System
is a software system or program that allows access to data contained in a database. The
objective of the DBMS is to provide a convenient and effective method of defining, storing,
and retrieving the information stored in the database.
The database and database management systems have become essential for managing
business, governments, schools, universities, banks etc.
In an organization, the data is the most basic resource. To run the organization efficiently,
the proper organization and management of data is essential. The formal definition of the
major terms used in databases and database systems is defined in this section.
1.2.1 Data
The term data may be defined as known facts that could be recorded and stored on Computer
Media. It is also defined as raw facts from which the required information is produced.
1
2 Introduc tion to Database Management System
1.2.2 Information
Data and information are closely related and are often used interchangeably. Information is
nothing but refined data. In other way, we can say, information is processed, organized or
summarized data. According to Burch et. al., “Information is data that have been put into
a meaningful and useful content and communicated to a recipient who uses it to made
decisions”. Information consists of data, images, text, documents and voice, but always in
a meaningful content. So we can say, that information is something more than mere data.
Data are processed to create information. The recipient receives the information and then
makes a decision and takes an action, which may triggers other actions
In these days, there is no lack of data, but there is lack of quality information. The
quality information means information that is accurate, timely and relevant, which are the
three major key attributes of information.
1. Accuracy : It means that the information is free from errors, and it clearly and
accurately reflects the meaning of data on which it is based. It also means it is free
from bias and conveys an accurate picture to the recipient.
2. Timeliness : It means that the recipients receive the information when they need it
and within the required time frame.
3. Relevancy : It means the usefulness of the piece of information for the corresponding
persons. It is a very subjective matter. Some information that is relevant for one
person might not be relevant for another and vice versa e.g., the price of printer is
irrelevant for a person who wants to purchase computer.
So, organization that have good information system, which produce information that is
accurate, timely and relevant will survive and those that do not realize the importance of
information will soon be out of business.
1.2.5 Database
A database is a collection of interrelated data stored together with controlled redundancy
to serve one or more applications in an optimal way. The data are stored in such a way
that they are independent of the programs used by the people for accessing the data. The
approach used in adding the new data, modifying and retrieving the existing data from the
database is common and controlled one.
It is also defined as a collection of logically related data stored together that is designed
to meet information requirements of an organization. We can also define it as an electronic
filling system.
The example of a database is a telephone directory that contains names, addresses and
telephone numbers of the people stored in the computer storage.
4 Introduc tion to Database Management System
Databases are organized by fields, records and files. These are described briefly as
follows :
1.2.5.1 Fields
It is the smallest unit of the data that has meaning to its users and is also called data
item or data element. Name, Address and Telephone number are examples of fields. These
are represented in the database by a value.
1.2.5.2 Records
A record is a collection of logically related fields and each field is possessing a fixed
number of bytes and is of fixed data type. Alternatively, we can say a record is one complete
set of fields and each field have some value. The complete information about a particular
phone number in the database represents a record. Records are of two types fixed length
records and variable length records.
1.2.5.3 Files
A file is a collection of related records. Generally, all the records in a file are of same
size and record type but it is not always true. The records in a file may be of fixed length
or variable length depending upon the size of the records contained in a file. The telephone
directory containing records about the different telephone holders is an example of file. More
detail is available in chapter 3.
Data Items
Relationships
Physical
Database Constraints
Schema
Application Programs
Physical
Meta Data
Database
The DBMS software together with the database is called a Database System.
Conventionally, the data were stored and processed using traditional file processing systems.
In these traditional file systems, each file is independent of other file, and data in different
files can be integrated only by writing individual program for each application. The data
and the application programs that uses the data are so arranged that any change to the data
requires modifying all the programs that uses the data. This is because each file is hard-coded
with specific information like data type, data size etc. Some time it is even not possible to
identify all the programs using that data and is identified on a trial-and-error basis.
A file processing system of an organization is shown in Figure 1.3. All functional areas in
the organization creates, processes and disseminates its own files. The files such as inventory
and payroll generate separate files and do not communicate with each other.
File 1 File 2 File 3 File 1 File 2 File 3 File 1 File 2 File 3 File 1 File 2
No doubt such an organization was simple to operate and had better local control but
the data of the organization is dispersed throughout the functional sub-systems. These days,
databases are preferred because of many disadvantages of traditional file systems.
Introduc tion to Database Systems 7
1.3.1 Disadvantages of Traditional File System
A traditional file system has the following disadvantages:
1. Data Redundancy : Since each application has its own data file, the same data may
have to be recorded and stored in many files. For example, personal file and payroll
file, both contain data on employee name, designation etc. The result is unnecessary
duplicate or redundant data items. This redundancy requires additional or higher
storage space, costs extra time and money, and requires additional efforts to keep
all files upto-date.
2. Data Inconsistency : Data redundancy leads to data inconsistency especially when
data is to be updated. Data inconsistency occurs due to the same data items that
appear in more than one file do not get updated simultaneously in each and every
file. For example, an employee is promoted from Clerk to Superintendent and the
same is immediately updated in the payroll file may not necessarily be updated
in provident fund file. This results in two different designations of an employee at
the same time. Over the period of time, such discrepencis degrade the quality of
information contain in the data file that affects the accuracy of reports.
3. Lack of Data Integration : Since independent data file exists, users face difficulty in
getting information on any ad hoc query that requires accessing the data stored in
many files. In such a case complicated programs have to be developed to retrieve
data from every file or the users have to manually collect the required information.
4. Program Dependence : The reports produced by the file processing system are
program dependent, which means if any change in the format or structure of data
and records in the file is to be made, the programs have to modified correspondingly.
Also, a new program will have to be developed to produce a new report.
5. Data Dependence : The Applications/programs in file processing system are data
dependent i.e., the file organization, its physical location and retrieval from the storage
media are dictated by the requirements of the particular application. For example, in
payroll application, the file may be organised on employee records sorted on their
last name, which implies that accessing of any employee’s record has to be through
the last name only.
6. Limited Data Sharing : There is limited data sharing possibilities with the traditional
file system. Each application has its own private files and users have little choice
to share the data outside their own applications. Complex programs required to be
written to obtain data from several incompatible files.
7. Poor Data Control : There was no centralised control at the data element level,
hence a traditional file system is decentralised in nature. It could be possible that
the data field may have multiple names defined by the different departments of an
organization and depending on the file it was in. This situation leads to different
meaning of a data field in different context or same meaning for different fields.
This causes poor data control.
8. Problem of Security : It is very difficult to enforce security checks and access rights
in a traditional file system, since application programs are added in an adhoc manner.
8 Introduc tion to Database Management System
9. Data Manipulation Capability is Inadequate : The data manipulation capability is
very limited in traditional file systems since they do not provide strong relationships
between data in different files.
10. Needs Excessive Programming : An excessive programming effort was needed to
develop a new application program due to very high interdependence between
program and data in a file system. Each new application requires that the developers
start from the scratch by designing new file formats and descriptions and then write
the file access logic for each new file.
Database
(Hardware)
User
(Data)
Application
programs User
(Software)
(Data)
User
(Data)
User
(Users)
1. Data : The whole data in the system is stored in a single database. This data in the
database are both shared and integrated. Sharing of data means individual pieces of data
in the database is shared among different users and every user can access the same piece
of data but may be for different purposes. Integration of data means the database can be
function of several distinct files with redundancy controlled among the files.
Introduc tion to Database Systems 9
2. Hardware : The hardware consists of the secondary storage devices like disks, drums
and so on, where the database resides together with other devices. There is two types of
hardware. The first one, i.e., processor and main memory that supports in running the DBMS.
The second one is the secondary storage devices, i.e., hard disk, magnetic disk etc., that are
used to hold the stored data.
3. Software : A layer or interface of software exists between the physical database and
the users. This layer is called the DBMS. All requests from the users to access the database
are handled by the DBMS. Thus, the DBMS shields the database users from hardware details.
Furthermore, the DBMS provides the other facilities like accessing and updating the data in
the files and adding and deleting files itself.
4. Users : The users are the people interacting with the database system in any way.
There are four types of users interacting with the database systems. These are Application
Programmers, online users, end users or naive users and finally the Database Administrator
(DBA). More about users in section 1.4.
The users of a database system can be classified into various categories depending upon
their interaction and degree of expertise of the DBMS.
12 Introduc tion to Database Management System
1.4.1 End Users or Naive Users
The end users or naive users use the database system through a menu-oriented application
program, where the type and range of response is always displayed on the screen. The user
need not be aware of the presence of the database system and is instructed through each
step. A user of an ATM falls in this category.
The DBMS provides different languages and interfaces for each category of users to express
database queries and updations. When the design of the database is complete and the DBMS
is chosen to implement it, the first thing to be done is to specify the conceptual and internal
schemas for the database and the corresponding mappings. The following five languages are
available to specify different schemas.
1. Data Definition Language (DDL) 2. Storage Definition Language (SDL)
3. View Definition Language (VDL) 4. Data Manipulation Language (DML)
5. Fourth-Generation Language (4-GL)
The plans of the database and data stored in the database are most important for an
organization, since database is designed to provide information to the organization. The data
stored in the database changes regularly but the plans remain static for longer periods of time.
1.6.1 Schema
A schema is plan of the database that give the names of the entities and attributes and
the relationship among them. A schema includes the definition of the database name, the
record type and the components that make up the records. Alternatively, it is defined as
a frame-work into which the values of the data items are fitted. The values fitted into the
frame-work changes regularly but the format of schema remains the same e.g., consider the
database consisting of three files ITEM, CUSTOMER and SALES. The data structure diagram
for this schema is shown in Figure 1.5. The schema is shown in database language.
Generally, a schema can be partitioned into two categories, i.e., (i) Logical schema and
(ii) Physical schema.
(i) The logical schema is concerned with exploiting the data structures offered by the
DBMS so that the schema becomes understandable to the computer. It is important
as programs use it to construct applications.
(ii) The physical schema is concerned with the manner in which the conceptual database
get represented in the computer as a stored database. It is hidden behind the logical
schema and can usually be modified without affecting the application programs.
The DBMS’s provide DDL and DSDL to specify both the logical and physical schema.
Introduc tion to Database Systems 15
FIGURE 1.5. Data structure diagram for the item sales record.
1.6.2 Subschema
A subschema is a subset of the schema having the same properties that a schema has. It
identifies a subset of areas, sets, records, and data names defined in the database schema
available to user sessions. The subschema allows the user to view only that part of the
database that is of interest to him. The subschema defines the portion of the database as
seen by the application programs and the application programs can have different view of
data stored in the database.
The different application programs can change their respective subschema without affecting
other’s subschema or view.
The Subschema Definition Language (SDL) is used to specify a subschema in the DBMS.
1.6.3 Instances
The data in the database at a particular moment of time is called an instance or a database
state. In a given instance, each schema construct has its own current set of instances. Many
instances or database states can be constructed to correspond to a particular database schema.
Everytime we update (i.e., insert, delete or modify) the value of a data item in a record,
one state of the database changes into another state. The Figure 1.6 shows an instance of
the ITEM relation in a database schema.
16 Introduc tion to Database Management System
ITEM
The architecture is a framework for describing database concepts and specifying the structure
of database system. The three level architecture was suggested by Ansi/Sparc. Here
database is divided into three levels external level, conceptual level and internal level as
shown in Figure 1.7.
Views and External External External External
mappings view 1 view 2 view N level
are
maintained
by DBA
Conceptual/Internal
mapping
A data model is a collection of concepts that can be used to describe the structure of the
database including data types, relationships and the constraints that apply on the data.
A data model helps in understanding the meaning of the data and ensures that, we
understand.
Introduc tion to Database Systems 19
– The data requirements of each user.
– The use of data across various applications.
– The nature of data independent of its physical representations.
A data model supports communication between the users and database designers. The major
use of data model is to understand the meaning of the data and to facilitate communication
about the user requirements.
WORLD
LAN Server
Operating System
Database Files (Storage)
Communications
PC PC
— User Interface
— DBMS
— Operating System
— Database Files (Storage)
— Operating System
Terminal Terminal
— Display — Display
— Keyboard — Keyboard
Database Database
Network
Client/Server Client/Server
Database Database
Client Client
Network
Server
Client Client
1. Server : Server is DBMS itself. It consists of DBMS and supports all basic DBMS
functions. Server components of DBMS are installed at server. It acts as monitor of all of
its clients. It distributes work-load to other computers. Clients must obey their servers.
Functions of Server : The server performs various functions, which are as follows.
1. It supports all basic DBMS functions.
2. Monitor all his clients.
3. Distribute work-load over clients.
4. Solve problems which are not solved by clients.
5. Maintain security and privacy.
6. Avoiding unauthorized access of data.
2. Clients : Client machine is a personal computer or workstation which provide services
to both server and users. It must obey his server. Client components of DBMS are installed
at client site. Clients are taking instructions from server and help them by taking their load.
When any user want to execute a query on client, the client first take data from server then
execute the query on his own hardware and returns the result to the server. As a result,
server is free to do more complex applications.
3. Network Interface : Clients are connected to server by network interface. It is useful
in connecting the server interface with user interface so that server can run his applications
over his clients.
In the client server architecture, there are more than one server. Sometimes, a server is
used as Database Server, other as Application Server, other as Backup Server etc.
Advantages of Client-Server Database System
1. It increase the overall performance of DBMS.
2. Load can be distributed among clients.
3. It provides better user interface.
4. It is used to develop highly complex applications.
5. Clients with different operating systems can be connected with each other.
6. Single copy of DBMS is shared.
7. It reduces cost.
Disadvantages of Client-Server Database System
1. Network is error prone.
2. It is a kind of centralized system. If server is crashed or failed, there is loss of data.
32 Introduc tion to Database Management System
3. Recovery is typical and additional burden on DBMS server to handle concurrency
control.
4. Programming cost is high.
5. The implementation is more complex since one needs to deal with the middle ware
and the network.
True/False
1. A database actually consists of three parts: information, the logical structure of that information,
and tables.
2. A data dictionary, or relation, is a two-dimensional table used to store data within a relational
database.
3. A database management system (DBMS) allows you to specify the logical organization for a
database and access and use the information within a database.
4. A physical view represents how the users view the data.
5. A database may have numerous physical views.
6. Fixed length record sometimes wastes space while variable length record does not waste space.
7. A database is any collection of related data.
8. A DBMS is a software system to facilitate the creation and maintenance of a computerized
database.
9. End-users can be categorized into casual, designer, or parametric users.