DBMS - UNIT I
DBMS - UNIT I
Data is defined as collection of raw facts about a place, person, thing or object involving in the
transactions of an organization.
Data can be represented in various forms like text, numbers, images, audio, video, graphs, document
files, etc.
Data constitutes the building blocks of information.
Database:
DBMS:
Database management system can be defined as reorganized collection of logically related data and
set of programs used for creating, storing, updating and retrieval of data from the database.
DBMS acts as a mediator between end-user and the database.
Database management system (DBMS): can be defined as collection of programs that manages
database structure and controls access to data.
DBMS enables data to be shared.
The primary goal of a DBMS is to provide a way to store and retrieve database information that is
both convenient and efficient
Database system consists of logically related data stored in a single logical data repository. Database
system may be physically distributed among multiple storage facilities. DBMS eliminates most of
file system’s problems. Current generation stores data structures, relationships between structures,
and access paths. Also defines, stores, and manages all access paths and components
1. Enterprise Information
Accounting: For payments, receipts, account balances, assets and other accounting information.
Human resources: For information about employees, salaries, payroll taxes, and benefits, and for
generation of paychecks.
Manufacturing: For management of the supply chain and for tracking production of items in
factories, inventories of items in warehouses and stores, and orders for items.
Online retailers: For sales data noted above plus online order tracking, generation of
recommendation lists, and maintenance of online product evaluations.
2. Banking Finance
Credit card transactions: For purchases on credit cards and generation of monthly statements.
Finance: For storing information about holdings, sales, and purchases of financial instruments such
as stocks and bonds; also for storing real-time market data to enable online trading by customers
and automated trading by the firm.
Others
Universities: For student information, course registrations, and grades (in addition to standard
enterprise information such as human resources and accounting).
Airlines: For reservations and schedule information. Airlines were among the first to use databases
in a geographically distributed manner.
Database systems arose in response to early methods of computerized management of commercial data.
1. Data redundancy and inconsistency: Since different programmers create the files and application
programs over a long period, the various files are likely to have different structures and the programs may be
written in several programming languages
2. Difficulty in accessing data : Suppose that one of the university clerks needs to find out the names of all
students who live within a particular postal-code area.
3. Data isolation : Because data are scattered in various files, and files may be in different formats, writing
new application programs to retrieve the appropriate data is difficult.
4. Integrity problems : The data values stored in the database must satisfy certain types of consistency
constraints.
5. Atomicity problems : A computer system, like any other device, is subject to failure. In many
applications, it is crucial that, if a failure occurs, the data be restored to the consistent state that existed prior
to the failure.
6. Concurrent-access anomalies : To guard against this possibility, the system must maintain some form of
supervision. But supervision is difficult to provide because data may be accessed by many different
application programs that have not been coordinated previously.
7. Security problems: Not every user of the database system should be able to access all the data.
Databases have evolved since their inception in the 1960s, beginning with hierarchical and network
databases, through the 1980s with object-oriented databases, and today with SQL and NoSQL databases
and cloud databases.
I. RELATIONAL DATABASE
A relational database, invented by E.F. Codd at IBM in 1970, is a tabular database in which data is defined
so that it can be reorganized and accessed in a number of different ways.
Relational databases are made up of a set of tables with data that fits into a predefined category. Each
table has at least one data category in a column, and each row has a certain data instance for the
categories which are defined in the columns.
The Structured Query Language (SQL) is the standard user and application program interface for a
relational database. Relational databases are easy to extend, and a new data category can be added after
the original database creation without requiring that you modify all the existing applications.
A distributed database is a database in which portions of the database are stored in multiple physical
locations, and in which processing is dispersed or replicated among different points in a network
Distributed databases can be homogeneous or heterogeneous. All the physical locations in a homogeneous
distributed database system have the same underlying hardware and run the same operating systems and
database applications. The hardware, operating systems or database applications in a heterogeneous
distributed database may be different at each of the locations.
A cloud database is a database that has been optimized or built for a virtualized environment, either in a
hybrid cloud, public cloud or private cloud. Cloud databases provide benefits such as the ability to pay for
storage capacity and bandwidth on a per-use basis, and they provide scalability on demand, along with
high availability.
A cloud database also gives enterprises the opportunity to support business applications in a software-as-
a-service deployment.
Items created using object-oriented programming languages are often stored in relational databases, but
object-oriented databases are well-suited for those items.
An object-oriented database is organized around objects rather than actions, and data rather than logic.
For example, a multimedia record in a relational database can be a definable data object, as opposed to an
alphanumeric value.
V. GRAPH DATABASE
A graph-oriented database, or graph database, is a type of NoSQL database that uses graph theory to
store, map and query relationships. Graph databases are basically collections of nodes and edges, where
each node represents an entity, and each edge represents a connection between nodes.
File descriptions are stored within each application program that accesses a given file. As a consequence,
any change to a file structure requires changes to the file descriptions for all programs that access the file.
Suppose it is decided to change the customer address field length in the records in a file from 30 to 40
characters. The file descriptions in each program that is affected would have to be modified. It is often
difficult to locate all programs affected by such changes.
b. Duplication of data:
Because applications are often developed independently in file processing systems, unplanned duplicate data
files are the rule rather than the exception. This duplication is wasteful because it requires additional storage
space and increased effort to keep all files up to date. Duplicate data files often result in loss data integrity
4 – Dr.Antony Paul Raj. A
School of Arts and Science, SMVEC – Dept. of Computational Studies
because the data formats may be inconsistent or the data values may not agree. For example, the same data
item may have different names in different files.
With the traditional file processing approach, each application has its own private files and users have little
opportunity to share data outside their own applications. It is often frustrating to managers to find that a
requested report will require a major programming effort to obtain data from several incompatible files in
separate systems. Data are scattered in various files, and the files may be in different formats. Writing new
application program to retrieve data was difficult.
With the traditional file processing approach, there is little opportunity to leverage the previous development
efforts. Each new application requires that the developer essentially start from scratch by designing new file
formats and descriptions. The lengthy development times required are often inconsistent with today’s fast-
paced business environment.
The separation of data description from the application programs that use the data is called the Data
Independence. The Data descriptions are stored in a central location called repository.
The design goal with the database approach is that previously separate data files are integrated into single,
logical structure. Each primary fact is recorded in only one place in database. The database approach does
not eliminate redundancy entirely, but it allows the designer to carefully control the type and the amount
of the redundancy.
By eliminating the data redundancy, the opportunity of reducing the inconsistency has increased. For
example, if a customer’s address is stored only once, we cannot have disagreement on the stored values.
We avoid the wasted storage space that results from redundant data storage.
Enforcement of Standards:
1. A database system is a collection of interrelated data and a set of programs that allow users to access
and modify these data.
2. A major purpose of a database system is to provide users with an abstract view of the data.
3. There is an 3 layers for viewing the data from the database they are as follows :
1. The lowest level of abstraction describes how the data are actually stored.
1. The next-higher level of abstraction describes what data are stored in the database, and what relationships
exist among those data.
2. The logical level thus describes the entire database in terms of a small number of relatively simple structures.
Although implementation of the simple structures at the logical level may involve complex physical-level
structures, the user of the logical level does not need to be aware of this complexity. This is referred to as
physical data independence.
1. The highest level of abstraction describes only part of the entire database.
2. Even though the logical level uses simpler structures, complexity remains because of the variety of
information stored in a large database
DBMS RDBMS
1. DBMS stands for Database Management 1. RDBMS stands for Relational Database
System. Management System.
2. The relationship between two files is 2. On the other hand, RDBMS work differently
controlled in a programming manner in the where the relationship between two files are
DBMS. specified when the tables are created.
3. This program is only capable of supporting a 3. The case is different with RDBMS that can
single user at the time. support a range of users at the time.
4. There are chances of inconsistencies in the 4. The RDBMS is pretty difficult to create and
DBMS as the data don’t get stored using the follow the ACID model that makes them fully
ACID model. structured and consistent.
5. The main motive behind the creation of this
5. These types of database systems are utilized
program is to control the databases present in the
to maintain the relationship in a set of tables.
computer network and its hard disks.
6. RDBMS is used to manage large amount of
6. DBMS is good for managing small data.
data.
7. If you want to alter the data then it’s quite
7. It is very easy to alter data in RDBMS.
complex in DBMS.
8. RDBMS is capable of supporting a great
8. DBMS is greatly utilized by small companies
variety of users and created in such a way that
where small data is involved as it only supports a
broader data can be controlled so it is used for
single user.
big companies.
Many data models have been proposed, and we can categorize them according to the types of concepts
they use to describe the database structure.
An entity represents a real-world object or concept, such as an employee or a project, that is described in
the database. An attribute represents some property of interest that further describes an entity, such as the
employee’s name or salary. A relationship among two or more entities represents an interaction among the
entities, which is explained by the Entity-Relationship
model—a popular high-level conceptual data model.
Representational or implementation data models are the models used most frequently in traditional
commercial DBMSs, and they include the widely-used relational data model, as well as
the so-called legacy data models—the network and hierarchical models—that have been widely used in the
past.
We can regard object data models as a new family of higher level implementation data models that are
closer to conceptual data models.
Object data models are also frequently utilized as highlevel conceptual models, particularly in the software
engineering domain.
Physical data models describe how data is stored in the computer by representing information such as
record formats, record orderings, and access paths. An access path is a structure that makes the search
for particular database records efficient.
o Logical data independence refers characteristic of being able to change the conceptual schema
without having to change the external schema.
o Logical data independence is used to separate the external level from the conceptual view.
o If we do any changes in the conceptual view of the data, then the user view of the data would not
be affected.
o Logical data independence occurs at the user interface level.
A relational database consists of a collection of tables, each of which is assigned a unique name.
Basic Structure
Consider the account table of Figure 3.1. It has three column headers: account-number, branch-name, and
balance. Following the terminology of the relational model, these headers are attributes. For each
attribute, there is a set of permitted values, called the domain of that attribute. For the attribute branch-
name, for example, the domain is the set of all branch names.
LetD1 denote the set of all account numbers, D2 the set of all branch names, and D3the set of all balances.
Any row of account must consist of a 3-tuple (v1, v2, v3), where v1 is an account number (that is, v1 is in
domain D1),v2 is a branch name (that is, v2 is in domain D2), and v3 is a balance (that is, v3 is in domain
D3). In general, account will contain only a subset of the set of all possible rows. Therefore, account is a
subset of
Mathematicians define a relation to be a subset of a Cartesian product of a list of domains. This definition
corresponds almost exactly with our definition of table. The only difference is that we have assigned names
to attributes, whereas mathematicians rely on numeric “names,” using the integer 1 to denote the attribute
whose domain appears first in the list of domains, 2 for the attribute whose domain appears second, and so
on. Because tables are essentially relations, we shall use the mathematical terms relation and tuple in
place of the terms table and row. A tuple variable is a variable that stands for a tuple; in other words, a
tuple variable is a variable whose domain is the set of all tuples.
In the account relation of Figure 3.1, there are seven tuples. Let the tuple variable t refer to the first tuple of
the relation. We use the notation t[account-number] to denote the value of t on the account-number
attribute. Thus, t[account-number] = “A-101,” and t[branch-name] = “Downtown”. Alternatively, we may write
t[1] to denote the value of tuple t on the first attribute (account-number), t[2] to denote branch-name, and so
on. Since a relation is a set of tuples, we use the mathematical notation of t ∈r to denote that tuple t is in
relation r.
For all relations r, the domains of all attributes of r be atomic. A domain is atomic if elements of the domain
are considered to be indivisible units. For example, the set of integers is an atomic domain, but the set of all
sets of integers is a non atomic domain.
1. Requirements Analysis: Talk to the potential users! Understand what data is to be stored, and what
operations and requirements are desired.
2. Conceptual Database Design: Develop a high-level description of the data and constraints (we will
use the ER data model)
3. Logical Database Design: Convert the conceptual model to a schema in the chosen data model of
the DBMS. For a relational database, this means converting the conceptual to a relational schema
(logical schema).
4. Schema Refinement: Look for potential problems in the original choice of schema and try to
redesign.
5. Physical Database Design: Direct the DBMS into choice of underlying data layout (e.g., indexes and
clustering) in hopes of optimizing the performance.
6. Applications and Security Design: How will the underlying database interact with surrounding
applications.
attributes: These are used to describe a particular entity (e.g. name, SS#, height).
Relationships
A relationship is an association among two or more entities. The relationship must be uniquely
identified by the participating entities.
A relationship can also have descriptive attributes, to record additional information about the
relationship (as opposed to about any one participating entity).
DATA DICTIONARY
1. We can define a data dictionary as a DBMS component that stores the definition of data characteristics
and relationships.
2. The DBMS data dictionary provides the DBMS with its self describing characteristic. In effect, the data
dictionary resembles and X-ray of the company’s entire data set, and is a crucial element in the data
administration function.
1. Data elements that are define in all tables of all databases. Specifically the data dictionary stores the
name, datatypes, display formats, internal storage formats, and validation rules. The data dictionary tells
where an element is used, by whom it is used and so on.
3. Indexes define for each database tables. For each index the DBMS stores at least the index name the
attributes used, the location, specific index characteristics and the creation date.
4. Define databases: who created each database, the date of creation where the database is located, who
the
7. Programs that access the database including screen formats, report formats application formats, SQL
queries and so on.
9. Relationships among data elements which elements are involved: whether the relationship are
mandatory or optional, the connectivity and cardinality and so on.
There are four different types of database-system users, differentiated by the way they expect to interact
with the system. Different types of user interfaces have been designed for the different types of users.
NAIVE USERS
1. Are unsophisticated users who interact with the system by invoking one of the application programs that
have been written previously.
2. For example, a bank teller who needs to transfer $50 from account A to account B invokes a program
called transfer.
APPLICATION PROGRAMMERS
1. Are computer professionals who write application programs. Application programmers can choose from
many tools to develop user interfaces.
2. Rapid application development (RAD) tools are tools that enable an application programmer to
construct forms and reports without writing a program.
SOPHISTICATED USERS
1. Interact with the system without writing programs. Instead, they form their requests in a database query
language.
1. Tools simplify analysts’ tasks by letting them view summaries of data in different ways.
2. For instance, an analyst can see total sales by region (for example, North, South, East, and West), or by
product, or by a combination of region and product (that is, total sales of each product in each region).
QUERY PROCESSOR:
DDL interpreter, which interprets DDL statements and records the definitions in the data dictionary.
DML compiler, which translates DML statements in a query language into an evaluation plan consisting
of low-level instructions that the query evaluation engine understands.
A query can usually be translated into any of a number of alternative evaluation plans that all give the same
result. The DML compiler also performs query optimization, that is, it picks the lowest cost evaluation plan
from among the alternatives.
Query evaluation engine, which executes low-level instructions generated by the DML compiler
STORAGE MANAGER:
A storage manager is a program module that provides the interface between the low level data stored in
the database and the application programs and queries submitted to the system. The storage manager is
responsible for the interaction with the file manager.
The storage manager components include:
1. Authorization and integrity manager, which tests for the satisfaction of integrity constraints and
checks the authority of users to access data.
2. Transaction manager, which ensures that the database remains in a consistent (correct) state despite
system failures, and that concurrent transaction executions proceed without conflicting.
3. File manager, which manages the allocation of space on disk storage and the data structures used to
represent information stored on disk.
4. Buffer manager, which is responsible for fetching data from disk storage into main memory, and
deciding what data to cache in main memory. The buffer manager is a critical part of the database system,
since it enables the database to handle data sizes that are much larger than the size of main memory.
TRANSACTION MANAGER:
A transaction is a collection of operations that performs a single logical function in a database application.
Each transaction is a unit of both atomicity and consistency. Thus, we require that transactions do not
violate any database-consistency constraints.
DATABASE ARCHITECTURE
15 – Dr.Antony Paul Raj. A
School of Arts and Science, SMVEC – Dept. of Computational Studies
The DBMS design depends upon its architecture.
o The basic client/server architecture is used to deal with a large number of PCs, web servers, database
servers and other components that
o The client/server architecture consists of many PCs and a workstation which are connected via the
network.
o DBMS architecture depends upon how users are connected to the database to get their request done
1-Tier Architecture
o In this architecture, the database is directly available to the user. It means the user can directly sit on
uses it.
o Any changes done here will directly be done on the database itself. It doesn't provide a handy tool for
end users.
o The 1-Tier architecture is used for development of the local application, where programmers can directly
communicate with the database for the quick response.
Application Programmers
Database Administrators
o The 2-Tier architecture is same as basic client-server. In the two-tier architecture, applications on the
client end can directly communicate with the database at the server side. For this interaction, API's like:
ODBC, JDBC are used.
o The user interfaces and application programs are run on the client-side.
o The server side is responsible to provide the functionalities like: query processing and transaction
management.
o To communicate with the DBMS, client-side application establishes a connection with the server side.
DDL Interpreter
DML Compiler
3-Tier Architecture
o The 3-Tier architecture contains another layer between the client and server. In this architecture, client
can't directly communicate with the server.
o The application on the client-end interacts with an application server which further communicates with
the database system.
o End user has no idea about the existence of the database beyond the application server. The database
also has no idea about any other user beyond the application.
o This Tier is responsible for the Storage and it have Storage Manager that have the following functions