Introduction to DBMS.docx
Introduction to DBMS.docx
Unit – I
1- Introduction
1. A database-management system (DBMS) is a collection of interrelated data and a set of programs to
access those data.
2. The collection of data, usually referred to as the database, contains information relevant to an
enterprise.
3. The primary goal of a DBMS is to provide a way to store and retrieve database information that is both
convenient and efficient.
4. Database systems are designed to manage large bodies of information. Management of data involves
both defining structures for storage of information and providing mechanisms for the manipulation of
information.
5. The database system must also ensure the safety of the information stored, despite system crashes or
attempts at unauthorized access.
• Banking: For customer information, accounts, and loans, and banking transactions.
• Airlines: For reservations and schedule information. Airlines were among the first to use databases in a
geographically distributed manner—terminals situated around the world accessed the central database
system through phone lines and other data networks.
• Universities: For student information, course registrations, and grades.
• Credit card transactions: For purchases on credit cards and generation of monthly statements.
• Telecommunication: For keeping records of calls made, generating monthly bills, maintaining
balances on prepaid calling cards, and storing information about the communication networks.
• Finance: For storing information about holdings, sales, and purchases of financial instruments such as
stocks and bonds.
• Sales: For customer, product, and purchase information.
• Manufacturing: For management of supply chain and for tracking production of items in factories,
inventories of items in warehouses/stores, and orders for items.
• Human resources: For information about employees, salaries, payroll taxes and benefits, and for
generation of paychecks.
A typical file-processing system is supported by a conventional operating system. The system stores
permanent records in various files, and it needs different application programs to extract records from,
and add records to, the appropriate files. Before database management systems (DBMSs), organizations
usually stored information in such systems.
1
2. The various files are likely to have different formats and the programs may be written in several
programming languages.
3. The same information may be duplicated in several places (files). This redundancy leads to higher
storage and access cost.
4. It also leads to data inconsistency; that is, the various copies of the same data may no longer agree.
• Data isolation.
Because data are scattered in various files, and files may be in different formats, writing new application
programs to retrieve the appropriate data is difficult.
• Integrity problems.
1. The data values stored in the database must satisfy certain types of consistency constraints.
2. For example, the balance of a bank account may never fall below a prescribed amount (say, $25).
Developers enforce these constraints in the system by adding appropriate code in the various application
programs.
3. When new constraints are added, it is difficult to change the programs to enforce them.
4. The problem is compounded when constraints involve several data items from different files.
• Atomicity problems.
1. A computer system, like any other mechanical or electrical device, is subject to failure.
2. In many applications, it is crucial that, if a failure occurs, the data be restored to the consistent state that
existed prior to the failure.
3. Consider a program to transfer $50 from account A to account B. If a system failure occurs during the
execution of the program, it is possible that the $50 was removed from account A but was not credited to
account B, resulting in an inconsistent database state.
4. It is essential to database consistency that either both the credit and debit occur, or that neither occur.
That is, the funds transfer must be atomic—it must happen in its entirety or not at all.
5. It is difficult to ensure atomicity in a conventional file-processing system.
• Concurrent-access anomalies.
1. For the overall performance of the system and faster response, many systems allow multiple users to
update the data simultaneously.
2. In such an environment, interaction of concurrent updates may result in inconsistent data.
3. Consider bank account A, containing $500. If two customers withdraw funds (say $50 and $100
respectively) from account A at about the same time, the result of the concurrent executions may leave
the account in an incorrect (or inconsistent) state. Suppose that the programs executing on behalf of each
withdrawal read the old balance, reduce that value by the amount being withdrawn, and write the result
back. If the two programs run concurrently, they may both read the value $500, and write back $450 and
$400, respectively. Depending on which one writes the value last, the account may contain$450 or $400,
rather than the correct value of $350.
4. To guard against this possibility, the system must maintain some form of supervision.
5. But supervision is difficult to provide because data may be accessed by many different application
programs that have not been coordinated previously.
2
• Security problems.
1. Not every user of the database system should be able to access all the data.
2. For example, in a banking system, payroll personnel need to see only that part of the database that has
information about the various bank employees. They do not need access to information about customer
accounts.
3. But, since application programs are added to the system in an ad hoc manner, enforcing such security
constraints is difficult.
4- View of Data
A major purpose of a database system is to provide users with an abstract view of the data.
That is, the system hides certain details of how the data are stored and maintained.
• Physical level.
The lowest level of abstraction describes how the data are actually stored. The physical level describes
complex low-level data structures in detail.
• Logical level.
The next-higher level of abstraction describes what data are stored in the database, and what
relationships exist among those data. The logical level of abstraction is used by the Database
administrator, who decides what information is to kept in the database.
• View level.
The highest level of abstraction describes only part of the entire database.
The logical level uses simpler structures, but still the complexity remains because of the variety of
information stored in a large database.
Many users of the database system do not need all this information; instead, they need to access only a
part of the database.
The view level of abstraction exists to simplify their interaction with the system. The system may
provide many views for the same database.
3
At the physical level, a customer, account, or employee record can be described as a block of
consecutive storage locations (for example, words or bytes). The language compiler hides this level of
detail from programmers. Similarly, the database system hides many of the lowest-level storage details
from database programmers. Database administrators, on the other hand, may be aware of certain details
of the physical organization of the data.
At the logical level, each such record is described by a type definition, and the interrelationship of
these record types is defined. Programmers using a programming language work at this level of
abstraction. Similarly, database administrators usually work at this level of abstraction.
At the view level, computer users see a set of application programs that hide details of the data types.
At the view level, several views of the database are defined, and database users see these views. In
addition to hiding details of the logical level of the database, the views also provide a security
mechanism to prevent users from accessing certain parts of the database.
Database systems have several schemas, partitioned according to the levels of abstraction.
● The physical schema describes the database design at the physical level.
● The logical schema describes the database design at the logical level.
● A database may also have several schemas at the view level, sometimes called subschemas that describe
different views of the database.
Data Independence
The programmers construct applications by using the logical schema.
The physical schema is hidden beneath the logical schema, and can be changed easily without affecting
the application programs.
4
The ability to modify a schema definition in one level without affecting a schema definition in the next
higher level is called data independence.
Logical data independence is more difficult to achieve than is physical data independence, since
application programs are heavily dependent on the logical structure of the data that they access.
5- Data Models
The data model: a collection of conceptual tools for describing data, data relationships, data semantics,
and consistency constraints.
The following are the two data models that describe the design of a database at the logical level.
The entity-relationship (E-R) data model is based on a perception of a real world that consists of a
collection of basic objects, called entities, and of relationships among these objects.
An entity is a “thing” or “object” in the real world that is distinguishable from other objects.
For example, each person is an entity, and bank accounts can be considered as entities.
The set of all entities of the same type and the set of all relationships of the same type are termed an
entity set and relationship set, respectively.
The overall logical structure (schema) of a database can be expressed graphically by an E-R diagram,
which is built up from the following components:
• Rectangles, which represent entity sets
• Ellipses, which represent attributes
• Diamonds, which represent relationships among entity sets
• Lines, which link attributes to entity sets and entity sets to relationships
Each component is labeled with the entity or relationship that it represents.
5
As an illustration, consider part of a database banking system consisting of customers and of the
accounts that these customers have.
The E-R diagram indicates that there are two entity sets, customer and account, with attributes.
The diagram also shows a relationship depositor between customer and account.
The E-R model also represents certain constraints to which the contents of a database must conform.
One important constraint is mapping cardinalities, which express the number of entities to which
another entity can be associated via a relationship set.
For example, if each account must belong to only one customer, the E-R model can express that
constraint.
The relational model uses a collection of tables to represent both data and the relationships among those
data. Each table has multiple columns, and each column has a unique name.
Figure 1.3 below presents a sample relational database comprising three tables: One shows details of
bank customers, the second shows accounts, and the third shows which accounts belong to which
customers.
The customer table shows, for example, that the customer identified by customer-id 192-83-7465 is
named Johnson and lives at 12 Alma St. in Palo Alto.
6
The account table shows, for example, that account A-101 has a balance of $500, and A-201 has a
balance of $900.
The relational model is an example of a record-based model. Record-based models are so named
because the database is structured in fixed-format records of several types.
Each table contains records of a particular type. Each record type defines a fixed number of fields, or
attributes.
The columns of the table correspond to the attributes of the record type.
The relational model hides such low-level implementation details from database developers and users.
The relational data model is the most widely used data model.
The relational model is at a lower level of abstraction than the E-R model.
Database designs are often carried out in the E-R model, and then translated to the relational Model.
7
5.3 Other Data Models
The object-relational data model combines features of the object-oriented data model and relational
data model.
Historically, two other data models, the network data model and the hierarchical data model,
preceded the relational data model.
6- Database Languages
A database system provides a data definition language to specify the database schema and a data
manipulation language to express database queries and updates.
The data definition and data manipulation languages are not two separate languages; they are the part of
single database language, the widely used SQL language.
The following statement in the SQL language defines the account table:
create table account(account-number char(10), balance integer);
The storage structure and access methods used by the database system are specified by a set of
statements in a special type of DDL called a data storage and definition language.
These statements define the implementation details of the database schemas, which are usually hidden
from the users.
The data values stored in the database must satisfy certain consistency constraints.
For example, suppose the balance on an account should not fall below $100.
The DDL provides facilities to specify such constraints.
The database systems check these constraints every time the database is updated.
Declarative DMLs are usually easier to learn and use than are procedural DMLs.
Since a user does not have to specify how to get the data, the database system has to figure out an
efficient means of accessing data. The DML component of the SQL language is nonprocedural.
Application programs are programs that are used to interact with the database.
Application programs are usually written in a host language, such as Cobol, C, C++, or Java.
To access the database, DML statements need to be executed from the host language.
• Naive users -are unsophisticated users who interact with the system by invoking one of the
application programs that have been written previously.
For example, a bank teller who needs to transfer $50 from account A to account B invokes a program
called transfer. This program asks the teller for the amount of money to be transferred, the account from
which the money is to be transferred, and the account to which the money is to be transferred.
9
• Application programmers- are computer professionals who write application programs. Application
programmers can choose from many tools to develop user interfaces. Rapid application development
(RAD) tools are tools that enable an application programmer to construct forms and reports without
writing a program.
• Sophisticated users -interact with the system without writing programs. They form their requests in a
database query language. They submit the query to a query processor, whose function is to break down
DML statements into instructions that the storage manager understands. Analysts who submit queries to
explore data in the database fall in this category.
• Specialized users -are sophisticated users who write specialized database applications that do not fit
into the traditional data-processing framework. Among these applications are computer-aided design
systems, knowledgebase and expert systems, systems that store data with complex data types (for
example, graphics data and audio data), and environment-modeling systems.
• Storage structure and access-method definition-The DBA creates appropriate storage structures and
access methods by writing a set of definitions, which is translated by the data storage and data definition
language compiler.
• Schema and physical-organization modification.-The DBA carries out changes to the schema and
physical organization to reflect the changing needs of the organization, or to alter the physical
organization to improve performance.
• Granting of authorization for data access- By granting different types of authorization, the database
administrator can regulate which parts of the database various users can access. The authorization
information is kept in a special system structure that the database system consults whenever someone
attempts to access the data in the system.
• Routine maintenance.-
Examples of the database administrator’s routine maintenance activities are:
Periodically backing up the database, either onto tapes or onto remote servers, to prevent loss of data
in case of disasters such as flooding.
Ensuring that enough free disk space is available for normal operations and upgrading disk space as
required.
Monitoring jobs running on the database and ensuring that performance is not degraded by very
expensive tasks submitted by some users.
The functional components of a database system can be broadly divided into the storage manager and
the query processor components.
The storage manager is important because databases typically require a large amount of storage space.
Corporate databases range in size from hundreds of gigabytes to, for the largest databases, terabytes of
data.
Since the main memory of computers cannot store this much information, the information is stored on
disks. Data are moved between disk storage and main memory as needed.
The query processor is important because it helps the database system simplify and facilitate access to
data.
The storage manager implements several data structures as part of the physical system implementation:
• Data files, which store the database itself.
• Data dictionary, which stores metadata about the structure of the database, in particular the schema of
the database.
• Indices, which provide fast access to data items that hold particular values.
12