1-Introduction To DBMS
1-Introduction To DBMS
Topic Covered:
Introduction to Databases and Transactions
What is database system, purpose of database system, view of data, relational databases, database architecture,
transaction management.
Data: It can be defined as the raw information which is processed by computer. Database: It is nothing but collection
View of Data
A major purpose of a database system is to provide users with an abstract view of the data.
Data Abstraction
Many database-system users are not computer trained, developers hide the complexity from users through several levels of
abstraction, to simplify users interactions with the system:
• Physical level:
The lowest level of abstraction describes how the data are actually stored.
The physical level describes complex low-level data structures in detail.
• Logical level:
The next-higher level of abstraction describes what data are stored in the database, and what relationships exist
among those data.
The logical level thus describes the entire database in terms of a small number of relatively simple structures.
Although implementation of the simple structures at the logical level may involve complex physical-level
structures, the user of the logical level does not need to be aware of this complexity. This is referred to as
physical data independence.
Database administrators, who must decide what information to keep in the database, use the logical level of
abstraction.
• View level.
The highest level of abstraction describes only part of the entire database. Even though the logical level uses
simpler structures, complexity remains because of the variety of information stored in a large database.
Many users of the database system do not need all this information; instead, they need to access only a part of
the database. The view level of abstraction exists to simplify their interaction with the system. The system may
provide many views for the same database.
Data Independence
Data independence can be classified into two types
1. Logical data independence
It is the ability to modify the conceptual schema without affecting the existing external schemas.
In logical data independence, the users are shielded from changes in the logical structure of the data or changes in the
choice of relations to be stored.
The changes to the conceptual schema, such as the addition and deletion of entities, addition and deletion of attributes, or
addition and deletion of relationships must be possible without changing existing external schemas or having to rewrite
application programs.
Only the view definition and the mapping need be changed in a DBMS that supports logical data independence.
2. Physical data independence
The ability to modify the internal schema without having to change the conceptual or external schemas is called physical
data independence.
In physical data independence, the conceptual schema insulates the users from changes in the physical storage of the data.
The changes to the internal schema, such as using different file organizations or storage structures, using different storage
devices , modifying indexes or hashing algorithms
must be possible without changing the conceptual or external schemas.
In other words, physical data independence indicates that the physical storage structures or devices used for storing the data
could be changed without necessitating a change in the conceptual view or any of the external views.
Note:
The Logical data independence is difficult to achieve than physical data independence as it requires the flexibility in the
design of database and programmer has to anticipate the future requirements or modifications in the design of the
database.
Database Languages
A database system provides a data-definition language to specify the database
schema and a data-manipulation language to express database queries and updates.
In practice, the data-definition and data-manipulation languages are not two separate languages; instead they
simply form parts of a single database language, such as the widely used SQL language.
Data-Manipulation Language
A data-manipulation language (DML) is a language that enables users to access or manipulate data as organized by the
appropriate data model. The types of access are:
• Retrieval of information stored in the database
• Insertion of new information into the database
• Deletion of information from the database
• Modification of information stored in the database There are
basically two types:
• Procedural DMLs require a user to specify what data are needed and how to get those data.
• Declarative DMLs (also referred to as nonprocedural DMLs) require a user to specify what
data are needed without specifying how to get those data.
Declarative DMLs are usually easier to learn and use than are procedural DMLs.
A query is a statement requesting the retrieval of information. The portion of a DML that involves information retrieval is called
a query language.
Data-Definition Language
We specify a database schema by a set of definitions expressed by a special language called a data-definition language
(DDL). The DDL is also used to specify additional properties of the data.
We specify the storage structure and access methods used by the database system by a set of statements in a special type
of DDL called a data storage and definition language. These statements define the implementation details of the
database schemas, which are usually hidden from the users.
The data values stored in the database must satisfy certain consistency constraints.
For example, suppose the university requires that the account balance of a department must never be negative.
The DDL provides facilities to specify such constraints. The database system checks these constraints every time the
database is updated.
Relational Databases
A relational database is based on the relational model and uses a collection of tables to represent both data and the
relationships among those data. It also includes a DML and DDL.
Attributes
Advantages:
1. Ease of use: The revision of any information as tables consisting of rows and columns is much easier to understand.
2. Flexibility: Different tables from which information has to be linked and extracted can be easily manipulated by
operators such as project and join to give information in the form in which it is desired.
3. Precision: The usage of relational algebra and relational calculus in the manipulation of the relations between the tables
ensures that there is no ambiguity, which may otherwise arise in establishing the linkages in a complicated network type
database.
4. Security: Security control and authorization can also be implemented more easily by moving sensitive attributes in a
given table into a separate relation with its own authorization controls. If authorization requirement permits, a particular
attribute could be joined back with others to enable full information retrieval.
5. Data Independence: Data independence is achieved more easily with normalization structure used in a relational
database than in the more complicated tree or network structure.
6. Data Manipulation Language: The possibility of responding to query by means of a language based on relational
algebra and relational calculus e.g SQL is easy in the relational database approach. For data organized in other structure
the query language either becomes complex or extremely limited in its capabilities.
Disadvantages:
1. Performance: A major constraint and therefore disadvantage in the use of relational database system is machine
performance. If the number of tables between which relationships to be established are large and the tables themselves
effect the performance in responding to the sql queries.
2. Physical Storage Consumption: With an interactive system, for example an operation like join would depend upon the
physical storage also. It is, therefore common in relational databases to tune the databases and in such a case the physical
data layout would be chosen so as to give good performance in the most frequently run operations. It therefore would
naturally result in the fact that the lays frequently run operations would tend to become even more shared.
3. Slow extraction of meaning from data: if the data is naturally organized in a hierarchical manner and stored as such,
the hierarchical approach may give quick meaning for that data.
Q.Write a Note on Database Architecture.
Database Architecture
A database system is partitioned into modules that deal with each of the responsibilities
of the overall system. The functional components of a database system can be broadly divided into the storage manager and the
query processor components.
The storage manager is important because databases typically require a large amount of storage space. Corporate databases range in
size from hundreds of gigabytes to, for the largest databases, terabytes of data. A gigabyte is approximately 1000 megabytes
(actually 1024) (1 billion bytes), and a terabyte is 1 million megabytes (1 trillion bytes). Since the main memory of computers
cannot store this much information, the information is stored on disks. Data are moved between disk
storage and main memory as needed. Since the movement of data to and from disk is slow relative to the speed of the central
processing unit, it is imperative that the database system structure the data so as to minimize the need
to move data between disk and main memory.
The query processor is important because it helps the database system to simplify and facilitate access to data. The query processor
allows database users to obtain good performance while being able to work at the view level and not be burdened with
understanding the physical-level details of the implementation of the system. It is the job of the database system to translate
updates and queries written in a nonprocedural language, at the logical level, into an efficient sequence of operations at the
physical level.
1.7.1 Storage Manager
The storage manager is the component of a database system that provides the interface between the low-level data
stored in the database and the application programs and queries submitted to the system.
The storage manager is responsible for the interaction with the file manager. The raw data are stored on the disk
using the file system provided by the operating system. The storage manager translates the various DML statements
into low-level file-system commands.
1.7 Data Storage and Querying
Thus, the storage manager is responsible for storing, retrieving, and updating data in the database.
The storage manager components include:
• Authorization and integrity manager, which tests for the satisfaction of integrity constraints and
checks the authority of users to access data.
• Transaction manager, which ensures that the database remains in a consistent (correct) state despite
system failures, and that concurrent transaction executions proceed without conflicting.
• File manager, which manages the allocation of space on disk storage and the data structures used to
represent information stored on disk.
• Buffer manager, which is responsible for fetching data from disk storage into main memory, and
deciding what data to cache in main memory. The buffer manager is a critical part of the database
system, since it enables the database to handle data sizes that are much larger than the size of main
memory.
The storage manager implements several data structures as part of the physical system implementation:
• Data files, which store the database itself.
• Data dictionary, which stores metadata about the structure of the database, in particular the schema of
the database.
• Indices, which can provide fast access to data items. Like the index in this textbook, a database index
provides pointers to those data items that hold a particular value. For example, we could use an index
to find the instructor record with a particular ID, or all instructor records with a particular name.
Hashing is an alternative to indexing that is faster in some but not all cases.
1.7.2 The Query Processor
o The query processor components include:
o DDL interpreter, which interprets DDL statements and records the definitions in the data dictionary.
o DML compiler, which translates DML statements in a query language into an evaluation plan
consisting of low-level instructions that the query evaluation engine understands.
o A query can usually be translated into any of a number of alternative evaluation plans that all give the same
result. The DML compiler also performs query optimization; that is, it picks the lowest cost evaluation plan
from among the alternatives.
o Query evaluation engine, which executes low-level instructions generated by the DML compiler.
Database Users:
Users are differentiated by the way they expect to interact with the system n Application
programmers – interact with system through DML calls n Sophisticated users – form
requests in a database query language
n Specialized users – write specialized database applications that do not fit into the traditional data processing
framework
n Native users – invoke one of the permanent application programs that have been written previously
l Examples, people accessing database over the web, bank tellers, clerical staff Coordinates all the
activities of the database system; the database administrator has a good understanding of the
enterprise‘s information resources and needs.
n Database administrator's duties include:
l Schema definition
l Storage structure and access method definition l Schema
and physical organization modification l Granting user
authority to access the database l Specifying integrity
constraints
l Acting as liaison with users
l Monitoring performance and responding to changes in requirements
Transaction Management
A transaction is a collection of operations that performs a single logical function in a database application. Each transaction
is a unit of both atomicity and consistency. Thus, we require that transactions do not violate any database consistency
constraints. That is, if the database was consistent when a transaction started, the database must be consistent when the
transaction successfully terminates.