0% found this document useful (0 votes)
15 views

UNIT 1 Introduction To DBMS

This document provides an introduction to Database Management Systems (DBMS), explaining key concepts such as data, records, tables, and the importance of management systems in maintaining data integrity and security. It outlines the advantages of using a DBMS over traditional file processing systems, including improved data sharing, consistency, and reduced redundancy, while also addressing the complexities and potential drawbacks of DBMS. The document emphasizes the significance of DBMS in various applications, such as banking, education, and telecommunications.

Uploaded by

ankitfrnd45
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views

UNIT 1 Introduction To DBMS

This document provides an introduction to Database Management Systems (DBMS), explaining key concepts such as data, records, tables, and the importance of management systems in maintaining data integrity and security. It outlines the advantages of using a DBMS over traditional file processing systems, including improved data sharing, consistency, and reduced redundancy, while also addressing the complexities and potential drawbacks of DBMS. The document emphasizes the significance of DBMS in various applications, such as banking, education, and telecommunications.

Uploaded by

ankitfrnd45
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

Unit 1

Introduction to Database Management System

1.1 Basic concepts:


As the name suggests, the database management system consists of two parts. They are:
1. Database and
2. Management System

What is a Database?
To find out what database is, we have to start from data, which is the basic building block of any DBMS.

Data: Facts, figures, statistics etc. having no particular meaning (e.g. 1, ABC, 19 etc).
Record: Collection of related data items, e.g. in the above example the three data items had no
meaning. But if we organize them in the following way, then they collectively represent
meaningful information.
Roll Name Age
1 ABC 19

Table or Relation: Collection of related records.

Roll Name Age


1 ABC 19
2 DEF 22
3 XYZ 28

The columns of this relation are called Fields, Attributes or Domains. The rows are called Tuples
or Records.
Database: Collection of related relations. Consider the following collection of tables:
T1 T2
Roll Name Age Roll Address
1 ABC 19 1 KOL
2 DEF 22 2 DEL
3 XYZ 28 3 MUM
T3 T4

Roll Year Year Hostel


1 I I H1
2 II II H2
3 I

We now have a collection of 4 tables. They can be called a “related collection” because we can
clearly find out that there are some common attributes existing in a selected pair of tables. Because
of these common attributes we may combine the data of two or more tables together to find out the
complete details of a student. Questions like “Which hostel does the youngest student live in?”
can be answered now, although Age and Hostel attributes are in different tables.

A database in a DBMS could be viewed by lots of different people with different responsibilities.

Figure 1.1: Empolyees are accessing Data through DBMS

For example, within a company there are different departments, as well as customers, who each need
to see different kinds of data. Each employee in the company will have different levels of access to
the database with their own customized front-end application.

In a database, data is organized strictly in row and column format. The rows are called Tuple or
Record. The data items within one row may belong to different data types. On the other hand, the
columns are often called Domain or Attribute. All the data items within a single attribute are of
the same data type.
What is Management System?

A database-management system (DBMS) is a collection of interrelated data and a set of programs


to access those data. This is a collection of related data with an implicit meaning and hence is a
database. The collection of data, usually referred to as the database, contains information relevant to
an enterprise. The primary goal of a DBMS is to provide a way to store and retrieve database
information that is both convenient and efficient. By data, we mean known facts that can be recorded
and that have implicit meaning.

The management system is important because without the existence of some kind of rules and
regulations it is not possible to maintain the database. We have to select the particular attributes
which should be included in a particular table; the common attributes to create relationship between
two tables; if a new record has to be inserted or deleted then which tables should have to be handled
etc. These issues must be resolved by having some kind of rules to follow in order to maintain the
integrity of the database.

Database systems are designed to manage large bodies of information. Management of data involves
both defining structures for storage of information and providing mechanisms for the manipulation
of information. In addition, the database system must ensure the safety of the information stored,
despite system crashes or attempts at unauthorized access. If data are to be shared among several
users, the system must avoid possible anomalous results.

Because information is so important in most organizations, computer scientists have developed a


large body of concepts and techniques for managing data. These concepts and technique form the
focus of this book. This chapter briefly introduces the principles of database systems.

Database Management System (DBMS) and Its Applications:

A Database management system is a computerized record-keeping system. It is a repository or a


container for collection of computerized data files. The overall purpose of DBMS is to allow he
users to define, store, retrieve and update the information contained in the database on demand.
Information can be anything that is of significance to an individual or organization.
Databases touch all aspects of our lives. Some of the major areas of application are as follows:
1. Banking
2. Airlines
3. Universities
4. Manufacturing and selling
5. Human resources

Enterprise Information
◦ Sales: For customer, product, and purchase information.
◦ Accounting: For payments, receipts, account balances, assets and other accounting information.
◦ Human resources: For information about employees, salaries, payroll taxes, and benefits, and for
generation of paychecks.
◦ Manufacturing: For management of the supply chain and for tracking production of items
in factories, inventories of items inwarehouses and stores, and orders for items.
Online retailers: For sales data noted above plus online order tracking,generation of
recommendation lists, and
maintenance of online product evaluations.
Banking and Finance
◦ Banking: For customer information, accounts, loans, and banking transactions.
◦ Credit card transactions: For purchases on credit cards and generation of monthly statements.
◦ Finance: For storing information about holdings, sales, and purchases of financial instruments
such as stocks and bonds; also for storing real-time market data to enable online trading by
customers and automated trading by the firm.
• Universities: For student information, course registrations, and grades (in addition to standard
enterprise information such as human resources and accounting).
• Airlines: For reservations and schedule information. Airlines were among the first to use databases
in a geographically distributed manner.
• Telecommunication: For keeping records of calls made, generating monthly bills, maintaining
balances on prepaid calling cards, and storing information about the communication networks.

Purpose of Database Systems


Database systems arose in response to early methods of computerized management of commercial data.
As an example of such methods, typical of the 1960s, consider part of a university organization that,
among other data, keeps information about all instructors, students, departments, and course offerings.
One way to keep the information on a computer is to store it in operating system files. To allow
users to manipulate the information, the system has a number of application programs that
manipulate the files, including programs to:

✓ Add new students, instructors, and courses

✓ Register students for courses and generate class rosters


✓ Assign grades to students, compute grade point averages (GPA), and generate transcripts.

System programmers wrote these application programs to meet the needs of the university.
New application programs are added to the system as the need arises. For example, suppose that a
university decides to create a new major (say, computer science).As a result, the university creates a
new department and creates new permanent files (or adds information to existing files) to record
information about all the instructors in the department, students in that major, course offerings,
degree requirements, etc. The university may have to write new application programs to deal with
rules specific to the new major. New application programs may also have to be written to handle
new rules in the university. Thus, as time goes by, the system acquires more files and more
application programs.

This typical file-processing system is supported by a conventional operating system. The system
stores permanent records in various files, and it needs different application programs to extract
records from, and add records to, the appropriate files. Before database management systems
(DBMSs) were introduced, organizations usually stored information in such systems. Keeping
organizational information in a file- processing system has a number of major disadvantages:

Disadvantages of File Processing System:

Data redundancy and inconsistency. Since different programmers create the files and application
programs over a long period, the various files are likely to have different structures and the programs
may be written in several programming languages. Moreover, the same information may be
duplicated in several places (files). For example, if a student has a double major (say, music and
mathematics) the address and telephone number of that student may appear in a file that consists of
student records of students in the Music department and in a file that consists of student records of
students in the Mathematics department. This redundancy leads to higher storage and access cost. In
addition, it may lead to data inconsistency; that is, the various copies of the same data may no
longer agree. For example, a changed student address may be reflected in the Music department
records but not elsewhere in the system.

Difficulty in accessing data. Suppose that one of the university clerks needs to find out the names
of all students who live within a particular postal-code area. The clerk asks the data-processing
department to generate such a list. Because the designers of the original system did not anticipate
this request, there is no application program on hand to meet it. There is, however, an application
program to generate the list of all students.

The university clerk has now two choices: either obtain the list of all students and extract the needed
information manually or ask a programmer to write the necessary application program. Both
alternatives are obviously unsatisfactory. Suppose that such a program is written, and that, several
days later, the same clerk needs to trim that list to include only those students who have taken at
least 60 credit hours. As expected, a program to generate such a list does not exist. Again, the clerk
has the preceding two options, neither of which is satisfactory. The point here is that conventional
file-processing environments do not allow needed data to be retrieved in a convenient and efficient
manner. More responsive data-retrieval systems are required for general use.

Data isolation. Because data are scattered in various files, and files may be in different formats,
writing new application programs to retrieve the appropriate data is difficult.

Integrity problems. The data values stored in the database must satisfy certain types of consistency
constraints. Suppose the university maintains an account for each department, and records the
balance amount in each account. Suppose also that the university requires that the account balance of
a department may never fall below zero. Developers enforce these constraints in the system by
adding appropriate code in the various application programs. However, when new constraints are
added, it is difficult to change the programs to enforce them. The problem is compounded when
constraints involve several data items from different files.
Atomicity problems. A computer system, like any other device, is subject to failure. In many
applications, it is crucial that, if a failure occurs, the data be restored to the consistent state that
existed prior to the failure.
Consider a program to transfer $500 from the account balance of department A to the account
balance of department B. If a system failure occurs during the execution of the program, it is possible
that the $500 was removed from the balance of department A but was not credited to the balance of
department B, resulting in an inconsistent database state. Clearly, it is essential to database
consistency that either both the credit and debit occur, or that neither occur.
That is, the funds transfer must be atomic—it must happen in its entirety or not at all. It is difficult to
ensure atomicity in a conventional file-processing system.
Concurrent-access anomalies. For the sake of overall performance of the system and faster
response, many systems allow multiple users to update the data simultaneously. Indeed, today, the
largest Internet retailers may have millions of accesses per day to their data by shoppers. In such an
environment, interaction of concurrent updates is possible and may result in inconsistent data.
Consider department A, with an account balance of $10,000. If two department clerks debit the
account balance (by say $500 and $100, respectively) of department A at almost exactly the same
time, the result of the concurrent executions may leave the budget in an incorrect (or inconsistent)
state. Suppose that the programs executing on behalf of each withdrawal read the old balance, reduce
that value by the amount being withdrawn, and write the result back. If the two programs run
concurrently, they may both read the value $10,000, and write back $9500 and $9900, respectively.
Depending on which one writes the value last, the account balance of department A may contain
either $9500 or $9900, rather than the correct value of $9400. To guard against this possibility, the
system must maintain some form of supervision.
But supervision is difficult to provide because data may be accessed by many different application
programs that have not been coordinated previously.

As another example, suppose a registration program maintains a count of students registered for a
course, in order to enforce limits on the number of students registered. When a student registers, the
program reads the current count for the courses, verifies that the count is not already at the limit,
adds one to the count, and stores the count back in the database. Suppose two students register
concurrently, with the count at (say) 39. The two program executions may both read the value 39,
and both would then write back 40, leading to an incorrect increase of only 1, even though two
students successfully registered for the course and the count should be 41. Furthermore, suppose the
course registration limit was 40; in the above case both students would be able to register, leading to
a violation of the limit of 40 students.

Security problems. Not every user of the database system should be able to access all the data. For
example, in a university, payroll personnel need to see only that part of the database that has
financial information. They do not need access to information about academic records. But, since
application programs are added to the file-processing system in an ad hoc manner, enforcing such
security constraints is difficult.

These difficulties, among others, prompted the development of database systems. In what follows,
we shall see the concepts and algorithms that enable database systems to solve the problems with
file-processing systems.

1.2 Advantages of DBMS over File Processing System:


Controlling of Redundancy: Data redundancy refers to the duplication of data (i.e storing same
data multiple times). In a database system, by having a centralized database and centralized control
of data by the DBA the unnecessary duplication of data is avoided. It also eliminates the extra time
for processing the large volume of data. It results in saving the storage space.
Improved Data Sharing : DBMS allows a user to share the data in any number of
application programs.

Data Integrity : Integrity means that the data in the database is accurate. Centralized
control of the data helps in permitting the administrator to define integrity constraints to
the data in the database. For example: in customer database we can can enforce an
integrity that it must accept the customer only from Noida and Meerut city.

Security : Having complete authority over the operational data, enables the DBA in
ensuring that the only mean of access to the database is through proper channels. The
DBA can define authorization checks to be carried out whenever access to sensitive
data is attempted.

Data Consistency : By eliminating data redundancy, we greatly reduce the


opportunities for inconsistency. For example: is a customer address is stored only once,
we cannot have disagreement on the stored values. Also updating data values is greatly
simplified when each value is stored in one place only. Finally, we avoid the wasted
storage that results from redundant data storage.

Efficient Data Access : In a database system, the data is managed by the DBMS and all
access to the data is through the DBMS providing a key to effective data processing

Enforcements of Standards : With the centralized of data, DBA can establish and
enforce the data standards which may include the naming conventions, data quality
standards etc.

Data Independence : Ina database system, the database management system provides
the interface between the application programs and the data. When changes are made to
the data representation, the meta data obtained by the DBMS is changed but the DBMS
is continues to provide the data to application program in the previously used way. The
DBMs handles the task of transformation of data wherever necessary.

Reduced Application Development and Maintenance Time : DBMS supports many


important functions that are common to many applications, accessing data stored in the
DBMS, which facilitates the quick development of application.

Disadvantages of DBMS

1) It is bit complex. Since it supports multiple functionality to give the user the best,
the underlying software has become complex. The designers and developers should
have thorough knowledge about the software to get the most out of it.

2) Because of its complexity and functionality, it uses large amount of memory. It also
needs large memory to run efficiently.

3) DBMS system works on the centralized system, i.e.; all the users from all over the
world access this database. Hence any failure of the DBMS, will impact all the
users.
4) DBMS is generalized software, i.e.; it is written work on the entire systems rather
specific one. Hence some of the application will run slow.

1.3 Data Abstraction

Data Abstraction is a process of hiding unwanted or irrelevant details from the end user. It
provides a different view and helps in achieving data independence which is used to enhance
the security of data.

The database systems consist of complicated data structures and relations. For users to access
the data easily, these complications are kept hidden, and only the relevant part of the database
is made accessible to the users through data abstraction.

Levels of abstraction for DBMS

Database systems include complex data-structures. In terms of retrieval of data, reduce


complexity in terms of usability of users and in order to make the system efficient, developers
use levels of abstraction that hide irrelevant details from the users. Levels of abstraction
simplify database design.Mainly there are three levels of abstraction for DBMS, which are as
follows −

• Physical or Internal Level

• Logical or Conceptual Level

• View or External Level


For the system to be usable, it must retrieve data efficiently. The need for efficiency has
led designers to use complex data structures to represent data in the database. Since many
database-system users are not computer trained, developers hide the complexity from
users through several levels of abstraction, to simplify users’ interactions with the system:

Database Disk

Figure 1.2 : Levels of Abstraction in a DBMS

• Physical level (or Internal View / Schema): The lowest level of abstraction describes
how the data are actually stored. The physical level describes complex low-level data
structures in detail.

• Logical level (or Conceptual View / Schema): The next-higher level of abstraction
describes what data are stored in the database, and what relationships exist among those
data. The logical level thus describes the entire database in terms of a small number of
relatively simple structures. Although implementation of the simple structures at the
logical level may involve complex physical-level structures, the user of the logical level
does not need to be aware of this complexity. This is referred to as physical data
independence. Database administrators, who must decide what information to keep in
the database, use the logical level of abstraction.
• View level (or External View / Schema): The highest level of abstraction describes
only part of the entire database. Even though the logical level uses simpler structures,
complexity remains because of the variety of information stored in a large database.
Many users of the database system do not need all this information; instead, they need
to access only a part of the database. The view level of abstraction exists to simplify
their interaction with the system. The system may provide many views for the same
database. Figure 1.2 shows the relationship among the three levels of abstraction.
An analogy to the concept of data types in programming languages may clarify the
distinction among levels of abstraction. Many high-level programming languages
support the notion of a structured type. For example, we may describe a record as
follows:
type instructor = record
ID : char (5);
name : char (20);
dept name : char (20);
salary : numeric (8,2);
end;

This code defines a new record type called instructor with four fields. Each field has a
name and a type associated with it. A university organization may have several such
record types, including

• department, with fields dept_name, building, and budget


• course, with fields course_id, title, dept_name, and credits
• student, with fields ID, name, dept_name, and tot_cred

At the physical level, an instructor, department, or student record can be described as a


block of consecutive storage locations. The compiler hides this level of detail from
programmers. Similarly, the database system hides many of the lowest-level storage
details from database programmers. Database administrators, on the other hand, may be
aware of certain details of the physical organization of the data.

At the logical level, each such record is described by a type definition, as in the previous
code segment, and the interrelationship of these record types is defined as well.
Programmers using a programming language work at this level of abstraction.
Similarly, database administrators usually work at this level of abstraction.

Finally, at the view level, computer users see a set of application programs that hide
details of the data types. At the view level, several views of the database are defined,
and a database user sees some or all of these views. In addition
to hiding details of the logical level of the database, the views also provide a security
mechanism to prevent users from accessing certain parts of the database. For example,
clerks in the university registrar office can see only that part of the database that has
information about students; they cannot access information about salaries of instructors.

1.4 Database Languages

Once data is stored or filled it requires manipulation like insertion, deletion, updating, and
modification of data. For these operations a set of languages are provided by the database
management system (DBMS). So, the database languages are used to read, update and store
data in the database.

The DBMS languages are pictorially represented as follows −


The different types of DBMS languages are as follows −

• Data Definition Language (DDL) − Create, Drop, Truncate, Rename.

• Data Manipulation language (DML) − Select, Insert, Delete, Update.

• Data Control Language (DCL) − Revoke, Grant.

• Transaction Control Language (TCL) − Rollback, Commit.

Data Definition Language (DDL)

It is a language that allows the user to define the data and their relationship to other types of
data. The DDL commands are: Create, Alter, Rename, Drop, Truncate.

Data Manipulation Language (DML)

It is a language that provides a set of operations to support the basic data manipulation
operation on data held in the database. The DML commands are: Insert, delete, update, select,
merge, call.

Data Control Language (DCL)

DCL is used to access the stored data. It is mainly used for revoke and grant the user access
to a database. The DCL commands are: Grant, Revoke.

Transaction Control Language (TCL)

TCL is a language which manages the transactions within the database. It is used to execute
the changes made by the data manipulation language statements. The TCL commands are:
Commit, Rollback.

1.5 Data Models


Underlying the structure of a database is the data model: a collection of conceptual
tools for describing data, data relationships, data semantics, and consistency
constraints. A data model provides a way to describe the design of a database at the
physical, logical, and view levels.
The data models can be classified into four different categories:

• Relational Model. The relational model uses a collection of tables to represent both
data and the relationships among those data. Each table has multiple columns, and each
column has a unique name. Tables are also known as relations. The relational model is
an example of a record-based model.
Record-based models are so named because the database is structured in fixed-format
records of several types. Each table contains records of a particular type. Each record
type defines a fixed number of fields, or attributes. The columns of the table correspond
to the attributes of the record type. The relational data model is the most widely used
data model, and a vast majority of current database systems are based on the relational
model.

Entity-Relationship Model. The entity-relationship (E-R) data model uses a collection of


basic objects, called entities, and relationships among these objects.
An entity is a “thing” or “object” in the real world that is distinguishable from other
objects. The entity- relationship model is widely used in database design.

Object-Based Data Model. Object-oriented programming (especially in Java, C++, or


C#) has become the dominant software-development methodology. This led to the
development of an object-oriented data model that can be seen as extending the E-R
model with notions of encapsulation, methods (functions), and object identity. The
object-relational data model combines features of the object-oriented data model and
relational data model.
Semi-structured Data Model. The semi-structured data model permits the
specification of data where individual data items of the same type may have different
sets of attributes. This is in contrast to the data models mentioned earlier, where every
data item of a particular type must have the same set of attributes. The Extensible
Markup Language (XML) is widely used to represent semi-structured data.

Historically, the network data model and the hierarchical data model preceded the
relational data model. These models were tied closely to the underlying implementation,
and complicated the task of modeling data. As a result they are used little now, except
in old database code that is still in service in some places.

1.6 Data Independence

A database system normally contains a lot of data in addition to users’ data. For example, it
stores data about data, known as metadata, to locate and retrieve data easily. It is rather
difficult to modify or update a set of metadata once it is stored in the database. But as a
DBMS expands, it needs to change over time to satisfy the requirements of the users. If the
entire data is dependent, it would become a tedious and highly complex job.

Metadata itself follows a layered architecture, so that when we change data at one layer, it
does not affect the data at another level. This data is independent but mapped to each other.

Logical Data Independence


Logical data is data about database, that is, it stores information about how data is managed
inside. For example, a table (relation) stored in the database and all its constraints, applied
on that relation.

Logical data independence is a kind of mechanism, which liberalizes itself from actual data
stored on the disk. If we do some changes on table format, it should not change the data
residing on the disk.

Physical Data Independence

All the schemas are logical, and the actual data is stored in bit format on the disk. Physical
data independence is the power to change the physical data without impacting the schema or
logical data.

For example, in case we want to change or upgrade the storage system itself − suppose we
want to replace hard-disks with SSD − it should not have any impact on the logical data or
schemas.

1.7 Multi User DBMS Architecture

The design of a DBMS depends on its architecture. It can be centralized or decentralized or


hierarchical. The architecture of a DBMS can be seen as either single tier or multi-tier. An n-
tier architecture divides the whole system into related but independent n modules, which can
be independently modified, altered, changed, or replaced.

In 1-tier architecture, the DBMS is the only entity where the user directly sits on the DBMS
and uses it. Any changes done here will directly be done on the DBMS itself. It does not
provide handy tools for end-users. Database designers and programmers normally prefer to
use single-tier architecture.

If the architecture of DBMS is 2-tier, then it must have an application through which the
DBMS can be accessed. Programmers use 2-tier architecture where they access the DBMS
by means of an application. Here the application tier is entirely independent of the database
in terms of operation, design, and programming.

3-tier Architecture
A 3-tier architecture separates its tiers from each other based on the complexity of the users
and how they use the data present in the database. It is the most widely used
architecture to design a DBMS.
.

• Database (Data) Tier − At this tier, the database resides along with its query
processing languages. We also have the relations that define the data and their
constraints at this level.

• Application (Middle) Tier − At this tier reside the application server and the
programs that access the database. For a user, this application tier presents an
abstracted view of the database. End-users are unaware of any existence of the
database beyond the application. At the other end, the database tier is not aware of
any other user beyond the application tier. Hence, the application layer sits in the
middle and acts as a mediator between the end-user and the database.

• User (Presentation) Tier − End-users operate on this tier and they know nothing
about any existence of the database beyond this layer. At this layer, multiple views
of the database can be provided by the application. All views are generated by
applications that reside in the application tier.

Multiple-tier database architecture is highly modifiable, as almost all its components are
independent and can be changed independently.

1.8 Components of DBMS:

The database management system (DBMS) software is divided into several components.
Each component will perform a specific operation. Some of the functions of the DBMS are
supported by operating systems.
The DBMS accepts the SQL commands that are generated from a variety of user interfaces,
which produces a query evaluation plan, executes these plans against the database, and
returns the answers.

Let’s have a look on the major software components of DBMS with pictorial representation −

Components

The components of the DBMS are as follows −

• DBA − The Data Base Administrator (DBA) responsibility is to create the DBMS
structure and have the ability to control the structure.

• Application Programs − It is used to create the records, change and update the
records. It is mainly useful in designing the interface.
• DML processor − Data Manipulation language, it is helpful to update data,
manipulate data based on user request, checks according to syntax of SQL.

• DDL Processor − Data Definition language checks the structure of the database. It
checks the improper statements and the syntax of statements according to the SQL.

• Data Dictionary − Store all the queries. Queries are checked according to the SQL
configuration, if the queries are valid ok. Otherwise, it generates errors.

• Integrity Checker − Here data is stored which is designed by Database administrator.


Check the primary or unique key.

• Authenticate control − Authenticate control checks whether a user is valid or not.

• Command Processor − It processes the query ->SQL. For example, SQL ->Oracle ->
optimize -> generate file.

• Query optimizer − It updates the query, Reduces response time at end.

• Transaction manager − Transaction manager, manage changes in query.

• Scheduler − Send number of requests at a time, A queue is formed according to time.

• Buffer manager − Buffer manager performs storage management operation.

• Recovery manager − Recovery manager recovers the data from main memory and
manages the log files or recovery files.

• Query processor − Query processor processes the query coming from the user side.
Its responsibility is to manage DML and DDL commands.

Example

Select emp name, address from emp;

Select is a DML command.

Both processors work at the same time.

• Syntax table is created or not.

• Run select query which is retrieved from hard disk.

Run-time database manager


Run-time database performs the operations mentioned below −

• Authenticate control

• Integrity checker

• Command processor

Data Manager

Data manager performs the physical level working and monitors how much space to be
allocated to the database (DB).

1.9 Data Modelling:

The ER model defines the conceptual view of a database. It works around real-world entities
and the associations among them. At view level, the ER model is considered a good option
for designing databases.

Entity

An entity can be a real-world object, either animate or inanimate, that can be easily
identifiable. For example, in a school database, students, teachers, classes, and courses
offered can be considered as entities. All these entities have some attributes or properties
that give them their identity.

An entity set is a collection of similar types of entities. An entity set may contain entities
with attribute sharing similar values. For example, a Students set may contain all the
students of a school; likewise a Teachers set may contain all the teachers of a school from all
faculties. Entity sets need not be disjoint.

Attributes

Entities are represented by means of their properties, called attributes. All attributes have
values. For example, a student entity may have name, class, and age as attributes.

There exists a domain or range of values that can be assigned to attributes. For example, a
student's name cannot be a numeric value. It has to be alphabetic. A student's age cannot be
negative, etc.

Types of Attributes
• Simple attribute − Simple attributes are atomic values, which cannot be divided
further. For example, a student's phone number is an atomic value of 10 digits.

• Composite attribute − Composite attributes are made of more than one simple
attribute. For example, a student's complete name may have first_name and
last_name.

• Derived attribute − Derived attributes are the attributes that do not exist in the
physical database, but their values are derived from other attributes present in the
database. For example, average_salary in a department should not be saved directly
in the database, instead it can be derived. For another example, age can be derived
from data_of_birth.

• Single-value attribute − Single-value attributes contain single value. For example −


Social_Security_Number.

• Multi-value attribute − Multi-value attributes may contain more than one values.
For example, a person can have more than one phone number, email_address, etc.

These attribute types can come together in a way like −

• simple single-valued attributes

• simple multi-valued attributes

• composite single-valued attributes

• composite multi-valued attributes

Entity-Set and Keys

o Keys play an important role in the relational database.


o It is used to uniquely identify any record or row of data from the table. It is also used
to establish and identify relationships between tables.

For example: In Student table, ID is used as a key because it is unique for each student. In
PERSON table, passport_number, license_number, SSN are keys since they are unique for
each person.
Types of key:

1. Primary key

o It is the first key used to identify one and only one instance of an entity uniquely. An
entity can contain multiple keys, as we saw in the PERSON table. The key which is
most suitable from those lists becomes a primary key.
o In the EMPLOYEE table, ID can be the primary key since it is unique for each
employee. In the EMPLOYEE table, we can even select License_Number and
Passport_Number as primary keys since they are also unique.
o For each entity, the primary key selection is based on requirements and developers.
2. Candidate key

o A candidate key is an attribute or set of attributes that can uniquely identify a tuple.
o Except for the primary key, the remaining attributes are considered a candidate key.
The candidate keys are as strong as the primary key.

For example: In the EMPLOYEE table, id is best suited for the primary key. The rest of the
attributes, like SSN, Passport_Number, License_Number, etc., are considered a candidate
key.
3. Super Key

Super key is an attribute set that can uniquely identify a tuple. A super key is a superset of a
candidate key.

For example: In the above EMPLOYEE table, for(EMPLOEE_ID, EMPLOYEE_NAME),


the name of two employees can be the same, but their EMPLYEE_ID can't be the same.
Hence, this combination can also be a key.59

The super key would be EMPLOYEE-ID (EMPLOYEE_ID, EMPLOYEE-NAME), etc.

4. Foreign key

o Foreign keys are the column of the table used to point to the primary key of another
table.
o Every employee works in a specific department in a company, and employee and
department are two different entities. So we can't store the department's information in
the employee table. That's why we link these two tables through the primary key of
one table.
o We add the primary key of the DEPARTMENT table, Department_Id, as a new
attribute in the EMPLOYEE table.
o In the EMPLOYEE table, Department_Id is the foreign key, and both the tables are
related.
5. Alternate key

There may be one or more attributes or a combination of attributes that uniquely identify each
tuple in a relation. These attributes or combinations of the attributes are called the candidate
keys. One key is chosen as the primary key from these candidate keys, and the remaining
candidate key, if it exists, is termed the alternate key. In other words, the total number of the
alternate keys is the total number of candidate keys minus the primary key. The alternate key
may or may not exist. If there is only one candidate key in a relation, it does not have an
alternate key.

For example, employee relation has two attributes, Employee_Id and PAN_No, that act as
candidate keys. In this relation, Employee_Id is chosen as the primary key, so the other
candidate key, PAN_No, acts as the Alternate key.

6. Composite key
Whenever a primary key consists of more than one attribute, it is known as a composite key.
This key is also known as Concatenated Key.

For example, in employee relations, we assume that an employee may be assigned multiple
roles, and an employee may work on multiple projects simultaneously. So the primary key
will be composed of all three attributes, namely Emp_ID, Emp_role, and Proj_ID in
combination. So these attributes act as a composite key since the primary key comprises more
than one attribute.

7. Artificial key

The key created using arbitrarily assigned data are known as artificial keys. These keys are
created when a primary key is large and complex and has no relationship with many other
relations. The data values of the artificial keys are usually numbered in a serial order.

For example, the primary key, which is composed of Emp_ID, Emp_role, and Proj_ID, is
large in employee relations. So it would be better to add a new virtual attribute to identify
each tuple in the relation uniquely.
Relationship

The association among entities is called a relationship. For example, an


employee works_at a department, a student enrolls in a course. Here, Works_at and Enrolls
are called relationships.

Relationship Set

A set of relationships of similar type is called a relationship set. Like entities, a relationship
too can have attributes. These attributes are called descriptive attributes.

Degree of Relationship

The number of participating entities in a relationship defines the degree of the relationship.

• Binary = degree 2

• Ternary = degree 3

• n-ary = degree n

Mapping Cardinalities

Cardinality defines the number of entities in one entity set, which can be associated with
the number of entities of other set via relationship set.

• One-to-one − One entity from entity set A can be associated with at most one entity
of entity set B and vice versa.

• One-to-many − One entity from entity set A can be associated with more than one
entities of entity set B however an entity from entity set B, can be associated with at
most one entity.
• Many-to-one − More than one entities from entity set A can be associated with at
most one entity of entity set B, however an entity from entity set B can be associated
with more than one entity from entity set A.

• Many-to-many − One entity from A can be associated with more than one entity
from B and vice versa.

You might also like