0% found this document useful (0 votes)
15 views

Unit 1 DBMS

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views

Unit 1 DBMS

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 47

Database Management System Unit 1

Unit 1 Introduction to Database Systems and E-R Model


Data, Information and File
Data
Data is the known facts or figures that have implicit meaning. It can also be defined as it is the representation of
facts, concepts or instruction in a formal manner, which is suitable for understanding and processing. Data can
be represented in alphabets (A-Z, a-z),in digits(0-9) and using special characters(+,-.#,$, etc.)
e.g. 25, “ajit” etc.
Data is unprocessed and unorganized in nature.
Information
Information is the processed data on which decisions and actions are based. Information can be defined as the
organized and classified data to provide meaningful values.
e.g. “The age of Ravi is 25”.
File
File is a collection of related data stored in secondary memory.

Database Management System (DBMS)


A DBMS is a collection of interrelated data and a set of programs to access those data. The primary goal of a
DBMS is to provide a way to store and retrieve database information that is both convenient and efficient.
Database systems are designed to manage large bodies of information. Management of data involves:
 defining structures for storage of information
 providing mechanisms for the manipulation of information.
In addition, the database system must ensure the safety of the information stored, despite system crashes or
attempts at unauthorized access. If data are to be shared among several users, the system must avoid possible
anomalous results.
The DBMS is hence a general-purpose software system that facilitates the processes of defining, constructing,
and manipulating databases for various applications.
Defining a database involves specifying the data types, structures, and constraints for the data to be stored in the
database.
Constructing the database is the process of storing the data itself on some storage medium that is controlled by
the DBMS.
Manipulating a database includes such functions as querying the database to retrieve specific data, updating the
database to reflect changes in the mini world, and generating reports from the data.
The main feature of data in a database is:
1. It must be well organized.
2. It is related.
Prepared By- Charu Kavadia Page1
Database Management System Unit 1

3. It is accessible in a logical order without any difficulty.


4. It is stored only once.
Functions of DBMS
1. Defining Database Schema: it must give facility for defining the database structure also specifies access
rights to authorized users.
2.Manipulation of the Database: The DBMS must have functions like insertion of record into database,
updating of data, and deletion of data and retrieval of data.
3.Sharing of Database: The DBMS must share data items for multiple users by maintaining consistency of
data.
4.Protection of Database: It must protect the database against unauthorized users.
5.Database Recovery: If for any reason the system fails DBMS must facilitate data base recovery.
Applications of DBMS
 Universities: For student information, course registrations, and grades (in addition to standard enterprise
information such as human resources and accounting).
 Airlines: For reservations and schedule information. Airlines were among the first to use databases in a
geographically distributed manner.
 Navigation systems: For maintaining the locations of various places of interest along withthe exact routes of
roads, train systems, buses, etc.
 Banking: For customer information, accounts, and loans, and banking transactions.
 Credit card transactions: For purchases on credit cards and generation of monthly statements.
 Telecommunication: For keeping records of calls made, generating monthly bills, maintaining balances on
prepaid calling cards, and storing information aboutthe communication networks.
 Finance: For storing information about holdings, sales, and purchases of financial instruments such as stocks
and bonds.
 Sales: For customer, product, and purchase information.
 Manufacturing: For management of supply chain and for tracking productionof items in factories,
inventories of items in warehouses/stores, and orders foritems.
 Human resources: For information about employees, salaries, payroll taxes andbenefits, and for generation
of pay checks.
Modes of DBMS
There are two modes in which databases are used-
 The first mode is to support Online Transaction Processing, where a large number of users use the
database, with each user retrieving relatively small amounts of data, and performing small updates. This
is the primary mode of use for the vast majority of users of database applications such as those that we
outlined earlier.

Prepared By- Charu Kavadia Page2


Database Management System Unit 1

 The second mode is to support Data Analytics, that is, the processing of data to draw conclusions, and
infer rules or decision procedures, which are then used to drive business decisions.

History of Database
Techniques for data storage and processing have evolved over the years:
 1950s and early 1960s: Magnetic tapes were developed for data storage. Data processing tasks such as
payroll were automated, with data stored on tapes. Processing of data consisted of reading data from one
or more tapes and writing data to a new tape.
 Late 1960s and early 1970s: Widespread use of hard disks in the late 1960s changed the scenario for
data processing greatly, since hard disks allowed direct access to data. The position of data on disk was
immaterial, since any location on disk could be accessed in just tens of milliseconds. Widespread use of
hard disks in the late 1960s changed the scenario for data processing greatly, since hard disks allowed
direct access to data. The position of data on disk was immaterial, since any location on disk could be
accessed in just tens of milliseconds. Data were thus freed from the tyranny of sequentiality. With the
advent of disks, the network and hierarchical data models were developed, which allowed data structures
such as lists and trees to be stored on disk. Programmers could construct and manipulate these data
structures. A landmark paper by Edgar Codd in 1970 defined the relational model and nonprocedural
ways of querying data in the relational model, and relational databases were born.
 Late 1970s and 1980s: By the early 1980s, relational databases had become competitive with network
and hierarchical database systems even in the area of performance. Relational databases were so easy to
use that they eventually replaced network and hierarchical databases. The 1980s also saw much research
on parallel and distributed databases, as well as initial work on object-oriented databases.
 1990s: The SQL language was designed primarily for decision support applications, which are query-
intensive, yet the mainstay of databases in the 1980s was transaction processing applications, which are
update-intensive.
 2000s: The types of data stored in database systems evolved rapidly during this period. Semi-structured
data became increasingly important. XML emerged as a data-exchange standard. JSON, a more compact
data-exchange format well suited for storing objects from JavaScript or other programming languages
subsequently grew increasingly important.
 2010s: The limitations of No SQL systems, such as lack of support for consistency, and lack of support
for declarative querying, were found acceptable by many applications (e.g., social networks), in return
for the benefits they provided such as scalability and availability.

File System v/s DBMS


File System Approach
Prepared By- Charu Kavadia Page3
Database Management System Unit 1

File based systems were an early attempt to computerize the manual system. File system is a method of
organising the files in a hard disk or other medium of storage. A file system is a software that manages the data
files in a computer system. File system arranges the files and helps in retrieving the files, when required. It is
compatible with different file types, such as mp3, doc, txt, mp4,etc. and these are also grouped into directories.
The file system is a collection of data and for any management with it, the user has to write the procedures. It is
also called a traditional based approach in which a decentralized approach was followed where each department
stored and controlled its own data with the help of a data processing specialist. The main role of a data
processing specialist was to create the necessary computer file structures, and also manage the data within
structures and design some application programs that create reports based on file data.

In the above figure, consider an example of a student's file system. The student file will contain information
regarding the student (i.e. roll no, student name, course etc.). Similarly, we have a subject file that contains
information about the subject and the result file which contains the information regarding the result.
Some fields are duplicated in more than one file, which leads to data redundancy. So to overcome this problem,
we need to create a centralized system, i.e. DBMS approach.
DBMS
A database approach is a well-organized collection of data that are related in a meaningful way which can be
accessed by different users but stored only once(centralized) in a system. The user need not write the procedures
for handling the database. The various operations performed by the DBMS system are: Insertion, deletion,
selection, sorting etc.

Prepared By- Charu Kavadia Page4


Database Management System Unit 1

Disadvantages of File System over DBMS


1.Data Redundancy and Inconsistency (Redundancy means repetitive data and inconsistent means
different values for same data)
File processing system leads to the usage of many copies of same data. This is data redundancy. If we need to
change any of the data, then we need to change the data at all copies. If not, this will lead to inconsistency.
For example, let us assume a file for storing addresses of students. If we make three copies of the address file
andstore them in three different computers, we say that the data is redundant. If suppose one want to change the
address of any student, then the change should be made at all the three computers failing which leads to
inconsistent data.
2. Difficulty in Accessing Data
In a file processing system, to access data differently we need to have different programs.
For example, if we want to access student names from a file, we need a program that does the job. If we want to
view only address of all students from a specific city, then we need different program that does the required job.
This list goes endless. Hence, it is difficult to access data.
3.Data Isolation
Files are stored in different locations, different formats. Thus they are isolated and writing new programs to
retrieve the data is difficult.
For example, one location the student data may be stored in .txt format. In other location, the same file may be
stored in .doc format.
4.Integrity Problems
Integrity problem arises when the database fails to satisfy certain integrity conditions.
For example, the phone number cannot be longer than 10 digits, bank balance should not go below 1000 etc.
The actual problem arises when we would like to include new such conditions with the existing database. It is
hard to make those changes. The problem is compounded when constraints involve several data items from
different files.
5. Atomicity Problems
Atomicity means must happen entirely or not at all. A computer system, like any other device, is subject to
failure. The database must be in a consistent state in spite of failures.
For example-Consider a banking system with a program to transfer Rs 500 from account A to B. If a system
failure occurs during the execution of the program, it is possible that the Rs 500 was removed from the balance
of account but was not credited to the balance of account B, resulting in an inconsistent database state. Clearly,
it is essential to database consistency that either both the credit and debit occur, or that neither occur. That is,
the funds transfer must be atomic—it must happen in its entirety or not at all. It is difficult to ensure
atomicity in a conventional file-processing system.
6.Concurrent Access Anomalies

Prepared By- Charu Kavadia Page5


Database Management System Unit 1

Simultaneous access of a data item should be handled carefully.


For example-Consider account A, with a balance of Rs 10,000. If two bank clerks debit the account balance (by
say Rs 500 and Rs 100, respectively) of account A at almost exactly the same time, the result of the concurrent
executions may leave the account balance in an incorrect (or inconsistent)state. Suppose that the programs
executing on behalf of each withdrawal read the old balance, reduce that value by the amount being withdrawn,
and write the result back. If the two programs run concurrently, they may both read the value Rs 10000, and
write back Rs 9500 and Rs 9900, respectively. Depending on which one writes the value last, the balance of
account A may contain either Rs 9500 or Rs9900, rather than the correct value of Rs.9400.
It is difficult to handle concurrent access in file processing system due to the fact of data isolation, redundancy
etc.
7. Security Problems
Not every user of the database system should be able to access all the data. For example, in a university, payroll
personnel need to see only that part of the data base that has financial information. They do not need access to
information about academic records. But since application programs are added to the file-processing system in
an ad hoc manner, enforcing such security constraints is difficult.
Advantages of File System
 The file Based system is not complicated and is simpler to use.
 Because of the above point, this system is quite inexpensive.
 Because the file based system is simple and cheap, it is normally suitable for home users and owners of
small businesses.
 Since the file based system is used by smaller organisations or individual users, it stores comparatively
lesser amount of data. Hence, the data can be accessed faster and more easily.

Advantages of DBMS (over File System or in general)


1. Reduction of Redundancy
Unlike traditional file-system storage, data redundancy is reduced or eliminated in DBMS because all data are
stored at a centralized location rather than being created by individual users and for each application. Data
redundancy occurs when the same data are stored unnecessarily at different places. Centralized control of data
by the DBA (data base administrator) avoids unnecessary duplication of data and effectively reduces the total
amount of data storage required.
2. Data Consistency
In traditional file system storage, the changes made by one user in one application doesn’t update the changes in
other application, given both have the same set of details. While this is not the case with DBMS systems, as
there is a single repository of data that is defined once and is accessed by many users, and data are consistent.
3. Sharing of Data

Prepared By- Charu Kavadia Page6


Database Management System Unit 1

Data Sharing is the primary advantage of Database management systems. DBMS system allows users and
applications to share Data with multiple applications and users.
4. Data Concurrency
DBMS allows multiple users to access and modify the same set of data at the same time, and reflect these
changes in real-time. The DBMS executes the actions of the program in such a way that the concurrent access is
permitted but the conflicting operations are not permitted to proceed concurrently.
Another aspect is that the DBMS allows multiple views for a single database schema i.e. offering different
interfaces for the same data according to user capabilities. DBMS provides the various concurrency control
protocols for ensuring the atomicity and serializability of the concurrent data access.
Since Database Systems lets multiple users to access the same data from different locations at the same time,
the working speed on the database is increased.
5. Fast data Access
While in traditional file-based approach, it might take hours to look for very specific information that might be
needed in the context of some business emergency, while DBMS reduces this time to a few seconds. This is a
great advantage of DBMS because we can write small queries which will search the database for us and it will
retrieve the information in the fastest way possible due to its inbuilt searching operations.
6. Data Backup and Recovery
This is another advantage of DBMS as it provides a strong framework for Data backup, users are not required to
back up their data periodically and manually, it is automatically taken care by DBMS. Moreover, in case of a
server crash, DBMS restores the Database to its previous condition.
7. Data Integrity
Data integrity means that the data contained in the database is both accurate and consistent. It is essential as
there are multiple databases in DBMS. All these databases contain data which is visible to multiple users.
Therefore, DBMS ensures that data is consistent and correct in all databases for all users. Therefore, data values
being entered for the storage could be checked to ensure that they fall within a specified range and are of the
correct format.
8. Data Security
DBMS systems provide a strong framework to protect data privacy and security. DBMS ensures that only
authorized users have access to data and there is a mechanism to define access privileges. The DBA who has the
ultimate responsibility for the data in the DBMS can ensure that proper access procedures are followed
including proper authentication schemas for access to the database system and additional check before
permitting access to sensitive data.
9. Data Atomicity
DBMS ensures Atomicity, i.e. complete transaction should be performed on the database. If any transaction is
partially completed, then it rolls backs them.

Prepared By- Charu Kavadia Page7


Database Management System Unit 1

For e.g.: If we make an online purchase, money is deducted from our account while if the purchase is somehow
failed, then no money is deducted or if it gets deducted, it gets returned within few days.
10. Conflict Resolution
DBA resolve the conflicting requirements of various user and applications. The DBA chooses the best file
structure (storage) and access(time) method to get optional performance for the response-critical applications,
while permitting less critical applications to continue to use the database with a relatively slower response.
11. Data Independence
Data independence is usually considered from two points of views; Physical Data independence and Logical
Data Independence.
a) Physical Data Independence: It allows changes in the physical storage devices or organization of the files
to be made without requiring changes in the conceptual view or any of the external views and hence in the
application programs using the database. Thus, the files may migrate from one type of physical media to
another or the file structure may change without any need for changes in the application program.
Physical data independence refers to the immunity of the internal model to change in the physical model. The
logical schema stays unchanged even though changes are made to file organization or storage structures, storage
devices or indexing strategy. Physical data independence deals with hiding the details of the storage structure
from user applications. The applications should not be involved with these issues, since there is no difference in
the operation carried out against the data.
b) Logical Data Independence: It implies that application programs need not be changed, if fields are added to
an existing record; nor do they have to be changed if fields not used by applications programs are deleted.
Logical data independence indicates that the conceptual schema can be changed without affecting the existing
external schemas.
A logical schema is a conceptual design of the database done on paper or a whiteboard, much like architectural
drawings for a house. The ability to change the logical schema, without changing the external schema or user
view, is called Logical Data Independence. For example, the addition or removal of new entities, attributes or
relationships to this conceptual schema should be possible without having to change existing external schemas
or rewrite existing application programs. In other words, changes to the logical schema (e.g., alterations to the
structure of the database like adding a column or other tables) should not affect the function of the application
(external views).
Data Independence is an advantage with DBMS environment since it allows for changes at one level of the
database without affecting other levels.

Disadvantages of DBMS
1. Cost of software/hardware and migration: DBMS software and hardware (networking installation) cost is
high. In addition to the cost of purchasing or developing the software, the hardware has to be upgraded to allow

Prepared By- Charu Kavadia Page8


Database Management System Unit 1

for the extensive programs and work spaces required for their execution and storage. An additional cost is that
of migration from a traditionally separate application environment to an integrated one.
2. The processing overhead by the DBMS for implementation of security, integrity and sharing of the data.
3. Problem associated with centralization-While centralization reduces duplication, the lack of duplication
requires that the database be adequately backed up so that in the case of failure the data can be recovered.
Centralization also means that the data is accessible from a single source. This increases the potential severity of
security breaches and disruption of the operation of the organization because of downtimes and failures. The
replacement of a monolithic centralized database by a federation of independent and cooperating distributed
databases resolves some of the problems resulting from failures and downtimes.
4. Complexity of Backup and Recovery: Backup and recovery operations are fairly complex in a DBMS
environment , and this is exacerbated in a concurrent multi user database system.
5. Setup of the database system requires more knowledge, money, skills, and time.

Differences between File System and DBMS

Prepared By- Charu Kavadia Page9


Database Management System Unit 1

Describing and Storing Data


Data Abstraction (Describing Data)
The need for efficiency has led database system developers to use complex data structures to represent data in
the database. Since many database-system users are not computer trained, developers hide the complexity from
users through several levels of data abstraction, to simplify users’ interactions with the system:
 Physical Level: The lowest level of abstraction describes how the data are actually stored. The physical
level describes complex low-level data structures in detail.
 Logical Level: The next-higher level of abstraction describes what data are stored in the database, and
what relationships exist among those data. The logical level thus describes the entire database in terms
of a small number of relatively simple structures. Although implementation of the simple structures at
the logical level may involve complex physical level structures, the user of the logical level does not
need to be aware of this complexity. This is referred to as physical data independence. Database
administrators, who must decide what information to keep in the database, use the logical level of
abstraction.

Prepared By- Charu Kavadia Page10


Database Management System Unit 1

 View Level: The highest level of abstraction describes only part of the entire database. Even though the
logical level uses simpler structures, complexity remains because of the variety of information stored in
a large database. Many users of the database system do not need all this information; instead, they need
to access only a part of the database. The view level of abstraction exists to simplify their interaction
with the system. The system may provide many views for the same database.

Figure: Types of Views


An important feature of data models, such as the relational model, is that they hide such low level
implementation details from not just database users, but even from database-application developers. The
database system allows application developers to store and retrieve data using the abstractions of the data
model, and converts the abstract operations into operations on the low-level implementation.
Example of View
Let us consider an example:
Type instructor = record
ID :char (5);
name: char (20);
dept name : char (20);
salary : numeric (8 digits,2 decimal);
end;
This code defines a new record type called instructor with four fields. Each field has a name and a type
associated with it. For example, char(20) specifies a string with 20 characters, while numeric(8,2) specifies a
number with 8 digits, two of which are to the right of the decimal point.
A university organization may have several such record types, including:
• department, with fields dept name, building, and budget.
• course, with fields course id, title, dept name, and credits.
• student, with fields ID, name, dept name, and tot credits.
At the physical level, an instructor, department, or student record can be described as a block of consecutive
bytes.

Prepared By- Charu Kavadia Page11


Database Management System Unit 1

At the logical level, each such record is described by a type definition. The interrelationship of these record
types is also defined at the logical level; a requirement that the dept name value of an instructor record must
appear in the department table is an example of such an interrelationship.
Finally, at the view level, computer users see a set of application programs that hide details of the data types. At
the view level, several views of the database are defined, and a database user sees some or all of these views. In
addition to hiding details of the logical level of the database, the views also provide a security mechanism to
prevent users from accessing certain parts of the database. For example, clerks in the university registrar office
can see only that part of the database that has information about students; they cannot access information about
salaries of instructors.
Instances and Schemas (Storage of Data)
The collection of information stored in the database at a particular moment is called an instance of the database.
The overall design of the database is called the database schema.
A database schema is related to the variable declarations (along with associated type definitions) in a program.
Each variable has a particular value at a given instant. The values of the variables in a program at a point in time
correspond to an instance of a database schema.
Databse can have 3 schemas :
Physical schema : It describes the database design at the physical level.
Logical schema: It describes the database design at the logical level.
Subschemas: It describe different views of the database.
Of these, the logical schema is by far the most important in terms of its effect on application programs, since
programmers construct applications by using the logical schema. The physical schema is hidden beneath the
logical schema and can usually be changed easily without affecting application programs. Application programs
are said to exhibit physical data independence if they do not depend on the physical schema and thus need not
be rewritten if the physical schema changes.

Database Languages
A database system provides a Data-Definition Language (DDL) to specify the database schema and a Data-
Manipulation Language (DML) to express database queries and updates. The data definition and data
manipulation languages are not two separate languages; instead they simply form parts of a single database
language, such as the widely used SQL language.
Data-Definition Language
We specify a database schema by a set of definitions expressed by a special language called a Data Definition
Language (DDL).It will define the schema for database. Also DDL is used to specify following additional
properties of the data.
a) Domain Constraints: A domain of possible values must be associated with every attribute (also called as

Prepared By- Charu Kavadia Page12


Database Management System Unit 1

fields) (for example, integer types, character types, date/time types). Declaring an attribute to be of a particular
domain acts as a constraint on the values that it can take.
b) Referential Integrity: There are cases where we wish to ensure that a value that appears in one relation for a
given set of attributes also appears in a certain set of attributes in another relation (referential integrity). For
example, the department listed for each course must be one that actually exists. More precisely, the dept name
value in a course record must appear in the dept name attribute of some record of the department relation.
c) Assertions: An assertion is any condition that the database must always satisfy. Domain constraints and
referential-integrity constraints are special forms of assertions. However, there are many constraints that we
cannot express by using only these special forms. For example, “Every department must have at least three
courses offered every semester” must be expressed as an assertion.”
d) Authorization: We may want to differentiate among the users as far as the type of access they are permitted
on various data values in the database. These differentiations are expressed in terms of authorization, the most
common being: read authorization, which allows reading, but not modification, of data; insert authorization,
which allows insertion of new data, but not modification of existing data; update authorization, which allows
modification, but not deletion, of data; and delete authorization, which allows deletion of data. We may assign
the user all, none, or a combination of these types of authorization.
Example:
create table department
(dept name char (20),
building char (15),
budget numeric (12,2));

The DDL, just like any other programming language, gets as input some instructions (statements) and generates
some output. The output of the DDL is placed in the data dictionary, which contains metadata—that is, data
about data.
Data Manipulation Language
A Data-Manipulation Language (DML) is a language that enables users to access or manipulate data as
organized by the appropriate data model. The types of access are:
• Retrieval of information stored in the database
• Insertion of new information into the database
• Deletion of information from the database
• Modification of information stored in the database
There are basically two types:
• Procedural DMLs require a user to specify what data are needed and how to get those data.
• Declarative DMLs (also referred to as nonprocedural DMLs) require a user to specify what data are needed
Prepared By- Charu Kavadia Page13
Database Management System Unit 1

withoutspecifying how to get those data.


Differences between DDL and DML

Query
A query is a statement requesting the retrieval of information. The portion of a DML that involves information
retrieval is called a query language.
Example:
select instructor.name from instructor where instructor.dept name = 'History';
Retrieve name of instructors from instructor table where department is history.
Data Dictionary
We can define a data dictionary as a DBMS component that stores the definition of data characteristics and
relationships. Such “data about data” is known as metadata. The DBMS data dictionary provides the DBMS
with its self-describing characteristic.
For example, the data dictionary typically stores descriptions of all:
• Data elements that are defined in all tables of all databases. Specifically the data dictionary stores the name,
data types, display formats, internal storage formats, and validation rules. The data dictionary tells where an
element is used, by whom it is used and so on.
• Tables defined in all databases. For example, the data dictionary is likely to store the name of the table
creator, the date of creation access authorizations, the number of columns, and so on.
• Indexes defined for each database tables. For each index the DBMS stores at least the index name the
attributes used, the location, specific index characteristics and the creation date.
• Define databases: who created each database, the date of creation where the database is located, who the DBA
is and so on.
• End users and The Administrators of the data base
• Programs that access the database including screen formats, report formats application formats, SQL queries
and so on.
• Access authorization for all users of all databases.
• Relationships among data elements which elements are involved: whether the relationships are mandatory or
Prepared By- Charu Kavadia Page14
Database Management System Unit 1

optional, the connectivity and cardinality and so on.


Structure of DBMS
Goal of a database system is to retrieve information from and store new information in the database. Following
figure shows the architecture of a database system that runs on a centralized server machine. It summarizes how
different types of users interact with a database, and how the different components of a database engine are
connected to each other. The centralized architecture shown in following figure is applicable to shared-memory
server architectures, which have multiple CPUs and exploit parallel processing, but all the CPUs access a
common shared memory. The database system is divided into three main components: Query Processor,
Storage Manager, and Disk Storage.
1. Query Processor
The query processor is important because it helps the database system to simplify and facilitate access to data.
It is the job of the database system to translate updates and queries written in a non-procedural language, at the
logical level, into an efficient sequence of operations at the physical level. The query processor components
include:
• DDL interpreter, which interprets DDL statements and records the definitions in the data dictionary.
• DML compiler, which translates DML statements in a query language into an evaluation plan consisting of
low-level instructions that the query-evaluation engine understands.
A query can usually be translated into any of a number of alternative evaluation plans that all give the same
result.
•Query Optimization: The DML compiler also performs query optimization; that is, it picks the lowest cost
evaluation plan from the alternatives.
• Query Evaluation Engine: It executes low-level instructions generated by the DML compiler.
2.Storage Manager
A storage manager is a program module that provides the interface between the low level data stored in the
database and the application programs and queries submitted to the system. The storage manager is responsible
for the interaction with the file manager. The raw data are stored on the disk using the file system, which is
usually provided by a conventional operating system. The storage manager translates the various DML
statements into low level file-system commands. Thus, the storage manager is responsible for storing,
retrieving, and updating data in the database.
The storage manager components include:
• Authorization and Integrity Manager, which tests for the satisfaction of integrity constraints and checks the
authority of users to access data.
• Transaction Manager, which ensures that the database remains in a consistent (correct) state despite system
failures, and that concurrent transaction executions proceed without conflicting.
• File manager, which manages the allocation of space on disk storage and the data structures used to represent

Prepared By- Charu Kavadia Page15


Database Management System Unit 1

information stored on disk.


• Buffer manager, which is responsible for fetching data from disk storage into main memory, and deciding
what data to cache in main memory. The buffer manager is a critical part of the database system, since it
enables the database to handle data sizes that are much larger than the size of main memory.
3. Disk Storage
The storage manager implements several data structures as part of the physical system implementation:
 Data Files: It stores the database itself.
 Data Dictionary: It stores metadata about the structure of the database, in particular the scheme of the
database.
 Indices: It provides fast access to data items that hold particular values.
4. Database Users and Administrators
a) Database Users and User Interfaces
There are four different types of database-system users, differentiated by the way they expect to interact with
the system. Different types of user interfaces have been designed for the different types of users.
• Naive users are unsophisticated users who interact with the system by using predefined user interfaces, such
as web or mobile applications. The typical user interface for naive users is a forms interface, where the user can
fill in appropriate fields of the form. Naive users may also view or read reports generated from the database.
As an example, consider a student, who during class registration period, wishes to register for a class by using a
web interface. Such a user connects to a web application program that runs at a web server. The application first
verifies the identity of the user and then allows her to access a form where she/he enters the desired information.
The form information is sent back to the web application at the server, which then determines if there is room in
the class (by retrieving information from the database) and if so adds the student information to the class roster
in the database.
• Application programmers are computer professionals who write application programs. Application
programmers can choose from many tools to develop user interfaces.
• Sophisticated users interact with the system without writing programs. Instead, they form their requests either
using a database query language or by using tools such as data analysis software. Analysts who submit queries
to explore data in the database fall in this category.
b) Database Administrator
One of the main reasons for using DBMSs is to have central control of both the data and the programs that
access those data. A person who has such central control over the system is called a Database Administrator
(DBA).
The functions of a DBA include:
• Schema definition: The DBA creates the original database schema by executing a set of data definition
statements in the DDL.

Prepared By- Charu Kavadia Page16


Database Management System Unit 1

• Storage structure and access-method definition. The DBA may specify some parameters pertaining to the
physical organization of the data and the indices to be created.
• Schema and physical-organization modification. The DBA carries out changes to the schema and physical
organization to reflect the changing needs of the organization, or to alter the physical organization to improve
performance.
• Granting of authorization for data access. By granting different types of authorization, the database
administrator can regulate which parts of the database various users can access. The authorization information is
kept in a special system structure that the database system consults whenever a user tries to access the data in
the system.
• Routine maintenance. Examples of the database administrator’s routine maintenance activities are:
 Periodically backing up the database onto remote servers
 Ensuring that enough free disk space is available for normal operations, and upgrading disk space as
required.
 Monitoring jobs running on the database.

Figure: Database Structure

Prepared By- Charu Kavadia Page17


Database Management System Unit 1

To scale up to even larger data volumes and even higher processing speeds, parallel databases are designed to
run on a cluster consisting of multiple machines. Further, distributed databases allow data storage and query
processing across multiple geographically separated machines.
Following figure shows the architecture of applications that use databases as their backend. Database
applications can be partitioned into two or three parts, as shown in figure.

Figure: Two and three Tier architecture


Earlier-generation database applications used a two-tier architecture, where the application resides at the client
machine, and invokes database system functionality at the server machine through query language statements.
In contrast, modern database applications use a three-tier architecture, where the client machine acts as merely a
front end and does not contain any direct database calls; web browsers and mobile applications are the most
commonly used application clients today. The front end communicates with an application server. The
application server, in turn, communicates with a database system to access data. Three tier applications provide
better security as well as better performance than two-tier applications.

Data Models
Underlying the structure of a database is the data model: a collection of conceptual tools for describing data,
data relationships, data semantics, and consistency constraints. It provides the conceptual tools for describing
the design of a database at each level of data abstraction. Based on it, it’s classified into following three types-
1) Conceptual Data Model: This Data Model defines WHAT the system contains. This model is typically
created by Business stakeholders and Data Architects. The purpose is to organize scope and define business
concepts and rules. This model is used in the requirement gathering process i.e., before the Database
Designers start making a particular database. Example- E-R Model
2) Logical Data Model: Defines HOW the system should be implemented regardless of the DBMS. This
model is typically created by Data Architects and Business Analysts. The purpose is to developed technical
map of rules and data structures.This data model allows us to focus primarily, on the design part of the
database. Example- Relational Model
Prepared By- Charu Kavadia Page18
Database Management System Unit 1

3) Physical Data Model: This Data Model describes HOW the system will be implemented using a specific
DBMS system. This model is typically created by DBA and developers. The purpose is actual
implementation of the database.Ultimately, all data in a database is stored physically on a secondary storage
device such as discs and tapes. This is stored in the form of files, records and certain other data structures. It
has all the information of the format in which the files are present and the structure of the databases,
presence of external data structures and their relation to each other.
Most common types of models-
1. Hierarchical Data Model -Hierarchical Model was the first DBMS model. This model organises the
data in the hierarchical tree structure. The hierarchy starts from the root which has root data and then
it expands in the form of a tree adding child node to the parent node. This model easily represents
some of the real-world relationships like sitemap of a website etc. Example: We can represent the
relationship between the shoes present on a shopping website in the following way:

2. Network Model- This model is an extension of the hierarchical model. It was the most popular model
before the relational model. This model is the same as the hierarchical model, the only difference is that
a record can have more than one parent. It replaces the hierarchical tree with a graph. Example: In the
example below we can see that node student has two parents i.e. CSE Department and Library. This was
earlier not possible in the hierarchical model.

3. Entity Relationship Model -The Entity–Relationship (E-R) Model is a high-level data model. This
model was designed by Peter Chen and published in 1976 papers. It is based on a perception of a real
world that consists of a collection of basic objects, called entities, and of relationships among these
objects. While formulating real-world scenario into the database model, the ER Model creates entity set,
relationship set, general attributes and constraints.

Prepared By- Charu Kavadia Page19


Database Management System Unit 1

Entity − An entity in an ER Model is a real-world entity having properties called attributes.


Every attribute is defined by its set of values called domain. For example, in a school database, a student
is considered as an entity. Student has various attributes like name, age, class, etc.
Relationship − The logical association among entities is called relationship. Relationships are mapped
with entities in various ways. Example: Teacher works for a department.
Example-

In the above diagram, the entities are Teacher and Department. The attributes of Teacher entity are
Teacher_Name, Teacher_id, Age, Salary, Mobile_Number. The attributes of entity Department entity
are Dept_id, Dept_name. The two entities are connected using the relationship. Here, each teacher
works for a department.
4. Relational Model - The Relational Model is a lower-level Model. It uses a collection of tables to
represent both data and the relationships among those data. This model was initially designed by Edgar
F. Codd, in 1969. Its conceptual simplicity has led to its widespread adoption; today a vast majority of
database products are based on the relational model. Designers often formulate database schema design
by first modelling data at a high level, using the E-R model, and then translating it into the relational
model. Example: Employee table.

Features of Relational Model


 Tuples: Each row in the table is called tuple. A row contains all the information about any instance of the
object. In the above example, each row has all the information about any specific individual like the first row
has information about John.
 Attribute or field: Attributes are the property which defines the table or relation. The values of the attribute
should be from the same domain. In the above example, we have different attributes ofthe employee like
Salary, Mobile_no, etc.
Advantages of Relational Model
Prepared By- Charu Kavadia Page20
Database Management System Unit 1

 Simple: This model is simpler as compared to the network and hierarchical model.
 Scalable: This model can be easily scaled as we can add as many rows and columns we want.
 Structural Independence: We can make changes in database structure without changing the way to access
the data. When we can make changes to the database structure without affecting the capability to DBMS
to access the data we can say that structural independence has been achieved.
Disadvantages of Relational Model
 Hardware Overheads: For hiding the complexities and making things easier for the user this model
requires more powerful hardware computers and data storage devices.
 Bad Design: As the relational model is very easy to design and use. So the users don't need to know how
the data is stored in order to access it. This ease of design can lead to the development of a poor database
which would slow down if the database grows.
But all these disadvantages are minor as compared to the advantages of the relational model. These problems
can be avoided with the help of proper implementation and organization.
5. Object Oriented Data Model–This is an extension of the ER model with notions of functions,
encapsulation, and object identity, as well. This model supports a rich type system that includes
structured and collection types. Thus, in 1980s, various database systems following the object-oriented
approach were developed. Here, the objects are nothing but the data carrying its properties.We can store
audio, video, images, etc in the database which was not possible in the relational model (although we
can store audio and video in relational database, it is advised not to store in the relational database). In
this model, two are more objects are connected through links. We use this link to relate one object to
other objects. Example- In following figure, we have two objects Employee and Department. All the
data and relationships of each object are contained as a single unit. The attributes like Name, Job_title of
the employee and the methods which will be performed by that object are stored as a single object. The
two objects are connected through a common attribute i.e the Department_id and the communication
between these two will be done with the help of this common id.

Entity Relationship Model


An E-R model is a design or blueprint of a database that can later be implemented as a database.The Entity–
Prepared By- Charu Kavadia Page21
Database Management System Unit 1

Relationship (E-R) Model is a high-level data model. This model was designed by Peter Chen and published in
1976 papers.
The entity-relationship data model perceives the real world as consisting of basic objects, called entities and
relationships among these objects. It was developed to facilitate data base design by allowing specification of an
enterprise schema which represents the overall logical structure of a data base.
Entity
An entity is a “thing” or “object” in the real world that is distinguishable from all other objects. For example,
each person in a university is an entity. An entity has a set of properties, and the values for some set of
properties must uniquely identify an entity. For instance, a person may have a person id property whose value
uniquely identifies that person.
Entity Set
An entity set is a set of entities of the same type that share the same properties, or attributes.
Example:The set of all people who are instructors at a given university can be defined as the entity set
instructor. Similarly, the entity set student might represent the set of all students in the university.
Attribute
An entity is represented by a set of attributes. Attributes are descriptive properties possessed by each member of
an entity set. The designation of an attribute for an entity set expresses that the database stores similar
information concerning each entity in the entity set; however, each entity may have its own value for each
attribute. Possible attributes of the instructor entity set are ID, name, dept name, and salary. Possible attributes
of the course entity set are course id, title, dept name, and credits.
Each entity has a value for each of its attributes. For instance, a particular instructor entity may have the value
12121 for ID, the value Wu for name, the value Finance for dept name, and the value 90000 for salary.

Figure : Example of entity and attributes


Domain of Attribute
For each attribute, there is a set of permitted values, called the domain, or value set, of that attribute. The
domain of attribute course id might be the set of all text strings of a certain length.
Similarly, the domain of attribute semester might be strings from the set {Fall, Winter, Spring, Summer}.
Attribute Types
An attribute, as used in the E-R model, can be characterized by the following attribute types.
a) Simple and Composite Attributes: Simple attributes are the one which cannot be divided into subparts.
Composite attributes, on the other hand, can be divided into subparts (i.e., other attributes).
For example, an attribute student name could be structured as a composite attribute consisting of first name,
middle initial, and last name. Suppose we need to add an address to the student entity set.
Prepared By- Charu Kavadia Page22
Database Management System Unit 1

The address can be defined as the composite attribute address with the attributes street, city, state, and postal
code. Composite attributes help us to group together related attributes, making the modelling cleaner. A
composite attribute may appear as a hierarchy. In the composite attribute address, its component attribute street
can be further divided into street number, street name, and apartment number. Following figure depicts these
examples of composite attributes for the student entity set.

Figure: Composite attributes student name and student address.


b) Single-valued and Multi-valued Attributes: The attributes which have a single value for a particular entity
are single valued. For instance, the student ID attribute for a specific student entity refers to only one student
ID. Such attributes are said to be single valued. There may be instances where an attribute has a set of values
for a specific entity. Suppose we add to the instructor entity set a phone number attribute. An instructor may
have zero, one, or several phone numbers, and different instructors may have different numbers of phones. This
type of attribute is said to be multi-valued.
c) Derived attributes: The value for this type of attribute can be derived from the values of other related
attributes or entities. As an example, suppose that the instructor entity set has an attribute age that indicates the
instructor’s age. If the instructor entity set also has an attribute date of birth, we can calculate age from date of
birth and the current date. Thus, age is a derived attribute. In this case, date of birth may be referred to as a base
attribute, or a stored attribute. The value of a derived attribute is not stored but is computed when required.

Figure: E-R diagram with composite, multivalued, and derived attributes


Above figure shows how composite attributes can be represented in the E-R notation. Here, a composite
attribute name with component attributes first name, middle initial, and last name replaces the simple attribute
name of instructor. As another example, suppose we were to add an address to the instructor entity set. The
address can be defined as the composite attribute address with the attributes, street, city, state, and postal code.
The attribute street is itself a composite attribute whose component attributes are street number, street name,
and apartment number. The above figure also illustrates a multi-valued attribute phone number, denoted by

Prepared By- Charu Kavadia Page23


Database Management System Unit 1

“{phone number}”, and a derived attribute age, depicted by “age ( )”.


Relationship
A relationship is an association among several entities. For example, we can define a relationship advisor that
associates instructor Katz with student Shankar. This relationship specifies that Katz is an advisor to student
Shankar.

Figure:Relationship set advisor (only some attributes of instructor and student are shown).
Relationship Sets
A relationship set is a set of relationships of the same type.
A relationship set is represented in an E-R diagram by a diamond, which is linked via lines to a number of
different entity sets (rectangles).

Figure: E-R diagram showing relationship set advisor.


The E-R diagram in above figure shows the two entity sets instructor and student, related through a binary
relationship set advisor.

Figure: E-R diagram with a ternary relationship proj_guide.

Degree of Relationship Set


The number of entity sets that participate in a relationship set is the degree of the relationship set. A binary
relationship set is of degree 2; a ternary relationship set is of degree 3.
The relationship set advisor (in above diagram)is an examples of a binary relationship set—that is, one that
involves two entity sets(instructor and student). Most of the relationship sets in a database system are binary.
Following diagram is an example of Ternary Relationship Set. Here relationship set Prescribes involves three
entity sets i.e. Doctor, Medicine and Patient.

Prepared By- Charu Kavadia Page24


Database Management System Unit 1

Mapping Cardinality (or Relationship Cardinality)


Mapping cardinalities define the number of association between two entities or the maximum number of
relationship instances in which an entity can participate.
Mapping cardinalities can be of following types-
1. Many-to-Many cardinality (m:n)
By this cardinality constraint,
 An entity in set A can be associated with any number (zero or more) of entities in set B.
 An entity in set B can be associated with any number (zero or more) of entities in set A.
Symbol Used-

Example-

Here,
 One student can enroll in any number (zero or more) of courses.
 One course can be enrolled by any number (zero or more) of students.
2. Many-to-One cardinality (m:1)
By this cardinality constraint,
 An entity in set A can be associated with at most one entity in set B.
 An entity in set B can be associated with any number (zero or more) of entities in set A.
Symbol Used-

Example-

Prepared By- Charu Kavadia Page25


Database Management System Unit 1

Here,
 One student can enroll in at most one course.
 One course can be enrolled by any number (zero or more) of students.
3. One-to-Many cardinality (1:n)
By this cardinality constraint,
 An entity in set A can be associated with any number (zero or more) of entities in set B.
 An entity in set B can be associated with at most one entity in set A.
Symbol Used-

Example-

Here,
 One student can enroll in any number (zero or more) of courses.
 One course can be enrolled by at most one student.
4. One-to-One cardinality (1:1 )
By this cardinality constraint,
 An entity in set A can be associated with at most one entity in set B.
 An entity in set B can be associated with at most one entity in set A.
Symbol Used-

Example-

Here,
 One student can enroll in at most one course.
 One course can be enrolled by at most one student.

Prepared By- Charu Kavadia Page26


Database Management System Unit 1

E-R Diagram
An Entity–relationship model (E-R model) describes the structure of a database with the help of a diagram,
which is known as Entity Relationship Diagram (E-R Diagram).
Components of E-R Diagram:-

Alternatives E-R Notation

Main features of E-R Model


 Entity relationship model is a high level conceptual model.
 ER model allows you to draw Database Design .It allows us to describe the data involved in areal world
enterprise in terms of objects and their relationships.ER diagram is used as a visual tool for representing
the model.
 It is widely used to develop an initial design of a database.

Prepared By- Charu Kavadia Page27


Database Management System Unit 1

 It provides a set of useful concepts that make it convenient for a developer to move from a base id set of
information to a detailed and description of information that can be easily implemented in a database
system.
 It describes data as a collection of entities, relationships and attributes.
 It is a GUI representation of the logical structure of a Database
Advantages of E-R Model
 Simple: Conceptually ER Model is very easy to build. If we know the relationship between the attributes
and the entities we can easily build the ER Diagram for the model.
 Effective Communication Tool: This model is used widely by the database designers for communicating
their ideas.
 Easy Conversion to any Model: This model maps well to the relational model and can be easily
converted to relational model by converting the ER model to the table. This model can also be converted
to any other model like network model, hierarchical model etc.
Disadvantages of ER Model
 No industry standard for notation: There is no industry standard for developing an ER model. So one
developer might use notations which are not understood by other developers.
 Hidden information: Some information might be lost or hidden in the ER model. As it is a high-level
view so there are chances that some details of information might be hidden.
Example of E-R Diagram
E-R Diagram of Library Management System

Extended E-R Features


1. Class Hierarchy: Class Hierarchy is a method of classifying the entities into subclasses.
Class hierarchy can be viewed one of two ways

Prepared By- Charu Kavadia Page28


Database Management System Unit 1

A. Specialization (Top Down Approach)


B. Generalization (Bottom Up Approach)
A) Specialization: The process of designating to sub grouping within an entity set is called specialization.
Specialization is a process of identifying subsets of an entity that shares different characteristics. It breaks
an entity into multiple entities from higher level (super class) to lower level (subclass).
As an example, the entity set person may be further classified as one of the following:
Employee
Student
The specialization of person allows us to distinguish among person entities according to whether they
correspond to employees or students: in general, a person could be an employee, a student, both, or neither.
B) Generalization: Generalization is a process of generalizing an entity which contains generalized attributes
or properties of generalized entities. The entity that is created will contain the common features. Generalization
is a Bottom up process.The design process may also proceed in a bottom-up manner, in which multiple entity
sets are synthesized into a higher-level entity set on the basis of common features. The database designer may
have first identified:
Instructor entity set with attributes instructor id, instructor name, instructor salary, and rank.
Secretary entity set with attributes secretary id, secretary name, secretary salary, and hours per week.
There are similarities between the instructor entity set and the secretary entity set in the sense that they have
several attributes that are conceptually the same across the two entity sets: namely, the identifier, name, and
salary attributes. This commonality can be expressed by generalization, which is a containment relationship that
exists between a higher-level entity set and one or more lower-level entity sets. In our example, employee is the
higher level entity set and instructor and secretary are lower-level entity sets. In this case, attributes that are
conceptually the same had different names in the two lower level entity sets. To create a generalization, the
attributes must be given a common name and represented with the higher-level entity person.

Figure: Specialization and generalization example


2) Attribute Inheritance
A crucial property of the higher- and lower-level entities created by specialization and generalization is attribute
inheritance. The attributes of the higher-level entity sets are said to be inherited by the lower-level entity sets.
For example, student and employee inherit the attributes of person. Thus, student is described by its ID, name,

Prepared By- Charu Kavadia Page29


Database Management System Unit 1

street, and city attributes, and additionally a tot cred (total credit) attribute; employee is described by its ID,
name, street, and city attributes, and additionally a salary attribute. Attribute inheritance applies through
all tiers of lower-level entity sets; thus, instructor and secretary, which are subclasses of employee, inherit the
attributes ID, name, street, and city from person, in addition to inheriting salary from employee , as shown in
following figure.

Figure: Attribute Inheritance


A lower-level entity set (or subclass) also inherits participation in the relationship sets in which its higher-level
entity (or superclass) participates. Like attribute inheritance, participation inheritance applies through all tiers
of lower-level entity sets. For example, suppose the person entity set participates in a relationship person dept
with department.
Then, the student, employee, instructor and secretary entity sets, which are subclasses of the person entity set,
also implicitly participate in the person dept relationship with department.
These entity sets can participate in any relationships in which the person entity set participates.
 A higher-level entity set with attributes and relationships that apply to all of its lower level entity sets.
 Lower-level entity sets with distinctive features that apply only within a particular lower level entity set.
3)Constraints on Specializations and Generalization
There are three types of constraints on generalization/specialization which are as follows:
A)First one determines which entity can be a member of the low-level entity set.
Such membership may be one of the following: -
Condition-defined:-In condition-defined lower-level entity sets, membership is evaluated on the basis of
whether or not an entity satisfies an explicit condition or predicate. For example, assume that the higher-level
entity set account has the attribute account-type. All account entities are evaluated on the defining account-type
attribute. Only those entities that satisfy the condition account-type = “savings account” are allowed to belong
to the lower-level entity set “saving account”. All entities that satisfy the condition account-type = “current
account” are included in “current account”. Since all the lower-level entities are evaluated on the basis of the
same attribute (in this case, on account-type), this type of generalization is said to be attribute-defined.
User-defined:-User-defined lower-level entity sets are not constrained by a membership condition; rather, the
database user assigns entities to a given entity set. For instance, let us assume that, after 3 months of

Prepared By- Charu Kavadia Page30


Database Management System Unit 1

employment, bank employees are assigned to one of four work teams. We therefore represent the teams as four
lower-level entity sets of the higher-level employee entity set. A given employee is not assigned to a specific
team entity automatically on the basis of an explicit defining condition. Instead, the user in charge of this
decision makes the team assignment on an individual basis. The assignment is implemented by an operation that
adds an entity to an entity set.
B)A second type of constraint relates to whether or not entities may belong to more than one lower-level entity
set within a single generalization. The lower-level entity sets may be one of the following:
• Disjoint. A disjointness constraint requires that an entity belong to no more than one lower-level entity set. In
our example, an account entity can satisfy only one condition for the account-type attribute; an entity can be
either a savings account or a checking account, but cannot be both.
• Overlapping. In overlapping generalizations, the same entity may belong to more than one lower-level entity
set within a single generalization. For example, consider the employee work team example, and assume that
certain managers participate in more than one work team. A given employee may therefore appear in more than
one of the team entity sets that are lower-level entity sets of employee. Thus, the generalization is overlapping.
As another example, suppose generalization applied to entity sets customer and employee leads to a higher-level
entity set person. The generalization is overlapping if an employee can also be a customer.
Lower-level entity overlap is the default case; a disjointness constraint must be placed explicitly on a
generalization (or specialization). We can note a disjointedness constraint in an E-R diagram by adding the
word disjoint next to the triangle symbol.
C)A final constraint, the completeness constraint on a generalization or specialization, specifies whether or not
an entity in the higher-level entity set must belong to at least one of the lower-level entity sets within the
generalization/specialization. This constraint may be one of the following:
• Total generalization or specialization. Each higher-level entity must belong to a lower-level entity set.
•Partial generalization or specialization. Some higher-level entities may not belong to any lower-level entity
set.
Partial generalization is the default. We can specify total generalization in an E-R diagram by using a double
line to connect the box representing the higher-level entity set to the triangle symbol. (This notation is similar to
the notation for total participation in a relationship.)
The account generalization is total: All account entities must be either a savings account or a checking account.
Because the higher-level entity set arrived at through generalization is generally composed of only those entities
in the lower-level entity sets, the completeness constraint for a generalized higher-level entity set is usually
total. When the generalization is partial, a higher-level entity is not constrained to appear in a lower-level entity
set. The work team entity sets illustrate a partial specialization. Since employees are assigned to a team only
after 3 months on the job, some employee entities may not be members of any of the lower-level team entity
sets.

Prepared By- Charu Kavadia Page31


Database Management System Unit 1

4) Aggregation
There is a one limitation with E-R model that it cannot express .So aggregation is an abstraction through which
relationship is treated as higher level entities.
Aggregation is an abstraction in which relationship sets (along with their associated entity sets) are treated as
higher-level entity sets, and can participate in relationships.
For example: Center entity offers the Course entity act as a single entity in the relationship which is in a
relationship with another entity visitor. In the real world, if a visitor visits a coaching center then he will never
enquiry about the Course only or just about the Center instead he will ask the enquiry about both.

Figure: E-R diagram with Aggregation

Keys
Keys help us to identify any row of data in a table. In a real-world application, a table could contain thousands
of records. Moreover, the records could be duplicated. Keys ensure that we can uniquely identify a table record
despite these challenges.
Allows us to establish a relationship between and identify the relation between tables.
Help you to enforce identity and integrity in the relationship.
Types of Keys
1) Super Key- A super key is a set of one or more attributes (columns), which can uniquely identify a row in a
table.
Example-

The above table has following super keys. All of the following sets of super key are able to uniquely identify a
row of the employee table.
 {Emp_SSN}
 {Emp_Number}
 {Emp_SSN, Emp_Number}
 {Emp_SSN, Emp_Name}
 {Emp_SSN, Emp_Number, Emp_Name}

Prepared By- Charu Kavadia Page32


Database Management System Unit 1

 {Emp_Number, Emp_Name}
All the attributes in a super key are definitely sufficient to identify each tuple uniquely in the given relation but
all of them may not be necessary.
2) Candidate Key- A candidate key is a minimal super key with no redundant attributes. The following two set
of super keys are chosen from the above sets as there are no redundant attributes in these sets.
 {Emp_SSN}
 {Emp_Number}
Only these two sets are candidate keys as all other sets are having redundant attributes that are not necessary for
unique identification.
 All the attributes in a candidate key are sufficient as well as necessary to identify each tuple uniquely.
 Removing any attribute from the candidate key fails in identifying each tuple uniquely.
 The value of candidate key must always be unique.
 The value of candidate key can never be NULL.
 It is possible to have multiple candidate keys in a relation.
 Those attributes which appears in some candidate key are called as prime attributes.
 All the candidate keys are super keys, but all the super keys are not candidate keys.
 Adding zero or more attributes to the candidate key generates the super key.
 No, of candidate keys in a Relation are nC(floor(n/2)),for example if a Relation have 5 attributes i.e.
R(A,B,C,D,E) then total no of candidate keys are 5C(floor(5/2))=10.
3) Primary Key- The primary key is selected from one of the candidate keys and becomes the identifying key
of a table. It can uniquely identify any data row of the table. A primary key is a candidate key that the database
designer selects while designing the database.
In the above example, either {Emp_SSN} or {Emp_Number} can be chosen as a primary key for the table
Employee.
 The value of primary key can never be NULL.
 The value of primary key must always be unique.
 The values of primary key can never be changed i.e. no updation is possible.
 The value of primary key must be assigned when inserting a record.
 A relation is allowed to have only one primary key.
 Primary keys are not necessarily to be a single attribute (column). It can be a set of more than one attributes
(columns).

4) Alternate Key-Out of all Candidate Keys, only one gets selected as primary key, remaining keys are
Prepared By- Charu Kavadia Page33
Database Management System Unit 1

known as alternate or secondary keys. Or we can say, Candidate keys that are left unimplemented or unused
after implementing the primary key are called as Alternate Keys. Alternate Keys are also known as
Secondary Keys.
For example- In the above case, if {Emp_Number}is selected as primary key, then {Emp_SSN} is Alternate
Key.
5) Foreign Key-A Foreign Key is an attribute value in a table that acts as the primary key in another table.
Hence, the foreign key is useful in linking together two tables. Data should be entered in the foreign key column
with great care, as wrongly entered data can invalidate the relationship between the two tables. An attribute ‘X’
in a table is called as a foreign key to some other attribute ‘Y’ in another table, when its values are dependent on
the values of attribute ‘Y’. The attribute ‘X’ can assume only those values which are assumed by the attribute
‘Y’. Here, the relation in which attribute ‘Y’ is present is called as the referenced relation. And the attribute ‘Y’
is called referenced attribute. The relation in which attribute ‘X’ is present is called as the referencing relation
and the attribute ‘Y’ is called referencing attribute.
Example- STUD_NO in STUDENT_COURSE is a foreign key to STUD_NO in STUDENT relation shown
below.

 Foreign key references the primary key of the table.


 Foreign key can take only those values which are present in the primary key of the referenced relation.
 Foreign key may have a name other than that of a primary key.
 Foreign key can take the NULL value.
 There is no restriction on a foreign key to be unique. In fact, foreign key is not unique most of the time
(STUD_NO in STUDENT_COURSE relation is not unique. It has been repeated for the first and third tuples.
However, the STUD_NO in STUDENT relation is a primary key and it needs to be always unique, and it
cannot be null.).
 Referenced relation may also be called as the master table or primary table.
 Referencing relation may also be called as the foreign table.
 Foreign keys help to maintain data and referential integrity.
6) Composite Key- A primary key comprising of multiple attributes and not just a single attribute is called as a
Composite Key.
Example-
Prepared By- Charu Kavadia Page34
Database Management System Unit 1

None of these columns alone can play a role of key in this table. Column cust_Id alone cannot become a key as
a same customer can place multiple orders, thus the same customer can have multiple entries. Column
order_Id alone cannot be a primary key as a same order can contain the order of multiple products, thus same
order_Id can be present multiple times. Column product_code cannot be a primary key as more than one
customers can place order for the same product. Column product_count alone cannot be a primary key because
two orders can be placed for the same product count. Based on this, it is safe to assume that the key should be
having more than one attributes:
Key in above table: {cust_id, product_code}. This is a composite key as it is made up of more than one
attributes.
7) Partial Key- Partial key is a key using which all the records of the table can’t be identified uniquely.
However, a bunch of related tuples can be selected from the table using the partial key.
Example-

Here, using partial key Emp_no, we can’t identify a tuple uniquely but we can select a bunch of tuples from the
table.
8) Unique Key- Unique key is a key with the following properties-
 It is unique for all the records of the table.
 Once assigned, its value can’t be changed i.e. it is non-updatable.
 It may have a NULL value.
Example- The best example of unique key is Adhaar Card Numbers.
 The Adhaar Card Number is unique for all the citizens (tuples) of India (table).
 If it gets lost and another duplicate copy is issued, then the duplicate copy always has the same number
as before.
 Thus, it is non-updatable.
 Few citizens may not have got their Adhaar cards, so for them its value is NULL.

Prepared By- Charu Kavadia Page35


Database Management System Unit 1

9) Surrogate Key-Surrogate key is a key with the following properties-


 It is unique for all the records of the table.
 It is updatable.
 It can’t be NULL i.e. it must have some value.
Example- Mobile Number of students in a class where every student owns a mobile number.

Key Constraints
Constraints are nothing but the rules that are to be followed while entering data into columns of the database
table. Constraints ensure that data entered by the user into columns must be within the criteria specified by the
Condition.
For example, if we want to maintain only unique IDs in the employee table or if you want to enter only age
under 18 in the student table etc.
Types of Key Constraints
1) NOT NULL: It ensures that the specified column doesn’t contain a NULL value. Null represents a record
where data may be missing or data for that record may be optional. Once not null is applied to a particular
column, we cannot enter null values to that column and restricted to maintain only some proper value other than
null.

2) UNIQUE: It provides a unique / distinct values to specified columns. Sometimes we need to maintain only
unique data in the column of a database table, this is possible by using a unique constraint. Unique constraint
ensures that all values in a column are unique.

Prepared By- Charu Kavadia Page36


Database Management System Unit 1

3) DEFAULT: It provides a default value to a column if none is specified. When a column is specified as
default with some value then all the rows will use the same value i.e. each and every time while entering the
data we need not enter that value .But default column value can be customized i.e. it can be overridden when
inserting a data for that row based on the requirement.
4) CHECK: Checks for the predefined conditions before inserting the data inside the table. Suppose in real-
time if we want to give access to an application only if the age entered by the user is greater than 18 this is
done at the back-end by using a check constraint. Check constraint ensures that the data entered by the user for
that column is within the range of values or possible values specified.
5) PRIMARY KEY: A primary key is a constraint in a table which uniquely identifies each row record in a
database table by enabling one or more the column in the table as primary key. Primary keys must contain
UNIQUE values, and cannot contain NULL values.

6) FOREIGN KEY: It ensures referential integrity of the relationship. The foreign key constraint is a column
or list of columns which points to the primary key column of another table .The main purpose of the foreign key
is only those values are allowed in the present table that will match to the primary key column of another table.

Participation Constraint
Participation Constraint specifies the existence of an entity when it is related to another entity in a relationship
type. It is also called minimum cardinality constraint. This constraint specifies the number of instances of an
entity that can participate in a relationship type. There are two types of Participation constraint –
1)Total Participation
Each entity in the entity set is involved in at least one relationship in a relationship set i.e. the number of
relationship in every entity is involved is greater than 0.
For Example-

Prepared By- Charu Kavadia Page37


Database Management System Unit 1

2) Partial Participation
Each entity in entity set may or may not occur in at least one relationship in a relationship set.
For example:

Weak and Strong Entity and Entity Sets


An entity type should have a key attribute which uniquely identifies each entity in the entity set known as
Strong Entity, but there exists some entity type for which key attribute can’t be defined. These are called
Weak Entity.
Prepared By- Charu Kavadia Page38
Database Management System Unit 1

The entity sets which do not have sufficient attributes to form a primary key are known as Weak Entity Sets
and the entity sets which have a primary key are known as Strong Entity sets.
As the weak entities do not have any primary key, they cannot be identified on their own, so they depend on
some other entity (known as owner entity). The weak entities have total participation constraint (existence
dependency) in its identifying relationship with owner identity. Weak entity types have partial keys. Partial
Keys are set of attributes with the help of which the tuples of the weak entities can be distinguished and
identified.
Weak entity always has total participation but Strong entity may not have total participation. Weak entity is
depended on strong entity to ensure the existence of weak entity. Like strong entity, weak entity does not have
any primary key, It has partial discriminator key. Weak entity is represented by double rectangle. The relation
between one strong and one weak entity is represented by double diamond.
Example

Example for weak entity


o In the ER diagram, we have two entities Employee and Dependents.
o Employee is a strong entity because it has a primary key attribute called Employee number
(Employee_No) which is capable of uniquely identifying all the employee.
o Unlike Employee, Dependents is weak entity because it does not have any primary key.
o D_Name along with the Employee_No can uniquly identfy the records of Depends. So here the
D_Name (Depends Name) is partial key.

Prepared By- Charu Kavadia Page39


Database Management System Unit 1

Difference between Strong and Weak Entity


S.No. Strong Entity Weak Entity
1. Strong entity always has primary key. While weak entity has partial discriminator key
2. Strong entity is not dependent of any other Weak entity is dependent on strong entity.
entity.
3. Strong entity is represented by single rectangle. Weak entity is represented by double rectangle.
4. Two strong entity’s relationship is represented While the relation between one strong and one
by single diamond. weak entity is represented by double diamond.
5. Strong entities have either total participation or While weak entity always has total
not. participation.

Conceptual Database Design with E-R Model (High Level Conceptual Data Model)
Following figure shows a simplified overview of the Database Design Process-

Figure: Main Phases of Database Design


1) The first step shown is requirements collection and analysis. During this step, the database designers
interview prospective database users to understand and document their data requirements. The result of this
step is a concisely written set of users’ requirements. These requirements should be specified in as detailed and
complete a form as possible. In parallel with specifying the data requirements, it is useful to specify the known
functional requirements of the application. These consist of the user defined operations (or transactions) that
will be applied to the database, including both retrievals and updates. In software design, it is common to use
data flow diagrams, sequence diagrams, scenarios, and other techniques to specify functional requirements.
2) Once the requirements have been collected and analysed, the next step is to create a conceptual schema for
the database, using a high-level conceptual data model i.e. E-R Model. This step is called Conceptual Design.

Prepared By- Charu Kavadia Page40


Database Management System Unit 1

The conceptual schema is a concise description of the data requirements of the users and includes detailed
descriptions of the entity types, relationships, and constraints; these are expressed using the concepts provided
by the high-level data model. Because these concepts do not include implementation details, they are usually
easier to understand and can be used to communicate with nontechnical users. The high-level conceptual
schema can also be used as a reference to ensure that all users’ data requirements are met and that the
requirements do not conflict. This approach enables database designers to concentrate on specifying the
properties of the data, without being concerned with storage and implementation details, which makes it is
easier to create a good conceptual database design.
During or after the conceptual schema design, the basic data model operations can be used to specify the high-
level user queries and operations identified during functional analysis. This also serves to confirm that the
conceptual schema meets all the identified functional requirements. Modifications to the conceptual schema can
be introduced if some functional requirements cannot be specified using the initial schema.
3)The next step in database design is the actual implementation of the database, using a commercial DBMS.
Most current commercial DBMSs use an implementation data model—such as the relational (SQL) model—so
the conceptual schema is transformed from the high-level data model into the implementation data model.
This step is called logical design or data model mapping; its result is a database schema in the implementation
data model of the DBMS. Data model mapping is often automated or semi-automated within the database
design tools.
4) The last step is the physical design phase, during which the internal storage structures, file organizations,
indexes, access paths, and physical design parameters for the database files are specified. In parallel with these
activities, application programs are designed and implemented as database transactions corresponding to the
high-level transaction specifications.
Example-
Database Design for Banking Enterprise
1) Data Requirements
The initial specification of user requirements may be based on interviews with the database users, and on the
designer’s own analysis of the enterprise. The description that arises from this phase serves as the basis for
specifying the conceptual structure of the database. The major characteristics of the banking enterprise-
 The bank is organized into branches. Each branch is located in a particular city and is identified by unique
name. The bank monitors the assets of each branch.
 Bank customers are identified by their customer-id values. The bank stores each customer’s name, and the
street and city where the customer lives. Customers may have accounts and can take out loans. A customer
may be associated with a particular banker, who may act as a loan officer or personal banker for that customer.
 Bank employees are identified by their employee-id values. The bank administration stores the name and
telephone number of each employee, the names of the employee’s dependents, and the employee-id number of

Prepared By- Charu Kavadia Page41


Database Management System Unit 1

the employee’s manager. The bank also keeps track of the employee’s start date and, thus, length of
employment and the employee’s manager.
 The bank offers two types of accounts—savings and checking accounts. Accounts can be held by more than
one customer, and a customer can have more than one account. Each account is assigned a unique account
number. The bank maintains a record of each account’s balance, and the most recent date on which the
account was accessed by each customer holding the account. In addition, each savings account has an interest
rate, and overdrafts are recorded for each checking account.
 A loan originates at a particular branch and can be held by one or more customers. A loan is identified by a
unique loan number. For each loan, the bank keeps track of the loan amount and the loan payments. Although
a loan payment number does not uniquely identify a particular payment among those for all the bank’s loans, a
payment number does identify a particular payment for a specific loan. The date and amount are recorded for
each payment.
2) Entity Set Designation
Our specification of data requirements serves as the starting point for constructing a conceptual schema for the
database. From the characteristics listed in previous step, we begin to identify entity sets and their attributes:
 The branch entity set, with attributes branch-name, branch-city, and assets.
 The customer entity set, with attributes customer-id, customer-name, customer street; and customer-city.
A possible additional attribute is banker-name.
 The employee entity set, with attributes employee-id, employee-name, telephone number, salary, and
manager. Additional descriptive features are the multivalued attribute dependent-name, the base
attribute start-date, and the derived attribute employment-length.
 Two account entity sets—savings-account and checking-account—with the common attributes of
account-number and balance; in addition, savings-account has the attribute interest-rate and checking-
account has the attribute overdraft-amount.
 The loan entity set, with the attributes loan-number, amount, and originating branch.
 The weak entity set loan-payment, with attributes payment-number, payment date, and payment-amount.
3) Relationship Set Designation
Now we will specify the following relationship sets and mapping cardinalities. In the process, we also refine
some of the decisions we made earlier regarding attributes of entity sets.
 borrower, a many-to-many relationship set between customer and loan.
 loan-branch, a many-to-one relationship set that indicates in which branch a loan originated. Note that
this relationship set replaces the attribute originating branch of the entity set loan.
 loan-payment, a one-to-many relationship from loan to payment, which documents that a payment is
made on a loan.

Prepared By- Charu Kavadia Page42


Database Management System Unit 1

 depositor, with relationship attribute access-date, a many-to-many relationship set between customer and
account, indicating that a customer owns an account.
 cust-banker, with relationship attribute type, a many-to-one relationship set expressing that a customer
can be advised by a bank employee, and that a bank employee can advise one or more customers. Note
that this relationship set has replaced the attribute banker-name of the entity set customer.
 works-for, a relationship set between employee entities with role indicators manager and worker; the
mapping cardinalities express that an employee works for only one manager and that a manager
supervises one or more employees. Note that this relationship set has replaced the manager attribute of
employee.
3) E-R Diagram
Drawing on the discussions in previous section, we now present the completed E-R diagram for our example
banking enterprise. Following figure depicts the full representation of a conceptual model of a bank, expressed
in terms of E-R concepts. The diagram includes the entity sets, attributes, relationship sets, and mapping
cardinalities arrived at through the design processes of Sections 1 and 2, and refined in Section 3.

Figure: E-R diagram for a banking enterprise.

Prepared By- Charu Kavadia Page43


Database Management System Unit 1

Entity v/s Attribute

Entity v/s Relationship

Prepared By- Charu Kavadia Page44


Database Management System Unit 1

Example-

In the above diagram, Lecturer, Course, Student are entities. They are also called strong entities as they do not
depend on other entities. The Lecturer entity has attributes id, name, and specialty. The Course entity has the
attributes course_id and course name. The Student entity has the id and name attribute. The Exam entity
depends on the Course Entity. Therefore, Exam is a weak entity. It has the attributes name, date, starting_time
and duration.
In the above diagram, follows, conducts and has are relationships of cardinality n:m, 1:m, 1:1 respectively.

Binary v/s Ternary Relationship


Binary Relationship
A Binary Relationship is when two entities participate and is the most common relationship degree. When such
a relationship is present we say that the degree is 2. This is the most common degree of relationship. It is easy to
deal with such relationship as these can be easily converted into relational tables.
For example, we have two entity type ‘Customer’ and ‘Account’ where each ‘Customer’ has an ‘Account’
which stores the account details of the ‘Customer’. Since we have two entity types participating we call it a
binary relationship. Also, one ‘Customer’ can have many ‘Account’ but each ‘Account’ should belong to only
one ‘Customer’. We can say that it is a one-to-many binary relationship.
Ternary Relationship
A Ternary(degree 3) Relationship exists when exactly three entity type participates. When such a relationship is
present we say that the degree is 3. As the number of entity increases in the relationship, it becomes complex to
convert them into relational tables.
For example, we have three entity type ‘Employee’, ‘Department’ and ‘Location’. The relationship between
these entities are defined as an employee works in a department, an employee works at a particular location. So,

Prepared By- Charu Kavadia Page45


Database Management System Unit 1

we can see we have three entities participating in a relationship so it is a ternary relationship. The degree of this
relation is 3.

Aggregation v/s Ternary Relationship


Aggregation
It is used when we have to model a relationship involving (entity sets and) a relationship set. Aggregation
allows a relationship set to be treated as an entity set for purposes of participation in (other) relationships.
Aggregation is a process when relation between two entities is treated as a single entity.
Example-

Figure : Aggregation
In the diagram above, the relationship between Center and Course together, is acting as an Entity, which is in
relationship with another entity Visitor. Now in real world, if a Visitor or a Student visits a Coaching Center,
he/she will never enquire about the center only or just about the course, rather he/she will ask enquire about
both.
Ternary Relationship
A ternary relationship is a relationship of degree three. That is, a relationship that contains three participating
entities. Cardinalities for ternary relationships can take the form of 1:1:1, 1:1: M, 1: M: N or M: N: P. The
cardinality constraint of an entity in a ternary relationship is defined by a pair of two entity instances associated
with the other single entity instance. For example, in a ternary relationship R(X, Y, Z) of cardinality M: N: 1,
for each pair of (X, Y) there is only one instance of Z; for each pair of (X, Z) there are N instances of Y;
for each pair of (Y, Z) there are M instances of X. For example, note the relationships (and their consequences)
in the following Figure which are represented by the following business rules:
• A DOCTOR writes one or more PRESCRIPTIONs.

Prepared By- Charu Kavadia Page46


Database Management System Unit 1

• A PATIENT may receive one or more PRESCRIPTIONs.


• A DRUG may appear in one or more PRESCRIPTIONs. (To simplify this example, assume that the business
rule states that each prescription contains only one drug. In short, if a doctor prescribes more than one drug, a
separate prescription must be written for each drug.)

Figure: Ternary Relationship

Conceptual Design for a Large Enterprise


 The process of conceptual design consists of more than just describing small fragments of the
application in terms of ER diagrams.
 For a large enterprise, the design may require the efforts of more than one designer and span data and
application code used by a number of user groups.
 An important aspect of the design process is the methodology used to structure the development of the
overall design and to ensure that the design takes into account all user requirements and is consistent.
 The usual approach is that the requirements of various user groups are considered, any conflicting
requirements are somehow resolved, and a single set of global requirements is generated at the end of
the requirements analysis phase.
 An alternative approach is to develop separate conceptual schemas for different user groups and to then
integrate these conceptual schemas.
 To integrate multiple conceptual schemas, we must establish correspondences between entities,
relationships, and attributes, and we must resolve numerous kinds of conflicts.

Prepared By- Charu Kavadia Page47

You might also like