DBMS Notes Unit 1
DBMS Notes Unit 1
(BCS501)
UNIT -1
Table of Contents
1. Introduction ............................................................................................................................... 1
Data and Information...................................................................................................................... 1
Data 1
Information .................................................................................................................................... 2
Every organization works around five resources, generally referred to as 5Ms named Man,
Machine, Material, Money, and Methods.
Here Man refers to the human resource or the employees in the organization, machine
refers to the machinery used to perform various tasks in the organization, material refers to
the raw material and finished goods produced in the organization, money refers to the
financials available in the organization in various forms and Methods refers to the way an
organizationworks.
If the organizations do not have accurate information about these five resources available
tothem, they cannot work efficiently; that is why information is the 6th resource.
The information cannot be generated about these resources until we have the relevant
dataabout these resources like how many people we have in our organization, what is their
expertise, how much money we have with us to invest, what kind of machines we have,
and how to accomplish a particular task.
Data is the raw material to produce information, which is the most valuable input for
organizations today to do the effective decision making. It is generally kept under high
security, just like cash.
Data can be managed in various formats depending on the type of data, type of security
required (high, low, medium), type of access required (like web, mobile, desktop), and
somemore factors.
This chapter aims to introduce the data, information, and data management techniques
usingspecialized systems called Relational Database Management System (RDBMS).
Data
Data are the facts that can be recorded in the form of texts, numbers, videos, audio, or
images.
For example:-
Roll number, student name, course name, branch name, date of birth, admission
dateare the facts about a student.
Book number, book title, author name, publisher name, purchase date, and price
arethe facts about a book in a library.
1
Location, date, time are the facts managed by Google Maps while we move from
oneplace to another.
Videos on YouTube are the facts posted/hosted by a diverse set of users and
for adiverse set of viewers.
Each piece of fact is technically called a datum, and the plural of datum is called data.
The collection of related data fields for a particular individual or item is called a data
record.For example:-
Roll number Student course name branch date of birth admission
Name name date
101 Ram B.Tech CSE 12-May-2001 03-Jun-2020
102 Shyam MCA NA 16-Apr-2002 04-Apr-2021
Figure 1. 1: Representation of Data Record
In the Figure 1.1, Roll number, student name, course name, branch name, date of birth,
admission date about a student say Ram is an example of Data Record.
Information
Data in itself has very limited utility when it comes to decision-making. Data needs to be
processed using a set of procedures to convert it into information. This information can
now be used for the desired purpose. The information will always be need-based or
purpose- specific.
For Example:-
Roll number, student name, course name, branch name, date of birth, admission date are
thefacts about a student – which is "data".
Now, if we process this data to find – Number of students in a particular branch? Number
of students below the age of 18 years? This is “Information”.
The information is user-need specific. Something which is information for one user can be
justa piece of data for another and vice-versa.
2
If a student scores greater than 90 marks in specific course that can be indicator of the
area of interest of that student. This represents the “Knowledge”.
An intelligent system can be build using the acquired knowledge to recommend the job to
thestudent according to their area of interest. This represents an “Intelligence”.
Therefore, data plays a very important role in building intelligent systems. We cannot build
intelligence systems until we have a correct and comprehensive set of data.
Managing Data
We often need to access and re-sort data for various uses. These may include:
o Creating mailing lists
o Writing management reports
o Generating lists of selected news stories
o Identifying various client needs
To manipulate the data, we need to store the data in well-defined structures. These
requirements, such as persistence of data, operations on data, etc., prompted the
development of database systems.
Database
Here organized means, all data is described and associated with other data. All data in a
database are related to each other; separate databases should be created to manage
unrelated data.
3
A database is integral to any company or organization. This is because the database stores
all the pertinent details about the company, such as employee records, business
transactional records, salary details, inventory, etc.
Database Management System (DBMS) is the software that is used to manage the
database.DBMS allows the users to carry out the following tasks:-
Data Definition: It helps in the creation, modification, and removal of definitions
thatdefine the organization of data in the database.
Data Updation: It helps in the insertion, modification, and deletion of the actual
data in the database.
Data Retrieval: It helps in the retrieval of data from the database, which can be
used by applications for various purposes.
User Administration: It helps in registering and monitoring users, enforcing data
security, monitoring performance, maintaining data integrity, dealing with concurrency
control, and recovering information corrupted by unexpected failure.
A database needs to be hosted or created on DBMS. Some of the famous DBMS are:
PostgreSQL
MySQL
Microsoft Access
SQLite
To access any data from a database through an application, we need the layer of DBMS. An
application uses the DBMS functionalities to store the data and retrieve the data from
database. The following images show how an application access the data from the
database.
4
File System Vs Database Management System
The first general-purpose DBMS was designed by Charles Bachman at General Electric inthe
early 1960s, called the Integrated Data Store. It formed the basis for the network data
5
model, which was standardized by the Conference on Data Systems Languages (CODASYL)and
strongly influenced database systems through the 1960s.
The other important events in history are mentioned below:
1960s: navigational databases, mostly hierarchical databases and network
databases were first used
1970: E.F. Codd from IBM introduces the concept of relational databases and
thefirst normal form
1971: second and third normal form are introduced by Codd
1973: project INGRES (the predecessor of PostgreSQL) begins in Berkeley
1974: project System R begins at IBM, developing the first implementation of
SQL
1974: Boyce-Codd normal form is introduced
1976: the entity-relationship model is introduced by Peter Chen from MIT
1978: the first release of Oracle DB, based on IBM’s papers
1979: public release of Oracle Version 2 (version 1 was never released)
1983: the very first release of IBM DB2 database
1985: project INGRES ends, the post-Ingres project starts in Berkeley
1990: the first public release of Postgres database
1990s: relational databases are the industry standard
2000s: NoSQL and NewSQL movements emerge
Database Design
Data Models
Database models:
The model represents a perception of reality. For example, a city map represents the
model of the city in reality.
A data model is a collection of concepts that can be used to represent the structure of a
database in a simple way. It provides a way to describe the design of the database at the
physical, logical and view level.
6
Importance of Data Models:
The most prevalent database model for application development today is the Relational
model. This will be our focus area for this course. We will also discuss two legacy data
models – Hierarchical Model and Network model, briefly for completeness of discussion:
1. Hierarchical Model
Hierarchical Database Model, as the name suggests, is a database model in which the data
is arranged in a hierarchical tree edifice. As it is arranged based on the hierarchy, every
record of the data tree should have at least one parent, and each parent should have one
or more child records, except for the child records in the last level. The Data can be
accessed by following through the classified structure, always initiated from the Root or
the first parent. Hence this model is named as Hierarchical Database Model.
Hierarchical database model is one of the oldest database models, which originated in
1950. One of the first hierarchical database model was Information Management System
developedby North American Rockwell company and IBM.
In this model, data is stored in the form of records which are the collection of fields. The
records are connected through links. Each field in a record can contain only one value. The
first node of the tree is called the root node. When data needs to be retrieved, then the
whole tree is traversed starting from the root node. This model can represent only one-to-
many relationships. It is not able to represent many-to-many relationships. (Note:- we will
study relationships in detail later in the course)
Let us take an example of college students who take different courses. A course can be
assigned to a student, but a student can take as many courses as they want.
7
Example:
It does not indicate the relationships and cannot answer the following queries:
What are the courses being offered by a faculty?
What are the courses being attended by a student?
Whom are the students being taught by a faculty?
Whom are the faculty teaching a student?
As shown in Figure. 1.7(a), when one course is taken by more than one student, then we need
to replicate the student record. As shown in the above example, Anil has registered for
8
courses Java and Python and thus, the record of Anil is repeated twice. Also, we can see
that record of Rohan has also appeared twice in this model. It leads to the redundancy of
data. The same hierarchical structure can be represented for the link course to teacher
shown in Figure 1.7(b).
A Hierarchical database model was widely used during the Mainframe Computers Era.
Today, it is used mainly for storing file systems and geographic information. It is used in
applications where high performance is required, such as telecommunications and banking.
A hierarchical database is also used for Windows Registry in the Microsoft Windows
operating system. It isuseful where the following two conditions are met:
1. The data should be in a hierarchical pattern, i.e., a parent-child relationship must
be present.
2. The data in a hierarchical pattern must be accessed through a single path only.
Advantages:
Simplicity: It is conceptually simple due to the parent-child relationship. A
clear chain of command or authority is there. Data can be retrieved easily due to the
explicit links present between the table structures.
Data Integrity: Referential integrity is always maintained, i.e., any changes
made in the parent table are automatically updated in a child table.
Data Security: It is provided and enforced by DBMS.
Efficiency: It is efficient when a large number of one-to-many relationships
exist in a database
Limitations:
A child record cannot be created if it is not linked to any parent record. In
ourexample above, we cannot create a course until it is opted by any student.
M: N relationship (many-to-many) is not supported.
Redundancy can result in data inconsistency.
Change in structure leads to change in all application programs.
No data manipulation language or data definition language.
Poor flexibility
Rigid structure
2. Network Model
The network model replaces the hierarchical tree with a graph, thus allowing more general
connections among the nodes. It allows a record to have more than one parent and
multiple child records, thus permits many-to-many relationships. The network model was
adopted by the Conference on Data System Language (CODASYL) Data Base Task Group in
1969 and underwent a major update in 1971. The operations on the network model are
done with the
9
help of the circular linked list. A child node in the network model can have more than one
parent record, as shown in Figure 1.8.
Example:
An example of a network model, as shown in Figure 1.9, shows that the record of Anil has
two parent records Python and Java. Similarly, the record of Rohan has two parent records
of record type Course. It reduces the redundancy of data as depicted in Figure 1.7(a),
showing hierarchical structure storing the same set of records.
Advantages:
Conceptual Simplicity: It is conceptually simple and easy to design.
10
Capability to handle more relationship types: It handles many to many
relationships, which is a real help in modeling real-life situations.
Ease of data access: The data access is easier and flexible as compared to
thehierarchical model.
Data Integrity: It ensures data integrity as a user must define the owner
recordbefore defining the member record.
Limitations:
Implementation Complexity: It requires the knowledge of actual physical data
storage for navigational data access. The database administrators, designers,programmers,
and even the end-users should be familiar with the internal data structure. It can't be used
to create user-friendly DBMS.
Absence of structural Independence: The data access method is navigational,
which makes structural change very difficult and impossible in most cases. So if changes
are made in the database structure, then all the application programs need to be modified.
Any change like updation, deletion, insertion is very complex.
The data operation like retrieval, deletion, and insertion is very complex because of the
graph structure. The network model uses the graph structure for data storage.
3. Relational Model
The relational model was introduced in 1970 by E. F. Codd (of IBM) in his landmark paper
“A Relational Model of Data for Large Shared Databanks” (Communications of the ACM,
June 1970, pp. 377−387). The relational model foundation is a mathematical concept
known as a relation. A relation is also called a table, as a matrix composed of intersecting
rows and columns. These tables represent both data and relationships amongst the data.
Each table corresponds to an application entity. Each row in a relation is called a tuple and
represents an instance of that entity. Each column represents an attribute. The relational
model also describes a precise set of data manipulation constructs based on advanced
mathematical concepts.
The relational data model is implemented through a very sophisticated relational database
management system (RDBMS). The RDBMS performs the same basic functions provided by
the hierarchical and network DBMS systems, in addition to other functions. Figure 1.10,
shows a relational model corresponding to the network model of records Course, Teacher
and Student.
11
Relation Attributes
Student : (S_id, S_name, S_age)
Course : (C_id, C_name)
Teacher : (T_id, T_name)
Teacher Course : (T_id, S_id)
Student Course: (S_id, C_id)
According to DB-Engines, in January 2021, the most widely used systems were Oracle,
MySQL (free software), Microsoft SQL Server, PostgreSQL (Open Source, a continuation
development after INGRES), IBM DB2, SQLite (free software), Microsoft Access, MariaDB
(free software), Teradata, Microsoft Azure SQL Database, Apache Hive (free software;
specialized for data warehouses)
According to research company Gartner, in 2011, the five leading proprietary software
relational database vendors by revenue were Oracle (48.8%), IBM (20.2%), Microsoft
(17.0%),SAP including Sybase (4.6%), and Teradata (3.7%).[24]
12
Advantages:
Structural independence is promoted using independent tables. Database
usersneed not be familiar with internal data storage.
The tabular view improves conceptual simplicity. Database designers need
tofocus on logical design rather than knowing the physical data storage details.
Ad hoc query capability is based on SQL. SQL is 4GL which permits the user
tospecify only what needs to be done without specifying how it will be done.
Isolates the end-user from physical-level details
Improves implementation and management simplicity
Precision: The usage of relational algebra and relational calculus in the
manipulation.
Limitations:
Requires substantial hardware and system software overhead: It requires
more powerful hardware and also physical storage devices to hide the complexity of
implementation and data storage.
Conceptual simplicity gives untrained people the tools to use a good system
poorly. Poor database design degrades the performance when the database grows and it
slows down a system.
May promote information problems, i.e., information island phenomenon. As
relational data model is easy to use, thus many departments may create their own database
application, which creates the problem in integrating information and may create the
issues of data inconsistency, redundancy, duplication, etc.
Few relational databases have limits on the length of fields. It cannot be
exceededto accommodate more information.
There are some database model that extends the concept of object-oriented programming
are discussed below in brief. The detailed discussion on these models is beyond the scope
ofthis book.
Relational database models may fail to handle the needs of complex information systems.
The problem with RDBMS is that they force an information system to model into form of
tables. Increasingly complex real-world problems demonstrated a need for a data model
that more closely represented the real world. In the object-oriented data model (OODM),
both data and their relationships are contained in a single structure known as an object.
Object oriented models represents an entity as a class and its instance as an object. A class
represents both attributes and the behavior of an entity. Object-oriented data models can
typically be represented using Unified Modeling Language (UML) class diagrams. It is used
to represent data and their relationships. Some object-oriented databases are designed to
work well with object-oriented programming languages such as Delphi, Ruby, Python,
13
JavaScript, Perl, Java, C#, Visual Basic .NET, C++, Objective-C and Smalltalk; others such as
JADE have their own programming languages
Advantages:
a. Capability to handle large number of different data types. It can store any type of
dataincluding text, numbers, pictures, voice and video.
b. Object oriented programming with database technology
c. Semantic content is added.
d. Object oriented features improve productivity: Inheritance, Polymorphism and
Dynamic binding
Limitations:
a. Difficult to maintain
b. Not suited for all applications.
Advantages:
a. Complex data sets can be saved and retrieved quickly and easily.
b. Works well with object-oriented programming languages.
Limitations:
a. Object databases are not widely adopted.
b. In some situations, the high complexity can cause performance problems.
c. High system overhead slows transactions
14
Figure 1. 11 : Database System Architecture
External level
It is also called view level. The reason this level is called “view” is that several users can
view their desired data from this level which is internally fetched from the database with
the helpof conceptual and internal level mapping.
The user doesn’t need to know the database schema details such as database structure,
table definition, etc. The user is only concerned about data that is returned to the view
level after it has been fetched from the database (present at the internal level).
Each external level view is used to cater to the needs of a particular category of users. For
Example, FACULTY of a university is interested in looking at the course details of students;
STUDENTS are interested in looking at all details related to academics, accounts, courses,
and hostel details as well. So, different views can be generated for different users. The main
focusof the external level is data abstraction.
The external level is the “top-level” of the Three Level DBMS Architecture.
Conceptual level
It is also called the logical level. The conceptual schema describes the Database structure
of the whole database for the community of users. This schema hides information about
the physical storage structures and focuses on describing entities (tables) and their
relationships, attributes (data elements) and their data types & constraints, user
authorization, and authentication, etc.
For example, the STUDENT database may contain STUDENT and COURSE tables that will be
visible to users, but users are unaware of their storage.
15
Internal level
This level is also known as the physical level. This level describes how the data is actually
stored in the storage devices like discs and tapes. This level is also responsible for
allocating space to the data. This is the lowest level of architecture.
Data Independence
The main objective of the 3-level architecture is to provide Data Independence, which
means that upper levels are unaffected by changes at lower levels. There are two types of
data independence:
1. Physical Data Independence: Any change in the physical location of tables and
indexes should not affect the conceptual level or external view of data. This data
independence iseasy to achieve and implemented by most of the DBMS today.
2. Conceptual Data Independence: The data at conceptual level schema and external
level schema must be independent. This means a change in conceptual schema should not
affect external schema. e.g., Adding or deleting attributes of a table should not affect the
user's view of the table. But this type of independence is difficult to achieve as compared to
physical data independence because some unavoidable changes in conceptual schema are
reflected in the user's view. Logical data independence requires flexibility in the design
16
of the database, and the programmer has to foresee future requirements or modificationsin
the design. Both these conditions are difficult to meet completely in the practical world.
Database designing for a real-world application starts from capturing the requirements to
physical implementation using DBMS software which consists of the following steps
shownbelow:
Conceptual Design: The requirements of the database are captured using the high-level
conceptual data model. For example, the ER model* is used for the conceptual design of
thedatabase.
Logical Design: Logical Design represents data in the form of a relational model. ER
diagram* produced in the conceptual design phase is used to convert the data into the
Relational Model.
Single-tier Architecture
our PC or laptop.
Figure 1. 13 : Single Tier Architecture
Two-tier Architecture
In a two-tier architecture, the application (including both presentation layer and business
logic layer) resides at the client machine, where it invokes database system (DBMS)
functionality, over the network, at the server machine through query language
statements. Application program interface standards like ODBC and JDBC are used for
interaction between the client and the server. For example – desktop applications, games,
music players, etc.
18
Figure 1. 14 : Two-Tier Architecture
Three-tier Architecture
In a three-tier architecture, the client machine acts as merely a front end and does not
contain any direct database calls. Instead, the client end (application client) communicates
with an application server, usually through a presentation layer. The application server, in
turn, communicates with a database system (DBMS) to access data. The business logic of
the application, which says what actions to carry out under what conditions, is embedded
in the application server instead of being distributed across multiple clients. Three-tier
applications are more appropriate for large applications and for applications that run on
World Wide Web.
Database server architecture defines the location where database management systems
get managed by an organization. It depends upon the number of users, type of data access,
and security requirements on the data.
19
A database management system can be used by only one person or group of people. It can
be used on a single computer using a single application, or it can be used by multiple
people using multiple devices like Desktop Applications, Web Applications, and Mobile
Applications. Sometimes we have very critical data that need to be highly secure from
destruction by people in the organization, hackers, or natural disasters.
Based on these requirements, the database administrator needs to decide the location of the
database management system, known as database server architecture.
Database Server Architecture can be broadly categorized into three categories.
1. Centralized Database Server Architecture
2. Decentralized Database Server Architecture
3. Distributed Database Server Architecture
In the case of centralized server architecture, the database management system is placed
ona single server or centralized server. It is generally used when the organization is very
small. This kind of database management requires fewer resources and costs, but it has
some set oflimitations. It requires regular backup. If we have not taken regular backup and
somethinggoes wrong with the server, then data can be lost.
All information sharing and communication is done via that central server. If that central
server gets failed, then all the users get disconnected, and no information sharing is
possible. It is also called the central point of failure.
20
In the case of decentralized server architecture, there are multiple servers, and each server
has the same copy of the database. Different users can access data from different servers.
If one server goes down due to some issue, then the users accessing that particular server
will be affected, and the rest of the users will be working fine.
This server architecture also helps in load sharing among the users, and the performance
of database management systems is not affected.
But there is always a limit to manage such servers, and such servers can be hacked or
damaged. Data is controlled by database administrators who have permission to access
and update the data.
Distributed Database
When the database is placed in such a manner that multiple copies of the same data are
kept on multiple machines in a peer-to-peer model, it is known as distributed database
server architecture. All such machines are known as nodes.
Once the data is updated on one machine, it gets automatically updated on pear nodes.
If data get corrupted on one node, it can be reinstalled back from another peer node. Since
the data get placed on thousands of machines, it is considered to be non-hackable data.
For example, Cryptocurrencies like Bitcoin and Ethereum uses distributed database
management. Data is managed on thousands of nodes. Anyone can make their own node
to keep a local copy of the data. If you do any transaction using Bitcoin or Ethereum
network, such data get updated on all the peer nodes connected in the network.
Even if your node is not online at a specific moment of time even then data will be updated
automatically when you get connected with the network.
Telecom: There is a database to keep track of the information regarding calls made,
network usage, customer details, etc. Without the database systems, it is hard to
maintain that huge amount of data that keeps updating every millisecond.
Online shopping: You must be aware of online shopping websites such as Amazon,
Flipkart, etc. These sites store the product information, your addresses and
preferences, credit details and provide you the relevant list of products based on
your query. All this involves a Database management system.
Industry: Where it is a manufacturing unit, warehouse, or distribution center, each one
needs a database to keep the records of the ins and outs. For example distribution
21
center should keep track of the product units that are supplied into the center as well
as the products that got delivered out from the distribution center on each day; this
is where DBMS comes into the picture.
Banking System: Storing customer info, tracking day-to-day credit and debit
transactions, generating bank statements, etc., has been done with the help of
Database management systems.
Sales: Store customer information, production information, and invoice details.
Airlines: To travel through airlines, making early reservations; this reservation
information, along with the flight schedule, is stored in a database.
Education sector: Database systems are frequently used in schools and colleges to
store and retrieve the data regarding student details, staff details, course details,
exam details, payroll data, attendance details, fees details, etc.
Database Users
Database users: -
There are three main types of database-system users, differentiated by the way they
expect to interact with the system.
Naive users are unsophisticated users who interact with the system by invoking one
of the application programs that have been written previously. Their main job function
revolves around constantly querying and updating the database. These kinds of
transactions are called canned transactions. For example, a clerk in the University
who needs to add a new instructor to Department A invokes a program
called NEW_HIRE. This program asks the clerk for the name of the new instructor, her
new ID, the name of the Department (i.e., A), and the salary. The typical user interface
for naive users is a forms interface, where the user can fill in appropriate fields of the
form. Naive users may also simply read reports generated from the database.
Application programmers are computer professionals who write application
programs. Application programmers can choose from many tools to develop user
interfaces.
Sophisticated users interact with the system without writing application programs.
Instead, they form their requests either using a database query language or by using
tools such as data analysis software. These include engineers, scientists, business
analysts, and others who are thoroughly familiarized with the DBMS to meet their
complex requirements.
Database Designer:
Database designers are responsible for identifying the data to be stored in the
22
database and for choosing appropriate structures to represent and store this data. These
tasks are mostly undertaken before the database is actually implemented and populated
with data. It is the responsibility of the database designer to communicate with all
prospective database users in order to understand their requirements and to create a
design that meets these requirements.
Database Administrator:
In a database environment, the primary resource is the database itself, and the secondary
resource is the DBMS and related software. Administering these resources is the
responsibility of the database administrator (DBA). The function of a DBA include:
Schema definition: The DBA creates the original database schema by executing a
set of data definition statements in the DDL.
Storage structure and access method definition
Schema and physical organization modification: The DBA carries out changes to
the schema and physical organization to reflect the changing needs of the organization orto
alter the physical organization to improve performance.
Granting of authorization for data access: By granting the different types of
authorization, the database administrator can regulate which parts of the database various
users can access. The authorization information is kept in a special system structure that
the database system consults whenever someone attempts to access the data in the
system.
Routine maintenance: Examples of database administrator’s routine maintenance
activities are:
o Periodically backing up the database, either onto tapes or onto remote
servers,to prevent loss of data in case of disasters such as flooding.
o Ensuring that enough free space is available for normal applications and
upgrading disk space as required.
o Monitoring jobs running on the database and ensuring that performance is
notdegraded by very expensive tasks submitted by some users.
23
Review Questions
Multiple Choice
Questions
Immunity of the external schemas (or application programs) to changes in the
conceptual schema is referred to as: (ISRO CS 2018)
a) Physical Data Independence
b) Logical Data Independence
c) Both (a) and (b)
d) None of the above
Which of the following products was an early implementation of the relational model
developed by E.F. Codd of IBM?
a) IDMSb)DB2
c) dBase-IId)R:base
An application where only one user accesses the database at a given time is an example of
.
a) single-user database applicationb)multiuser database application
c) e-commerce database application
d) data mining database application
Database is generally
a) System-centredb)User-centred
c) Company-centredd)Data-centred
Which of the following isn’t a level of abstraction?
a) physicalb)logical
c) userd)view
The level helps application programs hide the details of data types.
a) physicalb)logical
c) userd)view
A level that describes data stored in a database and the relationships among the data.
a) physicalb)logical
c) userd)view
There are numerous merits of using the database approach in contrast to file processing
approaches. Which among the listed options is/are not true of using the database approach?
a) Data redundancy can be reduced.
b) Data inconsistency can be avoided to some extent.
c) Computing resources needed for data processing can be reduced
d) Data processing standards can be enforced.
25
Conceptual Questions
o Why would you choose a database system instead of simply storing data in operating system
files? When would it make sense not to use a database system?
o What is data independence and how does a DBMS support it?
o Explain the difference between external, internal, and conceptual schemas. How are these
different schema layers related to the concepts of logical and physical data independence?
o Define the roles and responsibilities of DBA.
o Differentiate 1-Tier, 2-Tier, 3-Tier architecture with diagram?
o What are the three types of data models?
o What are the disadvantages of data model?
o What are the advantages of data model?
o What are the limitations of hierarchical data model?
o Compare the network data model with hierarchical data model.
o Compare the network model with the relational model in terms of ease of learningand ease of
use?
o What is the difference between a spreadsheet and a database? List three differences between
them.
What do you understand by Data Redundancy?
Project-Based Questions:
Q1. Suppose you want to build a video site similar to YouTube. Consider each of the points listed
in Section 1.2 as disadvantages of keeping data in a file-processing system. Discuss the relevance
of each of these points to the storage of actual video data and to metadata about the video, such
as title, the user who uploaded it, tags, and which users viewed it.
Q2. Suppose the scenario of weather forecasting application, and we would like to store data,
time, day, hours and forecasting status (rainy, dry, sunny, etc.). Discuss the following points
1. Types and structure of data we can use in this project.
2. Disadvantages in case of the file system.
3. Best Data model suitable for this scenario.
26
UNIT -1 Part-2
ER MODEL
Table of Contents
1. Introduction ................................................................................................................................. 1
2. Case Study ................................................................................................................................... 3
ER Modelling ..................................................................................................................... 4
Entities, Attributes and Relationships ............................................................................... 5
Constraints on Binary Relationship Types ............................................................................. 15
Specialization .................................................................................................................. 23
Generalization ................................................................................................................. 24
Attribute Inheritance....................................................................................................... 24
Constraints on generalization ......................................................................................... 25
Aggregation ..................................................................................................................... 26
Completion of ER Diagram – College Case Study ................................................................... 27
27
1. Introduction
Software applications are created to manage data and to help transform data into
information. A database design is an essential component of any application development
that supports and automates business processes. Database designing is a necessary step in
the process of creating any complex software. It helps developers understand the domain and
organize their work accordingly. Developing a concise and extensible database design is
critical for arriving at the right foundation plan while building a house.
The overall design of the database of an organization is called a schema. We will explain here
different phases in the overall design of the database. These phases are mainly divided based
on different levels of abstraction. The schemas defined at the design and implementation
levels are:
1. Conceptual schema
2. Logical schema
3. Physical schema
4. External level schema/View level schema/Subschema
The conceptual schema design is a high-level data model, providing a conceptual framework
to specify the data requirements of the database users and how the database will be
structured to fulfill these requirements. The database designer needs to interact extensively
with domain experts and users to carry out this task. The schema developed at the conceptual
design phase provides a detailed overview of the enterprise.
The logical schema defines the design of the database at the conceptual level of the data
abstraction. In this step, the conceptual schema is transformed from the high-level data
model into the implementation data model. It defines how the database should be
implemented using a specific DBMS. This schema defines all the logical constraints that need
to be applied to the data stored in the database.
The physical schema is the database design at the physical level of data abstraction. The
logical schema is mapped to the physical schema using RDMBS tools like Microsoft SQL Server,
Oracle SQL, or IBM's DB2. It describes how the data is organized in files, internal storage
structure, and access paths.
1
View Schema or external schema, also called subschema, defines the design of the
database at the view level of the data abstraction. It describes how an end-user will interact
with the database system. There are many view schema for a database system. Each view
schema defines the view of data for a particular group of people. It shows only those data
to a view group in which they are interested and hides the remaining details.
In the database design phases, data is represented by using a specific data model. The data
model is a collection of concepts or notations for describing data, data relationships, data
semantics, and data constraints. Most data models also include a set of basic operations for
manipulating data in the database.
Data modeling is a technique to define and organize the data requirements of an enterprise.
It allows creating a visual description of the business by analyzing, understanding, and
clarifying its data requirements. This chapter introduces Data Modelling, its development,
and concepts around it.
2
2. Case Study
The primary purpose of this case study is to design a database to construct a student
information management system in the department of student affairs – ABESEC to transform
the work in this department from manually to a computer-based system, which leads to
providing accuracy, efficiency, security, and so on.
User Requirement
After talking with the college employee who is responsible for the administration of the
department of Student Affairs at the college, it has been determined the required information
in the system. This determination is according to the actual needs of the college. The required
information for the student information management system is described as follows:
Student information
Faculty Information
Course information
Other information
In a College, there are several departments, and each department has one head of the
department (HOD). Department has a name, its location, and students that belong to the
department. A student can belong to only one department, and a department can have many
students. If a department has recently come into existence, it might not have any students.
Students have roll number, name, date of birth, gender, hobby, phone number, and address.
Faculty has a name, designation, date of joining, gender & salary and belongs to a particular
department. A department can run many courses with assigned credits, and students can
study any number of courses being offered by various departments. There are sections within
each department, and each section has many students. Each section has its name, maximum
capacity of that section, and the number of students in that section. Students need to do one
mini-project individually. Each available mini-project has to be chosen based on its name,
domain, subject, and description. Faculty members can teach multiple courses in multiple
departments. One course can be taught by many faculty members across departments.
Faculty members can be part of multiple research projects. These projects are either
sponsored by the government, industry, or the college itself. The faculty member can do one
or more projects, and one project can have more than one faculty member. Research projects
have a fixed duration, and their status needs to be tracked regularly.
3
After collecting the user requirement, just go through the entire story again and then break
the requirement story into well-defined parts to understand the requirements in a better
way.
In a College,
There are several departments, and each department has many faculties.
Out of the faculties from each department, one faculty is acting as Head of
Department.
Department has a name, Id, phone extension, specific mailing address, and Students
that belong to the department.
Students can belong to only one department at a time.
Department can have more than one or no student.
Students have their name, unique identification number, address, age, gender,
hobbies, and other information.
Faculty also have information similar to Students except for hobbies.
A student studies different Courses.
A department can run many courses, and a course can run in many departments.
Faculty teaches these Courses. Faculty can teach more than one course.
Department can run many sections.
Many students can be in one section, and each section has its name and the max
capacity.
A student must do one mini-project, and one mini-project can be opted by one student
only.
Faculty members can teach in multiple Departments.
Each course can be taught by many faculty members or no one.
Faculty members are also working on multiple research projects.
Research projects are either sponsored by the government, industry, or the college
itself.
One project can have more than one faculty member, and one faculty member can
work on more than one project.
After understanding the requirements, we will start designing the database. ER Diagram is the
first step of designing any database. Now we will understand the concept of ER Model.
ER Modelling
The entity-relationship (ER) data model is based on a perception of a real-world that consists
of a collection of basic objects, called entities, and of relationships among these objects. An
entity is a “thing” or “object” in the real world distinguishable from other objects. For
example, each person is an entity, and bank accounts can be considered as entities.
4
A set of attributes describes entities. For example, the attributes account-number and
balance may describe one particular account in a bank, and they form attributes of the
account entity. Similarly, attributes customer-name, customer-street address, and customer-
city may describe a customer entity.
The entity-relationship (ER) data model describes data as entities, attributes and
relationships. It was developed to facilitate database design by allowing the specification of
an enterprise schema, representing the overall logical structure of a database.
Here we will try to understand key concepts of ER Modelling and how to use them for
database design using a set of graphical symbols.
The database design created using these graphical symbols is called Entity-Relationship
Diagram or ER Diagram or simply ERD.
Entity
An entity is a real-world object which can be a person, place, or thing that can be uniquely
identified and distinguished from other objects. For example, every STUDENT has a unique
roll number. Every DEPARTMENT has a unique department code. Every COURSE has a unique
course code Every COUNTRY has unique country code, and every CAR has a unique
registration number. Here, STUDENT, DEPARTEMENT, COURSE, COUNTRY, and CAR are few
examples of entities.
If we cannot distinguish an item from another item, it is simply an object but not an entity.
E.g., each leaf on a tree is an object but not an entity since we cannot uniquely identify every
leaf on a tree.
Entities can be tangible or intangible. Tangible entities exist in the real world physically, e.g.,
STUDENT, CAR, etc. Intangible entities exist only logically and have no physical existence, e.g.,
COURSE, DEPARTMENT, etc.
Entities are represented in a rectangular box in an entity-relationship diagram.
Attributes
Every entity has a set of properties that describes the entity. For example, a student can have
attributes like roll number, first name, middle name, last name, date of birth, gender. A
department can have attributes like dept id, dept name, and CAR can have attributes like
registration number, make, model, color, price, etc.
5
Attributes are represented in oval shape attached to the entity.
Entity Type
An entity type defines the set of entities having a common set of attributes, e.g., all the
students in a college or university will have the same set of attributes, then STUDENT becomes
entity type.
Each entity type is defined by its name and attributes, as shown in the above image.
Entity Set
The collection of entities of a particular entity type in the database is called an entity set at
any point in time. It is also called the extension of the entity type. The entity set is usually
referred to using the same name as the entity type.
Example:
Entity Type: STUDENT
Entity Set:
6
Types of Attributes
The attributes of an entity can be different types
General Attribute or Attribute
Key Attribute
Multivalued attribute
Composite Attribute
Derived Attribute
Attributes
All the characteristics or properties defined for an entity are it’s attributes, e.g., the STUDENT
entity type can have the attributes like roll_no, first_name, middle_name, last_name, dob,
gender, etc.
Use Oval to represent an attribute.
Key Attributes
The attributes which help in the unique identification of entities are called key attributes.
Such attributes have unique values, e.g., roll_no of a STUDENT, registration_no of a CAR,
course_id of a COURSE, and dept_id of a DEPARTMENT are the key attributes.
While representing such attributes, we need to underline them.
Multivalued attribute
The attribute which can have multiple values for an entiry type is called a multivalued
attribute, e.g., a STUDENT can have multiple phone numbers using attribute phone_no.
Use double line oval to represent the multi-values attribute.
7
Figure 2.7 Representation of Multivalued Attribute (phone_no)
Composite Attribute
An attribute composed of multiple sub-attributes is called a composite attribute, e.g., the
address attribute can be divided into five sub-attributes house_no, street_name, city, state,
pin_code.
Use rounded brackets to represent such attributes inside the oval shape.
Derived Attribute
An attribute whose value is calculated from another attribute(s) is called a derived attribute,
e.g., the age attribute can also be derived from another attribute, dob (date of birth).
Use dotted line oval shape to represent a derived attribute.
8
Figure 2.9 Representation of Derived Attribute (age)
9
Weak Entity Type
An entity type may not have a key attribute(s) to identify each entity instance uniquely. Such
an entity type is termed a weak entity type. For a weak entity type to be meaningful, it must
be associated with another entity type, called the identifying or owner entity type. The weak
entity type is said to be existence dependent on the identifying (owner) entity type. The
identifying entity type is said to own the weak entity type that it identifies.
The relationship associating the weak entity type with the identifying entity type is called an
identifying relationship.
Although a weak entity type does not have a primary key, it normally has a partial key, which
is the attribute that can uniquely identify weak entities that are related to the same owner
entity. In the worst case, a composite attribute of all the weak entity’s attributes will be the
partial key. The partial key is sometimes called the discriminator. For example, we have a
weak entity type SECTION and the identifying (owner) entity type is DEPARTMENT. The partial
key of the weak entity type SECTION is the attribute section name since, for each department,
a section name uniquely identifies one single section for that department.
Though weak entity type is said to be existence dependent on the identifying entity type,
however, not every existence dependency results in a weak entity type. For example, the
PASSPORT entity type cannot exist unless it is related to a PERSON entity type, even though
it has its own key attribute (passport_no) and hence is not a weak entity type.
A double outlined rectangle represents a weak entiry type. The partial key is represented by
a dotted underline.
10
What is relationship?
When two or more entity types are linked together to manage the related information, such
linkage is called relationship, e.g., every STUDENT belongs to a DEPARTMENT. These two
entities have a relationship.
Use a diamond symbol to create the relationship between the entity types.
Relationship Type
When two or more entity types are linked together to manage the related information, then
such linkage is called a relationship type, e.g., every STUDENT belongs to a DEPARTMENT.
These two entities have a relationship type – "BELONGS TO"
Use the diamond symbol to create the relationship type between the entity types.
11
Relationship instance and a Relationship Set
A relationship instance is an instance that associates an entity instance from an entity type
to another entity instance of another entity type to establish a relationship among various
participating entity types.
A relationship set is the set of all relationship instances that participates in any relationship
type to define a relationship between various participating entity types.
12
Figure 2.15 ER diagram with a ternary relationship (COURSE ALLOCATION)
13
a relationship type in different roles. In such cases, the role name becomes essential for
distinguishing the meaning of each participating entity's role. Such relationship types are
called recursive relationships.
The WORKS FOR relationship type relates a faculty to a HOD, where both faculty and HOD
entities are members of the same FACULTY entity set. Every HOD is a faculty also. Hence, the
FACULTY entity type participates twice in WORKS FOR relationship type, once in the role of a
faculty and another in the role of a HOD.
A relationship type can also have attributes attached with it which are called descriptive
attributes.
Example:
COURSE ALLOCATION in
14
Figure 2.19 is done for a particular academic year and a semester. The figure below shows the
ER diagram with acad_yr & sem as descriptive attributes associated with the relationship type
COURSE ALLOCATION. Here acad_yr represents an academic year, and sem represents a
semester. This relationship states the department-wise allocation of course and faculty in the
particular academic year (acad_yr) and semester (sem).
Relationship types usually have certain constraints that limit the possible combination of
entities that may participate in the corresponding relationship set. These constraints are
determined from the real-world situation that the relationships represent. For example, a
student can be part of exactly one department, and then we would like to describe this
constraint in the database design. We will now look at two main types of binary relationship
constraints, cardinality ratio , and participation.
Cardinality Ratio
The cardinality ratio expresses the number of entities to which another entity can be
associated via a relationship set.
For a binary relationship type R between entity types A and B, the mapping cardinality must
be one of the following:
One-to-one. An entity in A is associated with at most one entity in B, and an entity in
B is associated with at most one entity in A.
15
Figure 2.20 Representation of One-To-One Cardinality
Example:
Students do mini-projects as part of the curriculum. “A student should opt only one mini-
project, and one mini-project can be opted by one student only". Therefore there is one-to-
one cardinality between two entity types, STUDENT and MINI PROJET, via OPTS relationship
type.
A student must opt for only one of the mini-projects from the MINI PROJECT entity type.
Hence the STUDENT entity set is associated with one cardinality limit to the MINI PROJECT
entity set via OPS relationship set. To represent this, "1" is put on the opposite side of the
STUDENT entity type (near MINI PROJECT entity type). Similarly, a mini-project will be
assigned to only one student. Hence MINI PROJECT entity set is also associated with one
cardinality limit to the STUDENT entity set, so “1” is put on the opposite side of MINI PROJECT
(near to STUDENT entity type).
16
Example:
“There are several departments, each department has many faculties, and one faculty can
be a part of only one department”. There is one-to-many cardinality between the
DEPARTMENT entity type and FACULTY entity type via HAS relationship type.
Each department has many faculty means that in one department, there can be several
faculty members. Therefore DEPARTMENT entity set is associated with many cardinality limit
to the FACULTY entity set via HAS relationship set. To represent this, "M" is put on the
FACULTY side (opposite to the DEPARTMENT entity type). A faculty can report to one
department only. Therefore, the FACULTY entity set is associated with one cardinality limit to
the DEPARTMENT entity set, so “1” is put near the DEPARTMENT entity type (opposite to the
FACULTY entity type).
Example:
“Many students can belong to one department at a time”, there is a many-to-one cardinality ratio
between STUDENT and DEPARTMENT entity type via BELONGS TO relationship type.
A student can belong to one department. Therefore, the STUDENT entity set is associated
with one cardinality limit to the DEPARTMENT entity set via BELONGS TO relationship set. To
represent this “1” is placed opposite to STUDENT entity type (near to DEPARTMENT entity
type). Similarly, a department can have many students; hence DEPARTMENT entity set is
17
associated with many cardinality limit to the STUDENT entity set, so “M” is placed opposite
the DEPARTMENT entity type (near to STUDENT entity type).
Example:
“A faculty member can take multiple courses, and multiple faculty members can take one
course”, there is a many-to-many cardinality between FACULTY and COURSE entity types via
TEACHES relationship type.
A faculty can teach multiple courses. Therefore, the FACULTY entity set is associated with
many cardinality limit to the COURSE entity set via TEACHES relationship set. To represent
this, “M” is placed opposite the FACULTY entity type (near COURSE entity type). Similarly, a
course can be taught by multiple faculty members; hence COURSE entity set is associated
with many cardinality limit to the FACULTY entity set, so “N” is placed opposite to the COURSE
entity type (near to FACULTY entity type).
18
Limit on Cardinality Ratio
ER diagrams also provide a way to indicate more complex constraints on the number of times
each entity participates in relationships in a relationship set. An edge between an entity set
and a binary relationship set can have an associated minimum and maximum cardinality. This
is shown in the form l..h, where l is the minimum and h the maximum cardinality.
Example:
Faculty are supposed to work on research projects. One faculty can work on Zero or a
maximum of three research projects. A research project can have a minimum of one and a
maximum of any number of faculty members.
The limit 0..3 on the line near the FACULTY entity type indicates that a faculty member can
work on 0 or a maximum of 3 research projects at a time. Similarly, limit 1..* on the line near
the RESEARCH PROJECT entity type indicates that a research project can have “1” or any
number of faculty members working in that research project.
The participation of an entity set E in a relationship set R is said to be total if every entity
instance in E participates in at least one relationship in R. If only some entities in E participate
in relationships in R, the participation of entity set E in relationship R is said to be partial. Total
participation is also called existence dependency.
In an ER diagram, total participation is shown by double lines, which indicates the total
participation of an entity set in a relationship set & partial participation is shown by a single
line which indicates the partial participation of an entity set in a relationship set.
Example:
Let's take the relationship between the DEPARTMENT entity type and the FACULTY entity
type. A faculty has to be part of a department; therefore, the participation of FACULTY entity
type in HAS relationship type is total and is shown by a double line. It indicates that for a
faculty to exist, it must be associated with a department. A department has partial
participation in HAS relationship type, as a department may not have even a single faculty
when it is just established. So a department can exist without any faculty for some time and
is shown by a single line in the ER diagram.
19
Figure 2.29 Entity relationship showing partial (single line) and total participation (double lines)
A weak entity type always has a total participation constraint (existence dependency) with
respect to its identifying entity type because a weak entiry type cannot exist without an owner
(identifying) entity type. The relationship associating the weak entity type with its identifying
entity type is called the identifying relationship type. Identifying relationship type is shown
by a double outlined diamond symbol.
Example:
The SECTION entity type is a weak entity type as it does not have any key attributes.
DEPARTMENT entity type helps to uniquely identify each entity instance of the SECTION entity
type.
Figure 2.30 Entity relationship showing identifying relationship (double outlined diamond)
Keys
A key is an attribute or a set of attributes of an entity type that help to uniquely identify an
entity instance in an entity set and is also used to establish relationships between the
different entity types.
When we convert the ER model into a database structure, all entity types in the ER model get
converted into Tables (or Relations). The entiry type attributes are converted into Table (or
Relation) columns and all the entity instances get converted into Table (or Relation) rows.
These Tables (or Relations) have different types of keys. Two important types of keys are
discussed here.
Primary Key or Composite Primary Key
Foreign Key
A Table column(s) with the following set of features can be designated as the primary key. A
Table can have one and only one primary key.
Its value can never be NULL.
Its value is always unique.
The value once provided cannot be changed.
20
Example:
The roll_no of a student is a primary key that helps in identifying each student uniquely.
The dept_id of a department is also a primary key that helps in uniquely identifying a
department.
If a Table does not have a single Table column that can identify Table rows uniquely, we have
to combine two or more Table columns to make the key.
Combining two or more Table columns that can be used to identify Table rows uniquely is
called the composite key or composite primary key.
Foreign Key
A primary key of a Table is called a foreign key to some other Table when that column is used
to relate the two Tables.
Example:
Every student belongs to one department. To know the student department, we need to add
dept_id to the STUDENT Table and create a relationship between DEPARTMENT and STUDENT
Tables. In such a case DEPARTMENT Table will be called a Referenced Table (Relation), and
the STUDENT Table will be called a Referencing Table (Relation).
The Table column used as a foreign key in a Table (Relation) must be a primary key in another
Table (Relation).
Detailed discussion on keys is covered in Chapter 3.
ER Diagram show relationship between DEPARTMENT and STUDENT
21
Converting the above ER diagram into Tables (Relations) and connecting them with
primary and foreign key
22
Extended E- R concepts
Although the basic ER concepts can model most database features, some aspects of a
database may be more aptly expressed by certain extensions to the basic ER model. This
section discusses the extended ER features of specialization, generalization, higher- and
lower-level entity types, attribute inheritance, and aggregation.
Specialization
An entity type may include subgroupings of distinct entities in some way from other entities
in the set. For instance, a subset of entities within an entity type may have attributes that are
not shared by all entity types. The ER model provides a means for representing these
distinctive entity groupings. Consider an entity type EMPLOYEE in the college, with attributes
employee id, first name, middle name, last name, gender, date of joining, designation,
salary. An employee may be further classified as one of the following:
faculty
staff
Each of these employee types is described by attributes that include all the attributes of entity
type employee plus possibly additional attributes. For example, the FACULTY entity type may
be described further by the attribute Ph.D. status, the number of research papers published.
In contrast, the STAFF entity type may be described further by the attribute technical or non-
technical. The process of designating subgroupings within an entity type is called
specialization. The specialization of an employee allows us to distinguish among persons
according to whether they are faculty or staff.
23
Generalization
The refinement from an initial entity type into successive levels of entity subgroupings
represents a top-down design process in which distinctions are made explicit. The design
process may also proceed in a bottom-up manner, in which multiple entity types are
synthesized into a higher-level entity type based on common features.
The database designer may have first identified a FACULTY entity type with the attributes
employee id, first name, middle name, last name, gender, date of joining, designation,
salary, Ph.D. status, number of research papers published, and a STAFF entity type with the
attributes employee id, first name, middle name, last name, gender, date of joining,
designation, salary, technical or non-technical. There are similarities between the FACULTY
entity type and the STAFF entity type because they have several attributes in common.
In our example, the EMPLOYEE is the higher-level entity type and FACULTY and STAFF are
lower-level entity types. Higher- and lower-level entity types also may be designated by the
terms superclass and subclass, respectively. The EMPLOYEE entity type is the superclass of
the FACULTY and STAFF subclasses. For all practical purposes, generalization is a simple
inversion of specialization. We will apply both processes, in combination, in the course of
designing the ER schema for an enterprise.
Attribute Inheritance
A crucial property of the higher- and lower-level entities created by specialization and
generalization is attribute inheritance. The attributes of the higher-level entity types are said
to be inherited by the lower-level entity types. For example, FACULTY and STAFF inherit the
attributes of the EMPLOYEE. Thus, FACULTY is described by its employee id, first name,
middle name, last name, gender, date of joining, designation, salary attributes, and
additionally, Ph.D. status, number of research papers published attributes. STAFF is
described by its employee id, first name, middle name, last name, gender, date of joining,
designation, salary attributes, and additionally technical or non-technical attribute. A lower-
level entity type (or subclass) also inherits participation in the relationship types in which its
higher-level entity (or superclass) participates.
Whether a given portion of an ER model was arrived at by specialization or generalization, the
outcome is the same:
A higher-level entity type with attributes and relationships that apply to all of its
lower-level entity types
Lower-level entity types with distinctive features that apply only within a particular
lower-level entity type
24
Constraints on generalization
There are certain constraints that database designers choose to put on a particular
generalization.
One type of constraint involves determining which entities can be members of a given lower-
level entity type. Such membership may be one of the following:
Condition-defined: In condition-defined constraints lower-level entity types, membership
is evaluated based on whether or not an entity satisfies an explicit condition or predicate.
For example, assume that the higher-level entity type EMPLOYEE has the attribute
employee-type. All employee entities are evaluated on the defining employee-type
attribute. Only those entities that satisfy the condition employee-type = “FACULTY” are
allowed to belong to the lower-level entity type FACULTY. All entities that satisfy the
condition employee-type = “STAFF” are included in STAFF. Since all the lower-level entities
are evaluated based on the same attribute (in this case, on account-type), this type of
generalization is said to be attribute-defined.
User-defined: User-defined lower-level entity types are not constrained by a membership
condition; rather, the database user assigns entities to a given entity type. For instance,
let us assume that employees are assigned to one of four work teams after three months
of employment. We, therefore, represent the teams as four lower-level entity types of
the higher-level EMPLOYEE entity type. A given employee is not assigned to a specific
team entity automatically based on an explicit defining condition. Instead, the user in
charge of this decision makes the team assignment on an individual basis. The assignment
is implemented by an operation that adds an entity to an entity type.
The second type of constraint relates to whether or not entities may belong to more than one
lower-level entity type within a single generalization. The lower-level entity types may be one
of the following:
Disjoint: A disjointness constraint requires that an entity belongs to no more than one
lower-level entity type. In our example, an EMPLOYEE entity can satisfy only one
condition; an entity can be either a faculty or a staff, but cannot be both.
Overlapping: In overlapping generalizations, the same entity may belong to more than
one lower-level entity type within a single generalization. For an illustration, consider the
employee work team example, and assume that certain managers participate in more
than one work team. Therefore, a given employee may appear in more than one of the
team entity types that are lower-level entity types of the EMPLOYEE. Thus, the
generalization is overlapping.
Lower-level entity overlap is the default case; a disjointness constraint must be placed
explicitly on a generalization (or specialization). We can note a disjointedness constraint in an
ER diagram by adding the word disjoint next to the triangle symbol.
The final constraint, the completeness constraint on a generalization or specialization,
specifies whether or not an entity in the higher-level entity type must belong to at least one
25
of the lower-level entity types within the generalization/specialization. This constraint may
be one of the following:
Total generalization or specialization. Each higher-level entity must belong to a lower-
level entity type.
Partial generalization or specialization. Some higher-level entities may not belong to
any lower-level entity type.
Partial generalization is the default. We can specify total generalization in an ER diagram using
a double line to connect the box representing the higher-level entity type to the triangle
symbol. (This notation is similar to the notation for total participation in a relationship.)
The employee generalization is total: All employee entities must be either be a faculty or staff.
Because the higher-level entity type arrived at through generalization is generally composed
of only those entities in the lower-level entity types, the completeness constraint for a
generalized higher-level entity type is usually total. When the generalization is partial, a
higher-level entity is not constrained to appear in a lower-level entity type. The work team
entity types illustrate a partial specialization. Since employees are assigned to a team only
after three months on the job, some employee entities may not be members of any lower-
level team entity types.
Aggregation
One limitation of ER modeling is that it cannot express the relationships among relationships.
To illustrate the need for this, aggregation is used. Aggregation is an abstraction through
which relationships with its corresponding entities are aggregated into a higher-level entity.
As shown in above Figure 2.34, A department has many sections and a student needs to be
seated in a section of the enrolled department. So, SEATED IN relationship is needed between
the relationship HAS and entity type STUDENT. Using aggregation, HAS relationship with its
entity types DEPARTMENT and SECTION is aggregated into a single entity and the relationship
SEATED IN is created between the aggregated entity and STUDENT entity type.
26
Completion of ER Diagram – College Case Study
As we now understand the requirements from the user (pls. refer 2.1), we will develop the ER
diagram of the requirements we have understood.
Below are the four steps in designing an ERD for a DBMS.
1. Identify Entity
2. Decide relationships and cardinality
3. Draw Entities and relationships separately
4. Connect relationships and entities
Identify Entity
Identifying nouns in the below statement and make them bold Italic characters.
In a College, there are several departments, and each department has one head of the
department (HOD). Department has a name, its location, and students that belong to the
department. A student can belong to only one department, and a department can have many
students. If a department has recently come into existence, it might not have any students.
Students have roll number, name, date of birth, gender, hobby, phone number, and address.
Faculty has a name, designation, date of joining, gender & salary and belongs to a particular
department. A department can run many courses with assigned credits, and students can
study any number of courses being offered by various departments. There are sections within
each department, and each section has many students. Each section has its name, maximum
capacity of that section, and the number of students in that section. Students need to do one
mini-project individually. Each available mini-project has to be chosen based on its name,
domain, subject, and description. Faculty members can teach multiple courses in multiple
departments. One course can be taught by many faculty members across departments.
Faculty members can be part of multiple research projects. These projects are either
sponsored by the government, industry, or the college itself. The faculty member can do one
or more projects, and one project can have more than one faculty member. Research projects
have fixed duration, and their status needs to be tracked regularly.
27
department. A department can run many courses with assigned credits and students can
study any number of courses being offered by various departments. There are sections within
each department, and each section has many students. Each section has its name, max
capacity of that section, and the number of students in that section. Students need to do one
mini-project individually. Each available mini-project has to be chosen based on its name,
domain, subject, and description. Faculty members can teach multiple courses in multiple
departments. One course can be taught by many faculty members across departments.
Faculty members can be part of multiple research projects. These projects are either
sponsored by the government, industry, or the college itself. The faculty member can do one
or more projects, and one project can have more than one faculty member. Research projects
have fixed duration, and their status needs to be tracked regularly.
A student has a roll number, first name, middle name, last name, date of birth, hobby, gender,
phone numbers, and address.
Faculty has faculty id, first name, middle name, last name, gender, date of joining,
designation, and salary.
28
Figure 2.36 FACULTY entity type and its attributes
Research Project has research project id, research project name, sponsoring agency duration,
and status.
29
Section has name, max capacity and the number of students in that section.
Mini project has project id, project name, domain subject and description.
Statement – A faculty can teach zero or more courses and course can be taught by zero or
more faculty.
Relationship- Teaches
Cardinality- M : N
Participation- Partial for FACULTY entiry type (a new faculty might not have been allocated a
course yet) and Partial for COURSE entiry type (a new course has not been opted by any
faculty yet).
Statement –A student belongs to one department, and a department can have zero or more
students
Relationship- Belongs To
Cardinality- M : 1
Participation- Total for STUDENT entiry type (a student has to belong to a department to
exist) and Partial for DEPARTMENT entiry type (a new department might not have any student
yet).
30
Figure 2.43 Representation of Strong and Partial relationship between STUDENT and DEPARTMENT
Statement – A department can have zero or more sections; a section must belong to a
department. SECTION is a weak entity type as it does not have any prime attribute. The same
section name can exist in all departments. Therefore the relationship between department
and section is an identifying relationship. A section is identified by it’s department.
Relationship- Has (Identifying Relationship)
Cardinality- 1 : M
Participation- Partial for DEPARTMENT entiry type (a new department might not have
sections created yet) and Total for SECTION entiry type (section has no existence without
department).
Figure 2.44 Representation of Partial and Total relationship between SECTION and DEPARTMENT
Statement –A department can have zero or more faculty. Every faculty must be part of only
one department.
Relationship- Has
Cardinality- 1 : M
Participation- Partial for DEPARTMENT entiry type (a new department might not have any
faculty yet) and Total for FACULTY entiry type (a faculty has to belong to a department to
exist).
Figure 2.45 Representation of Total and Partial relationship between FACULTY and DEPARTMENT
Statement – Each student has to do one mini project individually and every available mini-
project has to be opted by a student.
Relationship- Opts
Cardinality- 1 : 1
Participation- Total for STUDENT entiry type (a student has to opt a mini-project) and Total
for MINI PROJECT entiry type (a mini project has to be opted by a student).
31
Figure 2.46 Representation of Total relationship between STUDENT and MINI PROJECT
Statement – A department can run zero or more courses, and each course must belong to
only one department.
Relationship- Runs
Cardinality- 1 : M
Participation- Partial for DEPARTMENT entiry type (a new department might not be running
any course yet) and Total for COURSE entiry type (a course has to belong to a department to
exist).
Figure 2.47 Representation of Total and Partial relationship between COURSE and DEPARTMENT
Statement – A section can have zero or more students, and each student must belong to any
one section.
Relationship- Seated In
Cardinality- M : 1
Participation- Total for STUDENT entiry type (student has to be seated in a section) and Partial
for SECTION entiry type (a new section might not have been allocated a student yet).
Figure 2.48 Representation of Total and Partial relationship between STUDENT and SECTION
Statement – Each student studies one or more courses as part of the curriculum, and each
course can be opted by zero or more students.
Relationship- Studies
Cardinality- M : N
Participation- Total for STUDENT entiry type (a student has to study a course) and Partial for
COURSE entiry type (a new course might not have been opted by a student yet).
Figure 2.49 Representation of Total and Partial relationship between STUDENT and COURSE
Statement – A faculty may work on many research projects, and each research must have
one or more faculty are working on it.
Relationship- Works On
Cardinality- M : N
32
Participation- Partial for FACULTY entiry type (a faculty may work on a research project) and
Total for RESEARCH PROJECT entiry type (each research project needs to have faculty working
on it).
Figure 2.50 Representation of Total and Partial relationship between FACULTY and RESEARCH PROJECT
Statement –Faculty will have a HOD who is a faculty himself/herself. A HOD can have one or
more faculty under him/her.
Relationship- Works For
Cardinality- M : 1
Participation- Total for HOD Role (HOD will have faculty under him/her) and Partial for
Subordinate Role (The faculty who is a HOD will not have a HOD for him/her).
Figure 2.51 Representation of Total and Partial in FACULTY relationship as different Role
Complete ER Diagram-
33
Figure 2.51a Complete ER diagram of Case Study
A Relational database is the type of database based on the relational model proposed by E.F.
Codd in 1970. A relational database is a collection of Relations. E.F. Codd has originally
described a Relation as a set of tuples/records (t1, t2,…, tn) where each element ti belongs to
a domain Di. The domain is a set of values that an attribute can hold. For example, the domain
for attribute roll_no of an entity type STUDENT is the set of all roll numbers. A relational
database of any organization or what we can call a mini-world stores information about all its
entity types in Relations. These Relations are also called Tables and each Relation is assigned
a unique name in the database. A Relation or Table is shown in Figure 2.52, in which a row
keeps the record of an entity instance and each column holds the values of an attribute for
the complete entity set. A row in a Relation represents the relationship among values of
different attributes for an entity instance. A software used to manage the relational database
is called a Relational Database Management System (RDBMS).
A Table as shown in Figure 2.53 shows a Relation named STUDENT. It keeps a record of five
students in five rows, which are also called tuples. This Relation has eleven attributes named
roll_no, first_name, middle_name, last_name, dob, gender, house_no, street_name, city,
state, pin-code.
34
Figure 2.53 The STUDENT Relation (6 entity instances and 11 attributes)
1. Relation schema
A Relation schema describes the Relation (Table) structure as it includes the Relation name,
names of attributes.
For example, the Relation schema of two Relations FACULTY and COURSE can be written as:
FACULTY (faculty_id, first_name, middle_name, last_name, designation, doj, gender,
salary)
COURSE (course_id, course_name, course_credit)
The degree or arity of a Relation is the number of attributes in a Relation schema. For
example, the degree of Relation schema FACULTY is eight, and the degree of Relation schema
COURSE is three.
2. Relation instance
The snapshot of data stored in a Relation at any instant of time is called a Relation
For example, an instance of the Relation STUDENT is shown in
Figure 2.
35
Figure 2.54 A Relational Database that stores information of FACULTY, STUDENT, COURSE and DEPARTMENT
(An instance of relational database: one possible state).
The collection of data stored in the database at any instant of time is called a database
instance. The overall design or structure of the database is called the database
relational database schema is the collection of relation schemas and is shown in
Figure 2.42
36
or it can also be represented as:
After designing the conceptual model of the database using an ER diagram, we need to
convert the conceptual model into the relational model, which can be implemented using
any RDBMS, like Oracle SQL, MySQL etc.
Let’s see what a Relational Model is.
Relational Model
The relational model represents how data is stored in Relational Databases. A relational
database stores data in the form of Relations (Tables). In the relational model, entity types
and their corresponding relationships are represented by a collection of inter-related Tables.
Each Table is a group of columns and rows, where a column represents an attribute of an
entity type and rows represent its instance.
Following rules are used for converting an ER diagram into the relational tables:
For each regular (strong) entity type E in the ER schema, create a Relation R that includes all
the simple attributes of E.
A regular (strong) entity type with only simple attributes will be converted into a single
Relation (Table).
Attributes of the strong entity type will become the fields (column) of Relation with
their respective data types.
The key attribute of the entity type will become the primary key of the Table.
37
For Example:-
For each weak entity type W in the ER schema with owner entity type E, create a Relation R,
and include all simple attributes (or simple components of composite attributes) of W as
attributes of R. In addition, have as foreign key attributes of R, the primary key attribute(s) of
the Relation(s) that correspond to the owner entity type(s). This takes care of the identifying
relationship type of W. The primary key of R is the combination of the primary key(s) of the
owner(s) and the partial key of the weak entity type W.
Figure 2.59 Weak entity type SECTION and its owner entity type DEPARTMENT
As shown in ER diagram in Figure 2.59, SECTION is a weak entity type and its owner entity
type is DEPARTMENT; the Relation schema of SECTION will be represented as follows:
38
SECTION (section_name, max_capacity, student_nos, dept_id)
In this example, section_name is a partial key attribute of a weak entity type SECTION, and
dept_id is the primary key attribute of a strong entity type DEPARTMENT. We will create a
foreign key constraint on the SECTION schema referencing the primary key attribute of the
DEPARTMENT schema. The combination of dept_id and section_name will form the primary
key of Relation SECTION.
Multivalued Attributes
For each multivalued attribute A, create a new Relation R. This Relation R will include an
attribute corresponding to A, plus the primary key attribute K—as a foreign key in R—of the
Relation representing the entity type relationship type that has A as an attribute. The primary
key of R is the combination of A, and K. For example, phone_no and hobby attributes of
STUDENT entity type are multivalued attributes which mean a student can have multiple
phone numbers and multiple hobbies.
In this case, we need to create a separate Relation student_phone_no to hold all the phone
numbers related to a student. In this new Relation, roll_no from the STUDENT Relation will
be included as a foreign key to map different phone numbers of a student to his/her roll_no.
The combination of roll_no and phone_no will be the primary key of Relation
student_phone_no. We will similarly handle the hobby attribute of the STUDENT entity type.
39
Figure 2.46 Representation of Multivalued attribute (Schema)
Composite Attributes
For each binary 1:1 relationship type R in the ER schema, identify the Relations S and T
corresponding to the entity types participating in R.
40
1. Foreign key approach: Choose one of the Relations say S and include as a foreign key
in S the primary key of T. It is better to choose an entity type with total participation
in R in the role of S. Include all the simple attributes (or simple components of
composite attributes) of the 1:1 relationship type R as attributes of S.
For example, in our case study – Each student has to do one mini-project individually
and every available mini-project has to be opted by a student.
Relationship- Opts
Cardinality- 1 : 1
Relation Schema: MINI_PROJECT (miniproj_id, domain, subject, description, roll_no)
Note that it is possible to include the primary key of S as a foreign key in T instead.
Both entity types have total participation in the relationship in this case. So alternate
way is that we can add the miniproj_id attribute in the Relation schema of STUDENT
as a foreign key attribute referencing the Relation schema of Mini_Project.
2. Merged relation approach: An alternative mapping of a 1:1 relationship type is to
merge the two entity types and the relationship into a single Relation. This is possible
when both participations are total, as this would indicate that the two Tables will have
the exact same number of tuples at all times.
3. Cross-reference or relationship relation approach: The third option is to set up a third
Relation R to cross-reference the primary keys of the two Relations S and T
representing the entity types. As we will see, this approach is required for binary M:
N relationships. The Relation R is called a relationship relation (or sometimes a lookup
Table) because each tuple in R represents a relationship instance that relates one tuple
from S with one tuple from T. The Relation R will include the primary key attributes of
S and T as foreign keys to S and T. The primary key of R will be one of the two foreign
keys, and the other foreign key will be a unique key of R. The drawback is having an
extra Relation and requiring extra join operations when combining related tuples from
the Tables.
41
2. The cross-reference or relationship relation approach.
1. The foreign key approach: For each regular binary 1:N relationship type R, identify the
Relation S representing the participating entity type at the N-side of the relationship
type. Include as foreign key in S the primary key of the Relation T representing the
other entity type participating in R; we do this because each entity instance on the N-
side is related to at most one entity instance on the 1-side of the relationship type.
Include any simple attributes (or simple components of composite attributes) of the
1:N relationship type as attributes of S.
For example, in our case study – A department can run zero or more courses and each
course must belong to only one department.
Relationship- Runs
Cardinality- 1: M
Relation Schema: COURSE (course_id, course_name, course_credit, dept_id)
In the traditional relational model with no multivalued attributes, the only option for M:N
relationships is the relationship relation (cross-reference) option. For each binary M:N
relationship type R, create a new Relation S to represent R. Include as foreign key attributes
in S the primary keys of the Relations that represent the participating entity types; their
combination will form the primary key of S. Also include any simple attributes of the M: N
relationship type (or simple components of composite attributes) as attributes of S. Notice
that we cannot represent an M:N relationship type by a single foreign key attribute in one of
the participating Relations (as we did for 1:1 or 1:N relationship types) because of the M:N
cardinality ratio; we must create a separate relationship Relation S.
42
For example, in our case study – A faculty can teach zero or more courses and course can be
taught by zero or more faculty.
Relationship- Teaches
Cardinality- M : N
Relation Schema: FACULTY_COURSE (faculty_id, course_id)
43
Complete Relational Model of Case Study
Figure 2.68 Representation of various Tables and the relationship between them
45
Appendix 1: Various Notations to representation ER Model
46
Appendix 2: ER Diagram of Case Study in Crow’s Foot notation
47