0% found this document useful (0 votes)
26 views

DBMS Unit1 Notes

Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views

DBMS Unit1 Notes

Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 40

Database Management System

UNIT –I
Database System Applications
A Historical Perspective
The concept of a database was made possible by the emergence of direct access
storage media such as magnetic disks, which became widely available in the mid-1960s;
earlier systems relied on sequential storage of data on magnetic tape. The subsequent
development of database technology can be divided into three eras based on data model or
structure: navigational, SQL/relational, and post-relational.

The two main early navigational data models were the hierarchical model and
the CODASYL model (network model). These were characterized by the use of pointers
(often physical disk addresses) to follow relationships from one record to another.

The relational model, first proposed in 1970 by Edgar F. Codd, departed from
this tradition by insisting that applications should search for data by content, rather than by
following links. The relational model employs sets of ledger-style tables, each used for a
different type of entity. Only in the mid-1980s did computing hardware become powerful
enough to allow the wide deployment of relational systems (DBMSs plus applications). By
the early 1990s, however, relational systems dominated in all large-scale data
processing applications, and as of 2018 they remain dominant: IBM Db2, Oracle, MySQL,
and Microsoft SQL Server are the most searched DBMS. The dominant database language,
standardized SQL for the relational model, has influenced database languages for other data
models.

Object databases were developed in the 1980s to overcome the disadvantage


of object–relational DB , which led to the coining of the term "post-relational" and also the
development of hybrid object–relational databases.

The next generation of post-relational databases in the late 2000s became known
as NoSQL databases, introducing fast key–value stores and document-oriented databases. A
competing "next generation" known as NewSQL databases attempted new implementations
that retained the relational/SQL model while aiming to match the high performance of
NoSQL compared to commercially available relational DBMSs.
1960s, Navigational DBMS
The introduction of the term database coincided with the availability of direct-
access storage (disks and drums) from the mid-1960s onwards. The term represented a
contrast with the tape-based systems of the past, allowing shared interactive use rather than
daily batch processing. The Oxford English Dictionary cites a 1962 report by the System
Development Corporation of California as the first to use the term "data-base" in a specific
technical sense.

As computers grew in speed and capability, a number of general-purpose


database systems emerged; by the mid-1960s a number of such systems had come into
commercial use. Interest in a standard began to grow, and Charles Bachman, author of one
such product, the Integrated Data Store (IDS), founded the Database Task Group
within CODASYL, the group responsible for the creation and standardization of COBOL.

1970s, Relational DBMS


Edgar F. Codd worked at IBM in San Jose, California, in one of their offshoot
offices that were primarily involved in the development of hard disk systems. He was
unhappy with the navigational model of the CODASYL approach, notably the lack of a
"search" facility. In 1970, he wrote a number of papers that outlined a new approach to
database construction that eventually culminated in the groundbreaking A Relational Model
of Data for Large Shared Data Banks.
In this paper, he described a new system for storing and working with large
databases. Instead of records being stored in some sort of linked list of free-form records as in
CODASYL, Codd's idea was to organize the data as a number of "tables", each table being
used for a different type of entity. Each table would contain a fixed number of columns
containing the attributes of the entity. One or more columns of each table were designated as
a primary key by which the rows of the table could be uniquely identified; cross-references
between tables always used these primary keys, rather than disk addresses, and queries would
join tables based on these key relationships, using a set of operations based on the
mathematical system of relational calculus (from which the model takes its name). Splitting
the data into a set of normalized tables (or relations) aimed to ensure that each "fact" was
only stored once, thus simplifying update operations. Virtual tables called views could present
the data in different ways for different users, but views could not be directly updated.
Codd used mathematical terms to define the model: relations, tuples, and
domains rather than tables, rows, and columns. The terminology that is now familiar came
from early implementations. A common use of a database system is to track information
about users, their name, login information, various addresses and phone numbers. In the
relational approach, the data would be normalized into a user table.
Integrated approach
In the 1970s and 1980s, attempts were made to build database systems with
integrated hardware and software. The underlying philosophy was that such integration
would provide higher performance at a lower cost. Examples were IBM System/38, the early
offering of Teradata, and the Britton Lee, Inc. database machine.
Late 1970s, SQL DBMS
IBM started working on a prototype system loosely based on Codd's concepts
as System R in the early 1970s. The first version was ready in 1974/5, and work then started
on multi-table systems in which the data could be split so that all of the data for a record
(some of which is optional) did not have to be stored in a single large "chunk". Subsequent
multi-user versions were tested by customers in 1978 and 1979, by which time a
standardized query language – SQL – had been added.
1980s, on the Desktop
The 1980s ushered in the age of desktop computing. The new computers
empowered their users with spreadsheets like Lotus 1-2-3 and database software like dBASE.
The dBASE product was lightweight and easy for any computer user to understand out of the
box. C. Wayne Ratliff, the creator of dBASE, stated: "dBASE was different from programs
like BASIC, C, FORTRAN, and COBOL in that a lot of the dirty work had already been
done. The data manipulation is done by dBASE instead of by the user, so the user can
concentrate on what he is doing, rather than having to mess with the dirty details of opening,
reading, and closing files, and managing space allocation. dBASE was one of the top selling
software titles in the 1980s and early 1990s.
1990s, Object-Oriented
The 1990s, along with a rise in object-oriented programming, saw a growth in
how data in various databases were handled. Programmers and designers began to treat the
data in their databases as objects. That is to say that if a person's data were in a database, that
person's attributes, such as their address, phone number, and age, were now considered to
belong to that person instead of being extraneous data. This allows for relations between data
to be related to objects and their attributes and not to individual fields.
2000s, NoSQL and NewSQL
XML databases are a type of structured document-oriented database that allows
querying based on XML document attributes. XML databases are mostly used in applications
where the data is conveniently viewed as a collection of documents, with a structure that can
vary from the very flexible to the highly rigid: examples include scientific articles, patents,
tax filings, and personnel records.
NoSQL databases are often very fast, do not require fixed table schemas, avoid
join operations by storing denormalized data, and are designed to scale horizontally.
In recent years, there has been a strong demand for massively distributed
databases with high partition tolerance, but according to the CAP theorem, it is impossible
for a distributed system to simultaneously provide consistency, availability, and partition
tolerance guarantees. A distributed system can satisfy any two of these guarantees at the same
time, but not all three. For that reason, many NoSQL databases are using what is
called eventual consistency to provide both availability and partition tolerance guarantees
with a reduced level of data consistency.
NewSQL is a class of modern relational databases that aims to provide the same
scalable performance of NoSQL systems for online transaction processing (read-write)
workloads while still using SQL and maintaining the ACID guarantees of a traditional
database system.

Difference between File System and DBMS:


Basis File System DBMS

The file system is software that


manages and organizes the files in a DBMS is software for managing the database.
Structure storage medium within a computer.

Redundant data can be present in a


In DBMS there is no redundant data.
Data Redundancy file system.

It doesn’t provide backup and It provides backup and recovery of data even if it is
Backup and Recovery recovery of data if it is lost. lost.

There is no efficient query processing


Efficient query processing is there in DBMS.
Query processing in the file system.

There is less data consistency in the There is more data consistency because of the
Consistency file system. process of normalization.

It is less complex as compared to It has more complexity in handling as compared to


Complexity DBMS. the file system.
Basis File System DBMS

File systems provide less security in DBMS has more security mechanisms as compared
Security Constraints comparison to DBMS. to file systems.

It has a comparatively higher cost than a file


It is less expensive than DBMS.
Cost system.

Data Independence There is no data independence. In DBMS data independence exists.

Only one user can access data at a


Multiple users can access data at a time.
User Access time.

The user has to write procedures for


The user not required to write procedures.
Meaning managing databases

Data is distributed in many files. So,


Due to centralized nature sharing is easy
Sharing not easy to share data

It give details of storage and


It hides the internal details of Database
Data Abstraction representation of data

Integrity Constraints are difficult to


Integrity constraints are easy to implement
Integrity Constraints implement

Example Cobol, C++ Oracle, SQL Server

What are Data Models in DBMS?


Data models in DBMS help to understand the design at the conceptual, physical, and logical levels as it
provides a clear picture of the data making it easier for developers to create a physical database.

Data models are used to describe how the data is stored, accessed, and updated in a DBMS.

There are many types of data models that are used in the industry.
Types of Data Models in DBMS
1) Hierarchical model
In hierarchical model, data is organized into a tree like structure with each
record is having one parent record and many children. The main drawback of this model is
that, it can have only one to many relationships between nodes.
Lets say we have few students and few courses and a course can be assigned to a
single student only, however a student take any number of courses so this relationship
becomes one to many.

Advantages of the hierarchical model :


 As the database is based on this architecture the relationships between various
layers are logically simple so, it has a very simple hierarchical database structure.
 It has data sharing as all data are held in a common database data and therefore
sharing of data becomes practical.
 It offers data security and this model was the first database model that offered
data security.
 There’s also data integrity as it is based on the parent-child relationship and also
there’s always a link between the parents and the child segments.
Disadvantages of the hierarchical model :
 Even though this model is conceptually simple and easy to design at the same time it is
quite complex to implement.
 This model also lacks flexibility as the changes in the new tables or segments often
yield very complex system management tasks. Here, a deletion of one segment can
lead to the involuntary deletion of all segments under it.
 It has no standards as the implementation of this model does not provide any specific
standard.
 It is also limited as many of the common relationships do not conform to the 1 to N
format as required by the hierarchical model.

2) Network model

 The network model is better than a hierarchical model.


 Supports many to many relationships.
 Many parents can have many children.
 Many children can have many parents (as shown in the figure).
 Entities are represented as a connected network with each other.
 Represented as a network and one child can have more than one parent. This
model represents a complex structure.
 Entities can have multiple parent entities and lead to a complex structure.
 High performance
 Query facility is not available in the network model.

Advantages of the network model

 Sometimes, the network model is used to build computer network systems.


 The network model is considered as an enhancement to the hierarchical database model.
 It is fast data access with a network model.

Disadvantages of a network model

 The network model is a very complex database model, so the user must be very familiar with
the overall structure of the database.
 Updating the database is a quite difficult and boring task. We need the help of the application
programs that are being used to navigate the data.

3) Entity-Relationship Model
Entity-Relationship (ER) Model is based on the notion of real-world entities and relationships
among them. While formulating real-world scenario into the database model, the ER Model
creates entity set, relationship set, general attributes and constraints.
ER Model is best used for the conceptual design of a database.
ER Model is based on −
 Entitiesand their attributes.
 Relationships among entities.

These concepts are explained below.

 Entity − An entity in an ER Model is a real-world entity having properties called attributes.


Every attribute is defined by its set of values called domain. For example, in a school
database, a student is considered as an entity. Student has various attributes like name, age,
class, etc.
 Relationship − The logical association among entities is called relationship. Relationships
are mapped with entities in various ways. Mapping cardinalities define the number of
association between two entities.
Mapping cardinalities −
o one to one
o one to many
o many to one
o many to many

Advantages
1. SIMPLE : It is simple to draw an ER diagram when we know entities and
relationships.
2. EASY TO UNDERSTAND : The design of ER is very logical and hence they
are easy to design and understand.
3. INTEGRATED : The ER Model can be easily integrated with relational
model.
4. USEFUL IN DECISION MAKING : By drawing an ER-Diagram we come to
know what kind of attributes and relationship exist between them.
5. EASY CONVERSION : It can be easily converted to other type of models.

Disadvantages
1. LOSS OF INFORMATION: While drawing an ER Model some of the
information can be hidden or lost.
2. NO REPRESENTATION FOR DATA MANIPULATION: It is not possible
to represent data manipulation(commands like insert(),delete(),alter(),update()) in ER
model.
3. DATA INCONSISTENCY: Due to improper Normalization some data
inconsistency may occur so, while creating an ER diagram at least it should be in third
normal form.

4) Relational Model
The relational model in DBMS is an abstract model used to organize and manage the data
stored in a database. It stores data in two-dimensional inter-related tables, also known as
relations in which each row represents an entity and each column represents the properties of
the entity.
The main highlights of this model are −
 Data is stored in tables called relations.
 Relations can be normalized.
 In normalized relations, values saved are atomic values.
 Each row in a relation contains a unique value.
 Each column in a relation contains values from a same domain.
Relational Model Concepts

 Relation : Two-dimensional table used to store a collection of data elements.


 Tuple : Row of the relation, depicting a real-world entity.
 Attribute/Field : Column of the relation, depicting properties that define the relation.
 Attribute Domain : Set of pre-defined atomic values that an attribute can take i.e., it
describes the legal values that an attribute can take.
 Degree : It is the total number of attributes present in the relation.
 Cardinality : It specifies the number of entities involved in the relation i.e., it is the total
number of rows present in the relation.

 Relational Schema : It is the logical blueprint of the relation i.e., it describes the design
and the structure of the relation. It contains the table name, its attributes, and their types:

TABLE_NAME(ATTRIBUTE_1 TYPE_1, ATTRIBUTE_2 TYPE_2, ...)

For our Student relation example, the relational schema will be:

STUDENT(ROLL_NUMBER INTEGER, NAME VARCHAR(20), CGPA FLOAT)

 Relational Instance : It is the collection of records present in the relation at a given time.
 Relation Key : It is an attribute or a group of attributes that can be used to uniquely
identify an entity in a table or to determine the relationship between two tables. Relation
keys can be of 6 different types:
1. Candidate Key
2. Super Key
3. Composite Key
4. Primary Key
5. Alternate Key
6. Foreign Key

Highlights:

1. A Relation is a collection of rows (tuples) and columns (attributes).


2. In a relation, the tuples depicts real-world entity, while the attributes are the properties
that define the relation.
3. Structure of the relation is described by the relational schema.
4. Relational keys are used to uniquely identify a row in a table or to determine the
relationship between two tables.

Constraints in Relational Model

Relational models make use of some rules to ensure the accuracy and accessibility of the
data. These rules or constraints are known as Relational Integrity Constraints.

These constraints are checked before performing any operation like insertion, deletion, or
updation on the data present in a relational database. These constraints include:

 Domain Constraint : It specifies that every attribute is bound to have a value that lies
inside a specific range of values.

 Key Constraint : It states that every relation must contain an attribute or a set of
attributes (Primary Key) that can uniquely identify a tuple in that relation. This key can
never be NULL or contain the same value for two different tuples.
 Referential Integrity Constraint : It is defined between two inter-related tables. It
states that if a given relation refers to a key attribute of a different or same table.

Anomalies in Relational Model

When we notice any unexpected behavior while working with the relational databases, there
may be a presence of too much redundancy in the data stored in the database. This can
cause anomalies in the DBMS and it can be of various types such as:
 InsertionAnomalies: It is the inability to insert data in the database due to the absence of
other data. For example: Suppose we are dividing the whole class into groups for a project
and the GroupNumber attribute is defined so that null values are not allowed. If a new student
is admitted to the class but not immediately assigned to a group then this student can't be
inserted into the database.

 Deletion Anomalies - It is the accidental loss of data in the database upon deletion of any
other data element. For example: Suppose, we have an employee relation that contains the
details of the employee along with the department they are working in. Now, if a department
has one employee working in it and we remove the information of this employee from the
table, there will be the loss of data related to the department also. This can lead to data
inconsistency.

 Modification/Update Anomalies - It is the data inconsistency that arises from updation of


data in the database. For example: Suppose, while updating the data into the database
duplicate entries were entered. Now, if the user does not realize that the data is stored
redundantly after updation, there will be data inconsistency in the database.

Advantages

 Manageability - Because of the independent nature of each relation in a relational


database, it is easy to manipulate and manage. This improves the performance of the
database.
 Query capability - With the introduction of relational algebra, relational databases
provide easy access to data via high-level query language like SQL.
 Data integrity - With the introduction and implementation of relational constraints, the
relational model can maintain data integrity in the database.

Disadvantages

 The performance of the relational model depends upon the number of relations present in
the database.
 Hence, as the number of tables increases, the requirement of physical memory increases.
 The structure becomes complex and there is a decrease in the response time for the
queries.
 Because of all these factors, the cost of implementing a relational database increase.
5) Object Oriented Data Model
The foundation of any object-oriented programming language is based on the concept of
objects and classes. This enables the user to achieve abstraction, inheritance, polymorphism,
and encapsulation. A similar concept is used in the Object Oriented Model in DBMS.

The Object-Oriented Model in DBMS or OODM is the data model where data is stored in the
form of objects. This model is used to represent real-world entities. The data and data
relationship is stored together in a single entity known as an object in the Object Oriented
Model. The Object-Oriented Database Management System is built on top of Object Oriented
Model.

As we have discussed earlier, we can use the Object Oriented Model in DBMS to store real-
world entities. Here, we can store pictures, audio, video, and other types of data.

Example

 Here Transport, Bus, Ship, and Plane are objects.


 Bus has Road Transport as the attribute.
 Ship has Water Transport as the attribute.
 Plane has Air Transport as the attribute.
 The Transport object` is the base object and the Bus, Ship, and Plane objects derive from
it.

Take a look at another Example :


As you can see, here Student and Department are two different objects. Each one of
them has its attributes and methods. They are linked by a common attribute
Dept_no which establishes a relationship between objects.

Components of Object-Oriented Data Model:

Components of the Object-Oriented Data Model namely objects, classes, object


attributes, class hierarchy, etc., are explained as follows- Object- It is a physical or
a real-world entity. A single instance of an entity is defined by the object. It is
known as an 'instance of a class.

Object attribute- The objects have certain characteristics. These are known as
the attributes of the object.

Object method- The object's behavior is shown using object methods.

Class- It is a collection of similar kinds of objects. It is an entity that has attributes


and methods together.

Inheritance- It is the ability of the object within the class hierarchy to inherit the
attributes and methods of the classes above it. A new class can be derived from an
existing class, the new class has the attributes and methods described in the
existing class and also has its attributes and methods. This helps in code
reusability.

Advantages of Object-Oriented Data Model

 It is used to add semantic content that can be well understood by the user.
 It is used to make the code resemble real-world objects.
 Database integrity can be achieved.
 Structural and database independence is created.
 We can store pictures, audio, video, and other types of data, which was previously
impossible to store earlier.

Disadvantages of Object-Oriented Data Model

 It has complex navigational data access.


 There is a steep learning curve here.
 The transactions might be slow here.

6) Object-relational Data Model


An Object relational model is a combination of a Object oriented database model and a
Relational database model. So, it supports objects, classes, inheritance etc. just like Object
Oriented models and has support for data types, tabular structures etc. like Relational data
model.
One of the major goals of Object relational data model is to close the gap between relational
databases and the object oriented practices frequently used in many programming languages
such as C++, C#, Java etc.
Object-relational model tries to bring the main concepts from the OO domain to the relational
model. The heart is the relational model with some extensions. Extensions through user-
defined types. As the name suggests it is a combination of both the relational model and the
object-oriented model. This model was built to fill the gap between object-oriented model
and the relational model. We can have many advanced features like we can make complex
data types according to our requirements using the existing data types. The problem with this
model is that this can get complex and difficult to handle. So, proper understanding of this
model is required.

Object-relational database systems (ORDBSs) are generated from a combination of relational


models and OO thinking. This object-relational approach both inherits existing RDBS
technologies and provides support for object data management. Overall, most ORDBSs are
mainly implemented within relational models but merely add partial support for simple object
types.

The basic goal for the Object-relational database is to bridge the gap between relational
databases and the object-oriented modeling techniques used in programming languages such
as Java, C++, Visual Basic .NET or C#. However, a more popular alternative for achieving
such a bridge is to use a standard relational database systems with some form of object-
relational mapping (ORM) software. Whereas traditional RDBMS or SQL-DBMS products
focused on the efficient management of data drawn from a limited set of datatypes (defined
by the relevant language standards), an objectrelational DBMS allows software developers to
integrate their own types and the methods that apply to them into the DBMS.

History of Object Relational Data Model


Both Relational data models and Object oriented data models are very useful. But it was felt
that they both were lacking in some characteristics and so work was started to build a model
that was a combination of them both. Hence, Object relational data model was created as a
result of research that was carried out in the 1990’s.
Characteristics of an ORDBMSs

 Base datatype extension,


 Support complex objects,
 Inheritance

Advantages of Object Relational model


The advantages of the Object Relational model are −
Inheritance
The Object Relational data model allows its users to inherit objects, tables etc. so that they can
extend their functionality. Inherited objects contains new attributes as well as the attributes
that were inherited.
Complex Data Types
Complex data types can be formed using existing data types. This is useful in Object relational
data model as complex data types allow better manipulation of the data.
Extensibility
The functionality of the system can be extended in Object relational data model. This can be
achieved using complex data types as well as advanced concepts of object oriented model
such as inheritance.
Disadvantages of Object Relational model
The object relational data model can get quite complicated and difficult to handle at times as it
is a combination of the Object oriented data model and Relational data model and utilizes the
functionalities of both of them.
7) Semi-Structured Data Model
What is the Semi-Structured Data Model in DBMS?
This model is a DB (database) model in which the data and the schema are not separated, and
the amount of structure employed is determined by the goal or the purpose.
What is Semi-Structured Data in DBMS?
Semi-structured data refers to the structured data that doesn’t adhere to the tabular structure of
the data models that are associated with relational DBs or any other types of data tables. It
includes tags or any other markers in order to segregate semantic pieces and enforce
hierarchies of fields and records within the data. As a result, it’s known as a self-descriptive
structure.
Despite being grouped close to one other, entities of the same class in semi-structured data
may have diverse characteristics, and the order of the attributes is irrelevant.
Because full-length texts, documents, and DB are no longer the sole types of data, semi-
structured data has become more common since the internet’s inception. Semi-structured data
is ubiquitous in object-oriented DBs, and many applications require a means or a medium for
information transmission.
One standard for expressing semi-structured data is the Object Exchange Model (OEM), while
another is XML.
Example
A common example of this type of data model is web-based data sources in which the data
and the schema of the website are indistinguishable. Some entities may be missing attributes,
while another one may have an extra attribute in this model. This approach allows for data
storage flexibility. It also allows the qualities to be more flexible. Any value that we store in
any attribute can be either an atomic value or a collection of data.
Emails, HTML, web pages, etc., are a few more examples of this type of data model.
Advantages

 It can be used to represent data from some data sources that aren’t bound by a
schema.
 It offers a versatile format for data sharing between various DBs.
 Viewing structured data as semi-structured can be beneficial for browsing purposes.
 It is simple to alter the schema.
 It’s possible that the data transfer format is portable.
Disadvantages
The main disadvantage of using a semi-structured data model is that queries cannot be
performed as quickly as they can in a more limited structure, such as the relational model.
In a semi-structured DB, records are typically stored with unique IDs referenced with pointers
to their disc location.
It makes navigational or path-based queries very efficient, but it is inefficient for searching
multiple records (as is common in SQL) because it must seek around the disc following
pointers.
8) Flat Data Model
A flat database is a simple database system in which each database is represented as a single
table in which all of the records are stored as single rows of data, which are separated by
delimiters such as tabs or commas. The table is usually stored and physically represented as a
simple text file.
Because of the limitations of flat databases, they are not unsuitable for most software
applications in which there is a need to represent and store complex business relationships.
However, some application developers still use flat files in order to reduce the cost and
complexity of integrating a relational database.

Flat databases are also sometimes referred to as flat-file databases.

Unlike relational databases, flat databases cannot represent complex relationships between
entities. They also have no way of enforcing constraints between data.
For example, in an application used by a commercial bank, it is a good idea to ensure that, at
the time of creation, a new account must be linked to an existing customer. In a relational
database this is easily enforced using the concept of foreign keys to ensure that customer IDs
are filled in while creating an account, and also that said customer IDs already exist in
another table. This is not possible with flat databases.
Another limitation of flat databases vis-a-vis relational databases is the former’s lack of query
and indexing capability.
Some real-life examples of flat databases are contact lists in a mobile phone and the storage of
a high-scores list in a simple video game.
Data Abstraction in DBMS

While sending an email to some have you ever thought of where the email is physically stored
, what data model is used ? The answer is No. What we need to know is that we have to send
an email to that particular email address. This is called Data abstraction.

What is Data abstraction in Database Management System?


Data abstractions in DBMS refer to the hiding of unnecessary data from the end-user.
Database systems have complex data structures and relationships. These difficulties are
masked so that users may readily access the data, and just the relevant section of the database
is made accessible to them through data abstraction. Let's understand this more with an
example.
Example: If we want to retrieve any email from Gmail, we don't know where that data is
physically kept, such as in India or the United States, or what data model was utilized to store
it. These things are not essential to us. Only our email is of interest to us.

Levels of Data Abstractions in DBMS


The level of Data abstractions in DBMS reduces the time complexity and helps make the
system efficient.

Now let's look at the levels of data abstractions in DBMS and discuss them in detail.
1. Physical or Internal Level

It is the lowest level of abstraction for DBMSs, defining how data is stored, data structures for
storing data, and database access mechanisms.
Developers or database application programmers decide how to store data in the database. It is
complex to understand.
2. Logical or Conceptual Level
The logical level is the next higher level or intermediate level. It explains what data is stored
in the database and how those data are related. It seeks to explain the complete or entire data
by describing what tables should be constructed and what the linkages between those tables
should be. It is less complex than the physical level.
3. View or External Level

This is the top level. There are various views at the view level, with each view defining only a
portion of the total data. It also facilitates user engagement by providing a variety of views or
numerous views of a single database. All users have access to the view level. This is the
easiest and most simple level.

Data Independence
The primary goal of data abstractions in DBMS is to obtain data independence in order to save
time and money when modifying or altering a database.
Data independence is known as the ability to change the scheme without affecting the
programs and applications to be rewritten.
Data Independence is mainly of two types :
1. Physical level independence

It refers to the ability to change the physical schema without changing the conceptual or
logical schema, which is done for optimization purposes.
2. Logical level independence

This feature is referred to as the ability to change the logical schema without changing the
external schema or application program.
Any modifications to the conceptual representation of the data would not affect the user's
perception of the data.

Advantages of Data Abstraction in DBMS


 It reduces the complexity for the users.
 While retrieval of data abstractions in DBMS makes the system efficient.
 Increases the usability of the users.
 Increases the security aspect of the application as implementation details are hidden from
the users.
 Increases the code duplicity and reusability.

Structure of Database Management System


Structure of Database Management System is also referred to as Overall System Structure or
Database Architecture. DBMS is a software that allows access to data stored in a database
and provides an easy and effective method of –

 Defining the information.


 Storing the information.
 Manipulating the information.
 Protecting the information from system crashes or data theft.
 Differentiating access permissions for different users.
The database system is divided into three components: Query Processor, Storage Manager,
and Disk Storage. These are explained as below.
1. Query Processor: It interprets the requests (queries) received from end user via an
application program into instructions. It also executes the user request which is
received from the DML compiler.
Query Processor contains the following components –

 DML Compiler: It processes the DML statements into low level instruction (machine
language), so that they can be executed.
 DDL Interpreter: It processes the DDL statements into a set of table containing meta
data (data about data).
 Embedded DML Pre-compiler: It processes DML statements embedded in an
application program into procedural calls.
 Query Optimizer: It executes the instruction generated by DML Compiler.
2. Storage Manager: Storage Manager is a program that provides an interface between
the data stored in the database and the queries received. It is also known as Database
Control System. It maintains the consistency and integrity of the database by applying
the constraints and executing the DCL statements. It is responsible for updating,
storing, deleting, and retrieving data in the database.
It contains the following components –
 Authorization Manager: It ensures role-based access control, i.e,. checks whether
the particular person is privileged to perform the requested operation or not.

 Integrity Manager: It checks the integrity constraints when the database is modified.

 Transaction Manager: It controls concurrent access by performing the operations in


a scheduled way that it receives the transaction. Thus, it ensures that the database
remains in the consistent state before and after the execution of a transaction.

 File Manager: It manages the file space and the data structure used to represent
information in the database.

 Buffer Manager: It is responsible for cache memory and the transfer of data between
the secondary storage and main memory.

3. Disk Storage: It contains the following components –


 Data Files: It stores the data.

 Data Dictionary: It contains the information about the structure of any database
object. It is the repository of information that governs the metadata.
 Indices: It provides faster retrieval of data item.
INTRODUCTION TO DATABASE DESIGN

Database Design and E-R Diagrams


An Entity Relationship Diagram is a diagram that represents relationships among entities in a
database. It is commonly known as an ER Diagram. An ER Diagram in DBMS plays a
crucial role in designing the database. Today’s business world previews all the requirements
demanded by the users in the form of an ER Diagram. Later, it's forwarded to the database
administrators to design the database.

What is an ER Diagram?

An Entity Relationship Diagram (ER Diagram) pictorially explains the relationship between
entities to be stored in a database. Fundamentally, the ER Diagram is a structural design of
the database. It acts as a framework created with specialized symbols for the purpose of
defining the relationship between the database entities. ER diagram is created based on three
principal components: entities, attributes, and relationships.

The following diagram showcases two entities - Student and Course, and their relationship.
The relationship described between student and course is many-to-many, as a course can be
opted by several students, and a student can opt for more than one course. Student entity
possesses attributes - Stu_Id, Stu_Name & Stu_Age. The course entity has attributes such as
Cou_ID & Cou_Name.
What is an ER Model?

An Entity-Relationship Model represents the structure of the database with the help of a
diagram. ER Modelling is a systematic process to design a database as it would require you to
analyze all data requirements before implementing your database.

History of ER models

Peter Chen proposed ER Diagrams in 1971 to create a uniform convention that can be used as
a conceptual modeling tool. Many models were presented and discussed, but none were
suitable. The data structure diagrams offered by Charles Bachman also inspired his model.

Why Use ER Diagrams in DBMS?

 ER Diagram helps you conceptualize the database and lets you know which fields need to
be embedded for a particular entity
 ER Diagram gives a better understanding of the information to be stored in a database
 It reduces complexity and allows database designers to build databases quickly
 It helps to describe elements using Entity-Relationship models
 It allows users to get a preview of the logical structure of the database

Symbols Used in ER Diagrams:

 Rectangles: This Entity Relationship Diagram symbol represents entity types


 Ellipses: This symbol represents attributes
 Diamonds: This symbol represents relationship types
 Lines: It links attributes to entity types and entity types with other relationship types
 Primary key: Here, it underlines the attributes
 Double Ellipses: Represents multi-valued attributes
Components of ER Diagram

You base an ER Diagram on three basic concepts:

 Entities
 Strong Entity
 Weak Entity
 Attributes
 Key Attribute
 Composite Attribute
 Multi-valued Attribute
 Derived Attribute
 Relationships
 One-to-One Relationships
 One-to-Many Relationships
 Many-to-One Relationships
 Many-to-Many Relationships

Entities
An entity can be either a living or non-living component.

It showcases an entity as a rectangle in an ER diagram.

For example, in a student study course, both the student and the course are entities.

Weak Entity
An entity that makes reliance over another entity is called a weak entity

You showcase the weak entity as a double rectangle in ER Diagram.

In the example below, school is a strong entity because it has a primary key attribute - school
number.

Unlike school, the classroom is a weak entity because it does not have any primary key and
the room number here acts only as a discriminator.
Attribute
An attribute exhibits the properties of an entity.

You can illustrate an attribute with an oval shape in an ER diagram.

Key Attribute
Key attribute uniquely identifies an entity from an entity set.

It underlines the text of a key attribute.

For example: For a student entity, the roll number can uniquely identify a student from a set of
students.

Composite Attribute
An attribute that is composed of several other attributes is known as a composite attribute.

An oval showcases the composite attribute, and the composite attribute oval is further
connected with other ovals.

Multi-valued Attribute
Some attributes can possess over one value, those attributes are called multivalued attributes.
The double oval shape is used to represent a multivalued attribute.

Derived Attribute
An attribute that can be derived from other attributes of the entity is known as a derived
attribute.

In the ER diagram, the dashed oval represents the derived attribute.

Relationship
The diamond shape showcases a relationship in the ER diagram.

It depicts the relationship between two entities.

In the example below, both the student and the course are entities, and study is the relationship
between them.

One-to-One Relationship
When a single element of an entity is associated with a single element of another entity, it is
called a one-to-one relationship.
For example, a student has only one identification card and an identification card is given to
one person.

One-to-Many Relationship
When a single element of an entity is associated with more than one element of another entity,
it is called a one-to-many relationship

For example, a customer can place many orders, but an order cannot be placed by many
customers.

Many-to-One Relationship
When more than one element of an entity is related to a single element of another entity, then
it is called a many-to-one relationship.

For example, students have to opt for a single course, but a course can have many students.

Many-to-Many Relationship
When more than one element of an entity is associated with more than one element of another
entity, this is called a many-to-many relationship.
For example, you can assign an employee to many projects and a project can have many
employees.

How to Draw an ER Diagram?

Below are some important points to draw ER diagram:

 First,
identify all the Entities. Embed all the entities in a rectangle and label them
properly.
 Identify relationships between entities and connect them using a diamond in the middle,
illustrating the relationship. Do not connect relationships with each other.
 Connect attributes for entities and label them properly.
 Eradicate any redundant entities or relationships.
 Make sure your ER Diagram supports all the data provided to design the database.
 Effectively use colors to highlight key areas in your diagrams.

Conclusion

ER Diagram in DBMS is widely used to describe the conceptual design of databases. It helps
both users and database developers to preview the structure of the database before
implementing the database.

Entity

An entity in DBMS (Database Management System) is a real-world object that has certain
properties called attributes that define the nature of the entity.

Entities consist of attributes that define their characteristic features/properties.


For example: If we consider a car entity, it can have its attributes as a car's registration
number, car's model, car's name, car's color, number of seats that are there inside the car, etc.
Below is the tabular representation of the car entities.

Entities are divided into two categories. These two categories of an entity are tangible entities
and non-tangible entities.

Tangible Entities: Tangible entities are the entities that physically exist in the real world. For
example, the entity of cars, the entity of books, etc.

Non-Tangible Entities: Non-tangible entities are entities that do not physically exist in the
real world. For example, email id, social media account, etc.

Entity Set in DBMS:


An entity refers to any object having-
 Either a physical existence such as a particular person, office, house or car.
 Or a conceptual existence such as a school, a university, a company or a job.

In ER diagram,
 Attributes are associated with an entity set.
 Attributes describe the properties of entities in the entity set.
 Based on the values of certain attributes, an entity can be identified uniquely.

Types of Entity Sets-

An entity set may be of the following two types-


1. Strong Entity Set-
 A strong entity set is an entity set that contains sufficient attributes to uniquely identify
all its entities.
 In other words, a primary key exists for a strong entity set.
 Primary key of a strong entity set is represented by underlining it.

Symbols Used-

 A single rectangle is used for representing a strong entity set.


 A diamond symbol is used for representing the relationship that exists between two
strong entity sets.
 A single line is used for representing the connection of the strong entity set with the
relationship set.
 A double line is used for representing the total participation of an entity set with the
relationship set.
 Total participation may or may not exist in the relationship.

Example-

Consider the following ER diagram-

In this ER diagram,

 Two strong entity sets “Student” and “Course” are related to each other.
 Student ID and Student name are the attributes of entity set “Student”.
 Student ID is the primary key using which any student can be identified uniquely.
 Course ID and Course name are the attributes of entity set “Course”.
 Course ID is the primary key using which any course can be identified uniquely.
 Double line between Student and relationship set signifies total participation.
 It suggests that each student must be enrolled in at least one course.
 Single line between Course and relationship set signifies partial participation.
 It suggests that there might exist some courses for which no enrollments are made.
Weak Entity Set-

 A weak entity set is an entity set that does not contain sufficient attributes to
uniquely identify its entities.
 In other words, a primary key does not exist for a weak entity set.
 However, it contains a partial key called as a discriminator.
 Discriminator can identify a group of entities from the entity set.
 Discriminator is represented by underlining with a dashed line.

NOTE-

 The combination of discriminator and primary key of the strong entity set makes it
possible to uniquely identify all entities of the weak entity set.
 Thus, this combination serves as a primary key for the weak entity set.
 Clearly, this primary key is not formed by the weak entity set completely.

Symbols Used-

 A double rectangle is used for representing a weak entity set.


 A double diamond symbol is used for representing the relationship that exists between
the strong and weak entity sets and this relationship is known as identifying
relationship.
 A double line is used for representing the connection of the weak entity set with the
relationship set.
 Total participation always exists in the identifying relationship.

Example-

Consider the following ER diagram-

In this ER diagram,

 One strong entity set “Building” and one weak entity set “Apartment” are related to each
other.
 Strong entity set “Building” has building number as its primary key.
 Door number is the discriminator of the weak entity set “Apartment”.
 This is because door number alone can not identify an apartment uniquely as there may be
several other buildings having the same door number.
 Double line between Apartment and relationship set signifies total participation.
 It suggests that each apartment must be present in at least one building.
 Single line between Building and relationship set signifies partial participation.
 It suggests that there might exist some buildings which has no apartment.

To uniquely identify any apartment,


 First, building number is required to identify the particular building.
 Secondly, door number of the apartment is required to uniquely identify the apartment.

Thus,
Primary key of Apartment = Primary key of Building + Its own discriminator= Building
number + Door number
Differences between Strong entity set and Weak entity set-
Strong entity set Weak entity set

A single rectangle is used for the A double rectangle is used for the
representation of a strong entity set. representation of a weak entity set.

It contains sufficient attributes to form It does not contain sufficient attributes to


its primary key. form its primary key.

A diamond symbol is used for the A double diamond symbol is used for the
representation of the relationship that representation of the identifying relationship
exists between the two strong entity that exists between the strong and weak
sets. entity set.

A single line is used for the


A double line is used for the representation
representation of the connection
of the connection between the weak entity set
between the strong entity set and the
and the relationship set.
relationship.

Total participation may or may not Total participation always exists in the
exist in the relationship. identifying relationship.

Note-

In ER diagram, weak entity set is always present in total participation with the identifying
relationship set.
Relationship Set
Relationship Example:

‘Enrolled in’ is a relationship that exists between entities Student and Course.
Relationship Set:
A relationship set is a set of relationships of same type.

Set representation of above ER diagram is

Degree of a Relationship Set-

The number of entity sets that participate in a relationship set is termed as the degree of that
relationship set. Thus,

Types of Relationship Sets-

On the basis of degree of a relationship set, a relationship set can be classified into the
following types-

1. Unary relationship set


2. Binary relationship set
3. Ternary relationship set
4. N-ary relationship set

1. Unary Relationship Set-

Unary relationship set is a relationship set where only one entity set participates in a
relationship set.
Example:

One person is related to only one person.

2. Binary Relationship Set-

Binary relationship set is a relationship set where two entity sets participate in a relationship
set.

Example:

Student is enrolled in a Course.

3. Ternary Relationship Set-

Ternary relationship set is a relationship set where three entity sets participate in a relationship
set.
Example:

4. N-ary Relationship Set-

N-ary relationship set is a relationship set where ‘n’ entity sets participate in a relationship set.

Additional features of E-R Model

Additional features in E-R model are Generalization, Specialization and Aggregation. In ER


model they are used for data abstraction in which abstraction mechanism is used to hide
details of a set of objects.
Generalization:
Generalization is the process of extracting common properties from a set of entities and
create a generalized entity from it. It is a bottom-up approach in which two or more entities
can be generalized to a higher level entity if they have some attributes in common. For
Example, STUDENT and FACULTY can be generalized to a higher level entity called
PERSON as shown in Figure 1. In this case, common attributes like P_NAME, P_ADD
become part of higher entity (PERSON) and specialized attributes like S_FEE become part
of specialized entity (STUDENT).

Specialization:
In specialization, an entity is divided into sub-entities based on their characteristics. It is a
top-down approach where higher level entity is specialized into two or more lower level
entities. For Example, EMPLOYEE entity in an Employee management system can be
specialized into DEVELOPER, TESTER etc. as shown in Figure 2. In this case, common
attributes like E_NAME, E_SAL etc. become part of higher entity (EMPLOYEE) and
specialized attributes like TES_TYPE become part of specialized entity (TESTER).

Aggregation –
An ER diagram is not capable of representing relationship between an entity and a
relationship which may be required in some scenarios. In those cases, a relationship with its
corresponding entities is aggregated into a higher level entity. Aggregation is an abstraction
through which we can represent relationships as higher level entity sets.
For Example, Employee working for a project may require some machinery. So, REQUIRE
relationship is needed between relationship WORKS_FOR and entity MACHINERY. Using
aggregation, WORKS_FOR relationship with its entities EMPLOYEE and PROJECT is
aggregated into single entity and relationship REQUIRE is created between aggregated
entity and MACHINERY.

You might also like