DBMS Unit1 Notes
DBMS Unit1 Notes
UNIT –I
Database System Applications
A Historical Perspective
The concept of a database was made possible by the emergence of direct access
storage media such as magnetic disks, which became widely available in the mid-1960s;
earlier systems relied on sequential storage of data on magnetic tape. The subsequent
development of database technology can be divided into three eras based on data model or
structure: navigational, SQL/relational, and post-relational.
The two main early navigational data models were the hierarchical model and
the CODASYL model (network model). These were characterized by the use of pointers
(often physical disk addresses) to follow relationships from one record to another.
The relational model, first proposed in 1970 by Edgar F. Codd, departed from
this tradition by insisting that applications should search for data by content, rather than by
following links. The relational model employs sets of ledger-style tables, each used for a
different type of entity. Only in the mid-1980s did computing hardware become powerful
enough to allow the wide deployment of relational systems (DBMSs plus applications). By
the early 1990s, however, relational systems dominated in all large-scale data
processing applications, and as of 2018 they remain dominant: IBM Db2, Oracle, MySQL,
and Microsoft SQL Server are the most searched DBMS. The dominant database language,
standardized SQL for the relational model, has influenced database languages for other data
models.
The next generation of post-relational databases in the late 2000s became known
as NoSQL databases, introducing fast key–value stores and document-oriented databases. A
competing "next generation" known as NewSQL databases attempted new implementations
that retained the relational/SQL model while aiming to match the high performance of
NoSQL compared to commercially available relational DBMSs.
1960s, Navigational DBMS
The introduction of the term database coincided with the availability of direct-
access storage (disks and drums) from the mid-1960s onwards. The term represented a
contrast with the tape-based systems of the past, allowing shared interactive use rather than
daily batch processing. The Oxford English Dictionary cites a 1962 report by the System
Development Corporation of California as the first to use the term "data-base" in a specific
technical sense.
It doesn’t provide backup and It provides backup and recovery of data even if it is
Backup and Recovery recovery of data if it is lost. lost.
There is less data consistency in the There is more data consistency because of the
Consistency file system. process of normalization.
File systems provide less security in DBMS has more security mechanisms as compared
Security Constraints comparison to DBMS. to file systems.
Data models are used to describe how the data is stored, accessed, and updated in a DBMS.
There are many types of data models that are used in the industry.
Types of Data Models in DBMS
1) Hierarchical model
In hierarchical model, data is organized into a tree like structure with each
record is having one parent record and many children. The main drawback of this model is
that, it can have only one to many relationships between nodes.
Lets say we have few students and few courses and a course can be assigned to a
single student only, however a student take any number of courses so this relationship
becomes one to many.
2) Network model
The network model is a very complex database model, so the user must be very familiar with
the overall structure of the database.
Updating the database is a quite difficult and boring task. We need the help of the application
programs that are being used to navigate the data.
3) Entity-Relationship Model
Entity-Relationship (ER) Model is based on the notion of real-world entities and relationships
among them. While formulating real-world scenario into the database model, the ER Model
creates entity set, relationship set, general attributes and constraints.
ER Model is best used for the conceptual design of a database.
ER Model is based on −
Entitiesand their attributes.
Relationships among entities.
Advantages
1. SIMPLE : It is simple to draw an ER diagram when we know entities and
relationships.
2. EASY TO UNDERSTAND : The design of ER is very logical and hence they
are easy to design and understand.
3. INTEGRATED : The ER Model can be easily integrated with relational
model.
4. USEFUL IN DECISION MAKING : By drawing an ER-Diagram we come to
know what kind of attributes and relationship exist between them.
5. EASY CONVERSION : It can be easily converted to other type of models.
Disadvantages
1. LOSS OF INFORMATION: While drawing an ER Model some of the
information can be hidden or lost.
2. NO REPRESENTATION FOR DATA MANIPULATION: It is not possible
to represent data manipulation(commands like insert(),delete(),alter(),update()) in ER
model.
3. DATA INCONSISTENCY: Due to improper Normalization some data
inconsistency may occur so, while creating an ER diagram at least it should be in third
normal form.
4) Relational Model
The relational model in DBMS is an abstract model used to organize and manage the data
stored in a database. It stores data in two-dimensional inter-related tables, also known as
relations in which each row represents an entity and each column represents the properties of
the entity.
The main highlights of this model are −
Data is stored in tables called relations.
Relations can be normalized.
In normalized relations, values saved are atomic values.
Each row in a relation contains a unique value.
Each column in a relation contains values from a same domain.
Relational Model Concepts
Relational Schema : It is the logical blueprint of the relation i.e., it describes the design
and the structure of the relation. It contains the table name, its attributes, and their types:
For our Student relation example, the relational schema will be:
Relational Instance : It is the collection of records present in the relation at a given time.
Relation Key : It is an attribute or a group of attributes that can be used to uniquely
identify an entity in a table or to determine the relationship between two tables. Relation
keys can be of 6 different types:
1. Candidate Key
2. Super Key
3. Composite Key
4. Primary Key
5. Alternate Key
6. Foreign Key
Highlights:
Relational models make use of some rules to ensure the accuracy and accessibility of the
data. These rules or constraints are known as Relational Integrity Constraints.
These constraints are checked before performing any operation like insertion, deletion, or
updation on the data present in a relational database. These constraints include:
Domain Constraint : It specifies that every attribute is bound to have a value that lies
inside a specific range of values.
Key Constraint : It states that every relation must contain an attribute or a set of
attributes (Primary Key) that can uniquely identify a tuple in that relation. This key can
never be NULL or contain the same value for two different tuples.
Referential Integrity Constraint : It is defined between two inter-related tables. It
states that if a given relation refers to a key attribute of a different or same table.
When we notice any unexpected behavior while working with the relational databases, there
may be a presence of too much redundancy in the data stored in the database. This can
cause anomalies in the DBMS and it can be of various types such as:
InsertionAnomalies: It is the inability to insert data in the database due to the absence of
other data. For example: Suppose we are dividing the whole class into groups for a project
and the GroupNumber attribute is defined so that null values are not allowed. If a new student
is admitted to the class but not immediately assigned to a group then this student can't be
inserted into the database.
Deletion Anomalies - It is the accidental loss of data in the database upon deletion of any
other data element. For example: Suppose, we have an employee relation that contains the
details of the employee along with the department they are working in. Now, if a department
has one employee working in it and we remove the information of this employee from the
table, there will be the loss of data related to the department also. This can lead to data
inconsistency.
Advantages
Disadvantages
The performance of the relational model depends upon the number of relations present in
the database.
Hence, as the number of tables increases, the requirement of physical memory increases.
The structure becomes complex and there is a decrease in the response time for the
queries.
Because of all these factors, the cost of implementing a relational database increase.
5) Object Oriented Data Model
The foundation of any object-oriented programming language is based on the concept of
objects and classes. This enables the user to achieve abstraction, inheritance, polymorphism,
and encapsulation. A similar concept is used in the Object Oriented Model in DBMS.
The Object-Oriented Model in DBMS or OODM is the data model where data is stored in the
form of objects. This model is used to represent real-world entities. The data and data
relationship is stored together in a single entity known as an object in the Object Oriented
Model. The Object-Oriented Database Management System is built on top of Object Oriented
Model.
As we have discussed earlier, we can use the Object Oriented Model in DBMS to store real-
world entities. Here, we can store pictures, audio, video, and other types of data.
Example
Object attribute- The objects have certain characteristics. These are known as
the attributes of the object.
Inheritance- It is the ability of the object within the class hierarchy to inherit the
attributes and methods of the classes above it. A new class can be derived from an
existing class, the new class has the attributes and methods described in the
existing class and also has its attributes and methods. This helps in code
reusability.
It is used to add semantic content that can be well understood by the user.
It is used to make the code resemble real-world objects.
Database integrity can be achieved.
Structural and database independence is created.
We can store pictures, audio, video, and other types of data, which was previously
impossible to store earlier.
The basic goal for the Object-relational database is to bridge the gap between relational
databases and the object-oriented modeling techniques used in programming languages such
as Java, C++, Visual Basic .NET or C#. However, a more popular alternative for achieving
such a bridge is to use a standard relational database systems with some form of object-
relational mapping (ORM) software. Whereas traditional RDBMS or SQL-DBMS products
focused on the efficient management of data drawn from a limited set of datatypes (defined
by the relevant language standards), an objectrelational DBMS allows software developers to
integrate their own types and the methods that apply to them into the DBMS.
It can be used to represent data from some data sources that aren’t bound by a
schema.
It offers a versatile format for data sharing between various DBs.
Viewing structured data as semi-structured can be beneficial for browsing purposes.
It is simple to alter the schema.
It’s possible that the data transfer format is portable.
Disadvantages
The main disadvantage of using a semi-structured data model is that queries cannot be
performed as quickly as they can in a more limited structure, such as the relational model.
In a semi-structured DB, records are typically stored with unique IDs referenced with pointers
to their disc location.
It makes navigational or path-based queries very efficient, but it is inefficient for searching
multiple records (as is common in SQL) because it must seek around the disc following
pointers.
8) Flat Data Model
A flat database is a simple database system in which each database is represented as a single
table in which all of the records are stored as single rows of data, which are separated by
delimiters such as tabs or commas. The table is usually stored and physically represented as a
simple text file.
Because of the limitations of flat databases, they are not unsuitable for most software
applications in which there is a need to represent and store complex business relationships.
However, some application developers still use flat files in order to reduce the cost and
complexity of integrating a relational database.
Unlike relational databases, flat databases cannot represent complex relationships between
entities. They also have no way of enforcing constraints between data.
For example, in an application used by a commercial bank, it is a good idea to ensure that, at
the time of creation, a new account must be linked to an existing customer. In a relational
database this is easily enforced using the concept of foreign keys to ensure that customer IDs
are filled in while creating an account, and also that said customer IDs already exist in
another table. This is not possible with flat databases.
Another limitation of flat databases vis-a-vis relational databases is the former’s lack of query
and indexing capability.
Some real-life examples of flat databases are contact lists in a mobile phone and the storage of
a high-scores list in a simple video game.
Data Abstraction in DBMS
While sending an email to some have you ever thought of where the email is physically stored
, what data model is used ? The answer is No. What we need to know is that we have to send
an email to that particular email address. This is called Data abstraction.
Now let's look at the levels of data abstractions in DBMS and discuss them in detail.
1. Physical or Internal Level
It is the lowest level of abstraction for DBMSs, defining how data is stored, data structures for
storing data, and database access mechanisms.
Developers or database application programmers decide how to store data in the database. It is
complex to understand.
2. Logical or Conceptual Level
The logical level is the next higher level or intermediate level. It explains what data is stored
in the database and how those data are related. It seeks to explain the complete or entire data
by describing what tables should be constructed and what the linkages between those tables
should be. It is less complex than the physical level.
3. View or External Level
This is the top level. There are various views at the view level, with each view defining only a
portion of the total data. It also facilitates user engagement by providing a variety of views or
numerous views of a single database. All users have access to the view level. This is the
easiest and most simple level.
Data Independence
The primary goal of data abstractions in DBMS is to obtain data independence in order to save
time and money when modifying or altering a database.
Data independence is known as the ability to change the scheme without affecting the
programs and applications to be rewritten.
Data Independence is mainly of two types :
1. Physical level independence
It refers to the ability to change the physical schema without changing the conceptual or
logical schema, which is done for optimization purposes.
2. Logical level independence
This feature is referred to as the ability to change the logical schema without changing the
external schema or application program.
Any modifications to the conceptual representation of the data would not affect the user's
perception of the data.
DML Compiler: It processes the DML statements into low level instruction (machine
language), so that they can be executed.
DDL Interpreter: It processes the DDL statements into a set of table containing meta
data (data about data).
Embedded DML Pre-compiler: It processes DML statements embedded in an
application program into procedural calls.
Query Optimizer: It executes the instruction generated by DML Compiler.
2. Storage Manager: Storage Manager is a program that provides an interface between
the data stored in the database and the queries received. It is also known as Database
Control System. It maintains the consistency and integrity of the database by applying
the constraints and executing the DCL statements. It is responsible for updating,
storing, deleting, and retrieving data in the database.
It contains the following components –
Authorization Manager: It ensures role-based access control, i.e,. checks whether
the particular person is privileged to perform the requested operation or not.
Integrity Manager: It checks the integrity constraints when the database is modified.
File Manager: It manages the file space and the data structure used to represent
information in the database.
Buffer Manager: It is responsible for cache memory and the transfer of data between
the secondary storage and main memory.
Data Dictionary: It contains the information about the structure of any database
object. It is the repository of information that governs the metadata.
Indices: It provides faster retrieval of data item.
INTRODUCTION TO DATABASE DESIGN
What is an ER Diagram?
An Entity Relationship Diagram (ER Diagram) pictorially explains the relationship between
entities to be stored in a database. Fundamentally, the ER Diagram is a structural design of
the database. It acts as a framework created with specialized symbols for the purpose of
defining the relationship between the database entities. ER diagram is created based on three
principal components: entities, attributes, and relationships.
The following diagram showcases two entities - Student and Course, and their relationship.
The relationship described between student and course is many-to-many, as a course can be
opted by several students, and a student can opt for more than one course. Student entity
possesses attributes - Stu_Id, Stu_Name & Stu_Age. The course entity has attributes such as
Cou_ID & Cou_Name.
What is an ER Model?
An Entity-Relationship Model represents the structure of the database with the help of a
diagram. ER Modelling is a systematic process to design a database as it would require you to
analyze all data requirements before implementing your database.
History of ER models
Peter Chen proposed ER Diagrams in 1971 to create a uniform convention that can be used as
a conceptual modeling tool. Many models were presented and discussed, but none were
suitable. The data structure diagrams offered by Charles Bachman also inspired his model.
ER Diagram helps you conceptualize the database and lets you know which fields need to
be embedded for a particular entity
ER Diagram gives a better understanding of the information to be stored in a database
It reduces complexity and allows database designers to build databases quickly
It helps to describe elements using Entity-Relationship models
It allows users to get a preview of the logical structure of the database
Entities
Strong Entity
Weak Entity
Attributes
Key Attribute
Composite Attribute
Multi-valued Attribute
Derived Attribute
Relationships
One-to-One Relationships
One-to-Many Relationships
Many-to-One Relationships
Many-to-Many Relationships
Entities
An entity can be either a living or non-living component.
For example, in a student study course, both the student and the course are entities.
Weak Entity
An entity that makes reliance over another entity is called a weak entity
In the example below, school is a strong entity because it has a primary key attribute - school
number.
Unlike school, the classroom is a weak entity because it does not have any primary key and
the room number here acts only as a discriminator.
Attribute
An attribute exhibits the properties of an entity.
Key Attribute
Key attribute uniquely identifies an entity from an entity set.
For example: For a student entity, the roll number can uniquely identify a student from a set of
students.
Composite Attribute
An attribute that is composed of several other attributes is known as a composite attribute.
An oval showcases the composite attribute, and the composite attribute oval is further
connected with other ovals.
Multi-valued Attribute
Some attributes can possess over one value, those attributes are called multivalued attributes.
The double oval shape is used to represent a multivalued attribute.
Derived Attribute
An attribute that can be derived from other attributes of the entity is known as a derived
attribute.
Relationship
The diamond shape showcases a relationship in the ER diagram.
In the example below, both the student and the course are entities, and study is the relationship
between them.
One-to-One Relationship
When a single element of an entity is associated with a single element of another entity, it is
called a one-to-one relationship.
For example, a student has only one identification card and an identification card is given to
one person.
One-to-Many Relationship
When a single element of an entity is associated with more than one element of another entity,
it is called a one-to-many relationship
For example, a customer can place many orders, but an order cannot be placed by many
customers.
Many-to-One Relationship
When more than one element of an entity is related to a single element of another entity, then
it is called a many-to-one relationship.
For example, students have to opt for a single course, but a course can have many students.
Many-to-Many Relationship
When more than one element of an entity is associated with more than one element of another
entity, this is called a many-to-many relationship.
For example, you can assign an employee to many projects and a project can have many
employees.
First,
identify all the Entities. Embed all the entities in a rectangle and label them
properly.
Identify relationships between entities and connect them using a diamond in the middle,
illustrating the relationship. Do not connect relationships with each other.
Connect attributes for entities and label them properly.
Eradicate any redundant entities or relationships.
Make sure your ER Diagram supports all the data provided to design the database.
Effectively use colors to highlight key areas in your diagrams.
Conclusion
ER Diagram in DBMS is widely used to describe the conceptual design of databases. It helps
both users and database developers to preview the structure of the database before
implementing the database.
Entity
An entity in DBMS (Database Management System) is a real-world object that has certain
properties called attributes that define the nature of the entity.
Entities are divided into two categories. These two categories of an entity are tangible entities
and non-tangible entities.
Tangible Entities: Tangible entities are the entities that physically exist in the real world. For
example, the entity of cars, the entity of books, etc.
Non-Tangible Entities: Non-tangible entities are entities that do not physically exist in the
real world. For example, email id, social media account, etc.
In ER diagram,
Attributes are associated with an entity set.
Attributes describe the properties of entities in the entity set.
Based on the values of certain attributes, an entity can be identified uniquely.
Symbols Used-
Example-
In this ER diagram,
Two strong entity sets “Student” and “Course” are related to each other.
Student ID and Student name are the attributes of entity set “Student”.
Student ID is the primary key using which any student can be identified uniquely.
Course ID and Course name are the attributes of entity set “Course”.
Course ID is the primary key using which any course can be identified uniquely.
Double line between Student and relationship set signifies total participation.
It suggests that each student must be enrolled in at least one course.
Single line between Course and relationship set signifies partial participation.
It suggests that there might exist some courses for which no enrollments are made.
Weak Entity Set-
A weak entity set is an entity set that does not contain sufficient attributes to
uniquely identify its entities.
In other words, a primary key does not exist for a weak entity set.
However, it contains a partial key called as a discriminator.
Discriminator can identify a group of entities from the entity set.
Discriminator is represented by underlining with a dashed line.
NOTE-
The combination of discriminator and primary key of the strong entity set makes it
possible to uniquely identify all entities of the weak entity set.
Thus, this combination serves as a primary key for the weak entity set.
Clearly, this primary key is not formed by the weak entity set completely.
Symbols Used-
Example-
In this ER diagram,
One strong entity set “Building” and one weak entity set “Apartment” are related to each
other.
Strong entity set “Building” has building number as its primary key.
Door number is the discriminator of the weak entity set “Apartment”.
This is because door number alone can not identify an apartment uniquely as there may be
several other buildings having the same door number.
Double line between Apartment and relationship set signifies total participation.
It suggests that each apartment must be present in at least one building.
Single line between Building and relationship set signifies partial participation.
It suggests that there might exist some buildings which has no apartment.
Thus,
Primary key of Apartment = Primary key of Building + Its own discriminator= Building
number + Door number
Differences between Strong entity set and Weak entity set-
Strong entity set Weak entity set
A single rectangle is used for the A double rectangle is used for the
representation of a strong entity set. representation of a weak entity set.
A diamond symbol is used for the A double diamond symbol is used for the
representation of the relationship that representation of the identifying relationship
exists between the two strong entity that exists between the strong and weak
sets. entity set.
Total participation may or may not Total participation always exists in the
exist in the relationship. identifying relationship.
Note-
In ER diagram, weak entity set is always present in total participation with the identifying
relationship set.
Relationship Set
Relationship Example:
‘Enrolled in’ is a relationship that exists between entities Student and Course.
Relationship Set:
A relationship set is a set of relationships of same type.
The number of entity sets that participate in a relationship set is termed as the degree of that
relationship set. Thus,
On the basis of degree of a relationship set, a relationship set can be classified into the
following types-
Unary relationship set is a relationship set where only one entity set participates in a
relationship set.
Example:
Binary relationship set is a relationship set where two entity sets participate in a relationship
set.
Example:
Ternary relationship set is a relationship set where three entity sets participate in a relationship
set.
Example:
N-ary relationship set is a relationship set where ‘n’ entity sets participate in a relationship set.
Specialization:
In specialization, an entity is divided into sub-entities based on their characteristics. It is a
top-down approach where higher level entity is specialized into two or more lower level
entities. For Example, EMPLOYEE entity in an Employee management system can be
specialized into DEVELOPER, TESTER etc. as shown in Figure 2. In this case, common
attributes like E_NAME, E_SAL etc. become part of higher entity (EMPLOYEE) and
specialized attributes like TES_TYPE become part of specialized entity (TESTER).
Aggregation –
An ER diagram is not capable of representing relationship between an entity and a
relationship which may be required in some scenarios. In those cases, a relationship with its
corresponding entities is aggregated into a higher level entity. Aggregation is an abstraction
through which we can represent relationships as higher level entity sets.
For Example, Employee working for a project may require some machinery. So, REQUIRE
relationship is needed between relationship WORKS_FOR and entity MACHINERY. Using
aggregation, WORKS_FOR relationship with its entities EMPLOYEE and PROJECT is
aggregated into single entity and relationship REQUIRE is created between aggregated
entity and MACHINERY.