0% found this document useful (0 votes)
17 views

Dbms 123

Uploaded by

rajsingania442
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views

Dbms 123

Uploaded by

rajsingania442
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 46

UNIT-I

Data and Information:


Data: Data is a collection of raw facts without meaning. Data is raw, unorganized facts that need to be
processed. Data can be something simple and random and useless until it is organized.

For example, a list of dates — data — is meaningless without the information that makes the date‘s
relevant (dates of holiday).

Information:
Information is a collection of raw facts with meaning. When data is processed, organized,
structured or presented in a given context, so as to make it useful, it is called information. Data are raw facts
that constitute building block of information. Data are the heart of the DBMS. It is to be noted that all the data
will not convey useful information. Useful information is obtained from processed data. In other words, data
has to be interpreted in order to obtain information.

Good, timely, relevant information is the key to decision making. Good decision making is the key to
organizational survival. Data are a representation of facts, concepts or instructions in a formalized manner
suitable for communication, interpretation or processing by humans or automatic.

The data in DBMS can be broadly classified into two types, one is the collection of information needed
by the organization and the other is ―metadata‖ which is the information about the database. Data are
the most stable part of an organization‘s information system. A company needs to save information about
employees, departments and salaries. These pieces of information are called data. Permanent storage of data
is referred to as persistent data. Generally, we perform operations on data or data items to supply some
information about an entity.

For example library keeps a list of members,


books, due dates, and fines.

Data and information are interrelated. Data


usually refers to raw data, or unprocessed data. It is
the basic form of data, data that hasn‘t been analyzed
or processed in any manner. Once the data is
analyzed, it is considered as information. Information
is "knowledge communicated or received concerning
a particular fact or circumstance." Information is a
sequence of symbols that can be interpreted as a
message. It provides knowledge or insight about a
certain matter.
"Data" and "information" are tied together, whether
one is recognizing them as two separate words or using them interchangeably, as is common today.
Whether they are used interchangeably depends somewhat on the usage of "data" - its context
and grammar.

1
Database: It can be defined as collection of large amount of data stored at one place (or) it is
collection of logically related data that can be recorded.

A database is a well-organized collection of data that are related in a meaningful way, which can be
accessed in different logical orders. Database
systems are systems in which the interpretation and
storage of information are of primary importance.

The database should contain all the


data needed by the organization as a result, a huge
volume of data, the need for long-term storage of
the data and accesses of the data by a large number
of users generally characterize database systems. It
is clear that several users can access the data in an
organization still the integrity of the data should be
maintained. A database is integrated when same
information is not recorded in two places.

Database Management Systems(DBMS): DBMS is generally defined as collection of


logically related data and a set of programs to access the data.

(OR)

A data base management system refers to be as set of programs (software) for defining creating
manipulating and maintenance of the data base.

The DBMS hides much of the data bases internal complexity from the application programmers and
users.

Objectives of DBMS:
Mass Storage: DBMS can store a lot of data in it. So for all the big firms, DBMS is really ideal technology
to use. It can store thousands of records in it and one can fetch all that data whenever it is needed.

Removes Duplicity: If you have lots of data then data duplicity will occur for sure at any instance. DBMS
guarantee it that there will be no data duplicity among all the records. While storing new records, DBMS
makes sure that same data was not inserted before.

Multiple Users Access: No one handles the whole database alone. There are lots of users who are able to
access database. So this situation may happen that two or more users are accessing database. They can
change whatever they want, at that time DBMS makes it sure that they can work concurrently.

Data Protection: Information such as bank details, employee‘s salary details and sale purchase details
should always be kept secured. Also all the companies need their data secured from unauthorized use.
DBMS gives a master level security to their data. No one can alter or modify the information without the
privilege of using that data. 2

2
Data Backup and recovery: Sometimes database failure occurs so there is no option like one can say that
all the data has been lost. There should be a backup of database so that on database failure it can be
recovered. DBMS has the ability to backup and recover all the data in database.

Everyone can work on DBMS: There is no need to be a master of programming language if you want to
work on DBMS. Any accountant who is having less technical knowledge can work on DBMS. All the
definitions and descriptions are given in it so that even a non-technical background person can work on it.

Integrity: Integrity means your data is authentic and consistent. DBMS has various validity checks that
make your data completely accurate and consistence.

Platform Independent: DBMS can run at any platform. No particular platform is required to work on
database management system.

Evolution of Database Management Systems (DBMS):

The evolution of the data base management system is as follows


1960’s: Previous of database, Data was maintained in a flat file.

File based system was the previous management system of the present database management system,
at that time, there was no system available to handle the large volumes of data.

Flat Files: Earlier, punched cards technology was used to store data – later, files. But the files have no as
such advantage, rather have several limitations.

The Network database was developed to fulfill the need to open more complex, data relationships

Advantages: Various access methods,

Ex.: Sequential, Indexed, and Random


3
Limitations:

Requires extensive programming in third-generation language such as COBOL, BASIC

In Network data model, files are related as owners and members, similar to the common network
model except that each member file can have more than one owner.

Network data model identified the following three database components:

1. Network schema - database organization [structure].


2. Sub-schema - view s of database per user.
3. Data management language - at low level, procedural.

1970’s:
1. Relational Data Model
2. Relational Database Implementation

The draw backs of network data model create minimal data Independence, minimal theoretical
foundation and complex data access.

To overcome these drawbacks, in 1970, CODD published a relational data model and relational data
base management system Implementation.

In 1976, Peter Chen presented and Entity Relationship model (ER model), which is widely used in data
base design.

[1970-Present] Era of Relational Database and Database Management:

The relational database model was conceived by E. F. CODD in 1970. It can be defined using the
following two terminologies:

1. Instance – A table with rows or columns.


2. Schema – Specifies the structure (name of relation, name and type of each column).

This model is based on branches of mathematics called set theory and predicate logic.

1980’s: Advanced Database System:

1. RDBMS, Advanced Data Models.


2. Application Oriented DBMS.

In 1980, IBM released 2 commercial relational data base MGT System known as DB2 and SQL/PS.SQL
(structured Query language) was developed by IBM (International Business Machines).
1985: Object-oriented DBMS (OODBMS) develops.

1990’s: Advance Data Analysis:


1. Data Mining.
2. Data Ware Housing.
3. Multimedia Databases and Web Enabled Database.
4
In 1990‘s, there are two approaches to DBMS are more popular which are Object Oriented DBMS and
Object Relational DBMS (ORDBMS)

1990s: In Corporation of object-orientation in relational DBMSs, new application areas, such as data
warehousing and OLAP, web and Internet, Interest in text and multimedia, enterprise resource
planning (ERP) and management resource planning (MRP)

1991: Microsoft access, a personal DBMS created as element of Windows gradually supplied all other
personal DBMS products.

1995: First Internet database applications.

1997: XML applied to database processing, which solves long-standing database problems. Major vendors
begin to integrate XML into DBMS products.

2000’s:

1. Data Mining and its applications.


2. Web technology and Web enabled database.

In 2000‘s IBM, Oracle, Informix and other developed powerful DBMS for handling large data base.

Web enabled database simply put a database with a web-based interface.


Advantages:
 A web enabled database allows users to get the information they need from a central repository on
demand.
 The database is easy and simple to use.
 The data accessibility is easy via web-enabled database.
Disadvantages:
 Main disadvantage is that it can be hacked easily.
 Web enabled databases support the full range of DB operations, but in order to make them easy to
use, they must be "dumped down".

Types of Databases:
A DBMS supports many different types of databases.

1. Classification based on the Data model


2. Classification according to Number of Users
3. Classification according to Location
4. Classification based on Usage.
5. Classification based on Structuring.
6. Classification based on the Cost.
7. Classification Based on the Access.

5
DBMS:
DBMS is generally defined as collection of logically related data and a set of programs to access the
data. Or
A data base management system refers to be as set of programs (software) for defining creating
manipulating and maintains of the data base

The DBMS hides much of the data bases internal complexity from the application programmers and users.
Advantages of DBMS:
There are many advantages of data base management system.

Some of the advantages are:

1. Improved data sharing


2. Improved data security
3. Better data integration
4. Minimized data inconsistency
5. Improved data access
6. Improved decision making
7. Increased end user productivity
8. Minimized data redundancy
9. Concurrent access
10. Reduced application development time
11. Data independence
12. Centralized data management
13. Data base administration

1. Improved data sharing: The DBMS helps create an environment in which the data sharing is
improved because of the data is stored logically.

2. Improved data security: A DBMS provides a framework for better enforcement of data privacy
and security policies. A DBMS protect the data more security form UN authorized users.

3. Better data integrated: A DBMS provides better data integration in the data base. It promotes an
integrated view of the entire organization.

4. Minimized data inconsistency: Data inconsistency exists when different versions of the same data
appear in different places. In this users have an ambiguity position. The DBMS helps to reduce data
inconsistency.

5. Improved data access: The DBMS makes it possible to produce quick answers to ad hoc
(different) queries. DBMS utilize a variety of techniques to store and retrieve the data efficiently.

6. Improved decision making: Improved data access make it possible to generate better quality
information, on which better decision are based.

7. Increased end user productivity: The availability of data combined with the tools that transform
data info usable information. That can make difference between success and failure in the global
economy. This will empower end users to make quickly and information decision.

12
8. Minimized data redundancy: Since the whole data present in one central data base. Hence the
same data present in one file need not to be duplicated in another, it will causes data redundancy. The
DBMS helps to reduce data redundancy.

9. Concurrent access: DBMS schedules concurrent access to the data from the data base. i.e. the
same data can be accessed by multiple users at the same time. The DBMS also protect the user from
the system failures.

10. Reduced application development time: The DBMS supports important functions that are
common to many applications. The DBMS reducing the development time of application.

11. Data independence: The data independence means that programs are isolated from changes in the
way the data are structured (or) in stored. In this the data is stored in centralized manner then the data
is independent of the data. In DBMS, there are two data independences, one is physical data
independence and another one is logical data independence.
12. Centralized data management: In DBMS, all files are integrated into one system. Thus reducing
redundancy and making data management is more efficient.

13. Data base administration: In DBMS, the data can be stored in centralized manner we organizing
or administration of the data base is called data base administrator. He maintains the entire
operations of the data base are one of the responsible people of the entire data base operation.

Disadvantages of DBMS:
In data base approach, to maintaining and developing a data base, we should take risk
(problem) and we should invest money, time and environment.

In DBMS, we have number of disadvantages, they are

 New skilled person: For implementing data base applications and manage a staff of new people
because there is a rapid change in the technology. So we required newly skilled person for a
developing for data base application, this will lead the organization maintaining the currency it is a
main drawback.

 Frequent upgrades and replacement cycle: DBMS vendors (users) frequently upgrade their
products by adding new functionality. Such new features often come new upgrade versions of the
software. Some of these versions required hardware of the upgrade.

 Cost of hardware & software: For running DBMS software, a processor can have high speed and
large size of memory, it means that you have to upgrade the as well as software. This process is very
costly.

 Cost of data conversion: It is very difficult and costly method to convert data of file into data base.
You have to hiring data base system designers along with application programmers. So will
maintaining the staff, we require a lot of money has to be paid for developing a DBMS.

13
Database System Components (OR) Database System Environment:
The database system is composed of the five major parts.
1. Hardware: Hardware refers to all the systems physical devices.
2. Software: There are three type of software are needed
1. Operating system software
2. DBMS software.
3. Application programs and utility software.
3. People: This component includes all users of the database system.
Database administrators: DBMS manages the DBMS and ensure that the database is functioning properly.
Database designers: These people design the database structure.

Database Manager: Database manager is a program module which provides the interface between the low
level data stored in the database and the application programs and queries submitted to the system:
The database manager would translate DML statement into low level file system commands for storing,
retrieving, and updating data in the database.
System Analysis and Programmers: They design and implement the application programs.
End users: These are the people who use the application programs to run the organization daily
operations. Database users are the people who need information from the database to carry out their business
responsibility. The database users can be broadly classified into two categories like application programmers
and end users.
4. Procedures: Procedures are the instructions and rules that govern the design and use of the database
system.
Procedures are the rules that govern the design and the use of database. The procedure may contain
information on how to log on to the DBMS, start and stop the DBMS, procedure on how to identify the
failed component, how to recover the database, change the structure of the table, and improve the
performance.
18
5. Data: The word data covers the collection of facts stored in the database. Because data are the raw
material from which information is generated.
A database is a repository for data which, in general, is both integrated and shared. Integration means
that the database may be thought of as unifications of several otherwise distinct files, with any redundancy
among those files partially or wholly eliminated. The sharing of a database refers to the sharing of data by
different users, in the sense that each of those users may have access to the same piece of data and may use
it for different purposes. Any given user will normally be concerned with only a subset of the whole
database. The main features of the data in the database are listed later:
 The data in the database is well organized (structured)
 The data in the database is related
 The data are accessible in different orders without great difficulty
 The data in the database is persistent, integrated, structured, and shared.
Integrated Data:
A data can be considered to be a unification of several distinct data files and when any redundancy among
those files is eliminated, the data are said to be integrated data.

Shared Data:
A database contains data that can be shared by different users for different application simultaneously. It is
important to note that in this way of sharing of data, the redundancy of data is reduced, since repetitions are
avoided, the possibility of inconsistencies is reduced.

Metadata:
The information (data) about the data in a database is called Metadata. The Metadata are available for query
and manipulation, just as other data in the database.

Database Architecture:
Database architecture essentially describes the location of all the pieces of information that make up
the database application. The database architecture can be broadly classified into two-tier, three-tier and
multitier architecture. The design of a DBMS depends on its architecture. It can be centralized or
decentralized or hierarchical.
The architecture of a DBMS can be seen as either single tier or multi-tier. N-tier architecture divides
the whole system into related but independent n modules, which can be independently modified, altered,
changed or replaced.

19
1-tier architecture:
In 1-tier architecture, the DBMS is the only entity where the user directly sits on the DBMS and uses
it. Any changes done here will directly be done on the DBMS itself. It does not provide handy tools for end-
users. Database designers and programmers normally prefer to use single-tier architecture.
Two-Tier Architecture:
If the architecture of DBMS is 2-tier, then it must have an application through which the DBMS can
be accessed. Programmers use 2-tier architecture where they access the DBMS by means of an application.
Here the application tier is entirely independent of the database in terms of operation, design, and
programming.

The two-tier architecture is a client–server architecture in which the client contains the presentation
code and the SQL statements for data access. The database server processes the SQL statements and sends
query results back to the client. Two-tier client/server provides a basic separation of tasks.
The client or first tier, is primarily responsible for the presentation of data to the user and the
“server,” or second tier, is primarily responsible for supplying data services to the client

20
Presentation Services:
―Presentation services‖ refers to the portion of the application which presents data to the user.
Business Services:
―Business services‖ are a category of application services. Business services encapsulate an
organizations business processes and requirements.
Application Services
―Application services‖ provide other functions necessary for the application.
Data Services:
―Data services‖ provide access to data independent of their location. The data can come from legacy
mainframe, SQL RDBMS, or proprietary data access systems. Once again, the data services provide a
standard interface for accessing data.
Advantages :
 The two-tier architecture is a good approach for systems with stable requirements and a moderate
number of clients.
 The two-tier architecture is the simplest to implement, due to the number of good commercial
development environments.
Drawbacks:
 Software maintenance can be difficult because PC clients contain a mixture of presentation,
validation, and business logic code.
 To make a significant change in the business logic, code must be modified on many PC clients.
Three-tier Architecture:
3-tier architecture separates its tiers from each other based on the complexity of the users and how
they use the data present in the database. It is the most widely used architecture to design a DBMS.

21
 Presentation layer / User layer is the layer where user uses the database. He does not have any
knowledge about underlying database. He simply interacts with the database as though he has all
data in front of him. You can imagine this layer as a registration form where you will be inputting
your details. Did you ever guessed, after pressing ‗submit‘ button where the data goes? No right?
You just know that your details are saved. This is the presentation layer where all the details from the
user are taken, sent to the next layer for processing.
 Application layer is the underlying program which is responsible for saving the details that you
have entered, and retrieving your details to show up in the page. This layer has all the business logics
like validation, calculations and manipulations of data, and then sends the requests to database to get
the actual data. If this layer sees that the request is invalid, it sends back the message to presentation
layer. It will not hit the database layer at all.
 Data layer or Database layer is the layer where actual database resides. In this layer, all the tables,
their mappings and the actual data present. When you save you details from the front end, it will be
inserted into the respective tables in the database layer, by using the programs in the application
layer. When you want to view your details in the web browser, a request is sent to database layer by
application layer. The database layer fires queries and gets the data. These data are then transferred
to the browser (presentation layer) by the programs in the application layer.
Advantages of 3-tier architecture:
 Easy to maintain and modify. Any changes requested will not affect any other data in the database.
Application layer will do all the validations.
 Improved security. Since there is no direct access to the database, data security is increased. There is
no fear of mishandling the data. Application layer filters out all the malicious actions.
 Good performance. Since this architecture cache the data once retrieved, there is no need to hit the
database for each request. This reduces the time consumed for multiple requests and hence enables
the system to respond at the same time.
Disadvantages 3-tier Architecture
Disadvantages of 3-tier architecture are that it is little more complex and little more effort is required in
terms of hitting the database.
22
Multi tier Architecture:
A multi-tier, three-tier, or N -tier implementation employs a three-tier logical architecture super
imposed on a distributed physical model. Application Servers can access other application servers in order to
supply services to the client application as well as to other Application Servers.
The multiple-tier architecture is the most general client–server architecture. It can be most difficult to
implement because of its generality. However, a good design and implementation of multiple-tier
architecture can provide the most benefits in terms of scalability, interoperability, and flexibility.

The client application looks to Application Server-1 to supply data from a mainframe-based application.
Application Server-1 has no direct access to the mainframe application, but it does know, through the
development of application services, that Application Server-2 provides a service to access the data from the
main-frame application which satisfies the client request. Application Server -1 then invokes the appropriate
service on Application Server -2 and receives the requested data which is then passed on to the client.

23
UNIT I

2 Entity-Relationship Model:
2.1 Introduction:

Peter Chen first proposed modelling databases using a graphical technique that
humans can relate to easily. Humans can easily perceive entities and their characteristics
in the real world and represent any relationship with one another. Entity–Relationship (ER)
model gives the conceptual model of the world to be represented in the database. The
main motivation for defining the ER model is to provide a high level model for conceptual
database design, which acts as an intermediate stage prior to mapping the enterprise
being modelled onto a conceptual level.
The ER model achieves a high degree of data independence which means that the
database designer do not have to worry about the physical structure of the database. A
database schema in ER model can be pictorially represented by Entity–Relationship
diagram.

2.2 The Building Blocks of an Entity–Relationship Diagram


The basic building blocks of Entity- Relationship diagram are Entity, Attribute and
Relationship.
Entity
An entity can be a real-world object, either animate or inanimate, that can be easily
identifiable. For example, in a school database, students, teachers, classes, and courses
offered can be considered as entities.
Entity Type
An entity type or entity set is a collection of similar entities. Some examples of entity types
are:
– All students in PSG, say STUDENT.
– All courses in PSG, say COURSE.
– All departments in PSG, say DEPARTMENT.
Relationship
The association among entities is called a relationship. For example, an employee
works_at a department, a student enrolls in a course. Here, Works_at and Enrolls are
called relationships.
Attributes
Attributes are properties of entity types. In other words, entities are described in a
database by a set of attributes. The following are example of attributes:
– Brand, cost, and weight are the attributes of CELLPHONE.
– Roll number, name, and grade are the attributes of STUDENT.
ER Diagram
The ER diagram is used to represent database schema. In ER diagram:
– A rectangle represents an entity set.
– An ellipse represents an attribute.
– A diamond represents a relationship.

1
– Lines represent linking of attributes to entity sets and of entity sets to
relationship sets.

Example of ER diagram
In the ER diagram the two entities are STUDENT and CLASS. Two simple attributes which
are associated with the STUDENT are Roll number and the name. The attributes associated
with the entity CLASS are Subject Name and Hall Number. The relationship between the
two entities STUDENT and CLASS is Attends.

2.3 Classification of Entity Sets


Entity sets can be broadly classified into:
1. Strong entity.
2. Weak entity.
3. Associative entity.

Strong Entity
Strong entity is one whose existence does not depend on other entity.

Example
Consider the example, student takes course. Here student is a strong entity.

In this example, course is considered as weak entity because, if there are no students to
take a particular course, then that course cannot be offered. The COURSE entity depends
on the STUDENT entity.
Weak Entity
Weak entity is one whose existence depends on other entity.

2
Example
Consider the example, customer borrows loan. Here loan is a weak entity. For every loan,
there should be at least one customer. Here the entity loan depends on the entity customer
hence loan is a weak entity.

2.4 Attribute Classification


Attribute is used to describe the properties of the entity. This attribute can be broadly
classified based on value and structure. Based on value the attribute can be classified into
single value, multivalue, derived, and null value attribute. Based on structure, the attribute
can be classified as simple and composite attribute.

Symbols Used in ER Diagram


The elements in ER diagram are Entity, Attribute, and Relationship. The different types
of entities like strong, weak, and associative entity, different types of attributes like
multivalued and derived attributes and identifying relationship and their corresponding
symbols are shown later.

Single Value Attribute


Single value attribute means, there is only one value associated with that attribute.
Example
The examples of single value attribute are age of a person, Roll number of the student,
Registration number of a car, etc.
Multivalued Attribute
In the case of multivalue attribute, more than one value will be associated with that
attribute. For example, a person can have more than one phone number, email_address,
etc.

3
Examples of Multivalued Attribute

Derived Attribute
Derived attributes are the attributes that do not exist in the physical database, but their
values are derived from other attributes present in the database. For example,
average_salary in a department should not be saved directly in the database, instead it
can be derived. For another example, age can be derived from data_of_birth.
Example of Derived Attribute

Null Value Attribute


In some cases, a particular entity may not have any applicable value for an attribute. For
such situation, a special value called null value is created.

Example
In application forms, there is one column called phone no. if a person do not have phone
then a null value is entered in that column.

Simple attribute − Simple attributes are atomic values, which cannot be divided
further. For example, a student's phone number is an atomic value of 10 digits

Composite attribute − Composite attributes are made of more than one simple ttribute.
For example, a student's complete name may have first_name and last_name.

Consider the attribute “address” which can be further subdivided into Street name, City,
and State.

2.5 Relationship Degree

Relationship degree refers to the number of associated entities. The relationship degree
can be broadly classified into unary, binary, and ternary relationship.
2.5.1 Unary Relationship
The unary relationship is otherwise known as recursive relationship. In the unary
relationship the number of associated entity is one. An entity related to itself is known as
recursive relationship.

4
Roles and Recursive Relation

When an entity sets appear in more than one relationship, it is useful to add labels to
connecting lines. These labels are called as roles.
Example
In this example, Husband and wife are referred as roles.

2.5.2 Binary Relationship


In a binary relationship, two entities are involved. Consider the example; each staff will
be assigned to a particular department. Here the two entities are STAFF and DEPARTMENT.
2.5.3 Ternary Relationship

In a ternary relationship, three entities are simultaneously involved. Ternaryrelationships


are required when binary relationships are not sufficient to accurately describe the
semantics of an association among three entities.
Example
Consider the example of employee assigned a project. Here we are considering three
entities EMPLOYEE, PROJECT, and LOCATION. The relationship is “assigned-to.” Many
employees will be assigned to one project hence it is an example of one-to-many
relationship.

2.5.4 Quaternary Relationships


Quaternary relationships involve four entities. The example of quaternary relationship is
“ A professor teaches a course to students using slides. ” Here the four entities are
PROFESSOR, SLIDES, COURSE, and STUDENT. The relationships between the entities are

5
“Teaches.”
2.6 Relationship Classification
Relationship is an association among one or more entities. This relationship can be broadly
classified into one-to-one relation, one-to-many relation, many-to- many relation and
recursive relation.
2.6.1 One-to-Many Relationship Type
The relationship that associates one entity to more than one entity is called one-to-many
relationship. Example of one-to-many relationship is Country having states. For one
country there can be more than one state hence it is an example of one-to-many
relationship. Another example of one-to-many relationship is parent–child relationship. For
one parent there can be more than one child. Hence it is an example of one-to-many
relationship.
2.6.2 One-to-One Relationship Type
One-to-one relationship is a special case of one-to-many relationship. True one-to-one
relationship is rare. The relationship between the President and the country is anexample
of one-to-one relationship. For a particular country there will be only one President. In
general, a country will not have more than one President hence the relationship between
the country and the President is an example of one-to-one relationship. Another example
of one-to-one relationship is House to Location. A house is obviously in only one location.
2.6.3 Many-to-Many Relationship Type
The relationship between EMPLOYEE entity and PROJECT entity is an example of many-
to-many relationship. Many employees will be working in many projects hence the
relationship between employee and project is many-to- many relationship.
2.6.4 Many-to-One Relationship Type
The relationship between EMPLOYEE and DEPARTMENT is an example of many-to-one
relationship. There may be many EMPLOYEES working in one DEPARTMENT. Hence
relationship between EMPLOYEE and DEPARTMENT is many-to-one relationship. The four
relationship types are summarized and shown in Table 2.1.

2.7 Reducing ER Diagram to Tables


To implement the database, it is necessary to use the relational model. There is a simple
way of mapping from ER model to the relational model. There is almost one-to-one
correspondence between ER constructs and the relational ones.
2.7.1 Mapping Algorithm
The mapping algorithm gives the procedure to map ER diagram to tables.
The rules in mapping algorithm are given as:
While determining the minimum number of tables required for binary relationships with
given cardinality ratios, keep these thumb rules in your mind-
 For binary relationship with cardinality ration m : n , separate and individual tables
will be drawn for each entity set and relationship.
 For binary relationship with cardinality ratio either m : 1 or 1 : n , always remember

6
“many side will consume the relationship” i.e. a combined table will be drawn for
many side entity set and relationship set.
 For binary relationship with cardinality ratio 1 : 1 , two tables will be required. You
can combine the relationship set with any one of the entity sets.

Regular Entity
Regular entities are entities that have an independent existence and generally represent
real-world objects such as persons and products. Regular entities are represented by
rectangles with a single line.
2.7.2 Mapping Regular Entities
– Each regular entity type in an ER diagram is transformed into a relation. The name
given to the relation is generally the same as the entity type.
– Each simple attribute of the entity type becomes an attribute of the relation.
– The identifier of the entity type becomes the primary key of the corresponding relation.
Example 1
Mapping regular entity type tennis player

This diagram is converted into corresponding table as

Here,
– Entity name = Name of the relation or table.
In our example, the entity name is PLAYER which is the name of the table
– Attributes of ER diagram=Column name of the table.
In our example the Name, Nation, Position, and Number of Grand slams won which
forms the column of the table.
2.7.3 Converting Composite Attribute in an ER Diagram to Tables
When a regular entity type has a composite attribute, only the simple component
attributes of the composite attribute are included in the relation.
Example
In this example the composite attribute is the Customer address, which consists of
Street, City, State, and Zip.

7
When the regular entity type contains a multi-valued attribute, two new relations are
created.
The first relation contains all of the attributes of the entity type except the multi-valued
attribute.
The second relation contains two attributes that form the primary key of the second
relation. The first of these attributes is the primary key from the first relation, which
becomes a foreign key in the second relation. The second is the multi-valued attribute.

2.7.4 Mapping Multi-valued Attributes in ER Diagram to Tables


A multi-valued attribute is having more than one value. One way to map a multi-valued
attribute is to create two tables.
Example
In this example, the skill associated with the EMPLOYEE is a multi-valued attribute, since
an EMPLOYEE can have more than one skill as fitter, electrician, turner, etc.

2.7.5 Converting “Weak Entities” in ER Diagram to Tables


Weak entity type does not have an independent existence and it exists only through an
identifying relationship with another entity type called the owner.
For each weak entity type, create a new relation and include all of the simple attributes as
attributes of the relation. Then include the primary key of the identifying relation as a
foreign key attribute to this new relation.
The primary key of the new relation is the combination of the primary key of the identifying
and the partial identifier of the weak entity type. In this example DEPENDENTis weak
entity.

The corresponding table is given by

8
2.7.6 Converting Binary Relationship to Table
A relationship which involves two entities can be termed as binary relationship. This
binary relationship can be one-to-one, one-to-many, many-to-one, and many-to-many.
Mapping one-to-Many Relationship
For each 1 – M relationship, first create a relation for each of the two entity type’ s
participation in the relationship.
Example
One customer can give many orders. Hence the relationship between the two entities
CUSTOMER and ORDER is one-to-many relationship. In one-to-many relationship, include
the primary key attribute of the entity on the one-side of the relationship as a foreign key
in the relation that is on the many side of the relationship.

Here we have two entities CUSTOMER and ORDER. The relationship between CUSTOMER
and ORDER is one-to-many. For two entities CUSTOMER and ORDER, two tables namely
CUSTOMER and ORDER are created as shown later. The primary key CUSTOMER ID in the
CUSTOMER relation becomes the foreign key in the ORDER relation.
CUSTOMER

Binary one-to-one relationship can be viewed as a special case of one-to-many


relationships.
The process of mapping one-to-one relationship requires two steps. First, two relations are
created, one for each of the participating entity types. Second, the primary key of one of
the relations is included as a foreign key in the other relation.

2.7.7 Mapping Associative Entity to Tables


Many-to-many relationship can be modeled as an associative entity in the ER diagram.

Example 1. (Without Identifier)


Here the associative entity is ORDERLINE, which is without an identifier. That is the
associative entity ORDERLINE is without any key attribute.

9
The first step is to create three relations, one for each of the two participating entity types
and the third for the associative entity. The relation formed from the associative entity is
associative relation.

Example 2. (With Identifier)


Sometimes data models will assign an identifier (surrogate identifier) to the associative
entity type on the ER diagram. There are two reasons to motivate this approach:
1. The associative entity type has a natural identifier that is familiar to end user.
2. The default identifier may not uniquely identify instances of the associative entity.

(a) Shipment-No is a natural identifier to end user.


(b) The default identifier consisting of the combination of Customer-ID and Vendor-ID
does not uniquely identify the instances of SHIPMENT.

2.7.8 Converting Unary Relationship to Tables


Unary relationships are also called recursive relationships. The two most important cases
of unary relationship are one-to-many and many-to-many.
10
One-to-many Unary Relationship
Each employee has exactly one manager. A given employee may manage zero to many
employees. The foreign key in the relation is named Manager-ID. This attribute has the
same domain as the primary key Employee-ID.

2.7.9 Converting Ternary Relationship to Tables


A ternary relationship is a relationship among three entity types. The three entities given
in this example are PATIENT, PHYSICIAN, and TREATMENT. The PATIENT–TREATMENT is
an associative entity.

The primary key attributes – Patient ID, Physician ID, and Treatment Code – become
foreign keys in PATIENT TREATMENT. These attributes are components of the primary key
of PATIENT TREATMENT.

11
UNIT-2
I. Codd’s Rules in DBMS
Codd’s rules are proposed by a computer scientist named Dr. Edgar F. Codd and he also invent the
relational model for database management. These rules are made to ensure data integrity,
consistency, and usability. This set of rules basically signifies the characteristics and requirements of a
relational database management system (RDBMS). In this article, we will learn about various Codd’s
rules.
Rule 1: The Information Rule
All information, whether it is user information or metadata, that is stored in a database must be
entered as a value in a cell of a table. It is said that everything within the database is organized in a
table layout.
Rule 2: The Guaranteed Access Rule
Each data element is guaranteed to be accessible logically with a combination of the table name,
primary key (row value), and attribute name (column value).
Rule 3: Systematic Treatment of NULL Values
Every Null value in a database must be given a systematic and uniform treatment.
Rule 4: Active Online Catalog Rule
The database catalog, which contains metadata about the database, must be stored and accessed
using the same relational database management system.
Rule 5: The Comprehensive Data Sublanguage Rule
A crucial component of any efficient database system is its ability to offer an easily understandable
data manipulation language (DML) that facilitates defining, querying, and modifying information
within the database.
Rule 6: The View Updating Rule
All views that are theoretically updatable must also be updatable by the system.
Rule 7: High-level Insert, Update, and Delete
A successful database system must possess the feature of facilitating high-level insertions, updates,
and deletions that can grant users the ability to conduct these operations with ease through a single
query.
Rule 8: Physical Data Independence
Application programs and activities should remain unaffected when changes are made to the
physical storage structures or methods.
Rule 9: Logical Data Independence
Application programs and activities should remain unaffected when changes are made to the logical
structure of the data, such as adding or modifying tables.
Rule 10: Integrity Independence
Integrity constraints should be specified separately from application programs and stored in the
catalog. They should be automatically enforced by the database system.
Rule 11: Distribution Independence
The distribution of data across multiple locations should be invisible to users, and the database
system should handle the distribution transparently.
Rule 12: Non-Subversion Rule
If the interface of the system is providing access to low-level records, then the interface must not be
able to damage the system and bypass security and integrity constraints.

II. Integrity Constraints


o Integrity constraints are a set of rules. It is used to maintain the quality of information.
o Integrity constraints ensure that the data insertion, updating, and other processes have to be
performed in such a way that data integrity is not affected.
o Thus, integrity constraint is used to guard against accidental damage to the database.
Types of Integrity Constraint
1. Domain constraints
o Domain constraints can be defined as the definition of a valid set of values for an attribute.
o The data type of domain includes string, character, integer, time, date, currency, etc. The
value of the attribute must be available in the corresponding domain.
Example:

2. Entity integrity constraints


o The entity integrity constraint states that primary key value can't be null.
o This is because the primary key value is used to identify individual rows in relation and if the
primary key has a null value, then we can't identify those rows.
o A table can contain a null value other than the primary key field.
Example:

3. Referential Integrity Constraints


o A referential integrity constraint is specified between two tables.
o In the Referential integrity constraints, if a foreign key in Table 1 refers to the Primary Key of
Table 2, then every value of the Foreign Key in Table 1 must be null or be available in Table 2.
Example:

4. Key constraints
o Keys are the entity set that is used to identify an entity within its entity set uniquely.
o An entity set can have multiple keys, but out of which one key will be the primary key. A
primary key can contain a unique and null value in the relational table.
Example:

III. Relational Algebra


Relational algebra is a procedural query language. It gives a step by step process to obtain the
result of the query. It uses operators to perform queries.
Types of Relational operation

1. Select Operation:
ADVERTISEMENT
o The select operation selects tuples that satisfy a given predicate.
o It is denoted by sigma (σ).
1. Notation: σ p(r)
Where:
σ is used for selection prediction
r is used for relation
p is used as a propositional logic formula which may use connectors like: AND OR and NOT. These
relational can use as relational operators like =, ≠, ≥, <, >, ≤.
For example: LOAN Relation

BRANCH_NAME LOAN_NO AMOUNT

Downtown L-17 1000

Redwood L-23 2000

Perryride L-15 1500

Downtown L-14 1500

Mianus L-13 500

Roundhill L-11 900

Perryride L-16 1300


Input:
1. σ BRANCH_NAME="perryride" (LOAN)
Output:

BRANCH_NAME LOAN_NO AMOUNT

Perryride L-15 1500

Perryride L-16 1300


2. Project Operation:
o This operation shows the list of those attributes that we wish to appear in the result. Rest of
the attributes are eliminated from the table.
o It is denoted by ∏.
1. Notation: ∏ A1, A2, An (r)
Where
A1, A2, A3 is used as an attribute name of relation r.
Example: CUSTOMER RELATION

NAME STREET CITY

Jones Main Harrison

Smith North Rye

Hays Main Harrison

Curry North Rye

Johnson Alma Brooklyn

Brooks Senator Brooklyn


Input:
1. ∏ NAME, CITY (CUSTOMER)
Output:

NAME CITY

Jones Harrison

Smith Rye

Hays Harrison

Curry Rye

Johnson Brooklyn

Brooks Brooklyn
3. Union Operation:
o Suppose there are two tuples R and S. The union operation contains all the tuples that are
either in R or S or both in R & S.
o It eliminates the duplicate tuples. It is denoted by ∪.
1. Notation: R ∪ S
A union operation must hold the following condition:
o R and S must have the attribute of the same number.
o Duplicate tuples are eliminated automatically.
Example:
DEPOSITOR RELATION

CUSTOMER_NAME ACCOUNT_NO

Johnson A-101

Smith A-121
Mayes A-321

Turner A-176

Johnson A-273

Jones A-472

Lindsay A-284
BORROW RELATION

CUSTOMER_NAME LOAN_NO

Jones L-17

Smith L-23

Hayes L-15

Jackson L-14

Curry L-93

Smith L-11

Williams L-17
Input:
1. ∏ CUSTOMER_NAME (BORROW) ∪ ∏ CUSTOMER_NAME (DEPOSITOR)
Output:

CUSTOMER_NAME

Johnson

Smith

Hayes

Turner

Jones

Lindsay

Jackson

Curry

Williams

Mayes
4. Set Intersection:
o Suppose there are two tuples R and S. The set intersection operation contains all tuples that
are in both R & S.
o It is denoted by intersection ∩.
1. Notation: R ∩ S
Example: Using the above DEPOSITOR table and BORROW table
Input:
1. ∏ CUSTOMER_NAME (BORROW) ∩ ∏ CUSTOMER_NAME (DEPOSITOR)
Output:

CUSTOMER_NAME

Smith

Jones
5. Set Difference:
o Suppose there are two tuples R and S. The set intersection operation contains all tuples that
are in R but not in S.
o It is denoted by intersection minus (-).
1. Notation: R - S
Example: Using the above DEPOSITOR table and BORROW table
Input:
1. ∏ CUSTOMER_NAME (BORROW) - ∏ CUSTOMER_NAME (DEPOSITOR)
Output:

CUSTOMER_NAME

Jackson

Hayes

Willians

Curry
6. Cartesian product
o The Cartesian product is used to combine each row in one table with each row in the other
table. It is also known as a cross product.
o It is denoted by X.
1. Notation: E X D
Example:
EMPLOYEE

EMP_ID EMP_NAME EMP_DEPT

1 Smith A

2 Harry C

3 John B

DEPARTMENT

DEPT_NO DEPT_NAME

A Marketing

B Sales

C Legal
Input:
1. EMPLOYEE X DEPARTMENT
Output:

EMP_ID EMP_NAME EMP_DEPT DEPT_NO DEPT_NAME

1 Smith A A Marketing

1 Smith A B Sales

1 Smith A C Legal

2 Harry C A Marketing

2 Harry C B Sales

2 Harry C C Legal

3 John B A Marketing

3 John B B Sales

3 John B C Legal
7. Rename Operation:
The rename operation is used to rename the output relation. It is denoted by rho (ρ).
Example: We can use the rename operator to rename STUDENT relation to STUDENT1.
1. ρ(STUDENT1, STUDENT)
IV. Join Operations:
A Join operation combines related tuples from different relations, if and only if a given join
condition is satisfied. It is denoted by ⋈.
Example:
EMPLOYEE

EMP_CODE EMP_NAME

101 Stephan

102 Jack

103 Harry
SALARY

EMP_CODE SALARY

101 50000

102 30000

103 25000
1. Operation: (EMPLOYEE ⋈ SALARY)
Result:

EMP_CODE EMP_NAME SALARY

101 Stephan 50000

102 Jack 30000

103 Harry 25000


Types of Join operations:

1. Natural Join:
o A natural join is the set of tuples of all combinations in R and S that are equal on their
common attribute names.
o It is denoted by ⋈.
Example: Let's use the above EMPLOYEE table and SALARY table:
Input:
1. ∏EMP_NAME, SALARY (EMPLOYEE ⋈ SALARY)
Output:

EMP_NAME SALARY

Stephan 50000

Jack 30000

Harry 25000
2. Outer Join:
The outer join operation is an extension of the join operation. It is used to deal with missing
information.
Example:
EMPLOYEE

EMP_NAME STREET CITY

Ram Civil line Mumbai

Shyam Park street Kolkata

Ravi M.G. Street Delhi

Hari Nehru nagar Hyderabad


FACT_WORKERS

EMP_NAME BRANCH SALARY

Ram Infosys 10000

Shyam Wipro 20000

Kuber HCL 30000

Hari TCS 50000


Input:
1. (EMPLOYEE ⋈ FACT_WORKERS)
Output:

EMP_NAME STREET CITY BRANCH SALARY

Ram Civil line Mumbai Infosys 10000

Shyam Park street Kolkata Wipro 20000

Hari Nehru nagar Hyderabad TCS 50000

An outer join is basically of three types:


a. Left outer join
b. Right outer join
c. Full outer join
a. Left outer join:
o Left outer join contains the set of tuples of all combinations in R and S that are equal on their
common attribute names.
o In the left outer join, tuples in R have no matching tuples in S.
o It is denoted by ⟕.
Example: Using the above EMPLOYEE table and FACT_WORKERS table
Input:
1. EMPLOYEE ⟕ FACT_WORKERS

EMP_NAME STREET CITY BRANCH SALARY

Ram Civil line Mumbai Infosys 10000

Shyam Park street Kolkata Wipro 20000

Hari Nehru street Hyderabad TCS 50000

Ravi M.G. Street Delhi NULL NULL

b. Right outer join:


o Right outer join contains the set of tuples of all combinations in R and S that are equal on
their common attribute names.
o In right outer join, tuples in S have no matching tuples in R.
o It is denoted by ⟖.
Example: Using the above EMPLOYEE table and FACT_WORKERS Relation
Input:
1. EMPLOYEE ⟖ FACT_WORKERS
Output:

EMP_NAME BRANCH SALARY STREET CITY

Ram Infosys 10000 Civil line Mumbai

Shyam Wipro 20000 Park street Kolkata

Hari TCS 50000 Nehru street Hyderabad

Kuber HCL 30000 NULL NULL


c. Full outer join:
o Full outer join is like a left or right join except that it contains all rows from both tables.
o In full outer join, tuples in R that have no matching tuples in S and tuples in S that have no
matching tuples in R in their common attribute name.
o It is denoted by ⟗.
Example: Using the above EMPLOYEE table and FACT_WORKERS table
Input:
1. EMPLOYEE ⟗ FACT_WORKERS
Output:

EMP_NAME STREET CITY BRANCH SALARY

Ram Civil line Mumbai Infosys 10000

Shyam Park street Kolkata Wipro 20000

Hari Nehru street Hyderabad TCS 50000

Ravi M.G. Street Delhi NULL NULL

Kuber NULL NULL HCL 30000

3. Equi join:
It is also known as an inner join. It is the most common join. It is based on matched data as per the
equality condition. The equi join uses the comparison operator(=).
Example:
CUSTOMER RELATION

CLASS_ID NAME

1 John

2 Harry

3 Jackson
PRODUCT

PRODUCT_ID CITY

1 Delhi

2 Mumbai

3 Noida

Input:
1. CUSTOMER ⋈ PRODUCT
Output:

CLASS_ID NAME PRODUCT_ID CITY

1 John 1 Delhi

2 Harry 2 Mumbai

3 Harry 3 Noida

V. Relational Calculus
There is an alternate way of formulating queries known as Relational Calculus. Relational calculus is a non-
procedural query language. In the non-procedural query language, the user is concerned with the details
of how to obtain the end results. The relational calculus tells what to do but never explains how to do.
Most commercial relational languages are based on aspects of relational calculus including SQL-QBE and
QUEL.
Why it is called Relational Calculus?
It is based on Predicate calculus, a name derived from branch of symbolic language. A predicate is a truth-
valued function with arguments. On substituting values for the arguments, the function result in an
expression called a proposition. It can be either true or false. It is a tailored version of a subset of the
Predicate Calculus to communicate with the relational database.
Many of the calculus expressions involves the use of Quantifiers.
There are two types of quantifiers:
o Universal Quantifiers: The universal quantifier denoted by ∀ is read as for all which means
that in a given set of tuples exactly all tuples satisfy a given condition.
o Existential Quantifiers: The existential quantifier denoted by ∃ is read as for all which means
that in a given set of tuples there is at least one occurrences whose value satisfy a given condition.
Before using the concept of quantifiers in formulas, we need to know the concept of Free and Bound
Variables.
A tuple variable t is bound if it is quantified which means that if it appears in any occurrences a variable
that is not bound is said to be free.
Free and bound variables may be compared with global and local variable of programming languages.
Types of Relational calculus:
1. Tuple Relational Calculus (TRC)
It is a non-procedural query language which is based on finding a number of tuple variables also known
as range variable for which predicate holds true. It describes the desired information without giving a
specific procedure for obtaining that information. The tuple relational calculus is specified to select the
tuples in a relation. In TRC, filtering variable uses the tuples of a relation. The result of the relation can
have one or more tuples.
Notation:
A Query in the tuple relational calculus is expressed as following notation
1. {T | P (T)} or {T | Condition (T)}
Where
T is the resulting tuples
P(T) is the condition used to fetch T.
For example:
1. { T.name | Author(T) AND T.article = 'database' }
Output: This query selects the tuples from the AUTHOR relation. It returns a tuple with 'name' from
Author who has written an article on 'database'.
TRC (tuple relation calculus) can be quantified. In TRC, we can use Existential (∃) and Universal
Quantifiers (∀).
For example:
1. { R| ∃T ∈ Authors(T.article='database' AND R.name=T.name)}
Output: This query will yield the same result as the previous one.
2. Domain Relational Calculus (DRC)
The second form of relation is known as Domain relational calculus. In domain relational calculus, filtering
variable uses the domain of attributes. Domain relational calculus uses the same operators as tuple
calculus. It uses logical connectives ∧ (and), ∨ (or) and ┓ (not). It uses Existential (∃) and Universal
Quantifiers (∀) to bind the variable. The QBE or Query by example is a query language related to domain
relational calculus.
Notation:
1. { a1, a2, a3, ..., an | P (a1, a2, a3, ... ,an)}
Where
a1, a2 are attributes
P stands for formula built by inner attributes
For example:
1. {< article, page, subject > | ∈ javatpoint ∧ subject = 'database'}
Output: This query will yield the article, page, and subject from the relational javatpoint, where the
subject is a database.
VI. Query By Example (QBE)
QBE stands for Query By Example and it was developed in 1970 by Moshe Zloof at IBM.
It is a graphical query language where we get a user interface and then we fill some required fields to
get our proper result.
In SQL we will get an error if the query is not correct but in the case of QBE if the query is wrong either
we get a wrong answer or the query will not be going to execute but we will never get any error.
Note-:
In QBE we don’t write complete queries like SQL or other database languages it comes with some blank
so we need to just fill that blanks and we will get our required result.
Example
Consider the example where a table ‘SAC’ is present in the database with Name, Phone_Number, and
Branch fields. And we want to get the name of the SAC-Representative name who belongs to the MCA
Branch. If we write this query in SQL we have to write it like
SELECT NAME
FROM SAC
WHERE BRANCH = 'MCA'"
And definitely, we will get our correct result. But in the case of QBE, it may be done as like there is a
field present and we just need to fill it with “MCA” and then click on the SEARCH button we will get our
required result.
Points about QBE:

 Supported by most of the database programs.


 It is a Graphical Query Language.
 Created in parallel to SQL development.
UNIT-3
NORMALIZATION
1. What is 1NF 2NF and 3NF?
First Normal Form, or 1NF, removes repeated groups from a table to guarantee atomicity. The Second
Normal Form, or 2NF, lessens redundancy by eliminating partial dependencies. In a relational database,
the Third Normal Form, or 3NF, reduces data duplication by removing transitive
dependencies.
2. What are the four 4 types of database normalization?
First Normal Form (1NF), Second Normal Form (2NF), Third Normal Form (3NF), and Boyce-Codd
Normal Form (BCNF) are the four methods of database normalisation. They enhance data integrity in
relational databases by gradually removing redundant data.
3.What are the 3 rules in normalizing database?
Normalization rules in database design include: 1) Eliminate data redundancy by organizing data into
separate tables, 2) Ensure each table has a primary key for unique identification, and 3) Establish
relationships between tables using foreign keys for data integrity.
4.Why do we need normalization in databases?
Redundancy in data occurs when the same piece of information exists in a single database. Database
redundancy can lead to many drawbacks and introduces three anomalies (or abnormalities). These
anomalies are-
Insertion Anomaly
This anomaly occurs when specific data cannot be inserted into the table or database due to the
absence of some other data where both of these are independent of each other.
Deletion Anomaly
While deleting some data, when some critical information is lost that was necessary to maintain the
integrity of data, it is known as a deletion anomaly.
Updation / Modified Anomaly
This type of anomaly occurs when a single data has to be updated, but it demands multiple rows of
data to be updated. This further leads to data inconsistency if one forgets to update the data in some
places.

Due to these anomalies, the storage costs increase as the size of the database increases(because of
redundant data), further increasing the database’s complexity and making it more challenging to
maintain.
To rectify and address these issues, we need to optimize the given database by using the normalization
technique so that no redundant values are present in the database
5.What is DBMS Normalisation?
Normalization in a database is the process in which we organize the given data by minimizing the
redundancy present in a relation. In this, we eliminate the anomalies present, namely - update,
insertion and deletion. Normalization divides the single table into smaller tables and links them using
relationships. The different normal forms help us minimize redundancy in the database table.
To perform normalization in the database, we decompose the table into multiple tables. This process
keeps repeating until we achieve SRP (Single Responsibility Principle). The SRP states that one table
should have one role only.
6.What is Normalization? Explain about 1NF,2NF,3NF and BCNF with suitable examples.

Types of DBMS Normal Form


Normalization in a database is done through a series of normal forms.
Normal Description
Form

1NF If a table has no repeated groups, it is in 1NF.


2NF If a table is in 1NF and every non-key attribute is fully dependent onthe primary key,
then it is in 2NF.
3NF If a table is in 2NF and has no transitive dependencies, it is in 3NF.
BCNF If a table is in 3NF and every non-prime attribute fully dependent onthe candidate keys,
then it is in BCNF.
4NF If a table is in BCNF and has no multi-valued dependencies, it is in4NF.
First Normal Form (1NF)
In 1NF, every database cell or relation contains an atomic value that can’t be further divided, i.e., the
relation shouldn’t have multivalued attributes.
Example:
The following table contains two phone number values for a single attribute.

So to convert it into 1NF, we decompose the table as the following -

Here, we can notice data repetition, but 1NF doesn’t care about it.
Second Normal Form (2NF)
In 2NF, the relation present should be 1NF, and no partial dependency should exist. Partial
dependency is when the non-prime attributes depend entirely on the candidate or primary key, even
if the primary key is composite.
Example 1: (depicting partial dependency issues)
If given with a relation R(A, B, C, D) where we have {A, B} as the primary key where A and B can’t be
NULL simultaneously, but both can be NULL independently and C, D are non-prime attributes. If B is
NULL and we are given the functional dependency, say, B → C. So can this ever hold?
As B contains NULL, it can never determine the value of C. So, as B → C is a partial dependency, it
creates a problem. Therefore, the non-prime attributes cannot be determined by a part of the
primary key. We can remove the partial dependency present by creating two relations ( the 2NF
conversion)-
Relation 1 = R1(ABD), where {A, B} is the primary key. AB determines D.
Relation 2 = R1(BC), where B is the primary key. And from this, B determines C.
Example 2:
Consider the following table. Its primary key is {StudentId, ProjectId}.
The Functional dependencies given are -
StudentId → StudentName
ProjectId → ProjectName

As it represents partial dependency, we decompose the table as follows -

Here projectId is mentioned in both tables to set up a relationship between them.


Third Normal Form (3NF)
In 3NF, the given relation should be 2NF, and no transitivity dependency should exist, i.e., non-prime
attributes should not determine non-prime attributes.
Example:
Consider the following scenario where the functional dependencies are -
A → B and B → C, where A is the primary key.
As here, a non-prime attribute can be determined by a prime attribute, which implies transitivity
dependency exists. To remove this, we decompose this and convert it into 3NF. So, we create two
relations -
R1(A, B), where A is the primary key and R2(B, C), where B is the primary key.
Boyce-Codd Normal Form(BCNF)
In BCNF, the relation should be in 3NF.If given a relation, say A → B, A should be a super key in this.
This implies that no prime attribute should be determined or derived from any other prime or non-
prime attribute.
Example:
Given the following table. Its candidate keys are {Student, Teacher} and {Student, Subject}.
The Functional dependencies given are -
{Student, Teacher} → Subject
{Student, Subject} → Teacher
Teacher → Subject
As this table is not in BCNF form, so we decompose it into the following tables:

Here Teacher is mentioned in both tables to set up a relationship between them.


Advantages of Normalization
The following are the advantages of normalization in a database:
1. The redundancy in data is minimized, leading to a smaller size of the database.
2. It removes the data inconsistency.
3. The database becomes easy to maintain when we organize it using normal forms.
4. It becomes comparatively easier to write queries as the size of the database decreases.
5. Decreased database size further reduces the complexity of sorting and finding any value in
the database.
Disadvantages of Normalization
The following are the disadvantages of normalization in database:
1. Decomposing the table in Normalization can lead to a
poorer database design and severeproblems.
2. The process of normalization in the database is very time-
consuming as we decompose thetable repeatedly into
different normal forms until we reach the SRP situation.
3. It becomes tough to normalize relations that are of a higher
degree.
Transaction Processing – Database Security
1.Transaction in DBMS
When the data of users is stored in a database, that data needs to be accessed and modified from time
to time. This task should be performed with a specified set of rules and in a systematic way to maintain
the consistency and integrity of the data present in a database. In DBMS, this task is called a
transaction. It is similar to a bank transaction, where the user requests to withdraw some amount of
money from his account. Subsequently, several operations take place such as fetching the user’s
balance from the database, subtracting the desired amount from it, and updating the user’s account
balance. This series of operations can be called a transaction. Transactions are very common in DBMS.
In this article, we will discuss what a transaction means, various operations of transactions, transaction
states, and properties of transactions in DBMS.
2, What does a Transaction mean in DBMS?
Transaction in Database Management Systems (DBMS) can be defined as a set of logically related
operations. It is the result of a request made by the user to access the contents of the database and
perform operations on it. It consists of various operations and has various states in its completion
journey. It also has some specific properties that must be followed to keep the database consistent.
Operations of Transaction
A user can make different types of requests to access and modify the contents of a database. So, we
have different types of operations relating to a transaction. They are discussed as follows:
i) Read(X)
A read operation is used to read the value of X from the database and store it in a buffer in the main
memory for further actions such as displaying that value. Such an operation is performed when a user
wishes just to see any content of the database and not make any changes to it. For example, when a
user wants to check his/her account’s balance, a read operation would be performed on user’s account
balance from the database.
ii) Write(X)
A write operation is used to write the value to the database from the buffer in the main memory. For a
write operation to be performed, first a read operation is performed to bring its value in buffer, and
then some changes are made to it, e.g. some set of arithmetic operations are performed on it
according to the user’s request, then to store the modified value back in the database, a write
operation is performed. For example, when a user requests to withdraw some money from his
account, his account balance is fetched from the database using a read operation, then the amount to
be deducted from the account is subtracted from this value, and then the obtained value is stored back
in the database using a write operation.
iii) Commit
This operation in transactions is used to maintain integrity in the database. Due to some failure of
power, hardware, or software, etc., a transaction might get interrupted before all its operations are
completed. This may cause ambiguity in the database, i.e. it might get inconsistent before and after
the transaction. To ensure that further operations of any other transaction are performed only after
work of the current transaction is done, a commit operation is performed to the changes made by a
transaction permanently to the database.
iv) Rollback
This operation is performed to bring the database to the last saved state when any transaction is
interrupted in between due to any power, hardware, or software failure. In simple words, it can be
said that a rollback operation does undo the operations of transactions that were performed before its
interruption to achieve a safe state of the database and avoid any kind of ambiguity or inconsistency.
3. Transaction Schedules
When multiple transaction requests are made at the same time, we need to decide their order of
execution. Thus, a transaction schedule can be defined as a chronological order of execution of
multiple transactions. There are broadly two types of transaction schedules discussed as follows,
i) Serial Schedule
In this kind of schedule, when multiple transactions are to be executed, they are executed serially, i.e.
at one time only one transaction is executed while others wait for the execution of the current
transaction to be completed. This ensures consistency in the database as transactions do not execute
simultaneously. But, it increases the waiting time of the transactions in the queue, which in turn lowers
the throughput of the system, i.e. number of transactions executed per time. To improve the
throughput of the system, another kind of schedule are used which has some more strict rules which
help the database to remain consistent even when transactions execute simultaneously.
ii) Non-Serial Schedule
To reduce the waiting time of transactions in the waiting queue and improve the system efficiency, we
use nonserial schedules which allow multiple transactions to start before a transaction is completely
executed. This may sometimes result in inconsistency and errors in database operation. So, these
errors are handled with specific algorithms to maintain the consistency of the database and improve
CPU throughput as well. Non-Serial Schedules are also sometimes referred to as parallel schedules as
transactions execute in parallel in this kind of schedules.
4.Serializable
Serializability in DBMS is the property of a nonserial schedule that determines whether it would
maintain the database consistency or not. The nonserial schedule which ensures that the database
would be consistent after the transactions are executed in the order determined by that schedule is
said to be Serializable Schedules. The serial schedules always maintain database consistency as a
transaction starts only when the execution of the other transaction has been completed under it. Thus,
serial schedules are always serializable.
A transaction is a series of operations, so various states occur in its completion journey. They are
discussed as follows:
i) Active
It is the first stage of any transaction when it has begun to execute. The execution of the transaction
takes place in this state. Operations such as insertion, deletion, or updation are performed during this
state. During this state, the data records are under manipulation and they are not saved to the
database, rather they remain somewhere in a buffer in the main memory.
ii) Partially Committed
This state of transaction is achieved when it has completed most of the operations and is executing its
final operation. It can be a signal to the commit operation, as after the final operation of the
transaction completes its execution, the data has to be saved to the database through the commit
operation. If some kind of error occurs during this state, the transaction goes into a failed state, else it
goes into the Committed state.
iii) Commited
This state of transaction is achieved when all the transaction-related operations have been executed
successfully along with the Commit operation, i.e. data is saved into the database after the required
manipulations in this state. This marks the successful completion of a transaction.
iv) Failed
If any of the transaction-related operations cause an error during the active or partially committed
state, further execution of the transaction is stopped and it is brought into a failed state. Here, the
database recovery system makes sure that the database is in a consistent state.
v) Aborted
If the error is not resolved in the failed state, then the transaction is aborted and a rollback operation
is performed to bring database to the the last saved consistent state. When the transaction is aborted,
the database recovery module either restarts the transaction or kills it.
The illustration below shows the various states that a transaction may encounter in its completion
journey.

Transaction in DBMS
5.Properties of Transaction
As transactions deal with accessing and modifying the contents of the database, they must have some
basic properties which help maintain the consistency and integrity of the database before and after the
transaction. Transactions follow 4 properties, namely, Atomicity, Consistency, Isolation, and Durability.
Generally, these are referred to as ACID properties of transactions in DBMS. ACID is the acronym used
for transaction properties. A brief description of each property of the transaction is as follows.
i) Atomicity
This property ensures that either all operations of a transaction are executed or it is aborted. In any
case, a transaction can never be completed partially. Each transaction is treated as a single unit (like an
atom). Atomicity is achieved through commit and rollback operations, i.e. changes are made to the
database only if all operations related to a transaction are completed, and if it gets interrupted, any
changes made are rolled back using rollback operation to bring the database to its last saved state.
ii) Consistency
This property of a transaction keeps the database consistent before and after a transaction is
completed. Execution of any transaction must ensure that after its execution, the database is either in
its prior stable state or a new stable state. In other words, the result of a transaction should be the
transformation of a database from one consistent state to another consistent state. Consistency, here
means, that the changes made in the database are a result of logical operations only which the user
desired to perform and there is not any ambiguity.
iii) Isolation
This property states that two transactions must not interfere with each other, i.e. if some data is used
by a transaction for its execution, then any other transaction can not concurrently access that data
until the first transaction has completed. It ensures that the integrity of the database is maintained and
we don’t get any ambiguous values. Thus, any two transactions are isolated from each other. This
property is enforced by the concurrency control subsystem of DBMS.
iv) Durability
This property ensures that the changes made to the database after a transaction is completely
executed, are durable. It indicates that permanent changes are made by the successful execution of a
transaction. In the event of any system failures or crashes, the consistent state achieved after the
completion of a transaction remains intact. The recovery subsystem of DBMS is responsible for
enforcing this property.
6.Database Security
Security of databases refers to the array of controls, tools, and procedures designed to ensure and
safeguard confidentiality, integrity, and accessibility. This tutorial will concentrate on confidentiality
because it's a component that is most at risk in data security breaches.
Security for databases must cover and safeguard the following aspects:
o The database containing data.
o Database management systems (DBMS)
o Any applications that are associated with it.
o Physical database servers or the database server virtual, and the hardware that runs it.
o The infrastructure for computing or network that is used to connect to the database.
Security of databases is a complicated and challenging task that requires all aspects of security
practices and technologies. This is inherently at odds with the accessibility of databases. The more
usable and accessible the database is, the more susceptible we are to threats from security. The more
vulnerable it is to attacks and threats, the more difficult it is to access and utilize.
Why Database Security is Important?
According to the definition, a data breach refers to a breach of data integrity in databases. The amount
of damage an incident like a data breach can cause our business is contingent on various consequences
or elements.
o Intellectual property that is compromised: Our intellectual property--trade secrets,
inventions, or proprietary methods -- could be vital for our ability to maintain an advantage in
our industry. If our intellectual property has been stolen or disclosed and our competitive
advantage is lost, it could be difficult to keep or recover.
o The damage to our brand's reputation: Customers or partners may not want to purchase
goods or services from us (or deal with our business) If they do not feel they can trust our
company to protect their data or their own.
o The concept of business continuity (or lack of it): Some businesses cannot continue to
function until a breach has been resolved.
o Penalties or fines to be paid for not complying: The cost of not complying with international
regulations like the Sarbanes-Oxley Act (SAO) or Payment Card Industry Data Security Standard
(PCI DSS) specific to industry regulations on data privacy, like HIPAA or regional privacy laws
like the European Union's General Data Protection Regulation (GDPR) could be a major
problem with fines in worst cases in excess of many million dollars for each violation.
o Costs for repairing breaches and notifying consumers about them: Alongside notifying
customers of a breach, the company that has been breached is required to cover the
investigation and forensic services such as crisis management, triage repairs to the affected
systems, and much more.
7. Common Threats and Challenges
Numerous software configurations that are not correct, weaknesses, or patterns of carelessness or
abuse can lead to a breach of security. Here are some of the most prevalent kinds of reasons for
security attacks and the reasons.
Insider Dangers
An insider threat can be an attack on security from any three sources having an access privilege to the
database.
o A malicious insider who wants to cause harm
o An insider who is negligent and makes mistakes that expose the database to attack. vulnerable
to attacks
o An infiltrator is an outsider who acquires credentials by using a method like phishing or
accessing the database of credential information in the database itself.
Insider dangers are among the most frequent sources of security breaches to databases. They often
occur as a consequence of the inability of employees to have access to privileged user credentials.
Human Error
The unintentional mistakes, weak passwords or sharing passwords, and other negligent or uninformed
behaviours of users remain the root causes of almost half (49 percent) of all data security breaches.
Database Software Vulnerabilities can be Exploited
Hackers earn their money by identifying and exploiting vulnerabilities in software such as databases
management software. The major database software companies and open-source databases
management platforms release regular security patches to fix these weaknesses. However, failing to
implement the patches on time could increase the risk of being hacked.
SQL/NoSQL Injection Attacks
A specific threat to databases is the infusing of untrue SQL as well as other non-SQL string attacks in
queries for databases delivered by web-based apps and HTTP headers. Companies that do not follow
the safe coding practices for web applications and conduct regular vulnerability tests are susceptible to
attacks using these.
Buffer Overflow is a way to Exploit Buffers
Buffer overflow happens when a program seeks to copy more data into the memory block with a
certain length than it can accommodate. The attackers may make use of the extra data, which is stored
in adjacent memory addresses, to establish a basis for they can begin attacks.
DDoS (DoS/DDoS) Attacks
In a denial-of-service (DoS) attack in which the attacker overwhelms the targeted server -- in this case,
the database server with such a large volume of requests that the server is unable to meet no longer
legitimate requests made by actual users. In most cases, the server is unstable or even fails to function.
Malware
Malware is software designed to exploit vulnerabilities or cause harm to databases. Malware can be
accessed via any device that connects to the databases network.
Attacks on Backups
Companies that do not protect backup data using the same rigorous controls employed to protect
databases themselves are at risk of cyberattacks on backups.
8. Control methods of Database Security
Database Security means keeping sensitive information safe and prevent the loss of data. Security of
data base is controlled by Database Administrator (DBA).
The following are the main control measures are used to provide security of data in databases:
1. Authentication
2. Access control
3. Inference control
4. Flow control
5. Database Security applying Statistical Method
6. Encryption
These are explained as following below.
1. Authentication :
Authentication is the process of confirmation that whether the user log in only according to the
rights provided to him to perform the activities of data base. A particular user can login only up to
his privilege but he can’t access the other sensitive data. The privilege of accessing sensitive data is
restricted by using Authentication.
By using these authentication tools for biometrics such as retina and figure prints can prevent the
data base from unauthorized/malicious users.
2. Access Control :
The security mechanism of DBMS must include some provisions for restricting access to the data
base by unauthorized users. Access control is done by creating user accounts and to control login
process by the DBMS. So, that database access of sensitive data is possible only to those people
(database users) who are allowed to access such data and to restrict access to unauthorized
persons.
The database system must also keep the track of all operations performed by certain user
throughout the entire login time.
3. Inference Control :
This method is known as the countermeasures to statistical database security problem. It is used to
prevent the user from completing any inference channel. This method protect sensitive
information from indirect disclosure.
Inferences are of two types, identity disclosure or attribute disclosure.
4. Flow Control :
This prevents information from flowing in a way that it reaches unauthorized users. Channels are
the pathways for information to flow implicitly in ways that violate the privacy policy of a company
are called convert channels.
5. Database Security applying Statistical Method :
Statistical database security focuses on the protection of confidential individual values stored in
and used for statistical purposes and used to retrieve the summaries of values based on categories.
They do not permit to retrieve the individual information.
This allows to access the database to get statistical information about the number of employees in
the company but not to access the detailed confidential/personal information about the specific
individual employee.
6. Encryption :
This method is mainly used to protect sensitive data (such as credit card numbers, OTP numbers)
and other sensitive numbers. The data is encoded using some encoding algorithms.
An unauthorized user who tries to access this encoded data will face difficulty in decoding it, but
authorized users are given decoding keys to decode data.

Q.1: What is meant by a transaction in DBMS?


In DBMS, a transaction is a set of logical operations performed to access and modify the contents of the
database as per the user’s request.
Q.2: What is meant by ACID properties in transactions?
ACID is an acronym used for the properties of transaction in DBMS. It is used to denote the properties of
transactions, i.e. Atomicity, Consistency, Isolation and Durability.
Q.3: Which operation of transactions ensures the durability property?
In DBMS, the durability of a transaction, i.e. the changes made by it are saved to the database
permanently, is ensured by the ‘Commit’ operation. A transaction is completed only if data is saved
using ‘Commit’ operation. And then, the changes remain durable, i.e. in case of any system failures, the
last saved state of the database can be recovered through database recovery subsystem in DBMS.
Q.4: What is meant by schedules of transactions in DBMS?
When multiple transaction requests are made at the same time, we need to decide the order of
execution of these transactions. This chronological order of execution of transactions is called as a
schedule of transactions in DBMS. It is mainly of two types, i.e. Serial Schedules and Non Serial
Schedules.
Q.5: What do you mean by serializability in DBMS?
Serializability is the property of a schedule of transactions in DBMS which determines whether the
database would be in consistent state or not if the transactions are executed following the given
schedule.
Q.6: What is primary key, candidate key, and foreign key?
The primary key uniquely identifies each record, while foreign keys establish table relationships and
minimality criteria, composite keys combine columns for a unique identifier, and super keys can
uniquely identify records. These keys form the foundation of effective data management.

You might also like