Dbms 123
Dbms 123
For example, a list of dates — data — is meaningless without the information that makes the date‘s
relevant (dates of holiday).
Information:
Information is a collection of raw facts with meaning. When data is processed, organized,
structured or presented in a given context, so as to make it useful, it is called information. Data are raw facts
that constitute building block of information. Data are the heart of the DBMS. It is to be noted that all the data
will not convey useful information. Useful information is obtained from processed data. In other words, data
has to be interpreted in order to obtain information.
Good, timely, relevant information is the key to decision making. Good decision making is the key to
organizational survival. Data are a representation of facts, concepts or instructions in a formalized manner
suitable for communication, interpretation or processing by humans or automatic.
The data in DBMS can be broadly classified into two types, one is the collection of information needed
by the organization and the other is ―metadata‖ which is the information about the database. Data are
the most stable part of an organization‘s information system. A company needs to save information about
employees, departments and salaries. These pieces of information are called data. Permanent storage of data
is referred to as persistent data. Generally, we perform operations on data or data items to supply some
information about an entity.
1
Database: It can be defined as collection of large amount of data stored at one place (or) it is
collection of logically related data that can be recorded.
A database is a well-organized collection of data that are related in a meaningful way, which can be
accessed in different logical orders. Database
systems are systems in which the interpretation and
storage of information are of primary importance.
(OR)
A data base management system refers to be as set of programs (software) for defining creating
manipulating and maintenance of the data base.
The DBMS hides much of the data bases internal complexity from the application programmers and
users.
Objectives of DBMS:
Mass Storage: DBMS can store a lot of data in it. So for all the big firms, DBMS is really ideal technology
to use. It can store thousands of records in it and one can fetch all that data whenever it is needed.
Removes Duplicity: If you have lots of data then data duplicity will occur for sure at any instance. DBMS
guarantee it that there will be no data duplicity among all the records. While storing new records, DBMS
makes sure that same data was not inserted before.
Multiple Users Access: No one handles the whole database alone. There are lots of users who are able to
access database. So this situation may happen that two or more users are accessing database. They can
change whatever they want, at that time DBMS makes it sure that they can work concurrently.
Data Protection: Information such as bank details, employee‘s salary details and sale purchase details
should always be kept secured. Also all the companies need their data secured from unauthorized use.
DBMS gives a master level security to their data. No one can alter or modify the information without the
privilege of using that data. 2
2
Data Backup and recovery: Sometimes database failure occurs so there is no option like one can say that
all the data has been lost. There should be a backup of database so that on database failure it can be
recovered. DBMS has the ability to backup and recover all the data in database.
Everyone can work on DBMS: There is no need to be a master of programming language if you want to
work on DBMS. Any accountant who is having less technical knowledge can work on DBMS. All the
definitions and descriptions are given in it so that even a non-technical background person can work on it.
Integrity: Integrity means your data is authentic and consistent. DBMS has various validity checks that
make your data completely accurate and consistence.
Platform Independent: DBMS can run at any platform. No particular platform is required to work on
database management system.
File based system was the previous management system of the present database management system,
at that time, there was no system available to handle the large volumes of data.
Flat Files: Earlier, punched cards technology was used to store data – later, files. But the files have no as
such advantage, rather have several limitations.
The Network database was developed to fulfill the need to open more complex, data relationships
In Network data model, files are related as owners and members, similar to the common network
model except that each member file can have more than one owner.
1970’s:
1. Relational Data Model
2. Relational Database Implementation
The draw backs of network data model create minimal data Independence, minimal theoretical
foundation and complex data access.
To overcome these drawbacks, in 1970, CODD published a relational data model and relational data
base management system Implementation.
In 1976, Peter Chen presented and Entity Relationship model (ER model), which is widely used in data
base design.
The relational database model was conceived by E. F. CODD in 1970. It can be defined using the
following two terminologies:
This model is based on branches of mathematics called set theory and predicate logic.
In 1980, IBM released 2 commercial relational data base MGT System known as DB2 and SQL/PS.SQL
(structured Query language) was developed by IBM (International Business Machines).
1985: Object-oriented DBMS (OODBMS) develops.
1990s: In Corporation of object-orientation in relational DBMSs, new application areas, such as data
warehousing and OLAP, web and Internet, Interest in text and multimedia, enterprise resource
planning (ERP) and management resource planning (MRP)
1991: Microsoft access, a personal DBMS created as element of Windows gradually supplied all other
personal DBMS products.
1997: XML applied to database processing, which solves long-standing database problems. Major vendors
begin to integrate XML into DBMS products.
2000’s:
In 2000‘s IBM, Oracle, Informix and other developed powerful DBMS for handling large data base.
Types of Databases:
A DBMS supports many different types of databases.
5
DBMS:
DBMS is generally defined as collection of logically related data and a set of programs to access the
data. Or
A data base management system refers to be as set of programs (software) for defining creating
manipulating and maintains of the data base
The DBMS hides much of the data bases internal complexity from the application programmers and users.
Advantages of DBMS:
There are many advantages of data base management system.
1. Improved data sharing: The DBMS helps create an environment in which the data sharing is
improved because of the data is stored logically.
2. Improved data security: A DBMS provides a framework for better enforcement of data privacy
and security policies. A DBMS protect the data more security form UN authorized users.
3. Better data integrated: A DBMS provides better data integration in the data base. It promotes an
integrated view of the entire organization.
4. Minimized data inconsistency: Data inconsistency exists when different versions of the same data
appear in different places. In this users have an ambiguity position. The DBMS helps to reduce data
inconsistency.
5. Improved data access: The DBMS makes it possible to produce quick answers to ad hoc
(different) queries. DBMS utilize a variety of techniques to store and retrieve the data efficiently.
6. Improved decision making: Improved data access make it possible to generate better quality
information, on which better decision are based.
7. Increased end user productivity: The availability of data combined with the tools that transform
data info usable information. That can make difference between success and failure in the global
economy. This will empower end users to make quickly and information decision.
12
8. Minimized data redundancy: Since the whole data present in one central data base. Hence the
same data present in one file need not to be duplicated in another, it will causes data redundancy. The
DBMS helps to reduce data redundancy.
9. Concurrent access: DBMS schedules concurrent access to the data from the data base. i.e. the
same data can be accessed by multiple users at the same time. The DBMS also protect the user from
the system failures.
10. Reduced application development time: The DBMS supports important functions that are
common to many applications. The DBMS reducing the development time of application.
11. Data independence: The data independence means that programs are isolated from changes in the
way the data are structured (or) in stored. In this the data is stored in centralized manner then the data
is independent of the data. In DBMS, there are two data independences, one is physical data
independence and another one is logical data independence.
12. Centralized data management: In DBMS, all files are integrated into one system. Thus reducing
redundancy and making data management is more efficient.
13. Data base administration: In DBMS, the data can be stored in centralized manner we organizing
or administration of the data base is called data base administrator. He maintains the entire
operations of the data base are one of the responsible people of the entire data base operation.
Disadvantages of DBMS:
In data base approach, to maintaining and developing a data base, we should take risk
(problem) and we should invest money, time and environment.
New skilled person: For implementing data base applications and manage a staff of new people
because there is a rapid change in the technology. So we required newly skilled person for a
developing for data base application, this will lead the organization maintaining the currency it is a
main drawback.
Frequent upgrades and replacement cycle: DBMS vendors (users) frequently upgrade their
products by adding new functionality. Such new features often come new upgrade versions of the
software. Some of these versions required hardware of the upgrade.
Cost of hardware & software: For running DBMS software, a processor can have high speed and
large size of memory, it means that you have to upgrade the as well as software. This process is very
costly.
Cost of data conversion: It is very difficult and costly method to convert data of file into data base.
You have to hiring data base system designers along with application programmers. So will
maintaining the staff, we require a lot of money has to be paid for developing a DBMS.
13
Database System Components (OR) Database System Environment:
The database system is composed of the five major parts.
1. Hardware: Hardware refers to all the systems physical devices.
2. Software: There are three type of software are needed
1. Operating system software
2. DBMS software.
3. Application programs and utility software.
3. People: This component includes all users of the database system.
Database administrators: DBMS manages the DBMS and ensure that the database is functioning properly.
Database designers: These people design the database structure.
Database Manager: Database manager is a program module which provides the interface between the low
level data stored in the database and the application programs and queries submitted to the system:
The database manager would translate DML statement into low level file system commands for storing,
retrieving, and updating data in the database.
System Analysis and Programmers: They design and implement the application programs.
End users: These are the people who use the application programs to run the organization daily
operations. Database users are the people who need information from the database to carry out their business
responsibility. The database users can be broadly classified into two categories like application programmers
and end users.
4. Procedures: Procedures are the instructions and rules that govern the design and use of the database
system.
Procedures are the rules that govern the design and the use of database. The procedure may contain
information on how to log on to the DBMS, start and stop the DBMS, procedure on how to identify the
failed component, how to recover the database, change the structure of the table, and improve the
performance.
18
5. Data: The word data covers the collection of facts stored in the database. Because data are the raw
material from which information is generated.
A database is a repository for data which, in general, is both integrated and shared. Integration means
that the database may be thought of as unifications of several otherwise distinct files, with any redundancy
among those files partially or wholly eliminated. The sharing of a database refers to the sharing of data by
different users, in the sense that each of those users may have access to the same piece of data and may use
it for different purposes. Any given user will normally be concerned with only a subset of the whole
database. The main features of the data in the database are listed later:
The data in the database is well organized (structured)
The data in the database is related
The data are accessible in different orders without great difficulty
The data in the database is persistent, integrated, structured, and shared.
Integrated Data:
A data can be considered to be a unification of several distinct data files and when any redundancy among
those files is eliminated, the data are said to be integrated data.
Shared Data:
A database contains data that can be shared by different users for different application simultaneously. It is
important to note that in this way of sharing of data, the redundancy of data is reduced, since repetitions are
avoided, the possibility of inconsistencies is reduced.
Metadata:
The information (data) about the data in a database is called Metadata. The Metadata are available for query
and manipulation, just as other data in the database.
Database Architecture:
Database architecture essentially describes the location of all the pieces of information that make up
the database application. The database architecture can be broadly classified into two-tier, three-tier and
multitier architecture. The design of a DBMS depends on its architecture. It can be centralized or
decentralized or hierarchical.
The architecture of a DBMS can be seen as either single tier or multi-tier. N-tier architecture divides
the whole system into related but independent n modules, which can be independently modified, altered,
changed or replaced.
19
1-tier architecture:
In 1-tier architecture, the DBMS is the only entity where the user directly sits on the DBMS and uses
it. Any changes done here will directly be done on the DBMS itself. It does not provide handy tools for end-
users. Database designers and programmers normally prefer to use single-tier architecture.
Two-Tier Architecture:
If the architecture of DBMS is 2-tier, then it must have an application through which the DBMS can
be accessed. Programmers use 2-tier architecture where they access the DBMS by means of an application.
Here the application tier is entirely independent of the database in terms of operation, design, and
programming.
The two-tier architecture is a client–server architecture in which the client contains the presentation
code and the SQL statements for data access. The database server processes the SQL statements and sends
query results back to the client. Two-tier client/server provides a basic separation of tasks.
The client or first tier, is primarily responsible for the presentation of data to the user and the
“server,” or second tier, is primarily responsible for supplying data services to the client
20
Presentation Services:
―Presentation services‖ refers to the portion of the application which presents data to the user.
Business Services:
―Business services‖ are a category of application services. Business services encapsulate an
organizations business processes and requirements.
Application Services
―Application services‖ provide other functions necessary for the application.
Data Services:
―Data services‖ provide access to data independent of their location. The data can come from legacy
mainframe, SQL RDBMS, or proprietary data access systems. Once again, the data services provide a
standard interface for accessing data.
Advantages :
The two-tier architecture is a good approach for systems with stable requirements and a moderate
number of clients.
The two-tier architecture is the simplest to implement, due to the number of good commercial
development environments.
Drawbacks:
Software maintenance can be difficult because PC clients contain a mixture of presentation,
validation, and business logic code.
To make a significant change in the business logic, code must be modified on many PC clients.
Three-tier Architecture:
3-tier architecture separates its tiers from each other based on the complexity of the users and how
they use the data present in the database. It is the most widely used architecture to design a DBMS.
21
Presentation layer / User layer is the layer where user uses the database. He does not have any
knowledge about underlying database. He simply interacts with the database as though he has all
data in front of him. You can imagine this layer as a registration form where you will be inputting
your details. Did you ever guessed, after pressing ‗submit‘ button where the data goes? No right?
You just know that your details are saved. This is the presentation layer where all the details from the
user are taken, sent to the next layer for processing.
Application layer is the underlying program which is responsible for saving the details that you
have entered, and retrieving your details to show up in the page. This layer has all the business logics
like validation, calculations and manipulations of data, and then sends the requests to database to get
the actual data. If this layer sees that the request is invalid, it sends back the message to presentation
layer. It will not hit the database layer at all.
Data layer or Database layer is the layer where actual database resides. In this layer, all the tables,
their mappings and the actual data present. When you save you details from the front end, it will be
inserted into the respective tables in the database layer, by using the programs in the application
layer. When you want to view your details in the web browser, a request is sent to database layer by
application layer. The database layer fires queries and gets the data. These data are then transferred
to the browser (presentation layer) by the programs in the application layer.
Advantages of 3-tier architecture:
Easy to maintain and modify. Any changes requested will not affect any other data in the database.
Application layer will do all the validations.
Improved security. Since there is no direct access to the database, data security is increased. There is
no fear of mishandling the data. Application layer filters out all the malicious actions.
Good performance. Since this architecture cache the data once retrieved, there is no need to hit the
database for each request. This reduces the time consumed for multiple requests and hence enables
the system to respond at the same time.
Disadvantages 3-tier Architecture
Disadvantages of 3-tier architecture are that it is little more complex and little more effort is required in
terms of hitting the database.
22
Multi tier Architecture:
A multi-tier, three-tier, or N -tier implementation employs a three-tier logical architecture super
imposed on a distributed physical model. Application Servers can access other application servers in order to
supply services to the client application as well as to other Application Servers.
The multiple-tier architecture is the most general client–server architecture. It can be most difficult to
implement because of its generality. However, a good design and implementation of multiple-tier
architecture can provide the most benefits in terms of scalability, interoperability, and flexibility.
The client application looks to Application Server-1 to supply data from a mainframe-based application.
Application Server-1 has no direct access to the mainframe application, but it does know, through the
development of application services, that Application Server-2 provides a service to access the data from the
main-frame application which satisfies the client request. Application Server -1 then invokes the appropriate
service on Application Server -2 and receives the requested data which is then passed on to the client.
23
UNIT I
2 Entity-Relationship Model:
2.1 Introduction:
Peter Chen first proposed modelling databases using a graphical technique that
humans can relate to easily. Humans can easily perceive entities and their characteristics
in the real world and represent any relationship with one another. Entity–Relationship (ER)
model gives the conceptual model of the world to be represented in the database. The
main motivation for defining the ER model is to provide a high level model for conceptual
database design, which acts as an intermediate stage prior to mapping the enterprise
being modelled onto a conceptual level.
The ER model achieves a high degree of data independence which means that the
database designer do not have to worry about the physical structure of the database. A
database schema in ER model can be pictorially represented by Entity–Relationship
diagram.
1
– Lines represent linking of attributes to entity sets and of entity sets to
relationship sets.
Example of ER diagram
In the ER diagram the two entities are STUDENT and CLASS. Two simple attributes which
are associated with the STUDENT are Roll number and the name. The attributes associated
with the entity CLASS are Subject Name and Hall Number. The relationship between the
two entities STUDENT and CLASS is Attends.
Strong Entity
Strong entity is one whose existence does not depend on other entity.
Example
Consider the example, student takes course. Here student is a strong entity.
In this example, course is considered as weak entity because, if there are no students to
take a particular course, then that course cannot be offered. The COURSE entity depends
on the STUDENT entity.
Weak Entity
Weak entity is one whose existence depends on other entity.
2
Example
Consider the example, customer borrows loan. Here loan is a weak entity. For every loan,
there should be at least one customer. Here the entity loan depends on the entity customer
hence loan is a weak entity.
3
Examples of Multivalued Attribute
Derived Attribute
Derived attributes are the attributes that do not exist in the physical database, but their
values are derived from other attributes present in the database. For example,
average_salary in a department should not be saved directly in the database, instead it
can be derived. For another example, age can be derived from data_of_birth.
Example of Derived Attribute
Example
In application forms, there is one column called phone no. if a person do not have phone
then a null value is entered in that column.
Simple attribute − Simple attributes are atomic values, which cannot be divided
further. For example, a student's phone number is an atomic value of 10 digits
Composite attribute − Composite attributes are made of more than one simple ttribute.
For example, a student's complete name may have first_name and last_name.
Consider the attribute “address” which can be further subdivided into Street name, City,
and State.
Relationship degree refers to the number of associated entities. The relationship degree
can be broadly classified into unary, binary, and ternary relationship.
2.5.1 Unary Relationship
The unary relationship is otherwise known as recursive relationship. In the unary
relationship the number of associated entity is one. An entity related to itself is known as
recursive relationship.
4
Roles and Recursive Relation
When an entity sets appear in more than one relationship, it is useful to add labels to
connecting lines. These labels are called as roles.
Example
In this example, Husband and wife are referred as roles.
5
“Teaches.”
2.6 Relationship Classification
Relationship is an association among one or more entities. This relationship can be broadly
classified into one-to-one relation, one-to-many relation, many-to- many relation and
recursive relation.
2.6.1 One-to-Many Relationship Type
The relationship that associates one entity to more than one entity is called one-to-many
relationship. Example of one-to-many relationship is Country having states. For one
country there can be more than one state hence it is an example of one-to-many
relationship. Another example of one-to-many relationship is parent–child relationship. For
one parent there can be more than one child. Hence it is an example of one-to-many
relationship.
2.6.2 One-to-One Relationship Type
One-to-one relationship is a special case of one-to-many relationship. True one-to-one
relationship is rare. The relationship between the President and the country is anexample
of one-to-one relationship. For a particular country there will be only one President. In
general, a country will not have more than one President hence the relationship between
the country and the President is an example of one-to-one relationship. Another example
of one-to-one relationship is House to Location. A house is obviously in only one location.
2.6.3 Many-to-Many Relationship Type
The relationship between EMPLOYEE entity and PROJECT entity is an example of many-
to-many relationship. Many employees will be working in many projects hence the
relationship between employee and project is many-to- many relationship.
2.6.4 Many-to-One Relationship Type
The relationship between EMPLOYEE and DEPARTMENT is an example of many-to-one
relationship. There may be many EMPLOYEES working in one DEPARTMENT. Hence
relationship between EMPLOYEE and DEPARTMENT is many-to-one relationship. The four
relationship types are summarized and shown in Table 2.1.
6
“many side will consume the relationship” i.e. a combined table will be drawn for
many side entity set and relationship set.
For binary relationship with cardinality ratio 1 : 1 , two tables will be required. You
can combine the relationship set with any one of the entity sets.
Regular Entity
Regular entities are entities that have an independent existence and generally represent
real-world objects such as persons and products. Regular entities are represented by
rectangles with a single line.
2.7.2 Mapping Regular Entities
– Each regular entity type in an ER diagram is transformed into a relation. The name
given to the relation is generally the same as the entity type.
– Each simple attribute of the entity type becomes an attribute of the relation.
– The identifier of the entity type becomes the primary key of the corresponding relation.
Example 1
Mapping regular entity type tennis player
Here,
– Entity name = Name of the relation or table.
In our example, the entity name is PLAYER which is the name of the table
– Attributes of ER diagram=Column name of the table.
In our example the Name, Nation, Position, and Number of Grand slams won which
forms the column of the table.
2.7.3 Converting Composite Attribute in an ER Diagram to Tables
When a regular entity type has a composite attribute, only the simple component
attributes of the composite attribute are included in the relation.
Example
In this example the composite attribute is the Customer address, which consists of
Street, City, State, and Zip.
7
When the regular entity type contains a multi-valued attribute, two new relations are
created.
The first relation contains all of the attributes of the entity type except the multi-valued
attribute.
The second relation contains two attributes that form the primary key of the second
relation. The first of these attributes is the primary key from the first relation, which
becomes a foreign key in the second relation. The second is the multi-valued attribute.
8
2.7.6 Converting Binary Relationship to Table
A relationship which involves two entities can be termed as binary relationship. This
binary relationship can be one-to-one, one-to-many, many-to-one, and many-to-many.
Mapping one-to-Many Relationship
For each 1 – M relationship, first create a relation for each of the two entity type’ s
participation in the relationship.
Example
One customer can give many orders. Hence the relationship between the two entities
CUSTOMER and ORDER is one-to-many relationship. In one-to-many relationship, include
the primary key attribute of the entity on the one-side of the relationship as a foreign key
in the relation that is on the many side of the relationship.
Here we have two entities CUSTOMER and ORDER. The relationship between CUSTOMER
and ORDER is one-to-many. For two entities CUSTOMER and ORDER, two tables namely
CUSTOMER and ORDER are created as shown later. The primary key CUSTOMER ID in the
CUSTOMER relation becomes the foreign key in the ORDER relation.
CUSTOMER
9
The first step is to create three relations, one for each of the two participating entity types
and the third for the associative entity. The relation formed from the associative entity is
associative relation.
The primary key attributes – Patient ID, Physician ID, and Treatment Code – become
foreign keys in PATIENT TREATMENT. These attributes are components of the primary key
of PATIENT TREATMENT.
11
UNIT-2
I. Codd’s Rules in DBMS
Codd’s rules are proposed by a computer scientist named Dr. Edgar F. Codd and he also invent the
relational model for database management. These rules are made to ensure data integrity,
consistency, and usability. This set of rules basically signifies the characteristics and requirements of a
relational database management system (RDBMS). In this article, we will learn about various Codd’s
rules.
Rule 1: The Information Rule
All information, whether it is user information or metadata, that is stored in a database must be
entered as a value in a cell of a table. It is said that everything within the database is organized in a
table layout.
Rule 2: The Guaranteed Access Rule
Each data element is guaranteed to be accessible logically with a combination of the table name,
primary key (row value), and attribute name (column value).
Rule 3: Systematic Treatment of NULL Values
Every Null value in a database must be given a systematic and uniform treatment.
Rule 4: Active Online Catalog Rule
The database catalog, which contains metadata about the database, must be stored and accessed
using the same relational database management system.
Rule 5: The Comprehensive Data Sublanguage Rule
A crucial component of any efficient database system is its ability to offer an easily understandable
data manipulation language (DML) that facilitates defining, querying, and modifying information
within the database.
Rule 6: The View Updating Rule
All views that are theoretically updatable must also be updatable by the system.
Rule 7: High-level Insert, Update, and Delete
A successful database system must possess the feature of facilitating high-level insertions, updates,
and deletions that can grant users the ability to conduct these operations with ease through a single
query.
Rule 8: Physical Data Independence
Application programs and activities should remain unaffected when changes are made to the
physical storage structures or methods.
Rule 9: Logical Data Independence
Application programs and activities should remain unaffected when changes are made to the logical
structure of the data, such as adding or modifying tables.
Rule 10: Integrity Independence
Integrity constraints should be specified separately from application programs and stored in the
catalog. They should be automatically enforced by the database system.
Rule 11: Distribution Independence
The distribution of data across multiple locations should be invisible to users, and the database
system should handle the distribution transparently.
Rule 12: Non-Subversion Rule
If the interface of the system is providing access to low-level records, then the interface must not be
able to damage the system and bypass security and integrity constraints.
4. Key constraints
o Keys are the entity set that is used to identify an entity within its entity set uniquely.
o An entity set can have multiple keys, but out of which one key will be the primary key. A
primary key can contain a unique and null value in the relational table.
Example:
1. Select Operation:
ADVERTISEMENT
o The select operation selects tuples that satisfy a given predicate.
o It is denoted by sigma (σ).
1. Notation: σ p(r)
Where:
σ is used for selection prediction
r is used for relation
p is used as a propositional logic formula which may use connectors like: AND OR and NOT. These
relational can use as relational operators like =, ≠, ≥, <, >, ≤.
For example: LOAN Relation
NAME CITY
Jones Harrison
Smith Rye
Hays Harrison
Curry Rye
Johnson Brooklyn
Brooks Brooklyn
3. Union Operation:
o Suppose there are two tuples R and S. The union operation contains all the tuples that are
either in R or S or both in R & S.
o It eliminates the duplicate tuples. It is denoted by ∪.
1. Notation: R ∪ S
A union operation must hold the following condition:
o R and S must have the attribute of the same number.
o Duplicate tuples are eliminated automatically.
Example:
DEPOSITOR RELATION
CUSTOMER_NAME ACCOUNT_NO
Johnson A-101
Smith A-121
Mayes A-321
Turner A-176
Johnson A-273
Jones A-472
Lindsay A-284
BORROW RELATION
CUSTOMER_NAME LOAN_NO
Jones L-17
Smith L-23
Hayes L-15
Jackson L-14
Curry L-93
Smith L-11
Williams L-17
Input:
1. ∏ CUSTOMER_NAME (BORROW) ∪ ∏ CUSTOMER_NAME (DEPOSITOR)
Output:
CUSTOMER_NAME
Johnson
Smith
Hayes
Turner
Jones
Lindsay
Jackson
Curry
Williams
Mayes
4. Set Intersection:
o Suppose there are two tuples R and S. The set intersection operation contains all tuples that
are in both R & S.
o It is denoted by intersection ∩.
1. Notation: R ∩ S
Example: Using the above DEPOSITOR table and BORROW table
Input:
1. ∏ CUSTOMER_NAME (BORROW) ∩ ∏ CUSTOMER_NAME (DEPOSITOR)
Output:
CUSTOMER_NAME
Smith
Jones
5. Set Difference:
o Suppose there are two tuples R and S. The set intersection operation contains all tuples that
are in R but not in S.
o It is denoted by intersection minus (-).
1. Notation: R - S
Example: Using the above DEPOSITOR table and BORROW table
Input:
1. ∏ CUSTOMER_NAME (BORROW) - ∏ CUSTOMER_NAME (DEPOSITOR)
Output:
CUSTOMER_NAME
Jackson
Hayes
Willians
Curry
6. Cartesian product
o The Cartesian product is used to combine each row in one table with each row in the other
table. It is also known as a cross product.
o It is denoted by X.
1. Notation: E X D
Example:
EMPLOYEE
1 Smith A
2 Harry C
3 John B
DEPARTMENT
DEPT_NO DEPT_NAME
A Marketing
B Sales
C Legal
Input:
1. EMPLOYEE X DEPARTMENT
Output:
1 Smith A A Marketing
1 Smith A B Sales
1 Smith A C Legal
2 Harry C A Marketing
2 Harry C B Sales
2 Harry C C Legal
3 John B A Marketing
3 John B B Sales
3 John B C Legal
7. Rename Operation:
The rename operation is used to rename the output relation. It is denoted by rho (ρ).
Example: We can use the rename operator to rename STUDENT relation to STUDENT1.
1. ρ(STUDENT1, STUDENT)
IV. Join Operations:
A Join operation combines related tuples from different relations, if and only if a given join
condition is satisfied. It is denoted by ⋈.
Example:
EMPLOYEE
EMP_CODE EMP_NAME
101 Stephan
102 Jack
103 Harry
SALARY
EMP_CODE SALARY
101 50000
102 30000
103 25000
1. Operation: (EMPLOYEE ⋈ SALARY)
Result:
1. Natural Join:
o A natural join is the set of tuples of all combinations in R and S that are equal on their
common attribute names.
o It is denoted by ⋈.
Example: Let's use the above EMPLOYEE table and SALARY table:
Input:
1. ∏EMP_NAME, SALARY (EMPLOYEE ⋈ SALARY)
Output:
EMP_NAME SALARY
Stephan 50000
Jack 30000
Harry 25000
2. Outer Join:
The outer join operation is an extension of the join operation. It is used to deal with missing
information.
Example:
EMPLOYEE
3. Equi join:
It is also known as an inner join. It is the most common join. It is based on matched data as per the
equality condition. The equi join uses the comparison operator(=).
Example:
CUSTOMER RELATION
CLASS_ID NAME
1 John
2 Harry
3 Jackson
PRODUCT
PRODUCT_ID CITY
1 Delhi
2 Mumbai
3 Noida
Input:
1. CUSTOMER ⋈ PRODUCT
Output:
1 John 1 Delhi
2 Harry 2 Mumbai
3 Harry 3 Noida
V. Relational Calculus
There is an alternate way of formulating queries known as Relational Calculus. Relational calculus is a non-
procedural query language. In the non-procedural query language, the user is concerned with the details
of how to obtain the end results. The relational calculus tells what to do but never explains how to do.
Most commercial relational languages are based on aspects of relational calculus including SQL-QBE and
QUEL.
Why it is called Relational Calculus?
It is based on Predicate calculus, a name derived from branch of symbolic language. A predicate is a truth-
valued function with arguments. On substituting values for the arguments, the function result in an
expression called a proposition. It can be either true or false. It is a tailored version of a subset of the
Predicate Calculus to communicate with the relational database.
Many of the calculus expressions involves the use of Quantifiers.
There are two types of quantifiers:
o Universal Quantifiers: The universal quantifier denoted by ∀ is read as for all which means
that in a given set of tuples exactly all tuples satisfy a given condition.
o Existential Quantifiers: The existential quantifier denoted by ∃ is read as for all which means
that in a given set of tuples there is at least one occurrences whose value satisfy a given condition.
Before using the concept of quantifiers in formulas, we need to know the concept of Free and Bound
Variables.
A tuple variable t is bound if it is quantified which means that if it appears in any occurrences a variable
that is not bound is said to be free.
Free and bound variables may be compared with global and local variable of programming languages.
Types of Relational calculus:
1. Tuple Relational Calculus (TRC)
It is a non-procedural query language which is based on finding a number of tuple variables also known
as range variable for which predicate holds true. It describes the desired information without giving a
specific procedure for obtaining that information. The tuple relational calculus is specified to select the
tuples in a relation. In TRC, filtering variable uses the tuples of a relation. The result of the relation can
have one or more tuples.
Notation:
A Query in the tuple relational calculus is expressed as following notation
1. {T | P (T)} or {T | Condition (T)}
Where
T is the resulting tuples
P(T) is the condition used to fetch T.
For example:
1. { T.name | Author(T) AND T.article = 'database' }
Output: This query selects the tuples from the AUTHOR relation. It returns a tuple with 'name' from
Author who has written an article on 'database'.
TRC (tuple relation calculus) can be quantified. In TRC, we can use Existential (∃) and Universal
Quantifiers (∀).
For example:
1. { R| ∃T ∈ Authors(T.article='database' AND R.name=T.name)}
Output: This query will yield the same result as the previous one.
2. Domain Relational Calculus (DRC)
The second form of relation is known as Domain relational calculus. In domain relational calculus, filtering
variable uses the domain of attributes. Domain relational calculus uses the same operators as tuple
calculus. It uses logical connectives ∧ (and), ∨ (or) and ┓ (not). It uses Existential (∃) and Universal
Quantifiers (∀) to bind the variable. The QBE or Query by example is a query language related to domain
relational calculus.
Notation:
1. { a1, a2, a3, ..., an | P (a1, a2, a3, ... ,an)}
Where
a1, a2 are attributes
P stands for formula built by inner attributes
For example:
1. {< article, page, subject > | ∈ javatpoint ∧ subject = 'database'}
Output: This query will yield the article, page, and subject from the relational javatpoint, where the
subject is a database.
VI. Query By Example (QBE)
QBE stands for Query By Example and it was developed in 1970 by Moshe Zloof at IBM.
It is a graphical query language where we get a user interface and then we fill some required fields to
get our proper result.
In SQL we will get an error if the query is not correct but in the case of QBE if the query is wrong either
we get a wrong answer or the query will not be going to execute but we will never get any error.
Note-:
In QBE we don’t write complete queries like SQL or other database languages it comes with some blank
so we need to just fill that blanks and we will get our required result.
Example
Consider the example where a table ‘SAC’ is present in the database with Name, Phone_Number, and
Branch fields. And we want to get the name of the SAC-Representative name who belongs to the MCA
Branch. If we write this query in SQL we have to write it like
SELECT NAME
FROM SAC
WHERE BRANCH = 'MCA'"
And definitely, we will get our correct result. But in the case of QBE, it may be done as like there is a
field present and we just need to fill it with “MCA” and then click on the SEARCH button we will get our
required result.
Points about QBE:
Due to these anomalies, the storage costs increase as the size of the database increases(because of
redundant data), further increasing the database’s complexity and making it more challenging to
maintain.
To rectify and address these issues, we need to optimize the given database by using the normalization
technique so that no redundant values are present in the database
5.What is DBMS Normalisation?
Normalization in a database is the process in which we organize the given data by minimizing the
redundancy present in a relation. In this, we eliminate the anomalies present, namely - update,
insertion and deletion. Normalization divides the single table into smaller tables and links them using
relationships. The different normal forms help us minimize redundancy in the database table.
To perform normalization in the database, we decompose the table into multiple tables. This process
keeps repeating until we achieve SRP (Single Responsibility Principle). The SRP states that one table
should have one role only.
6.What is Normalization? Explain about 1NF,2NF,3NF and BCNF with suitable examples.
Here, we can notice data repetition, but 1NF doesn’t care about it.
Second Normal Form (2NF)
In 2NF, the relation present should be 1NF, and no partial dependency should exist. Partial
dependency is when the non-prime attributes depend entirely on the candidate or primary key, even
if the primary key is composite.
Example 1: (depicting partial dependency issues)
If given with a relation R(A, B, C, D) where we have {A, B} as the primary key where A and B can’t be
NULL simultaneously, but both can be NULL independently and C, D are non-prime attributes. If B is
NULL and we are given the functional dependency, say, B → C. So can this ever hold?
As B contains NULL, it can never determine the value of C. So, as B → C is a partial dependency, it
creates a problem. Therefore, the non-prime attributes cannot be determined by a part of the
primary key. We can remove the partial dependency present by creating two relations ( the 2NF
conversion)-
Relation 1 = R1(ABD), where {A, B} is the primary key. AB determines D.
Relation 2 = R1(BC), where B is the primary key. And from this, B determines C.
Example 2:
Consider the following table. Its primary key is {StudentId, ProjectId}.
The Functional dependencies given are -
StudentId → StudentName
ProjectId → ProjectName
Transaction in DBMS
5.Properties of Transaction
As transactions deal with accessing and modifying the contents of the database, they must have some
basic properties which help maintain the consistency and integrity of the database before and after the
transaction. Transactions follow 4 properties, namely, Atomicity, Consistency, Isolation, and Durability.
Generally, these are referred to as ACID properties of transactions in DBMS. ACID is the acronym used
for transaction properties. A brief description of each property of the transaction is as follows.
i) Atomicity
This property ensures that either all operations of a transaction are executed or it is aborted. In any
case, a transaction can never be completed partially. Each transaction is treated as a single unit (like an
atom). Atomicity is achieved through commit and rollback operations, i.e. changes are made to the
database only if all operations related to a transaction are completed, and if it gets interrupted, any
changes made are rolled back using rollback operation to bring the database to its last saved state.
ii) Consistency
This property of a transaction keeps the database consistent before and after a transaction is
completed. Execution of any transaction must ensure that after its execution, the database is either in
its prior stable state or a new stable state. In other words, the result of a transaction should be the
transformation of a database from one consistent state to another consistent state. Consistency, here
means, that the changes made in the database are a result of logical operations only which the user
desired to perform and there is not any ambiguity.
iii) Isolation
This property states that two transactions must not interfere with each other, i.e. if some data is used
by a transaction for its execution, then any other transaction can not concurrently access that data
until the first transaction has completed. It ensures that the integrity of the database is maintained and
we don’t get any ambiguous values. Thus, any two transactions are isolated from each other. This
property is enforced by the concurrency control subsystem of DBMS.
iv) Durability
This property ensures that the changes made to the database after a transaction is completely
executed, are durable. It indicates that permanent changes are made by the successful execution of a
transaction. In the event of any system failures or crashes, the consistent state achieved after the
completion of a transaction remains intact. The recovery subsystem of DBMS is responsible for
enforcing this property.
6.Database Security
Security of databases refers to the array of controls, tools, and procedures designed to ensure and
safeguard confidentiality, integrity, and accessibility. This tutorial will concentrate on confidentiality
because it's a component that is most at risk in data security breaches.
Security for databases must cover and safeguard the following aspects:
o The database containing data.
o Database management systems (DBMS)
o Any applications that are associated with it.
o Physical database servers or the database server virtual, and the hardware that runs it.
o The infrastructure for computing or network that is used to connect to the database.
Security of databases is a complicated and challenging task that requires all aspects of security
practices and technologies. This is inherently at odds with the accessibility of databases. The more
usable and accessible the database is, the more susceptible we are to threats from security. The more
vulnerable it is to attacks and threats, the more difficult it is to access and utilize.
Why Database Security is Important?
According to the definition, a data breach refers to a breach of data integrity in databases. The amount
of damage an incident like a data breach can cause our business is contingent on various consequences
or elements.
o Intellectual property that is compromised: Our intellectual property--trade secrets,
inventions, or proprietary methods -- could be vital for our ability to maintain an advantage in
our industry. If our intellectual property has been stolen or disclosed and our competitive
advantage is lost, it could be difficult to keep or recover.
o The damage to our brand's reputation: Customers or partners may not want to purchase
goods or services from us (or deal with our business) If they do not feel they can trust our
company to protect their data or their own.
o The concept of business continuity (or lack of it): Some businesses cannot continue to
function until a breach has been resolved.
o Penalties or fines to be paid for not complying: The cost of not complying with international
regulations like the Sarbanes-Oxley Act (SAO) or Payment Card Industry Data Security Standard
(PCI DSS) specific to industry regulations on data privacy, like HIPAA or regional privacy laws
like the European Union's General Data Protection Regulation (GDPR) could be a major
problem with fines in worst cases in excess of many million dollars for each violation.
o Costs for repairing breaches and notifying consumers about them: Alongside notifying
customers of a breach, the company that has been breached is required to cover the
investigation and forensic services such as crisis management, triage repairs to the affected
systems, and much more.
7. Common Threats and Challenges
Numerous software configurations that are not correct, weaknesses, or patterns of carelessness or
abuse can lead to a breach of security. Here are some of the most prevalent kinds of reasons for
security attacks and the reasons.
Insider Dangers
An insider threat can be an attack on security from any three sources having an access privilege to the
database.
o A malicious insider who wants to cause harm
o An insider who is negligent and makes mistakes that expose the database to attack. vulnerable
to attacks
o An infiltrator is an outsider who acquires credentials by using a method like phishing or
accessing the database of credential information in the database itself.
Insider dangers are among the most frequent sources of security breaches to databases. They often
occur as a consequence of the inability of employees to have access to privileged user credentials.
Human Error
The unintentional mistakes, weak passwords or sharing passwords, and other negligent or uninformed
behaviours of users remain the root causes of almost half (49 percent) of all data security breaches.
Database Software Vulnerabilities can be Exploited
Hackers earn their money by identifying and exploiting vulnerabilities in software such as databases
management software. The major database software companies and open-source databases
management platforms release regular security patches to fix these weaknesses. However, failing to
implement the patches on time could increase the risk of being hacked.
SQL/NoSQL Injection Attacks
A specific threat to databases is the infusing of untrue SQL as well as other non-SQL string attacks in
queries for databases delivered by web-based apps and HTTP headers. Companies that do not follow
the safe coding practices for web applications and conduct regular vulnerability tests are susceptible to
attacks using these.
Buffer Overflow is a way to Exploit Buffers
Buffer overflow happens when a program seeks to copy more data into the memory block with a
certain length than it can accommodate. The attackers may make use of the extra data, which is stored
in adjacent memory addresses, to establish a basis for they can begin attacks.
DDoS (DoS/DDoS) Attacks
In a denial-of-service (DoS) attack in which the attacker overwhelms the targeted server -- in this case,
the database server with such a large volume of requests that the server is unable to meet no longer
legitimate requests made by actual users. In most cases, the server is unstable or even fails to function.
Malware
Malware is software designed to exploit vulnerabilities or cause harm to databases. Malware can be
accessed via any device that connects to the databases network.
Attacks on Backups
Companies that do not protect backup data using the same rigorous controls employed to protect
databases themselves are at risk of cyberattacks on backups.
8. Control methods of Database Security
Database Security means keeping sensitive information safe and prevent the loss of data. Security of
data base is controlled by Database Administrator (DBA).
The following are the main control measures are used to provide security of data in databases:
1. Authentication
2. Access control
3. Inference control
4. Flow control
5. Database Security applying Statistical Method
6. Encryption
These are explained as following below.
1. Authentication :
Authentication is the process of confirmation that whether the user log in only according to the
rights provided to him to perform the activities of data base. A particular user can login only up to
his privilege but he can’t access the other sensitive data. The privilege of accessing sensitive data is
restricted by using Authentication.
By using these authentication tools for biometrics such as retina and figure prints can prevent the
data base from unauthorized/malicious users.
2. Access Control :
The security mechanism of DBMS must include some provisions for restricting access to the data
base by unauthorized users. Access control is done by creating user accounts and to control login
process by the DBMS. So, that database access of sensitive data is possible only to those people
(database users) who are allowed to access such data and to restrict access to unauthorized
persons.
The database system must also keep the track of all operations performed by certain user
throughout the entire login time.
3. Inference Control :
This method is known as the countermeasures to statistical database security problem. It is used to
prevent the user from completing any inference channel. This method protect sensitive
information from indirect disclosure.
Inferences are of two types, identity disclosure or attribute disclosure.
4. Flow Control :
This prevents information from flowing in a way that it reaches unauthorized users. Channels are
the pathways for information to flow implicitly in ways that violate the privacy policy of a company
are called convert channels.
5. Database Security applying Statistical Method :
Statistical database security focuses on the protection of confidential individual values stored in
and used for statistical purposes and used to retrieve the summaries of values based on categories.
They do not permit to retrieve the individual information.
This allows to access the database to get statistical information about the number of employees in
the company but not to access the detailed confidential/personal information about the specific
individual employee.
6. Encryption :
This method is mainly used to protect sensitive data (such as credit card numbers, OTP numbers)
and other sensitive numbers. The data is encoded using some encoding algorithms.
An unauthorized user who tries to access this encoded data will face difficulty in decoding it, but
authorized users are given decoding keys to decode data.