Dbms Notes Mbu
Dbms Notes Mbu
ON
MBU
A.RANGAMPETA
The raw facts are called as data. The word “raw” indicates that they have not been processed.
What is information?
What is Knowledge?
DATA/INFORMATION PROCESSING:
The process of converting the data (raw facts) into meaningful information is called as data/information
processing.
Note: In business processing knowledge is more useful to make decisions for any organization.
DATA INFORMATION
1.Raw facts 1.Processed data
2. It is in unorganized form 2. It is in organized form
3. Data doesn’t help in 3. Information helps in
Decision decision
making process making process
INTRODUCTION TO DATABASES:
• Data processing tasks such as payroll were automated, with data stored on tapes.
• Data could also be input from punched card decks, and output to printers.
• With disks, network and hierarchical databases could be created that allowed data structures such as lists and
trees to be stored on disk. Programmers could construct and manipulate these data structures.
• With disks, network and hierarchical databases could be created that allowed data structures such as lists and
trees to be stored on disk. Programmers could construct and manipulate these data structures.
• Initial commercial relational database systems, such as IBM DB2, Oracle, Ingress, and DEC Rdb, played a
major role in advancing techniques for efficient processing of declarative queries.
• In the early 1980s, relational databases had become competitive with network and hierarchical database
systems even in the area of performance.
• The 1980s also saw much research on parallel and distributed databases, as well as initial work on object-
oriented databases.
Early 1990s:
• Decision support and querying re-emerged as a major application area for databases.
• The major event was the explosive growth of the World Wide Web.
• Databases were deployed much more extensively than ever before. Database systems now had to support very
high transaction processing rates, as well as very high reliability and 24 * 7 availability (availability 24 hours a
day, 7 days a week, meaning no downtime for scheduled maintenance activities).
The file management system also called as FMS in short is one in which all data is stored on a single
large file. The main disadvantage in this system is searching a record or data takes a long time. This lead to the
introduction of the concept, of indexing in this system. Then also the FMS system had lot of drawbacks to name
a few like updating or modifications to the data cannot be handled easily, sorting the records took long time and
so on. All these drawbacks led to the introduction of the Hierarchical Database System.
The previous system FMS drawback of accessing records and sorting records which took a long time
was removed in this by the introduction of parent-child relationship between records in database. The origin of
the data is called the root from which several branches have data at different levels and the last level is called
the
leaf. The main drawback in this was if there is any modification or addition made to the structure then the whole
structure needed alteration which made the task a tedious one. In order to avoid this next system took its origin
which is called as the Network Database System.
In this the main concept of many-many relationships got introduced. But this also followed the same
technology of pointers to define relationships with a difference in this made in the introduction if grouping of
data items as sets.
In order to overcome all the drawbacks of the previous systems, the Relational Database System got
introduced in which data get organized as tables and each record forms a row with many fields or attributes in
it. Relationships between tables are also formed in this system.
DATABASE:
(OR)
A database is a collection of information that is organized so that it can be easily accessed, managed and
updated.
The following are the various kinds of applications/organizations uses databases for their business processing
activities in their day-to-day life. They are:
1. Banking: For customer information, accounts, and loans, and banking transactions.
2. Airlines: For reservations and schedule information. Airlines were among the first to use databases in a
geographically distributed manner—terminals situated around the world accessed the central database system
through phone lines and other data networks.
4. Credit Card Transactions: For purchases on credit cards and generation of monthly statements.
5. Telecommunication: For keeping records of calls made, generating monthly bills, maintaining balances on
prepaid calling cards, and storing information about the communication networks.
6. Finance: For storing information about holdings, sales, and purchases of financial instruments such as stocks
and bonds.
8. Manufacturing: For management of supply chain and for tracking production of items in factories,
inventories of items in warehouses/stores, and orders for items.
9. Human resources: For information about employees, salaries, payroll taxes and benefits, and for generation
of paychecks.
11. Web: For access the Back accounts and to get the balance amount.
12. E –Commerce: For Buying a book or music CD and browse for things like watches, mobiles from the
Internet.
The database approach has some very characteristic features which are discussed in detail below:
Fundamental feature of the database approach is that the database system does not only contain the data
but also the complete definition and description of these data. These descriptions are basically details about the
extent, the structure, the type and the format of all data and, additionally, the relationship between the data. This
kind of stored data is called metadata ("data about data").
Application software does not need any knowledge about the physical data storage like encoding,
format, storage place, etc. It only communicates with the management system of a database (DBMS) via a
standardized interface with the help of a standardized language like SQL. The access to the data and the
metadata is entirely done by the DBMS. In this way all the applications can be totally separated from the data.
Data Integrity:
Data integrity is a byword for the quality and the reliability of the data of a database system. In a broader
sense data integrity includes also the protection of the database from unauthorized access (confidentiality) and
unauthorized changes. Data reflect facts of the real world.
Transactions:
A transaction is a bundle of actions which are done within a database to bring it from one consistent
state to a new consistent state. In between the data are inevitable inconsistent. A transaction is atomic what
means that it cannot be divided up any further. Within a transaction all or none of the actions need to be carried
out. Doing only a part of the actions would lead to an inconsistent database state.
Data Persistence:
Data persistence means that in a DBMS all data is maintained as long as it is not deleted explicitly. The
life span of data needs to be determined directly or indirectly be the user and must not be dependent on system
features. Additionally data once stored in a database must not be lost. Changes of a database which are done by
a transaction are persistent. When a transaction is finished even a system crash cannot put the data in danger
TYPES OF DATABASES:
➢
Database can be classified according to the following factors. They are:
1. Number of Users
2. Database Location
3. Expected type
4. Extent of use
➢
Desktop or personal computer database is an example for single user database.
➢
Workgroup database and enterprise databases are examples for multiuser database.
Workgroup database:
If the multiuser database supports relatively small number of users (fewer than 50) within an
organization is called as Workgroup database.
Enterprise database:
If the database is used by the entire organization and supports multiple users (more than 50) across many
departments is called as Enterprise database.
2. Based on Location:
According to the location of database the databases can be classified into following types. They are:
Centralized Database:
It is a database that is located, stored, and maintained in a single location. This location is most often a
central computer or database system, for example a desktop or server CPU, or a mainframe computer. In most
cases, a centralized database would be used by an organization (e.g. a business company) or an institution (e.g.
a university.)
A distributed database is a database in which storage devices are not all attached to a common CPU. It
may be stored in multiple computers located in the same physical location, or may be dispersed over a network
of interconnected computers.
• The DBMS is a general purpose software system that facilitates the process of defining constructing and
manipulating databases for various applications.
Goals of DBMS:
The primary goal of a DBMS is to provide a way to store and retrieve database information that is both
convenient and efficient
1. A Database represents some aspect of the real world. Changes to the real world reflected in the database.
Need of DBMS:
1. Before the advent of DBMS, organizations typically stored information using a “File Processing Systems”.
Example of such systems is File Handling in High Level Languages like C, Basic and COBOL etc., these
systems have Major disadvantages to perform the Data Manipulation. So to overcome those drawbacks now we
are using the DBMS.
3. In addition to that the database system must ensure the safety of the information stored, despite system
crashes or attempts at unauthorized access. If data are to be shared among several users, the system must avoid
possible anomalous results.
Data Independence:
Application programs should be as independent as possible from details of data representation and
storage. The DBMS can provide an abstract view of the data to insulate application code from such details.
A DBMS utilizes a variety of sophisticated techniques to store and retrieve data efficiently. This feature
is especially important if the data is stored on external storage devices.
If data is always accessed through the DBMS, the DBMS can enforce integrity constraints on the data.
For example, before inserting salary information for an employee, the DBMS can check that the department
budget is not exceeded. Also, the DBMS can enforce access controls that govern what data is visible to
different classes of users.
A database system allows several users to access the database concurrently. Answering different
questions from different users with the same (base) data is a central aspect of an information system. Such
concurrent use of data increases the economy of a system.
An example for concurrent use is the travel database of a bigger travel agency. The employees of
different branches can access the database concurrently and book journeys for their clients. Each travel agent
sees on his interface if there are still seats available for a specific journey or if it is already fully booked.
A DBMS also protects data from failures such as power failures and crashes etc. by the recovery
schemes such as backup mechanisms and log files etc.
When several users share the data, centralizing the administration of data can offer significant
improvements. Experienced professionals, who understand the nature of the data being managed, and how
different groups of users use it, can be responsible for organizing the data representation to minimize
redundancy and fine-tuning the storage of the data to make retrieval efficient.
DBMS supports many important functions that are common to many applications accessing data stored
in the DBMS. This, in conjunction with the high-level interface to the data, facilitates quick development of
applications. Such applications are also likely to be more robust than applications developed from scratch
because many important tasks are handled by the DBMS instead of being implemented by the application.
DISADVANTAGES OF DBMS:
Danger of a Overkill:
For small and simple applications for single users a database system is often not advisable.
Complexity:
A database system creates additional complexity and requirements. The supply and operation of a
database management system with several users and databases is quite costly and demanding.
Qualified Personnel:
`The professional operation of a database system requires appropriately trained staff. Without a qualified
database administrator nothing will work for long.
Through the use of a database system new costs are generated for the system itself but also for additional
hardware and the more complex handling of the system.
Lower Efficiency:
A database system is a multi-use software which is often less efficient than specialized software which
is produced and optimized exactly for one problem.
People who work with a database can be categorized as database users or database administrators.
Database Users:
There are four different types of database-system users, differentiated by the way they expect to interact with
the system.
Naive users:
Naive users are unsophisticated users who interact with the system by invoking one of the application programs
that have been written previously.
For example, a bank teller who needs to transfer $50 from account A to account B invokes a program
called transfer. This program asks the teller for the amount of money to be transferred, the account from which
the money is to be transferred, and the account to which the money is to be transferred.
Application programmers:
Application programmers are computer professionals who write application programs. Application
programmers can choose from many tools to develop user interfaces. Rapid application development (RAD)
Sophisticated users:
Sophisticated users interact with the system without writing programs. Instead, they form their requests
in a database query language. They submit each such query to a query processor, whose function is to break
down DML statements into instructions that the storage manager understands. Analysts who submit queries to
explore data in the database fall in this category.
Specialized users:
Specialized users are sophisticated users who write specialized database applications that do not fit into
the traditional data-processing framework.
Database Administrator:
One of the main reasons for using DBMSs is to have central control of both the data and the programs
that access those data. A person who has such central control over the system is called a database
administrator (DBA).
Schema definition:
The DBA creates the original database schema by executing a set of data definition statements in the
DDL, Storage structure and access-method definition.
The DBA carries out changes to the schema and physical organization to reflect the changing needs of
the organization, or to alter the physical organization to improve performance.
By granting different types of authorization, the database administrator can regulate which parts of the
database various users can access. The authorization information is kept in a special system structure that the
database system consults whenever someone attempts to access the data in the system.
Routine maintenance:
1. Periodically backing up the database, either onto tapes or onto remote servers, to prevent loss of data in case
of disasters such as flooding.
2. Ensuring that enough free disk space is available for normal operations, and upgrading disk space as required.
3. Monitoring jobs running on the database and ensuring that performance is not degraded by very expensive
tasks submitted by some users.
Hiding certain details of how the data are stored and maintained. A major purpose of database system is
to provide users with an “Abstract View” of the data. In DBMS there are 3 levels of data abstraction. The goal
of the abstraction in the DBMS is to separate the users request and the physical storage of data in the database.
Physical Level:
• The lowest Level of Abstraction describes “How” the data are actually stored.
• The physical level describes complex low level data structures in detail.
Logical Level:
• This level of data Abstraction describes “What” data are to be stored in the database and what relationships
exist among those data.
View Level:
• It is the highest level of data Abstracts that describes only part of entire database.
• Different users require different types of data elements from each database.
• The system may provide many views for the some database.
Schema:
The overall design of the database is called the “Schema” or “Meta Data”. A database schema
corresponds to the programming language type definition. The value of a variable in programming language
corresponds to an “Instance” of a database Schema.
The goal of this architecture is to separate the user applications and the physical database. In this
architecture, schemas can be defined at the following three levels:
1.The internal level has an internal schema, which describes the physical storage structure of the database.
The internal schema uses a physical data model and describes the complete details of data storage and
access paths for the database.
2. The conceptual level has a conceptual schema, which describes the structure of the whole database for a
community of users. The conceptual schema hides the details of physical storage structures and concentrates
on describing entities, data types, relationships, user operations, and constraints. A high-level data model or
an implementation data model can be used at this level.
3. The external or view level includes a number of external schemas or user views. Each external schema
describes the part of the database that a particular user group is interested in and hides the rest of the
database from that user group. A high-level data model or an implementation data model can be used at this
level.
• The ability to modify a scheme definition in one level without affecting a scheme definition in a higher
level is called data independence.
• The ability to modify the physical schema without causing application programs to be rewritten
• Modifications at this level are usually to improve performance.
• The ability to modify the conceptual schema without causing application programs to be rewritten
• Usually done when logical structure of database is altered
A database system is partitioned into modules that deal with each of the responsibilities of the overall
system. The functional components of a database system can be broadly divided into the storage manager and
the query processor components.
The storage manager is important because databases typically require a large amount of storage space.
Some Big organizations Database ranges from Giga bytes to Tera bytes. So the main memory of computers
cannot store this much information, the information is stored on disks. Data are moved between disk storage
and main memory as needed.
The query processor also very important because it helps the database system simplify and facilitate
access to data. So quick processing of updates and queries is important. It is the job of the database system to
translate updates and queries written in a nonprocedural language,
A storage manager is a program module that provides the interface between the low level data stored in
the database and the application programs and queries submitted to the system. The storage manager is
responsible for the interaction with the file manager. The storage manager translates the various DML
statements into low-level file-system commands. Thus, the storage manager is responsible for storing,
retrieving, and updating data in the database.
Authorization and integrity manager which tests for the satisfaction of integrity constraints and checks the
authority of users to access data.
Transaction manager which ensures that the database itself remains in a consistent state despite system
failures, and that concurrent transaction executions proceed without conflicting.
File manager: which manages the allocation of space on disk storage and the data structures used to represent
information stored on disk.
Buffer manager which is responsible for fetching data from disk storage into main memory. Storage manager
implements several data structures as part of the physical system implementation. Data files are used to store
the database itself. Data dictionary is used to stores metadata about the structure of the database, in particular
the schema of the database.
DDL interpreter: It interprets DDL statements and records the definitions in the data dictionary.
DML compiler: It translates DML statements in a query language into an evaluation plan consisting of low-
level instructions that the query evaluation engine understands.
Application Architectures:
Most users of a database system today are not present at the site of the database system, but connect to it
through a network. We can therefore differentiate between client machines, on which remote database users’
work, and server machines, on which the database system runs. Database applications are usually partitioned
into two or three parts. They are:
Two-Tier Architecture:
The application is partitioned into a component that resides at the client machine, which invokes
database system functionality at the server machine through query language statements. Application program
interface standards like ODBC and JDBC are used for interaction between the client and the server.
Three-Tier Architecture:
The client machine acts as merely a front end and does not contain any direct database calls. Instead, the
client end communicates with an application server, usually through forms interface. The application server in
turn communicates with a database system to access data. The business logic of the application, which says
what actions to carry out under what conditions, is embedded in the application server, instead of being
distributed across multiple clients. Three-tier applications are more appropriate for large applications, and for
applications that run on the World Wide Web.
The database design process can be divided into six steps. The ER Model is most relevant to the first
three steps. Next three steps are beyond the ER Model.
1. Requirements Analysis:
The very first step in designing a database application is to understand what data is to be stored in the
database, what applications must be built on top of it, and what operations are most frequent and subject to
performance requirements. The database designers collect information of the organization and analyzer, the
information to identify the user’s requirements. The database designers must find out what the users want from
the database.
Once the information is gathered in the requirements analysis step a conceptual database design is developed
and is used to develop a high level description of the data to be stored in the database, along with the constraints
that are known to hold over this data. This step is often carried out using the ER model, or a similar high-level
data model.
In this step convert the conceptual database design into a database schema (Logical Database Design) in
the data model of the chosen DBMS. We will only consider relational DBMSs, and therefore, the task in the
The first three steps are more relevant to the ER Model. Once the logical scheme is defined designer
consider the physical level implementation and finally provide certain security measures. The remaining three
steps of database design are briefly described below:
4. Schema Refinement:
The fourth step in database design is to analyze the collection of relations in our relational database
schema to identify potential problems, and to refine it. In contrast to the requirements analysis and conceptual
design steps, which are essentially subjective, schema refinement can be guided by some elegant and powerful
theory.
In this step we must consider typical expected workloads that our database must support and further
refine the database design to ensure that it meets desired performance criteria. This step may simply involve
building indexes on some tables and clustering some tables, or it may involve a substantial redesign of parts of
the database schema obtained from the earlier design steps.
6. Security Design:
The last step of database design is to include security features. This is required to avoid unauthorized
access to database practice after all the six steps. We required Tuning step in which all the steps are interleaved
and repeated until the design is satisfactory.
• DBMS performs several important functions that guarantee the integrity and consistency of the data in the
database.
• Those functions transparent to end users and can be accessed only through the use of DBMS. They include:
• Data Dictionary Management
• Data Storage Management
• Data transformation and Presentation
• Security Management
• Multiple Access Control
• Backup and Recovery Management
• Data Integrity Management
• Database Access Languages
• Databases Communication Interfaces
• DBMS stores definitions of database elements and their relationship (Metadata) in the data dictionary.
• The DBMS uses the data dictionary to look up the required data component structures and relationships.
• Any change made in database structure is automatically recorded in the data dictionary.
• Modern DBMS provides storage not only for data but also for related data entities.
• Data Storage Management is also important for database “performance tuning”.
• Performance tuning related to activities that make database more efficiently.
Security Management:
• DBMS creates a security system that enforces the user security and data privacy.
• Security rules determines which users can access the database, which data items each user can access etc.
• DBA and authenticated user logged to DBMS through username and password or through Biometric
authentication such as Finger print and face reorganization etc.
Multiuser Access Control:
• To provide data integrity and data consistency, DBMS uses sophisticated algorithms to ensure that
multiple users can access the database concurrently without compromising the integrity of database.
• DBMS provides backup and recovery to ensure data safety and integrity.
• Recovery management deals with the recovery of database after failure such as bad sector in the disk
or power failure. Such capability is critical to preserve database integrity.
• DBMS provides and enforces integrity rules, thus minimizing data redundancy and maximizing
data consistency.
• A query language is a non-procedural language i.e. it lets the user specify what must be done
without specifying how it is to be done.
• Current DBMS’s are accepting end-user requests via different network environments.
• For example, DBMS might provide access to database via Internet through the use of web browsers such
as Mozilla Firefox or Microsoft Internet Explorer.
What is Schema?
A database schema is the skeleton structure that represents the logical view of the entire database. (or)
• It defines how the data is organized and how the relations among them are associated.
STUDENT
What is Instance?
The data stored in the database at any given time is an instance of the database
Student
In the above table 1201, 1202, Venkat etc are said to be instance of student table.
Relational Algebra
Preliminaries
A query language is a language in which user requests to retrieve some information from the database.
The query languages are considered as higher level languages than programming languages. Query languages
are of two types,
Procedural Language
Non-Procedural Language
1. In procedural language, the user has to describe the specific procedure to retrieve the information
from the database.
2. In non-procedural language, the user retrieves the information from the database without describing
the specific procedure to retrieve it.
Example: The Tuple Relational Calculus and the Domain Relational Calculus are non-procedural
languages.
Relational Algebra
The relational algebra is a procedural query language. It consists of a set of operations that take one or
two relations (tables) as input and produce a new relation, on the request of the user to retrieve the specific
information, as the output.
The Selection, Projection and Rename operations are called unary operations because they operate only
on one relation. The other operations operate on pairs of relations and are therefore called binary operations.
The Selection is a relational algebra operation that uses a condition to select rows from a relation. A new
relation (output) is created from another existing relation by selecting only rows requested by the user that
satisfy a specified condition. The lower greek letter ‘sigma ’ is used to denote selection operation.
Example: Find the customer details who are living in Hyderabad city from customer relation.
The selection operation uses the column names in specifying the selection condition. Selection
conditions are same as the conditions used in the ‘if’ statement of any programming languages, selection
condition uses the relational operators < > <= >= != . It is possible to combine several conditions into a large
condition using the logical connectives ‘and’ represented by ‘‘ and ‘or’ represented by ‘’.
Example:
Find the customer details who are living in Hyderabad city and whose customer_id is greater than 1000
in Customer relation.
The projection is a relational algebra operation that creates a new relation by deleting columns from an
existing relation i.e., a new relation (output) is created from another existing relation by selecting only those
columns requested by the user from projection and is denoted by letter pi ().
The Selection operation eliminates unwanted rows whereas the projection operation eliminates
unwanted columns. The projection operation extracts specified columns from a table.
In the above example, the selection operation is performed first. Next, the projection of the resulting
relation on the customer_name column is carried out. Thus, instead of all customer details of customers living
in Hyderabad city, we can display only the customer names of customers living in Hyderabad city.
The above example is also known as relational algebra expression because we are combining two or
more relational algebra operations (ie., selection and projection) into one at the same time.
Example: Find the customer names (not all customer details) from customer relation.
customer_name ( customer )
The above stated query lists all customer names in the customer relation and this is not called as
relational algebra expression because it is performing only one relational algebra operation.
3) The Set Operations: ( Union, Intersection, Set-Difference, Cartesian product )
i) Union ‘ ’ Operation:
The union denoted by ‘’ It is a relational algebra operation that creates a union or combination of two
relations. The result of this operation, denoted by d b is a relation that includes all tuples that all either in d or
in b or in both d and b, where duplicate tuples are eliminated.
Example: Find the customer_id of all customers in the bank who have either an account or a loan or both.
To solve the above query, first find the customers with an account in the bank. That is customer_id (
depositor ). Then, we have to find all customers with a loan in the bank, customer_id ( borrower ). Now, to
answer the above query, we need the union of these two sets, that is, all customer names that appear in either or
If some customers A, B and C are both depositors as well as borrowers, then in the resulting relation,
their customer ids will occur only once because duplicate values are eliminated.
Therefore, for a union operation d b to be valid, we require that two conditions to be satisfied,
i) The relations depositor and borrower must have same number of attributes / columns.
ii) The domains of ith attribute of depositor relation and the ith attribute of borrower relation must be the
same, for all i.
The intersection operation denoted by ‘ ’ It is a relational algebra operation that finds tuples that are in
both relations. The result of this operation, denoted by d b, is a relation that includes all tuples common in
both depositor and borrower relations.
Example: Find the customer_id of all customers in the bank who have both an account and a loan.
The resulting relation of this query, lists all common customer ids of customers who have both an
account and a loan. Therefore, for an intersection operation d b to be valid, it requires that two conditions to
be satisfied as was the case of union operation stated above.
The set-difference operation denoted by’ −’ It is a relational algebra operation that finds tuples that are
in one relation but are not in another.
The resulting relation for this query, lists the customer ids of all customers who have an account but not
a loan. Therefore a difference operation d − b to be valid, it requires that two conditions to be satisfied as was
case of union operation stated ablove.
The Cartesian-product operation denoted by a cross ‘X’ It is a relational algebra operation which
allows to combine information from who relations into one relation.
Assume that there are n1 tuple in borrower relation and n2 tuples in loan relation. Then, the result of this
operation, denoted by r = borrower X loan, is a relation ‘r’ that includes all the tuples formed by each possible
pair of tuples one from the borrower relation and one from the loan relation. Thus, ‘r’ is a large relation
containing n1 * n2 tuples.
The drawback of the Cartesian-product is that same attribute name will repeat.
Example: Find the customer_id of all customers in the bank who have loan > 10,000.
That is, get customer_id from borrower relation and loan_amount from loan relation. First, find
Cartesian product of borrower X loan, so that the new relation contains both customer_id, loan_amoount with
each combination. Now, select the amount, by bloan_ampunt > 10000.
So, if any customer have taken the loan, then borrower.loan_no = loan.loan_no should be selected as
their entries of loan_no matches in both relation.
The Rename operation is denoted by rho ’’. It is a relational algebra operation which is used to give the
new names to the relation algebra expression. Thus, we can apply the rename operation to a relation ‘borrower’
to get the same relation under a new name. Given a relation ‘customer’, then the expression returns the same
relation ‘customer’ under a new name ‘x’.
x ( customer )
After performed this operation, Now there are two relations, one with customer name and second with
‘x’ name. The ‘rename’ operation is useful when we want to compare the values among same column attribute
in a relation.
If we want to find the largest account balance in the bank, Then we have to compare the values among
same column (balance) with each other in a same relation account, which is not possible.
So, we rename the relation with a new name‘d’. Now, we have two relations of account, one with
account name and second with ‘d’ name. Now we can compare the balance attribute values with each other in
separate relations.
The join operation, denoted by join ‘ ’. It is a relational algebra operation, which is used to combine
(join) two relations like Cartesian-product but finally removes duplicate attributes and makes the operations
(selection, projection, ..) very simple. In simple words, we can say that join connects relations on columns
containing comparable information.
i) Natural Join:
The natural join is a binary operation that allows us to combine two different relations into one relation
and makes the same column in two different relations into only one-column in the resulting relation. Suppose
we have relations with following schemas, which contain data on full-time employees.
employee_works relation
If we want to generate a single relation with all the information (emp_name, street, city, branch_name
and salary) about full-time employees. then, a possible approach would be to use the natural-join operation as
follows,
employee employee_works
We have lost street and city information about Smith, since tuples describing smith is absent in
employee_works. Similarly, we have lost branch_name and salary information about Gates, since the tuple
Example: Find the employee names and city who have salary details.
The join operation selects all employees with salary details, from where we can easily project the
employee names, cities and salaries. Natural Join operation results in some loss of information.
The drawback of natural join operation is some loss of information. To overcome the drawback of
natural join, we use outer-join operation. The outer-join operation is of three types,
a) Left outer-join ( )
b) Right outer-join ( )
c) Full outer-join ( )
a) Left Outer-join:
The left outer-join takes all tuples in left relation that did not match with any tuples in right relation,
adds the tuples with null values for all other columns from right relation and adds them to the result of natural
join as follows,
employee_works relation
employee relation
The full outer-join operation does both of those operations, by adding tuples from left relation that did
not match any tuples from the reight relations, as well as adds tuples from the right relation that did not match
any tuple from the left relation and adding them to the result of natural join as follows,
employee_works relation
employee relation
The theta join operation, denoted by symbol “ ” . It is an extension to the natural join operation
that combines two relations into one relation with a selection condition ( ).
The theta join operation is expressed as employee salary < 19000 employee_works and the resulting is as
follows,
There are two tuples selected because their salary greater than 20000 (salary > 20000). The result of
theta join as follows,
Let, Relation A is (x1, x2, …., xn, y1, y2, …, ym) and
Relation B is (y1, y2, …, ym),
Where, y1, y2, …, ym tuples are common to the both relations A and B with
same domain
compulsory.
Then, A B = new relation with x1, x2, …., xn tuples. Relation A and B represents the
dividend and devisor respectively. A tuple ‘t’ is in a b, if and only if two conditions are to be
satisfied,
t is in A-B (r)
for every tuple tb in B, there is a tuple ta in A satisfying the following two things,
1. ta[B] = tb[B]
2. ta[A-B] = t
Relational Calculus
It allows user to describe the set of answers without showing procedure about how they
should be computed. Relational calculus has a big influence on the design of commercial query
languages such as SQL and QBE (Query-by Example).
Variables in TRC takes tuples (rows) as values and TRC had strong influence on SQL.
Variables in DRC takes fields (attributes) as values and DRC had strong influence on
QBE.
Examples:
{ t | t loan }
This query gives all loan details such as loan_no, loan_date, loan_amt for all loan table
in a bank.
2) Find all loan details for loan amount over 100000 in loan relation.
This query gives all loan details such as loan_no, loan_date, loan_amt for all loan over
100000 in a loan table in a bank.
A Duple Relational Calculus (DRC) is a variable that comes in the range of the values of
domain (data types) of some columns (attributes).
Where, each xi is either a domain variable or a constant and p(< x1, x2, …., xn >)
denotes a DRC
formula.
A DRC formula is defined in a manner that is very similar to the definition of a TRC
formula. The main difference is that the variables are domain variables.
Examples:
This query gives all loan details such as loan_no, loan_date, loan_amt for all loan table
in a bank. Each column is represented with an initials such as N- loan_no, D – loan_date, A –
loan_amt. The condition < N, D, A > loan ensures that the domain variables N, D, A are
restricted to the column domain.
The tuple relational calculus restricts to safe expressions and is equal in expressive
power to relational algebra. Thus, for every relational algebra expression, there is an equivalent
expression in the tuple relational calculus and for tuple relational calculus expression, there is
an equivalent relational algebra expression.
For any given I, the set of answers for Q contains only values that are in dom(Q, I).
For each sub expression of the form R(p(R)) in Q, if a tuple r makes the formula
true, then r contains
only constraints in dom(Q, I).
3) For each sub expression of the form R(p(R)) in Q, if a tuple r contains a constant
that is not in
The expressive power of relational algebra is often used as a metric how powerful a
relational database query language is. If a query language can express all the queries that we can
express in relational algebra, it is said to be relationally complete. A practical query language is
expected to be relationally complete. In addition, commercial query languages typically support
features that allow us to express some queries that cannot be expressed in relational algebra.