Dbms Unit II
Dbms Unit II
UNIT II
Relational Algebra is procedural query language, which takes Relation as input and generate
relation as output. N relational algebra is a procedural query language, it means that it tells what
data to be retrieved and how to be retrieved. Relational algebra mainly provides theoretical
foundation for relational databases and SQL.
Relational model can represent as a table with columns and rows. Each row is known as a tuple.
Each table of the column has a name or attribute.
In relational model, the data and relationships are represented by collection of inter-related
tables. Each table is a group of column and rows, where column represents attribute of an entity
and rows represents records.
Attribute: It contains the name of a column in a particular table. Each attribute Ai must have a
domain, dom(Ai)
Relational instance: In the relational database system, the relational instance is represented by a
finite set of tuples. Relation instances do not have duplicate tuples.
Relational schema: A relational schema contains the name of the relation and name of all
columns or attributes.
Relational key: In the relational key, each row has one or more attributes. It can identify the row
in the relation uniquely.
Degree: The number of attributes in the relation is known as degree of the relation..
o Integrity constraints are a set of rules. It is used to maintain the quality of information.
o Integrity constraints ensure that the data insertion, updating, and other processes have to
be performed in such a way that data integrity is not affected.
o Thus, integrity constraint is used to guard against accidental damage to the database.
1. Domain constraints
o Domain constraints can be defined as the definition of a valid set of values for an
attribute.
o The data type of domain includes string, character, integer, time, date, currency, etc. The
value of the attribute must be available in the corresponding domain.
o The entity integrity constraint states that primary key value can't be null.
o This is because the primary key value is used to identify individual rows in relation and if
the primary key has a null value, then we can't identify those rows.
o A table can contain a null value other than the primary key field.
o In the Referential integrity constraints, if a foreign key in Table 1 refers to the Primary
Key of Table 2, then every value of the Foreign Key in Table 1 must be null or be
available in Table 2.
Example:
4. Key constraints
o Keys are the entity set that is used to identify an entity within its entity set uniquely.
o An entity set can have multiple keys, but out of which one key will be the primary key. A
primary key can contain a unique and null value in the relational table.
Example:
Relational Algebra
Relational algebra is a procedural query language. It gives a step by step process to obtain the
result of the query. It uses operators to perform queries.
SELECT (symbol: σ) : The SELECT operation is used for selecting a subset of the
tuples according to a given selection condition. Sigma(σ) Symbol denotes it. It is used as
an expression to choose tuples which meet the selection condition. Select operation
selects tuples that satisfy a given predicate
Notation: σ p(r)
Where:
For example −
σsubject = "database"(Books)
Output − Selects tuples from books where subject is 'database' and 'price' is 450.
Output − Selects tuples from books where subject is 'database' and 'price' is 450 or those
books published after 2010.
PROJECT (symbol: π) : The projection eliminates all attributes of the input relation but
those mentioned in the projection list. The projection method defines a relation that
contains a vertical subset of Relation.
This helps to extract the values of specified attributes to eliminates duplicate values. (pi)
The symbol used to choose attributes from a relation. This operation helps you to keep
specific columns from a relation and discards the other columns.
Example of Projection:
1 Google Active
2 Amazon Active
3 Apple Inactive
4 Alibaba Active
CustomerName Status
Google Active
Amazon Active
Apple Inactive
Alibaba Active
ρ(STUDENT1, STUDENT)
UNION (υ) : UNION is symbolized by ∪ symbol. It includes all tuples that are in tables
A or in B. It also eliminates duplicate tuples. So, set A UNION set B would be expressed
as:
A∪B
Example:
DEPOSITOR RELATION
CUSTOMER_NAME ACCOUNTO
Johnson A-101
Smith A-121
Mayes A-321
Turner A-176
Johnson A-273
Jones A-472
Lindsay A-284
BORROW RELATION
CUSTOMER_NAME LOAN_NO
Jones L-17
Smith L-23
Hayes L-15
Jackson L-14
Curry L-93
Smith L-11
Williams L-17
Input:
Output:
CUSTOMER_NAME
Johnson
Smith
Hayes
Turner
Jones
Lindsay
Jackson
Curry
Williams
Mayes
INTERSECTION (∩) and it is used to select common rows (tuples) from two tables
(relations).
Lets say we have two relations R1 and R2 both have same columns and we want to select
all those tuples(rows) that are present in both the relations, then in that case we can apply
intersection operation on these two relations R1 ∩ R2.
Example
Table 1: COURSE
Table 2: STUDENT
S901 Aditya 19
S911 Steve 18
Sukshma RD,Asst Prof,Hindustan College Page 9
Database Management Systems
S921 Paul 19
S931 Lucy 17
S941 Carl 16
S951 Rick 18
Query:
Output:
Student_Name
------------
Aditya
Steve
Paul
Lucy
SET DIFFERENCE (-) : Set Difference is denoted by – symbol. Lets say we have two
relations R1 and R2 and we want to select all those tuples(rows) that are present in
Relation R1 but not present in Relation R2, this can be done using Set difference R1 –
R2.
Example
Table 1: COURSE
Table 2: STUDENT
S901 Aditya 19
S911 Steve 18
S921 Paul 19
S931 Lucy 17
S941 Carl 16
S951 Rick 18
Query:
Output:
Student_Name
------------
Carl
Rick
R1 X R2
Table 1: R
Col_A Col_B
----- ------
AA 100
BB 200
CC 300
Table 2: S
Col_X Col_Y
----- -----
XX 99
YY 11
ZZ 101
Query:
Lets find the cartesian product of table R and S.
RXS
Output:
AA 100 XX 99
AA 100 YY 11
AA 100 ZZ 101
BB 200 XX 99
BB 200 YY 11
BB 200 ZZ 101
CC 300 XX 99
CC 300 YY 11
CC 300 ZZ 101
Example:
EMPLOYEE
EMP_CODE EMP_NAME
101 Stephan
102 Jack
103 Harry
SALARY
EMP_CODE SALARY
101 50000
102 30000
103 25000
Result:
Natural Join:
o A natural join is the set of tuples of all combinations in R and S that are equal on their
common attribute names.
o It is denoted by ⋈.
Example:
EMPLOYEE
EMP_CODE EMP_NAME
101 Stephan
102 Jack
103 Harry
SALARY
EMP_CODE SALARY
101 50000
102 30000
103 25000
Input:
Output:
EMP_NAME SALARY
Stephan 50000
Jack 30000
Harry 25000
Outer Join:
The outer join operation is an extension of the join operation. It is used to deal with missing
information.
Example:
EMPLOYEE
FACT_WORKERS
Input:
(EMPLOYEE ⋈ FACT_WORKERS)
Output:
o Left outer join contains the set of tuples of all combinations in R and S that are equal on
their common attribute names.
o It is denoted by ⟕.
Input:
1. EMPLOYEE ⟕ FACT_WORKERS
o Right outer join contains the set of tuples of all combinations in R and S that are equal on
their common attribute names.
o It is denoted by ⟖.
Input:
EMPLOYEE ⟖ FACT_WORKERS
Output:
o Full outer join is like a left or right join except that it contains all rows from both tables.
o In full outer join, tuples in R that have no matching tuples in S and tuples in S that have
no matching tuples in R in their common attribute name.
o It is denoted by ⟗.
Input:
EMPLOYEE ⟗ FACT_WORKERS
Output:
d. Equi join:
It is also known as an inner join. It is the most common join. It is based on matched data as per
the
Example:
CUSTOMER RELATION
CLASS_ID NAME
1 John
2 Harry
3 Jackson
PRODUCT
PRODUCT_ID CITY
1 Delhi
2 Mumbai
3 Noida
Input:
CUSTOMER ⋈ PRODUCT
Output:
1 John 1 Delhi
2 Harry 2 Mumbai
3 Harry 3 Noida
Aggregate functions:
An aggregate function allows you to perform a calculation on a set of values to return a single
scalar value.
The following are the most commonly used SQL aggregate functions:
example:
Id Name Salary
-----------------------
1 A 80
Sukshma RD,Asst Prof,Hindustan College Page 22
Database Management Systems
2 B 40
3 C 60
4 D 70
5 E 60
6 F Null
example:
Id Name Salary
-----------------------
1 A 80
2 B 40
3 C 60
4 D 70
5 E 60
6 F Null
example:
Id Name Salary
-----------------------
1 A 80
2 B 40
3 C 60
4 D 70
5 E 60
6 F Null
example:
Id Name Salary
-----------------------
1 A 80
2 B 40
3 C 60
4 D 70
5 E 60
6 F Null
example:
Id Name Salary
-----------------------
1 A 80
2 B 40
3 C 60
4 D 70
5 E 60
6 F Null
sum(salary): Sum all Non Null values of Column salary i.e., 310
sum(Distinct salary): Sum of all distinct Non-Null values i.e., 250.
AND Example
The following SQL statement selects all fields from "Customers" where country is "Germany"
AND city is "Berlin":
Example
OR Example
The following SQL statement selects all fields from "Customers" where city is "Berlin" OR
"München":
Example
NOT Example
The following SQL statement selects all fields from "Customers" where country is NOT
"Germany":
Example
GROUP BY Statement
The GROUP BY statement groups rows that have the same values into summary rows, like
"find the number of customers in each country".
The GROUP BY statement is often used with aggregate functions (COUNT, MAX, MIN,
SUM, AVG) to group the result-set by one or more columns.
Syntax
SELECT column_name(s)
FROM table_name
WHERE condition
GROUP BY column_name(s)
ORDER BY column_name(s);
The following SQL statement lists the number of customers in each country:
Example
VIEW
a view is a virtual table based on the result-set of an SQL statement. The SQL VIEW is, in
essence, a virtual table that does not physically exist. Rather, it is created by a SQL statement
that joins one or more tables.
A view contains rows and columns, just like a real table. The fields in a view are fields from one
or more real tables in the database.
Views syntax
Example : The following SQL creates a view that shows all customers from Brazil
UNIT III
Normalization
Normalization is a process of organizing the data in database to avoid data redundancy,
insertion anomaly, update anomaly & deletion anomaly.
o Normalization divides the larger table into the smaller table and links them using
relationship.
o The normal form is used to reduce redundancy from the database table.
Anomalies in DBMS
There are three types of anomalies that occur when the database is not normalized. These are –
Insertion, update and deletion anomaly. Let’s take an example to understand this.
Example: Suppose a manufacturing company stores the employee details in a table named
employee that has four attributes: emp_id for storing employee’s id, emp_name for storing
employee’s name, emp_address for storing employee’s address and emp_dept for storing the
department details in which the employee works. At some point of time the table looks like this:
The above table is not normalized. We will see the problems that we face when a table is not
normalized.
Update anomaly: In the above table we have two rows for employee Rick as he belongs to two
departments of the company. If we want to update the address of Rick then we have to update the
same in two rows or the data will become inconsistent. If somehow, the correct address gets
updated in one department but not in other then as per the database, Rick would be having two
different addresses, which is not correct and would lead to inconsistent data.
Insert anomaly: Suppose a new employee joins the company, who is under training and
currently not assigned to any department then we would not be able to insert the data into the
table if emp_dept field doesn’t allow nulls.
Delete anomaly: Suppose, if at a point of time the company closes the department D890 then
deleting the rows that are having emp_dept as D890 would also delete the information of
employee Maggie since she is assigned only to this department.
To overcome these anomalies we need to normalize the data. In the next section we will discuss
about normalization.
As per the rule of first normal form, an attribute (column) of a table cannot hold multiple values.
It should hold only atomic values.
Example: Suppose a company wants to store the names and contact details of its employees. It
creates a table that looks like this:
8812121212
102 Jon Kanpur
9900012222
9990000123
104 Lester Bangalore
8123450987
Two employees (Jon & Lester) are having two mobile numbers so the company stored them in
the same field as you can see in the table above.
This table is not in 1NF as the rule says “each attribute of a table must have atomic (single)
values”, the emp_mobile values for employees Jon & Lester violates that rule.
To make the table complies with 1NF we should have the data like this:
No non-prime attribute is dependent on the proper subset of any candidate key of table.
An attribute that is not part of any candidate key is known as non-prime attribute.
Example: Suppose a school wants to store the data of teachers and the subjects they teach. They
create a table that looks like this: Since a teacher can teach more than one subjects, the table can
have multiple rows for a same teacher.
111 Maths 38
111 Physics 38
222 Biology 38
333 Physics 40
333 Chemistry 40
CandidateKeys:{teacher_id,subject}
Non prime attribute: teacher_age
The table is in 1 NF because each attribute has atomic values. However, it is not in 2NF because
non prime attribute teacher_age is dependent on teacher_id alone which is a proper subset of
candidate key. This violates the rule for 2NF as the rule says “no non-prime attribute is
dependent on the proper subset of any candidate key of the table”.
To make the table complies with 2NF we can break it in two tables like this:
teacher_details table:
teacher_id teacher_age
111 38
222 38
333 40
teacher_subject table:
teacher_id Subject
111 Maths
111 Physics
222 Biology
333 Physics
333 Chemistry
An attribute that is not part of any candidate key is known as non-prime attribute.
In other words 3NF can be explained like this: A table is in 3NF if it is in 2NF and for each
functional dependency X-> Y at least one of the following conditions hold:
An attribute that is a part of one of the candidate keys is known as prime attribute.
Example: Suppose a company wants to store the complete address of each employee, they create
a table named employee_details that looks like this:
Here, emp_state, emp_city & emp_district dependent on emp_zip. And, emp_zip is dependent on
emp_id that makes non-prime attributes (emp_state, emp_city & emp_district) transitively
dependent on super key (emp_id). This violates the rule of 3NF.
To make this table complies with 3NF we have to break the table into two tables to remove the
transitive dependency:
employee table:
employee_zip table:
It is an advance version of 3NF that’s why it is also referred as 3.5NF. BCNF is stricter than
3NF. A table complies with BCNF if it is in 3NF and for every functional dependency X->Y, X
should be the super key of the table.
Example: Suppose there is a company wherein employees work in more than one department.
They store the data like this:
The table is not in BCNF as neither emp_id nor emp_dept alone are keys.
To make the table comply with BCNF we can break the table in three tables like this:
emp_nationality table:
emp_id emp_nationality
1001 Austrian
1002 American
emp_dept table:
emp_dept_mapping table:
emp_id emp_dept
1001 Stores
Functional
dependencies:
emp_id ->
emp_nationality
emp_dept -> {dept_type, dept_no_of_emp}
Candidate keys:
For first table: emp_id
For second table: emp_dept
For third table: {emp_id, emp_dept}
This is now in BCNF as in both the functional dependencies left side part is a key.
may be an entire program, a piece of a program or a single command (like the SQL
commands such as INSERT or UPDATE) and it may engage in any number of operations on
the database.
Let’s take an example of a simple transaction. Suppose a bank employee transfers Rs 500
from A's account to B's account. This very simple and small transaction involves several low-
level tasks.
A’s Account
Open_Account(A)
Old_Balance = A.balance
A.balance = New_Balance
Close_Account(A)
B’s Account
Open_Account(B)
Old_Balance = B.balance
B.balance = New_Balance
Close_Account(B)
Operations Descriptions
Now that we understand what is transaction, we should understand what are the problems
associated with it.
The main problem that can happen during a transaction is that the transaction can fail before
finishing the all the operations in the set. This can happen due to power failure, system crash etc.
This is a serious problem that can leave database in an inconsistent state. Assume that transaction
fail after third operation (see the example above) then the amount would be deducted from your
account but your friend will not receive it.
Commit: If all the operations in a transaction are completed successfully then commit those
changes to the database permanently.
Transaction Properties
A transaction is a very small unit of a program and it may contain several low-level tasks. A
transaction in a database system must maintain Atomicity, Consistency, Isolation, and Durability
− commonly known as ACID properties − in order to ensure accuracy, completeness, and data
integrity.
1. Atomicity : This property states that a transaction must be treated as an atomic unit, that is,
either all of its operations are executed or none. There must be no state in a database where a
transaction is left partially completed. States should be defined either before the execution of the
transaction or after the execution/abortion/failure of the transaction. Atomicity is also known as
the ‘All or nothing rule’.
Abort: If a transaction aborts then all the changes made are not visible.
Commit: If a transaction commits then all the changes made are visible.
Example: Let's assume that following transaction T consisting of T1 and T2. A consists of Rs
600 and B consists of Rs 300. Transfer Rs 100 from account A to account B.
T1 T2
Read(A) Read(B)
Write(A) Write(B)
If the transaction T fails after the completion of transaction T1 but before completion of
transaction T2, then the amount will be deducted from A but not added to B. This shows the
inconsistent database state. In order to ensure correctness of database state, the transaction must
be executed in entirety.
2. Consistency: The database must remain in a consistent state after any transaction. If the
database was in a consistent state before the execution of a transaction, it must remain consistent
after the execution of the transaction as well. The transaction is used to transform the database
from one consistent state to another consistent state. The execution of a transaction will leave a
database in either its prior stable state or a new stable state. The integrity constraints are
maintained so that the database is consistent before and after the transaction.
For example: The total amount must be maintained before or after the transaction.
3. Isolation: Isolation states that for every pair of transactions, one transaction should start
execution only when the other finished execution. Isolation property of a transaction means that
the data used during the execution of a transaction cannot be used by a second transaction until
the first one is completed. This property isolates transactions from one another.In other words, if
a transaction T1 is being executed and is using the data item X, that data item cannot be accessed
by any other transaction (T2………..Tn) until T1 ends.
4. Durability : This property ensures that once the transaction has completed execution, the
updates and modifications to the database are stored in and written to disk. The database should
be durable enough to hold all its latest updates even if the system fails or restarts. If a transaction
updates a chunk of data in a database and commits, then the database will hold the modified data.
If a transaction commits but the system fails before the data could be written on to the disk, then
that data will be updated once the system springs back into action.
These updates now become permanent and are stored in non-volatile memory. The effects of the
transaction, thus, are never lost.
States of Transactions
1.Active
2.PartiallyCommitted
3.Failed
4.Aborted
5. Committed
State Description
Active state A transaction goes into an active state immediately after it starts execution,
where it can issue READ and WRITE operations.
A transaction may be aborted when the transaction itself detects an error
during execution which it cannot recover from, for example, a transaction
trying to debit loan amount of an employee from his insufficient gross salary.
A transaction may also be aborted before it has been committed due to system
failure or any other circumstances beyond its control.
Partially committed When the transaction ends, it moves to the partially committed state.When the
last state is reached.
To this point, some recovery protocols need to ensure that a system failure
will not result in an inability to record the changes of the transaction
permanently. Once this check is successful, the transaction is said to have
reached its commit point and enters the committed state.
Failed state o If any of the checks made by the database recovery system fails, then
the transaction is said to be in the failed state.
o In the example of total mark calculation, if the database is not able to
fire a query to fetch the marks, then the transaction will fail to execute
Aborted o If any of the checks fail and the transaction has reached a failed state
then the database recovery system will make sure that the database is
in its previous consistent state. If not then it will abort or roll back the
transaction to bring the database into a consistent state.
Single User VS Multi User Systems One criteria to classify Database is according to number of
user that concurrently connect to the system.
Single User: only one user use the system in each time
Multi User: many users use the system in the same time.
Transactions, Read and Write Operations, and DBMS Buffers What is a Transaction?
A transaction include one or more database access operations. The database operation can be
embedded within an application or can be specified by high level query language. Specified
boundary by Begin and End transaction statements If the database operations in a transaction do
not update the database, it is called “ Read-only transaction ”.
Failure Classification
To find that where the problem has occurred, we generalize a failure into the following
categories:
1. Transaction failure
2. System crash
3. Disk failure
1. Transaction failure
The transaction failure occurs when it fails to execute or when it reaches a point from
where it can't go any further. If a few transaction or process is hurt, then this is called as
transaction failure.
2. System Crash
3. System failure can occur due to power failure or other hardware or software
failure. Example: Operating system error.
3. Disk Failure
4. It occurs where hard-disk drives or storage drives used to fail frequently. It was a
common problem in the early days of technology evolution.
5. Disk failure occurs due to the formation of bad sectors, disk head crash, and
unreachability to the disk or any other failure, which destroy all or part of disk
storage.
Concurrency Control
Reason for allowing concurrency Improved throughput of transactions and system resource
utilization Reduced waiting time of transactions.
Concurrency Control is the management procedure that is required for controlling concurrent
execution of the operations that take place on a database.
But before knowing about concurrency control, we should know about concurrent execution.
In a database transaction, the two main operations are READ and WRITE operations. So, there
is a need to manage these two operations in the concurrent execution of the transactions as if
these operations are not performed in an interleaved manner, and the data may become
inconsistent. So, the following problems occur with the Concurrent Execution of the operations:
The problem occurs when two different database transactions perform the read/write operations
on the same database items in an interleaved manner (i.e., concurrent execution) that makes the
values of the items incorrect hence making the database inconsistent.
For example:
Consider the below diagram where two transactions T X and TY, are performed on the same
account A where the balance of account A is $300.
o At time t1, transaction TX reads the value of account A, i.e., $300 (only read).
o At time t2, transaction TX deducts $50 from account A that becomes $250 (only deducted
and not updated/write).
o Alternately, at time t3, transaction TY reads the value of account A that will be $300 only
because TX didn't update the value yet.
o At time t4, transaction TY adds $100 to account A that becomes $400 (only added but not
updated/write).
o At time t6, transaction TX writes the value of account A that will be updated as $250 only,
as TY didn't update the value yet.
o Similarly, at time t7, transaction TY writes the values of account A, so it will write as
done at time t4 that will be $400. It means the value written by T X is lost, i.e., $250 is
lost.
The dirty read problem occurs when one transaction updates an item of the database, and
somehow the transaction fails, and before the data gets rollback, the updated database item is
accessed by another transaction. There comes the Read-Write Conflict between both
transactions.
For example: