0% found this document useful (0 votes)
14 views28 pages

Dbms Unit3 - Nep

Uploaded by

Sam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views28 pages

Dbms Unit3 - Nep

Uploaded by

Sam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

Unit 3

Functional Dependency
Functional Dependency (FD) is a constraint that determines the relation of one
attribute to another attribute in a Database Management System (DBMS). Functional
Dependency helps to maintain the quality of data in the database. It plays a vital role
to find the difference between good and bad database design.
Functional dependency (FD) is a set of constraints between two attributes in a
relation. Functional dependency says that if two tuples have same values for
attributes A1, A2,..., An, then those two tuples must have to have same values for
attributes B1, B2, ..., Bn.
Functional dependency is represented by an arrow sign (→) that is, X→Y, where X
functionally determines Y. The left-hand side attributes determine the values of
attributes on the right-hand side.
Example:

Employee
Employee Name Salary City
number
1 Dana 50000 San Francisco
2 Francis 38000 London
3 Andrew 25000 Tokyo
In this example, if we know the value of Employee number, we can obtain
Employee Name, city, salary, etc. By this, we can say that the city, Employee
Name, and salary are functionally depended on Employee number.
Key Terms Description
Axioms is a set of inference rules used to infer all the functional
Axiom
dependencies on a relational database.

It is a rule that suggests if you have a table that appears to contain two
Decomposition entities which are determined by the same primary key then you
should consider breaking them up into two different tables.

Dependent It is displayed on the right side of the functional dependency diagram.


Determinant It is displayed on the left side of the functional dependency Diagram.
It suggests that if two tables are separate, and the PK is the same, you
Union
should consider putting them. together

Rules of Functional Dependencies


Below are the Three most important rules for Functional Dependency in Database:

• Reflexive rule –. If X is a set of attributes and Y is_subset_of X, then X


holds a value of Y.
• Augmentation rule: When x -> y holds, and c is attribute set, then ac -> bc
also holds. That is adding attributes which do not change the basic
dependencies.
• Transitivity rule: This rule is very much similar to the transitive rule in
algebra if x -> y holds and y -> z holds, then x -> z also holds. X -> y is
called as functionally that determines y.

Types of Functional Dependencies in DBMS


There are mainly four types of Functional Dependency in DBMS. Following are
the types of Functional Dependencies in DBMS:

• Multivalued Dependency
• Trivial Functional Dependency
• Non-Trivial Functional Dependency
• Transitive Dependency
Multivalued Dependency in DBMS
Multivalued dependency occurs in the situation where there are multiple
independent multivalued attributes in a single table. A multivalued dependency is a
complete constraint between two sets of attributes in a relation. It requires that
certain tuples be present in a relation. Consider the following Multivalued
Dependency Example to understand.

Example:
Car_model Maf_year Color
H001 2017 Metallic
H001 2017 Green
H005 2018 Metallic
H005 2018 Blue
H010 2015 Metallic
H033 2012 Gray

In this example, maf_year and color are independent of each other but dependent
on car_model. In this example, these two columns are said to be multivalue
dependent on car_model.

This dependence can be represented like this:

car_model -> maf_year

car_model-> colour

Trivial Functional Dependency in DBMS


The Trivial dependency is a set of attributes which are called a trivial if the set of
attributes are included in that attribute.

So, X -> Y is a trivial functional dependency if Y is a subset of X. Let’s understand


with a Trivial Functional Dependency Example.
For example:

Emp_id Emp_name
AS555 Harry
AS811 George
AS999 Kevin
Consider this table of with two columns Emp_id and Emp_name.

{Emp_id, Emp_name} -> Emp_id is a trivial functional dependency as Emp_id is


a subset of {Emp_id,Emp_name}.

Non Trivial Functional Dependency in DBMS


Functional dependency which also known as a nontrivial dependency occurs when
A->B holds true where B is not a subset of A. In a relationship, if attribute B is not
a subset of attribute A, then it is considered as a non-trivial dependency.

Company CEO Age


Microsoft Satya Nadella 51
Google Sundar Pichai 46
Apple Tim Cook 57

Example:

(Company} -> {CEO} (if we know the Company, we knows the CEO name)

But CEO is not a subset of Company, and hence it’s non-trivial functional
dependency.
Transitive Dependency in DBMS
A Transitive Dependency is a type of functional dependency which happens when
“t” is indirectly formed by two functional dependencies. Let’s understand with the
following Transitive Dependency Example.

Example:

Company CEO Age


Microsoft Satya Nadella 51
Google Sundar Pichai 46
Alibaba Jack Ma 54

{Company} -> {CEO} (if we know the compay, we know its CEO’s name)

{CEO } -> {Age} If we know the CEO, we know the Age

Therefore according to the rule of rule of transitive dependency:

{ Company} -> {Age} should hold, that makes sense because if we know the
company name, we can know his age.

Note: You need to remember that transitive dependency can only occur in a
relation of three or more attributes.

Armstrong's Axioms

If F is a set of functional dependencies then the closure of F, denoted as F +, is the set


of all functional dependencies logically implied by F. Armstrong's Axioms are a set
of rules, that when applied repeatedly, generates a closure of functional
dependencies.
• Reflexive rule − If alpha is a set of attributes and beta is_subset_of alpha, then
alpha holds beta.
• Augmentation rule − If a → b holds and y is attribute set, then ay → by also
holds. That is adding attributes in dependencies, does not change the basic
dependencies.
• Transitivity rule − Same as transitive rule in algebra, if a → b holds and b →
c holds, then a → c also holds. a → b is called as a functionally that determines
b.

Trivial Functional Dependency

• Trivial − If a functional dependency (FD) X → Y holds, where Y is a subset


of X, then it is called a trivial FD. Trivial FDs always hold.
• Non-trivial − If an FD X → Y holds, where Y is not a subset of X, then it is
called a non-trivial FD.
• Completely non-trivial − If an FD X → Y holds, where x intersect Y = Φ, it
is said to be a completely non-trivial FD.

Normalization
If a database design is not perfect, it may contain anomalies, which are like a bad
dream for any database administrator. Managing a database with anomalies is next
to impossible.
Anomalies
• Update anomalies − If data items are scattered and are not linked to each
other properly, then it could lead to strange situations. For example, when we
try to update one data item having its copies scattered over several places, a
few instances get updated properly while a few others are left with old values.
Such instances leave the database in an inconsistent state.
• Deletion anomalies − We tried to delete a record, but parts of it was left
undeleted because of unawareness, the data is also saved somewhere else.
• Insert anomalies − We tried to insert data in a record that does not exist at all.
Normalization is a method to remove all these anomalies and bring the database
to a consistent state.

First Normal Form

For a table to be in the First Normal Form, it should follow the following 4 rules:

1. It should only have single(atomic) valued attributes/columns.


2. Values stored in a column should be of the same domain
3. All the columns in a table should have unique names.
4. And the order in which data is stored, does not matter.
If tables in a database are not even in the 1st Normal Form, it is considered
as bad database design.

Rules for First Normal Form

The first normal form expects you to follow a few simple rules while designing your
database, and they are:

Rule 1: Single Valued Attributes


Each column of your table should be single valued which means they should
not contain multiple values. We will explain this with help of an example later,
let's see the other rules for now.

Rule 2: Attribute Domain should not change


This is more of a "Common Sense" rule. In each column the values stored
must be of the same kind or type.

For example: If you have a column dob to save date of births of a set of
people, then you cannot or you must not save 'names' of some of them in that
column along with 'date of birth' of others in that column. It should hold only
'date of birth' for all the records/rows.

Rule 3: Unique name for Attributes/Columns


This rule expects that each column in a table should have a unique name. This
is to avoid confusion at the time of retrieving data or performing any other
operation on the stored data.

If one or more columns have same name, then the DBMS system will be left
confused.

Rule 4: Order doesn't matters


This rule says that the order in which you store the data in your table doesn't
matter.
Example

Although all the rules are self-explanatory still let's take an example where we will
create a table to store student data which will have student's roll no., their name and
the name of subjects they have opted for.

Here is our table, with some sample data added to it.

roll_no name subject

101 Akon OS, CN

103 Ckon Java

102 Bkon C, C++

Our table already satisfies 3 rules out of the 4 rules, as all our column names are
unique, we have stored data in the order we wanted to and we have not inter-mixed
different type of data in columns.

But out of the 3 different students in our table, 2 have opted for more than 1 subject.
And we have stored the subject names in a single column. But as per the 1st Normal
form each column must contain atomic value.

How to solve this Problem?


It's very simple, because all we have to do is break the values into atomic values.
Here is our updated table and it now satisfies the First Normal Form.

roll_no name subject


101 Akon OS
101 Akon CN
103 Ckon Java
102 Bkon C
102 Bkon C++
By doing so, although a few values are getting repeated but values for
the subject column are now atomic for each record/row.

Using the First Normal Form, data redundancy increases, as there will be many
columns with same data in multiple rows but each row as a whole will be unique.

Second Normal Form

For a table to be in the Second Normal Form, it must satisfy two conditions:

1. The table should be in the First Normal Form.


2. There should be no Partial Dependency.

What is Partial Dependency? First let's understand what is Dependency in a table?

What is Dependency?

Let's take an example of a Student table with


columns student_id, name, reg_no(registration
number), branch and address(student's home address).

student_id name reg_no branch address


In this table, student_id is the primary key and will be unique for every row, hence
we can use student_id to fetch any row of data from this table

Even for a case, where student names are same, if we know the student_id we can
easily fetch the correct record.

student_id name reg_no branch address


10 Akon 07-WY CSE Kerala
11 Akon 08-WY IT Gujarat

Hence we can say a Primary Key for a table is the column or a group of
columns(composite key) which can uniquely identify each record in the table.

I can ask from branch name of student with student_id 10, and I can get it. Similarly,
if I ask for name of student with student_id 10 or 11, I will get it. So all I need
is student_id and every other column depends on it, or can be fetched using it.
This is Dependency and we also call it Functional Dependency.

What is Partial Dependency?

Now that we know what dependency is, we are in a better state to understand what
partial dependency is.

For a simple table like Student, a single column like student_id can uniquely identfy
all the records in a table.

But this is not true all the time. So now let's extend our example to see if more than
1 column together can act as a primary key.

Let's create another table for Subject, which will


have subject_id and subject_name fields and subject_id will be the primary key.

subject_id subject_name
1 Java
2 C++
3 Php
Now we have a Student table with student information and another
table Subject for storing subject information.

Let's create another table Score, to store the marks obtained by students in the
respective subjects. We will also be saving name of the teacher who teaches that
subject along with marks.

score_id student_id subject_id marks teacher


1 10 1 70 Java Teacher
2 10 2 75 C++ Teacher
3 11 1 80 Java Teacher
In the score table we are saving the student_id to know which student's marks are
these and subject_id to know for which subject the marks are for.

Together, student_id + subject_id forms a Candidate Key(learn about Database


Keys) for this table, which can be the Primary key.

How this combination can be a primary key?


See, if I ask you to get me marks of student with student_id 10, can you get it from
this table? No, because you don't know for which subject. And if I give
you subject_id, you would not know for which student. Hence we need student_id +
subject_id to uniquely identify any row.

But where is Partial Dependency?

Now if you look at the Score table, we have a column names teacher which is only
dependent on the subject, for Java it's Java Teacher and for C++ it's C++ Teacher &
so on.

Now as we just discussed that the primary key for this table is a composition of two
columns which is student_id & subject_id but the teacher's name only depends on
subject, hence the subject_id, and has nothing to do with student_id.

This is Partial Dependency, where an attribute in a table depends on only a part of


the primary key and not on the whole key.

How to remove Partial Dependency?

There can be many different solutions for this, but our objective is to remove
teacher's name from Score table.

The simplest solution is to remove columns teacher from Score table and add it to
the Subject table. Hence, the Subject table will become:

subject_id subject_name teacher


1 Java Java Teacher
2 C++ C++ Teacher
3 Php Php Teacher
And our Score table is now in the second normal form, with no partial dependency.

score_id student_id subject_id marks


1 10 1 70
2 10 2 75
3 11 1 80
Quick Recap

1. For a table to be in the Second Normal form, it should be in the First Normal
form and it should not have Partial Dependency.
2. Partial Dependency exists, when for a composite primary key, any attribute in
the table depends only on a part of the primary key and not on the complete
primary key.
3. To remove Partial dependency, we can divide the table, remove the attribute
which is causing partial dependency, and move it to some other table where it
fits in well.
Third Normal Form (3NF)
Third Normal Form is an upgrade to Second Normal Form. When a table is in the
Second Normal Form and has no transitive dependency, then it is in the Third
Normal Form.

let's use the same example, where we have 3 tables, Student, Subject and Score.

Student Table
student_id name reg_no branch address
10 Akon 07-WY CSE Kerala
11 Akon 08-WY IT Gujarat
12 Bkon 09-WY IT Rajasthan
Subject Table
subject_id subject_name teacher
1 Java Java Teacher
2 C++ C++ Teacher
3 Php Php Teacher
Score Table
score_id student_id subject_id marks
1 10 1 70
2 10 2 75
3 11 1 80
In the Score table, we need to store some more information, which is the exam name
and total marks, so let's add 2 more columns to the Score table.
score_id student_id subject_id marks exam_name total_marks

Requirements for Third Normal Form

For a table to be in the third normal form,

1. It should be in the Second Normal form.


2. And it should not have Transitive Dependency.

What is Transitive Dependency?

With exam_name and total_marks added to our Score table, it saves more data now.
Primary key for our Score table is a composite key, which means it's made up of two
attributes or columns → student_id + subject_id.

Our new column exam_name depends on both student and subject. For example, a
mechanical engineering student will have Workshop exam but a computer science
student won't. And for some subjects you have Prctical exams and for some you
don't. So we can say that exam_name is dependent on
both student_id and subject_id.

And what about our second new column total_marks? Does it depend on our Score
table's primary key?

Well, the column total_marks depends on exam_name as with exam type the total
score changes. For example, practicals are of less marks while theory exams are of
more marks.

But, exam_name is just another column in the score table. It is not a primary key or
even a part of the primary key, and total_marks depends on it.

This is Transitive Dependency. When a non-prime attribute depends on other non-


prime attributes rather than depending upon the prime attributes or primary key.
How to remove Transitive Dependency?

Again the solution is very simple. Take out the


columns exam_name and total_marks from Score table and put them in
an Exam table and use the exam_id wherever required.

Score Table: In 3rd Normal Form

score_id student_id subject_id marks exam_id

The new Exam table

exam_id exam_name total_marks


1 Workshop 200
2 Mains 70
3 Practicals 30

Advantage of removing Transitive Dependency

The advantage of removing transitive dependency is,

• Amount of data duplication is reduced.


• Data integrity achieved.
Boyce-Codd Normal Form (BCNF)
Boyce-Codd Normal Form or BCNF is an extension to the third normal form, and
is also known as 3.5 Normal Form.

Rules for BCNF

For a table to satisfy the Boyce-Codd Normal Form, it should satisfy the following
two conditions:

1. It should be in the Third Normal Form.


2. And, for any dependency A → B, A should be a super key.
The second point sounds a bit tricky, right? In simple words, it means, that for a
dependency A → B, A cannot be a non-prime attribute, if B is a prime
attribute.

Example

Below we have a college enrolment table with


columns student_id, subject and professor.

student_id subject professor


101 Java P.Java
101 C++ P.Cpp
102 Java P.Java2
103 C# P.Chash
104 Java P.Java
As you can see, we have also added some sample data to the table.

In the table above:

• One student can enrol for multiple subjects. For example, student
with student_id 101, has opted for subjects - Java & C++
• For each subject, a professor is assigned to the student.
• And, there can be multiple professors teaching one subject like we have for
Java.
What do you think should be the Primary Key?

Well, in the table above student_id, subject together form the primary key, because
using student_id and subject, we can find all the columns of the table.

One more important point to note here is, one professor teaches only one subject,
but one subject may have two different professors.

Hence, there is a dependency between subject and professor here,


where subject depends on the professor name.
This table satisfies the 1st Normal form because all the values are atomic, column
names are unique and all the values stored in a particular column are of same
domain.

This table also satisfies the 2nd Normal Form as their is no Partial Dependency.

And, there is no Transitive Dependency, hence the table also satisfies the 3rd
Normal Form.

But this table is not in Boyce-Codd Normal Form.

Why this table is not in BCNF?


In the table above, student_id, subject form primary key, which
means subject column is a prime attribute.
But, there is one more dependency, professor → subject.

And while subject is a prime attribute, professor is a non-prime attribute, which


is not allowed by BCNF.

How to satisfy BCNF?

To make this relation(table) satisfy BCNF, we will decompose this table into two
tables, student table and professor table.

Below we have the structure for both the tables.

Student Table

student_id p_id
101 1
101 2
and so on...
And, Professor Table

p_id professor subject


1 P.Java Java
2 P.Cpp C++
and so on...
And now, this relation satisfy Boyce-Codd Normal Form. In the next tutorial we
will learn about the Fourth Normal Form.

A more Generic Explanation

In the picture below, we have tried to explain BCNF in terms of relations.

Introduction to Transaction Processing

Single user system: In this at-most, only one user at a time can use the system.
Multi-user system: In the same, many users can access the system concurrently.
Concurrency can be provided through:
1. Interleaved Processing –
In this, the concurrent execution of processes is interleaved in a single
CPU. The transactions are interleaved, meaning the second transaction
is started before the primary one could finish. And execution can switch
between the transactions. It can also switch between multiple
transactions. This causes inconsistency in the system.
2. Parallel Processing –
It is defined as the processing in which a large task into various smaller
tasks and smaller task also executes concurrently on several nodes. In
this, the processes are concurrently executed in multip le CPUs.
Transaction:
It is a logical unit of database processing that includes one or more access
operations. (read-retrieval, write-insert or update). It is a unit of program
execution that accesses and if required updates various data items.
A transaction is a set of operations that can either be embedded within an
application program or can be specified interactively via a high-quality language
such as SQL.
Example –
Consider a transaction that involves transferring $1700 from a customer’s savings
account to a customer’s checking account. This transaction involves two separate
operations: debiting the savings account by $1700 and crediting the checking
account by $1700. If one operation succeeds but the other doesn’t, the books of
the bank will not balance.
Transaction boundaries:
Begin and end boundaries. In this, you can say an application program may have
several transactions and transactions separated by the beginning and end of the
transaction in an application program.
Granularity of data:
• The size of data item is called its granularity.
• A data item can be an individual field (attribute), value of some record,
a record, or a whole disk block.
• Concepts are independent of granularity
Advantages:
• Batch processing or real-time processing available.
• Reduction in processing time, lead time and order cycle time.
• Reduction in inventory, personnel and ordering costs.
• Increase in productivity and customer satisfaction
Disadvantages:
• High setup costs.
• Lack of standard formats.
• Hardware and software incompatibility.

Transaction
o The transaction is a set of logically related operation. It contains a group of
tasks.
o A transaction is an action or series of actions. It is performed by a single user
to perform operations for accessing the contents of the database.
Example: Suppose an employee of bank transfers Rs 800 from X's account to Y's
account. This small transaction contains several low-level tasks:

X's Account

1. Open_Account(X)
2. Old_Balance = X.balance
3. New_Balance = Old_Balance - 800
4. X.balance = New_Balance
5. Close_Account(X)

Y's Account

1. Open_Account(Y)
2. Old_Balance = Y.balance
3. New_Balance = Old_Balance + 800
4. Y.balance = New_Balance
5. Close_Account(Y)

Operations of Transaction:

Following are the main operations of transaction:

Read(X): Read operation is used to read the value of X from the database and stores
it in a buffer in main memory.

Write(X): Write operation is used to write the value back to the database from the
buffer.

Let's take an example to debit transaction from an account which consists of


following operations:

1. R(X);
2. X = X - 500;
3. W(X);

Let's assume the value of X before starting of the transaction is 4000.

o The first operation reads X's value from database and stores it in a buffer.
o The second operation will decrease the value of X by 500. So buffer will
contain 3500.
o The third operation will write the buffer's value to the database. So X's final
value will be 3500.

But it may be possible that because of the failure of hardware, software or power,
etc. that transaction may fail before finished all the operations in the set.

For example: If in the above transaction, the debit transaction fails after executing
operation 2 then X's value will remain 4000 in the database which is not acceptable
by the bank.

To solve this problem, we have two important operations:

Commit: It is used to save the work done permanently.

Rollback: It is used to undo the work done.

DBMS Concurrency Control

Concurrency Control is the management procedure that is required for controlling


concurrent execution of the operations that take place on a database.

But before knowing about concurrency control, we should know about concurrent
execution.

Concurrent Execution in DBMS

o In a multi-user system, multiple users can access and use the same database
at one time, which is known as the concurrent execution of the database. It
means that the same database is executed simultaneously on a multi-user
system by different users.
o While working on the database transactions, there occurs the requirement of
using the database by multiple users for performing different operations, and
in that case, concurrent execution of the database is performed.
o The thing is that the simultaneous execution that is performed should be done
in an interleaved manner, and no operation should affect the other executing
operations, thus maintaining the consistency of the database. Thus, on making
the concurrent execution of the transaction operations, there occur several
challenging problems that need to be solved.

Problems with Concurrent Execution

In a database transaction, the two main operations


are READ and WRITE operations. So, there is a need to manage these two
operations in the concurrent execution of the transactions as if these operations are
not performed in an interleaved manner, and the data may become inconsistent. So,
the following problems occur with the Concurrent Execution of the operations:

Problem 1: Lost Update Problems (W - W Conflict)

The problem occurs when two different database transactions perform the
read/write operations on the same database items in an interleaved manner (i.e.,
concurrent execution) that makes the values of the items incorrect hence making the
database inconsistent.

For example:

Consider the below diagram where two transactions T X and TY, are performed
on the same account A where the balance of account A is $300.
o At time t1, transaction T X reads the value of account A, i.e., $300 (only read).
o At time t2, transaction T X deducts $50 from account A that becomes $250
(only deducted and not updated/write).
o Alternately, at time t3, transaction T Y reads the value of account A that will
be $300 only because T X didn't update the value yet.
o At time t4, transaction T Y adds $100 to account A that becomes $400 (only
added but not updated/write).
o At time t6, transaction T X writes the value of account A that will be updated
as $250 only, as T Y didn't update the value yet.
o Similarly, at time t7, transaction T Y writes the values of account A, so it will
write as done at time t4 that will be $400. It means the value written by T X is
lost, i.e., $250 is lost.

Hence data becomes incorrect, and database sets to inconsistent.


Dirty Read Problems (W-R Conflict)

The dirty read problem occurs when one transaction updates an item of the database,
and somehow the transaction fails, and before the data gets rollback, the updated
database item is accessed by another transaction. There comes the Read -Write
Conflict between both transactions.

For example:

Consider two transactions T X and TY in the below diagram performing


read/write operations on account A where the available balance in account A is
$300:

o At time t1, transaction T X reads the value of account A, i.e., $300.


o At time t2, transaction T X adds $50 to account A that becomes $350.
o At time t3, transaction T X writes the updated value in account A, i.e., $350.
o Then at time t4, transaction T Y reads account A that will be read as $350.
o Then at time t5, transaction T X rollbacks due to server problem, and the value
changes back to $300 (as initially).
o But the value for account A remains $350 for transaction T Y as committed,
which is the dirty read and therefore known as the Dirty Read Problem.
Unrepeatable Read Problem (W-R Conflict)

Also known as Inconsistent Retrievals Problem that occurs when in a transaction,


two different values are read for the same database item.

For example:

Consider two transactions, T X and TY, performing the read/write operations on


account A, having an available balance = $300. The diagram is shown below:

o At time t1, transaction T X reads the value from account A, i.e., $300.
o At time t2, transaction T Y reads the value from account A, i.e., $300.
o At time t3, transaction T Y updates the value of account A by adding $100 to
the available balance, and then it becomes $400.
o At time t4, transaction T Y writes the updated value, i.e., $400.
o After that, at time t5, transaction T X reads the available value of account A,
and that will be read as $400.
o It means that within the same transaction T X, it reads two different values of
account A, i.e., $ 300 initially, and after updation made by transaction T Y, it
reads $400. It is an unrepeatable read and is therefore known as the
Unrepeatable read problem.
Thus, in order to maintain consistency in the database and avoid such problems that
take place in concurrent execution, management is needed, and that is where the
concept of Concurrency Control comes into role.

Concurrency Control

Concurrency Control is the working concept that is required for controlling and
managing the concurrent execution of database operations and thus avoiding the
inconsistencies in the database. Thus, for maintaining the concurrency of the
database, we have the concurrency control protocols.

Concurrency Control Protocols

The concurrency control protocols ensure the atomicity, consistency, isolation,


durability and serializability of the concurrent execution of the database
transactions. Therefore, these protocols are categorized as:

o Lock Based Concurrency Control Protocol


o Time Stamp Concurrency Control Protocol
o Validation Based Concurrency Control Protocol

We will understand and discuss each protocol one by one in our next sections.

Failure Classification In DBMS

A failure in DBMS is categorized into the following three classifications to ease the
process of determining the exact nature of the problem:
1. Transaction failure
2. Disk failure
3. System crash
Types of failure

Transaction failure
A failure in transaction occurs when the transaction is unable to execute or if it
reaches a point from where the execution cannot proceed any further. If two or
more operations are hampered, then a transaction failure takes place.

Some of the reasons for transaction failure include:


Logical flaws
When the execution of a transaction fails due to internal errors or mistakes in the
code, then that is a logical flaw or error.

Syntax flaw
Syntax flaw takes place when the DBMS terminates an ongoing transaction on its
own due to the failure of the system to execute it. For instance, abortion of a
current transaction takes place during deadlock or unavailability of resources.

Disk failure
Disk failure occurs due to frequent failure of hard disks and storage drives.

Few reasons for disk failure are the formation of damaged sectors, head crash,
unreachability, and destruction of the disk.

System crash
Power cuts and failure in software and hardware causes a system crash, for
instance, OS crash.

The non-volatile memory does not get affected in a system crash (Fail-stop
assumption).

States of Transactions
A transaction in a database can be in one of the following states –
• Active − In this state, the transaction is being executed. This is the initial state of
every transaction.
• Partially Committed − When a transaction executes its final operation, it is said to
be in a partially committed state.
• Failed − A transaction is said to be in a failed state if any of the checks made by
the database recovery system fails. A failed transaction can no longer proceed
further.
• Aborted − If any of the checks fails and the transaction has reached a failed state,
then the recovery manager rolls back all its write operations on the database to bring
the database back to its original state where it was prior to the execution of the
transaction. Transactions in this state are called aborted. The database recovery
module can select one of the two operations after a transaction aborts –
- Re-start the transaction
- Kill the transaction
• Committed − If a transaction executes all its operations successfully, it is said to
be committed. All its effects are now permanently established on the database
system.
Transaction Properties: ACID Properties
A transaction is a very small unit of a program and it may contain several low level
tasks. A transaction in a database system must maintain Atomicity, Consistency,
Isolation, and Durability − commonly known as ACID properties − in order to ensure
accuracy, completeness, and data integrity.
• Atomicity − This property states that a transaction must be treated as an atomic
unit, that is, either all of its operations are executed or none. There must be no state
in a database where a transaction is left partially completed. States should be defined
either before the execution of the transaction or after the execution/abortion/failure
of the transaction.
• Consistency − The database must remain in a consistent state after any transaction.
No transaction should have any adverse effect on the data residing in the database.
If the database was in a consistent state before the execution of a transaction, it must
remain consistent after the execution of the transaction as well.
• Durability − The database should be durable enough to hold all its latest updates
even if the system fails or restarts. If a transaction updates a chunk of data in a
database and commits, then the database will hold the modified data. If a transaction
commits but the system fails before the data could be written on to the disk, then that
data will be updated once the system springs back into action.
• Isolation − In a database system where more than one transaction are being
executed simultaneously and in parallel, the property of isolation states that all the
transactions will be carried out and executed as if it is the only transaction in the
system. No transaction will affect the existence of any other transaction.

You might also like