100% found this document useful (1 vote)
9 views

Advanced Database Ch2 and 3

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
9 views

Advanced Database Ch2 and 3

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 73

11/21/2023

QUERY PROCESSING
AND OPTIMIZATION
Chapter 2

Overview of query processing


■ Query processing: the activities involved in parsing, validating,
optimizing and executing a query.
■ The aims of query processing are to transform a query written in a
high-level language, typically SQL, into a correct and efficient
execution strategy expressed in a low-level language.
■ Query Optimization:- the activity of choosing an efficient execution
strategy for processing a query.
■ There are many equivalent transformations of the same-high level
query.

1
11/21/2023

■ The aim of query optimization is to choose the one that


minimizes resource usage.
■ Resource usage & query time of query optimization depend
on database statistics to evaluate properly the different
options that are available.
■ The statistics cover information about relations, attributes,
and indexes.

■ Find all managers who work at a London branch.

Select * from Staff s, Branch b where s.branchNo=b.branchNo AND (s.position=‘manager’ and b.city=‘London’);

2
11/21/2023

■ Select * from Staff s, Branch b where s.branchNo=b.branchNo AND


(s.position=‘manager’ and b.city=‘London’);
■ Three equivalent relational algebra queries corresponding to this SQL
statement are:
1. σ(position=‘Manager’) ^ (city=‘London’) ^ ( Staff.branchNo
=Branch.branchNo)(Staff × Branch)
2. σ (position=‘Manager’) ^(city=‘London’) (Staff Staff.branchNo =
Branch.branchNo Branch)
3. σ position=‘Manager’ (Staff)) Staff.branchNo
=Branch.branchNo’ (σ city = ‘London’(Branch))

Assignment
Assume there are –1000 tuples in Staff;
-50 tuples in Branch;
–50 Managers;
-5 London branches;
■ Find the total cost to access disk for each query.
■ Which one is the best query?

3
11/21/2023

First query
σ(position=‘Manager’) ^ (city=‘London’) ^ ( Staff.branchNo
=Branch.branchNo)(Staff × Branch)
■ It calculates the Cartesian product of staff and Branch, which requires
(1000 *50) disk accesses to read the relations, and creates a relation
with (1000*50) tuples.
■ We then have to read each of these tuples again to test them against
the selection predicate at a cost of another(1000+50) disk accesses,
giving a total cost of:
(1000+50) +2*(1000*50)=101050 disk accesses

The second query


σ (position=‘Manager’) ^(city=‘London’) (Staff Staff.branchNo =
Branch.branchNo Branch)
■ The second query joins Staff and Branch on the branch number branchNo, which again
requires (1000 + 50) disk accesses to read each of the relations.
■ We know that the join of the two relations have 1000 tuples, one for each member of staff (a
member of staff can only work at one branch).
■ Consequently, the Selection operation requires 1000 disk accesses to read the result of the
join, giving a total cost of:
2*1000 + (1000 + 50) = 3050 disk accesses

4
11/21/2023

The third query


σ position=‘Manager’ (Staff)) Staff.branchNo=Branch.branchNo’ (σ
city = ‘London’(Branch))

■ The final query first reads each Staff tuple to determine the Manager tuples, which requires 1000
disk accesses and produces a relation with 50 tuples.
■ The second Selection operation reads each Branch tuple to determine the London branches,
which requires 50 disk accesses and produces a relation with 5 tuples(London branches).
■ The final operation is the join of the reduced Staff and Branch relations, which requires (50 +5)
disk accesses, giving a total cost of:

1000 + 2*50 +5 + (50 + 5) = 1160 disk accesses


■ Clearly the third option is the best from the three queries.

■ Query processing can be divided into four main phases:


1. Decomposition (consisting of parsing and validation),
2. Optimization,
3. Code generation
4. Execution.

5
11/21/2023

Query Decomposition
■ It is the first phase of query processing.
■ Used to transform a high-level query into a relational
algebra query.
■ Aids to check whether the query is
syntactically(programming rules which have checked
at compile time ) and semantically correct (meaning
that have handled at run time).

Query Decomposition Cont…


■ The typical stages of query decomposition are
1. Analysis
2. Normalization
3. semantic analysis
4. Simplification
5. query restructuring.

6
11/21/2023

Analysis
■ The query is lexically and syntactically analyzed using the
techniques of programming language compilers.
■ In addition, this stage verifies that the relations and attributes
specified in the query are defined in the system catalog.
■ It also verifies that any operations applied to database objects are
appropriate for the object type.
■ For example, consider the following query:
SELECT staffNumber FROM Staff WHERE position >10;
 This query would be rejected on two grounds:
1. In the select list, the attribute staffNumber is not defined for the Staff
relation (should be staffNo).
2. In the WHERE clause, the comparison “>10” is incompatible with the
data type position, which is a variable character string.

■Normalization
converts the query into a normalized form that can
be more easily manipulated.
The predicate (in SQL, the WHERE condition), which
may be arbitrarily complex, can be converted into
one of two forms by applying a few transformation
rules.

7
11/21/2023

■ Conjunctive normal form: A sequence of conjuncts that are


connected with the ^ (AND) operator. Each conjunct contains
one or more terms connected by the v(OR) operator.
 For example:
(position = ‘Manager’ V salary >20000) ^ branchNo = ‘B003’
A conjunctive selection contains only those tuples that satisfy
all conjuncts.
■ Disjunctive normal form: A sequence of disjuncts that are
connected with the (v OR) operator. Each disjunct contains one
or more terms connected by the ^ (AND) operator. For
example, we could rewrite the previous conjunctive normal
form as:
(position = ‘Manager’ ^ branchNo = ‘B003’) v (salary > 20000
^ branchNo = ‘B003’)

Semantic analysis
■ The objective of semantic analysis is to reject normalized queries that
are incorrectly formulated or contradictory.
■ A query is incorrectly formulated if components do not contribute to
the generation of the result, which may happen if some join
specifications are missing.
■ A query is contradictory if its predicate cannot be satisfied by any
tuple.
■ For example, the predicate (position = ‘Manager’ ^ position =
‘Assistant’) on the Staff relation is contradictory, as a member of staff
cannot be both a Manager and an Assistant simultaneously.
■ Thus the predicate ((position = ‘Manager’ ^ position =’Assistant’) v
salary>20000) could be simplified to (salary>20000) by interpreting
the contradictory clause as the Boolean value FALSE.

8
11/21/2023

Simplification
■ The objectives of the simplification stage are:
 Detect redundant qualifications
 Eliminate common subexpressions
 Transform the query to a semantically equivalent but more
easily and efficiently computed form.
■ An initial optimization is to apply the well-known idempotency rules
(operations such that no matter how many times you execute
them, you achieve the same result)of boolean algebra, such as:

■ For example, consider the following view definition and query on


the view:
CREATE VIEW Staff3 AS SELECT staffNo, fName, IName, salary, branchNo
FROM Staff WHERE (branchNo = ‘B003’) salary >20000);

■ During view resolution this query will become:


SELECT staffNo, fName, IName, salary, branchNo FROM Staff WHERE
(branchNo = ‘B003’ AND salary >20000) AND branchNo = ‘B003’;
■ And the WHERE condition reduces to (branchNo = ‘B003’ AND salary
>20000).

9
11/21/2023

■ Integrity constraints may also be applied to help simplify


queries. For example, consider the following integrity constraint,
which ensures that only Managers have a salary greater than
£20,000:
■ CREATE ASSERTION OnlyManagerSalaryHigh CHECK ((position
<> ‘Manager’ AND salary < 20000) OR (position = ‘Manager’
AND salary > 20000));
■ and consider the effect on the query:
■ SELECT * FROM Staff WHERE (position = ‘Manager’ AND salary
< 15000);
■ The predicate in the WHERE clause, which searches for a
manager with a salary below £15,000, is now a contradiction of
the integrity constraint so there can be no tuples that satisfy
this predicate.

Query restructuring
■ In the final stage of query decomposition, the query is
restructured to provide a more efficient implementation.
■ It is the techniques(approaches) for query optimization
Heuristic rules
Compare different strategies

10
11/21/2023

Dynamic Vs Static Optimization


■ The first three phases of query processing can be carried out in to two
options.
1. Dynamic optimization: One option is to dynamically carry out
decomposition and optimization every time the query is run.
 Dynamic query optimization takes place at execution time.
 Database access strategy is defined when the program is executed.
 Therefore, access strategy is dynamically determined by the DBMS at
run time, using the most up-to-date information about the database.

Dynamic Vs Static Optimization Cont…


2. Static optimization: Static query optimization takes place at compilation
time.
 In other words, the best optimization strategy is selected when the
query is compiled by the DBMS.
 This approach is common when SQL statements are embedded in
procedural programming languages such as C# or Visual Basic .

11
11/21/2023

Dynamic optimization
■ Advantage
 All information required to select an optimum strategy is up to date
■ Disadvantage
 The performance of the query is affected because the query has to be
parsed, validated, and optimized before it can be executed
 Since it is dynamic it generates a number of execution strategies to
be analyzed, which may have the effect of selecting a less than
optimum strategy.

Static optimization
■ Advantage
 the query is parsed, validated, optimized once.
 Runtime overhead is removed.
 There may be more time available to evaluate larger
number of execution strategies there by increasing
the chances of finding a more optimum strategy.
■ Disadvantage
 The execution strategy that is chosen as being
optimal when the query is compiled may no longer
be optimal when the query is run.

12
11/21/2023

Heuristical approach to Query Optimization


■ Heuristical approach to query optimization uses
transformation rules to convert one relational algebra
expression into an equivalent form that is known to be
more efficient.
■ For example, there is a transformation rule allowing the
order of Join and Selection operations to be changed so
that Selection can be performed first.

Transformation Rules for the Relational


Algebra Operations
1. Conjunctive Selection operations can cascade into individual Selection operations(
and vice versa).

13
11/21/2023

2. Commutativity of Selection operations.

3. In a sequence of Projection operations, only the last in the


sequence is required.

14
11/21/2023

4. Commutativity of Selection and Projection.


If the predicate p involves only the attributes in the projection
list, then the Selection and Projection operations commute:

5. Commutativity of Theta join (and Cartesian product).

■ As the Equijoin and Natural join are special cases of the Theta
join, then this rule also applies to these Join operations. For
example, using the Equijoin of Staff and Branch:

15
11/21/2023

6. Commutativity of Selection and Theta join (or Cartesian product).


If the selection predicate involves only attributes of one of the
relations being joined, then the Selection and Join (or Cartesian
product) operations commute:

■ Alternatively, if the selection predicate is a conjunctive predicate


of the form(p ^ q), where p involves only attributes of R, and q
involves only attributes of S, then the Selection and Theta join
operations commute as:

16
11/21/2023

■ For example:

7. Commutativity of Projection and Theta join (or Cartesian


product).
If the projection list is of the form L = L1 U L2, where L1 involves
only attributes of R, and L2 involves only attributes of S, then
provided the join condition contains only attributes of L, the
Projection and Theta join operations commute as:

17
11/21/2023

■ For example:

■ If the join condition contains additional attributes not in L, say


attributes M = M1 U M2 where M1 involves only attributes of r and
M2 involves only attributes of S, then a final Projection operation is
required:

18
11/21/2023

■ For example:

8. Commutativity of Union and Intersection (but not Set


difference).

9. Commutativity of Selection and set operations (Union,


Intersection, and Set difference).

10.Commutativity of Projection and Union

19
11/21/2023

11.Associativity of Theta join (and Cartesian product).


Cartesian product and Natural join are always associative:

If the join condition q involves only attributes from the relations


S and T, then Theta join is associative in the following manner:

■ For example:

12.Associativity of Union and Intersection (but not Set difference).

20
11/21/2023

Heuristical Processing Strategies


■ Many DBMSs use heuristics to determine strategies for query
processing.
1. Perform Selection operations as early as possible.
Selection reduces the cardinality of the relation and reduces the
subsequent processing of that relation.
2. Combine the Cartesian product with a subsequent Selection
operation whose predicate represents a join condition into a Join
operation.
We have already noted that we can rewrite a Selection with a Theta
join predicate and a Cartesian product operation as a Theta join
operation:

3. Use associativity of binary operations to rearrange leaf nodes so that


the leaf nodes with the most restrictive Selection operations are
executed first.
Again, our general rule of thumb is to perform as much reduction as
possible before performing binary operations. Thus, if we have two
consecutive Join operations to perform:

21
11/21/2023

4. Perform Projection operations as early as possible.


Again, Projection reduces the cardinality of the relation and reduces
the subsequent processing of that relation.
5. Compute common expressions once.
If a common expression appears more than once in the tree, and the
result it produces is not too large, store the result after it has been
computed once and then reuse it when required.

Materialization
■ Materialization :the results of intermediate relational algebra
operations are written temporarily to disk.
■ the output of one operation is stored in a temporary relation for
processing by the next operation.
■ evaluate one operation at a time, starting at the lowest-level.
■ Use intermediate results materialized into temporary relations to
evaluate next-level operations.
■ Materialized evaluation is always applicable
■ Cost of writing results to disk and reading them back can be quite high
– Our cost formulas for operations ignore cost of writing results to
disk, so
■ Overall cost = Sum of costs of individual operations +
cost of writing intermediate results to disk

22
11/21/2023

Pipelining
■ Pipelining (stream-based processing or on-the-fly processing) is to
pipeline the results of one operation to another operation without
creating a temporary relation to hold the intermediate result.
■ Clearly, if we can use pipelining we can save on the cost of creating
temporary relations and reading the results back in again.
■ evaluate several operations simultaneously, passing the results of one
operation on to the next.
■ Much cheaper than materialization: no need to store a temporary
relation to disk.
■ Pipelining may not always be possible – e.g., sort, hash-join.

■ If we assume that there is an index on the salary attribute, then we


could use the cascade of selection rule to transform this Selection into
two operations:

■ we can use the index to efficiently process the first Selection on salary,
store the result in a temporary relation and then apply the second
Selection to the temporary relation.
■ The pipeline approach dispenses with the temporary relation and
instead applies the second Selection to each tuple in the result of the
first Selection as it is produced, and adds any qualifying tuples from
the second operation to the result.

23
11/21/2023

Cost Estimation for the Relational algebra


Operations
■ There are many different ways of implementing relational algebra
operations.
■ The aim of query optimization is to choose the most efficient one. To
do this, it uses formulae that estimates the cost for a number of
options and select the one with the lowest cost.
■ As the dominant cost in query processing is usually that of disk
accesses which is slow compared with memory accesses.

Database Statistics
■ The success of estimating the size and cost of intermediate relational
algebra operations depends on the amount and currency of the
statistical information that the DBMS holds.
■ DBMS should hold the following types of information in its system
catalog.
■ For each base relation R:
■ nTuples(R) – the number of tuples (records) in relation R( that is, its
cardinality)
■ bFactor(R) – the blocking factor of R ( that is, the number of tuples of
that fit into one block).

24
11/21/2023

■ nBlocks(R) – the number of blocks required to store R. If the tuples of R


are stored physically together then:
nBlocks(R) = [nTuples(R)/bFactor(R)]
■ nDistinctA(R)- the number of distinct values that appear for attribute A
in relation R.
■ minA(R) , maxA(R) – minimum and maximum possible values for the
attribute A in relation R.
■ SCA(R) – the selection cardinality of attribute A in relation R. This is the
average number of tuples that satisfy an equality condition on attribute
A.

■ Keeping these statistics current can be problematic. If the DBMS


updates the statistics every time a tuple is inserted, updated, or
deleted, at peak times this would have a significant impact on
performance.
■ An alternative, and generally preferable, approach is for the DBMS to
update the statistics on a periodic basis, such as nightly or whenever
the system is idle.
■ Another approach taken by some systems is to make it the users’
responsibility to indicate that the statistics should be updated.

25
11/21/2023

Selection Operation
■ The selection operation in the relational algebra works on a single R
and defines a relation S containing only those tuples of R that satisfy
the specified predicate.
■ There are a number of different implementations for the Selection
operation, depending on the structure of the file in which the relation is
stored.
■ The main strategies that we consider are:
 Linear search ( unordered file, no index)
 Binary search ( ordered file, no index)
 Equality on hash key;
 Equality condition on primary key;
 Inequality condition on primary key;

Summary of estimated I/O cost of


strategies for selection operations

26
11/21/2023

Estimating the cardinality of the Selection


operation
■ We first present estimates for the expected number of tuples and
the expected number of distinct values for an attribute in the
result relation S obtained from the selection operation on R.

Linear search ( unordered file, no index)


■ With this approach it may be necessary to scan each tuple in each
block to determine whether it satisfies the predicate.
■ This is sometimes referred to as a full table scan.
■ In the case of an equality condition on a key attribute, on average
only half the blocks would be searched before the specific tuple is
found, so the cost estimate is:
[nBlocks(R)/2]
■ For any other condition, the entire file may need to be searched , the
more general cost is: nBlocks(R)

27
11/21/2023

Binary Search
■ If the predicate is of the form (A=x) and the file is ordered on
attribute A, which is also the key attribute of relation R, then the
cost estimate for the search is
■ [log2 (nBlocks(R))]
■ [log2(nBlocks(R))] + [SCA(R)/bFactor(R)]-1

28
11/21/2023

End of Chapter 2

29
Chapter 3
Transaction processing concepts
Transaction
• The transaction is a set of logically related operation.
It contains a group of tasks.
• A transaction is an action or series of actions. It is
performed by a single user to perform operations for
accessing the contents of the database.

• Example: Assume an employee of CBE transfers ETB


800 from X's account to Y's account. This small
transaction contains several low-level tasks:
Cont….
X's Account
Open_Account(X)
Old_Balance = X.balance
New_Balance = Old_Balance - 800
X.balance = New_Balance
Close_Account(X)

Y's Account
Open_Account(Y)
Old_Balance = Y.balance
New_Balance = Old_Balance + 800
Y.balance = New_Balance
Close_Account(Y)
Operations of Transaction
• Following are the main operations of
transaction:
• Read(X): Read operation is used to read the value
of X from the database and stores it in a buffer in
main memory.
• Write(X): Write operation is used to write the
value back to the database from the buffer.
• Let's take an example to debit transaction from an account which consists
of following operations:
1. R(X);
2. X = X - 500;
3. W(X);
• Let's assume the value of X before starting of the transaction is 4000.
• The first operation reads X's value from database and stores it in a buffer.
• The second operation will decrease the value of X by 500. So buffer will contain
3500.
• The third operation will write the buffer's value to the database. So X's final value
will be 3500.
• But it may be possible that because of the failure of hardware, software or
power, etc. that transaction may fail before finished all the operations in the
set.
For example: If in the above transaction, the debit
transaction fails after executing operation 2 then X's value
will remain 4000 in the database which is not acceptable by
the bank.
To solve this problem, we have two important operations:
• Commit: It is used to save the work done permanently.
• Rollback: It is used to undo the work done.
Transaction property
• The transaction has the four properties.
• These are used to maintain consistency in a database, before
and after the transaction
Property of Transaction
Atomicity
Consistency
Isolation
Durability
Atomicity
• It states that all operations of the transaction take place at once if
not, the transaction is aborted.
• There is no midway, i.e., the transaction cannot occur partially.
Each transaction is treated as one unit and either run to
completion or is not executed at all.
• Atomicity involves the following two operations:
Abort: If a transaction aborts then all the changes made are not
visible.
Commit: If a transaction commits then all the changes made are
visible.
• Example: Let's assume that following transaction T consisting of T1 and
T2. A consists of ETB 600 and B consists of ETB 300. Transfer ETB 50
from account A to account B.
T1 T2

Read(A) Read(B)
A:= A-50 Y:= Y+50
Write(A) Write(B)

• After completion of the transaction, A consists of ETB 550 and B consists


of ETB 350.
• If the transaction T fails after the completion of transaction T1 but before
completion of transaction T2, then the amount will be deducted from A
but not added to B. This shows the inconsistent database state. In order to
ensure correctness of database state, the transaction must be executed in
entirety.
•Consistency
• The integrity constraints are maintained so that
the database is consistent before and after the
transaction.
• The execution of a transaction will leave a database
in either its prior stable state or a new stable state.
• The consistent property of database states that
every transaction sees a consistent database
instance.
• The transaction is used to transform the database
from one consistent state to another consistent
state.
•For example: The total amount must be
maintained before or after the transaction.
Total before T occurs = 600+300=900
Total after T occurs= 550+350=900
•Therefore, the database is consistent. In the
case when T1 is completed but T2 fails,
then inconsistency will occur.
Isolation
•It shows that the data which is used at the time of
execution of a transaction cannot be used by the
second transaction until the first one is completed.
•In isolation, if the transaction T1 is being
executed and using the data item X, then that data
item can't be accessed by any other transaction T2
until the transaction T1 ends.
•The concurrency control subsystem of the DBMS
enforced the isolation property.
•Durability
• The durability property is used to indicate the
performance of the database's consistent state. It
states that the transaction made the permanent
changes.
• They cannot be lost by the erroneous operation of a
faulty transaction or by the system failure. When a
transaction is completed, then the database reaches a
state known as the consistent state.
• That consistent state cannot be lost, even in the
event of a system's failure.
• The recovery subsystem of the DBMS has the
responsibility of Durability property.
States of Transaction
• In a database, the transaction can be in one of the following states
•Active state
• The active state is the first state of every transaction.
In this state, the transaction is being executed.
• For example: Insertion or deletion or updating a record
is done here. But all the records are still not saved to
the database.
Partially committed
• In the partially committed state, a transaction executes
its final operation, but the data is still not saved to the
database.
• In the total mark calculation example, a final display of
the total marks step is executed in this state.
Committed
• A transaction is said to be in a committed state if it
executes all its operations successfully. In this state,
all the effects are now permanently saved on the
database system.
Failed state
• If any of the checks made by the database recovery
system fails, then the transaction is said to be in the
failed state.
• In the example of total mark calculation, if the
database is not able to fire a query to fetch the marks,
then the transaction will fail to execute.
Aborted
• If any of the checks fail and the transaction has reached
a failed state then the database recovery system will
make sure that the database is in its previous consistent
state. If not then it will abort or roll back the
transaction to bring the database into a consistent state.
• If the transaction fails in the middle of the transaction
then before executing the transaction, all the executed
transactions are rolled back to its consistent state.
• After aborting the transaction, the database recovery
module will select one of the two operations:
• Re-start the transaction
• Kill the transaction
Transaction Scheduling
• A series of operation from one transaction to another transaction is
known as schedule. It is used to preserve the order of the operation
in each of the individual transaction.
1. Serial Schedule
• The serial schedule is a type of schedule where one transaction is
executed completely before starting another transaction. In the
serial schedule, when the first transaction completes its cycle,
then the next transaction is executed.
• For example: Suppose there are two transactions T1 and T2
which have some operations. If it has no interleaving of
operations, then there are the following two possible outcomes:
Execute all the operations of T1 which was followed by all the operations of T2.
Execute all the operations of T1 which was followed by all the operations of T2.
In the given (a) figure, Schedule A shows the serial schedule where T1 followed by
T2.
In the given (b) figure, Schedule B shows the serial schedule where T2 followed by
T1.
Schedule A and Schedule B are serial schedule.
2. Non-serial Schedule
• If interleaving of operations is allowed, then there
will be non-serial schedule.
• It contains many possible orders in which the system
can execute the individual operations of the
transactions.
• In the given figure (c) and (d), Schedule C and
Schedule D are the non-serial schedules. It has
interleaving of operations.
Schedule C and Schedule D are Non-serial schedule.
3. Serializable schedule
• The serializability of schedules is used to find non-
serial schedules that allow the transaction to execute
concurrently without interfering with one another.
• It identifies which schedules are correct when
executions of the transaction have interleaving of
their operations.
• A non-serial schedule will be serializable if its result
is equal to the result of its transactions executed
serially.
Testing of Serializability
• Serialization Graph is used to test the Serializability of a schedule.
• Assume a schedule S. For S, we construct a graph known as precedence graph.
• This graph has a pair G = (V, E), where V consists a set of vertices, and E
consists a set of edges.
• The set of vertices is used to contain all the transactions participating in the
schedule.
• The set of edges is used to contain all edges Ti ->Tj for which one of the
three conditions holds:
1. Create a node Ti → Tj if Ti executes write (Q) before Tj executes read (Q).
2. Create a node Ti → Tj if Ti executes read (Q) before Tj executes write (Q).
3. Create a node Ti → Tj if Ti executes write (Q) before Tj executes write (Q).
 If a precedence graph contains a single edge Ti → Tj, then
all the instructions of Ti are executed before the first
instruction of Tj is executed.
 If a precedence graph for schedule S contains a cycle, then S
is non-serializable. If the precedence graph has no cycle, then
S is known as serializable.
For example:
Explanation:
Read(A): In T1, no subsequent writes to A, so no new edges
Read(B): In T2, no subsequent writes to B, so no new edges
Read(C): In T3, no subsequent writes to C, so no new edges
Write(B): B is subsequently read by T3, so add edge T2 → T3
Write(C): C is subsequently read by T1, so add edge T3 → T1
Write(A): A is subsequently read by T2, so add edge T1 → T2
Write(A): In T2, no subsequent reads to A, so no new edges
Write(C): In T1, no subsequent reads to C, so no new edges
Write(B): In T3, no subsequent reads to B, so no new edges
The precedence graph for schedule S1 contains a cycle that's
why Schedule S1 is non-serializable.
Example 2
Explanation:
Read(A): In T4,no subsequent writes to A, so no new edges
Read(C): In T4, no subsequent writes to C, so no new edges
Write(A): A is subsequently read by T5, so add edge T4 → T5
Read(B): In T5,no subsequent writes to B, so no new edges
Write(C): C is subsequently read by T6, so add edge T4 → T6
Write(B): A is subsequently read by T6, so add edge T5 → T6
Write(C): In T6, no subsequent reads to C, so no new edges
Write(A): In T5, no subsequent reads to A, so no new edges
Write(B): In T6, no subsequent reads to B, so no new edges
The precedence graph for schedule S2 contains no cycle that's why ScheduleS2
is serializable.
Conflict Serializable Schedule
•A schedule is called conflict serializability if
after swapping of non-conflicting operations, it
can transform into a serial schedule.
•The schedule will be a conflict serializable if it is
conflict equivalent to a serial schedule.
Conflicting Operations
The two operations become conflicting if all
conditions satisfy:
1. Both belong to separate transactions.
2. They have the same data item.
3. They contain at least one write operation.
Example:
•Swapping is possible only if S1 and S2 are logically
equal.
Here, S1 = S2. That means it is non-conflict.
Here, S1 ≠ S2. That means it is conflict.
Conflict Equivalent
In the conflict equivalent, one can be transformed to another
by swapping non-conflicting operations. In the given example,
S2 is conflict equivalent to S1 (S1 can be converted to S2 by
swapping non-conflicting operations).

Two schedules are said to be conflict equivalent if and only if:


1. They contain the same set of the transaction.
2. If each pair of conflict operations are ordered in the same way.
Example:
• Schedule S2 is a serial schedule because, in this, all operations of T1 are
performed before starting any operation of T2. Schedule S1 can be
transformed into a serial schedule by swapping non-conflicting operations
of S1.
After swapping of non-conflict operations, the schedule S1 becomes:

Since, S1 is conflict serializable.


View Serializability
A schedule will view serializable if it is view equivalent to a serial
schedule.
If a schedule is conflict serializable, then it will be view serializable.
The view serializable which does not conflict serializable contains
blind writes.
View Equivalent
• Two schedules S1 and S2 are said to be view equivalent if they
satisfy the following conditions:
1. Initial Read
• An initial read of both schedules must be the same. Suppose two schedule S1
and S2. In schedule S1, if a transaction T1 is reading the data item A, then in
S2, transaction T1 should also read A.

Above two schedules are view equivalent because Initial read operation in S1 is
done by T1 and in S2 it is also done by T1.
Cont….
2. Updated Read
In schedule S1, if Ti is reading A which is updated by Tj then in S2
also, Ti should read A which is updated by Tj.

Above two schedules are not view equal because, in S1, T3 is reading A
updated by T2 and in S2, T3 is reading A updated by T1.
Cont…
3. Final Write
• A final write must be the same between both the schedules. In schedule S1, if
a transaction T1 updates A at last then in S2, final writes operations should
also be done by T1.

Above two schedules is view equal because Final write operation in S1 is done by
T3 and in S2, the final write operation is also done by T3.
END OF CHAPTER 3

You might also like