0% found this document useful (0 votes)
22 views

Data Management - Exam of 18/02/2010 Solutions

The document provides solutions to 5 problems related to data management. Problem 1 asks about view-serializability of a schedule S and provides a solution showing S is not view-serializable or conflict-serializable. Problem 2 asks if schedule S can be made ACR and the solution shows this is not possible.

Uploaded by

uggg
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views

Data Management - Exam of 18/02/2010 Solutions

The document provides solutions to 5 problems related to data management. Problem 1 asks about view-serializability of a schedule S and provides a solution showing S is not view-serializable or conflict-serializable. Problem 2 asks if schedule S can be made ACR and the solution shows this is not possible.

Uploaded by

uggg
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Data Management – exam of 18/02/2010

Solutions

Problem 1 Consider the following schedule

S = r1 (A) r3 (C) w3 (B) r2 (A) w1 (B) w1 (A) w2 (A) r1 (C) w3 (C) r3 (A) r2 (D).

1. Tell whether S is view-serializable or not. Explain the answer in detail.


2. If the answer to the previous question was yes, then show a serial schedule that is view-
equivalent to S, otherwise exhibit a minimal set (i.e., a set of minimal cardinality) of actions
of S to remove so that the remaining schedule is not only view-serializable, but also conflict-
serializable.

Solution

1. S is not view-serializable. Indeed, every serial schedule S 0 on the set of transactions con-
stituting S will have either transaction t1 before t2 , or t2 before t1 . In the former case (t1
before t2 ), the pair hw1 (A), r2 (A)i is in the READS-FROM relation associated to S 0 , while
it is not in the READS-FROM relation associated to S. In the latter case (t2 before t1 ),
the pair hw2 (A), r1 (A)i is in the READS-FROM relation associated to S 0 , while it is not in
the READS-FROM relation associated to S. Actually, it is easy to see that S is not even
conflict-serializable, because the precedence graph associated to S is cyclic.

2. The answer to the previous question was no, and therefore we have to exhibit a minimal set
(i.e., a set of minimal cardinality) of actions of S to remove so that the remaining schedule is
conflict-serializable. To answer the question, we build the precedence graph associated to S,
and we label each edge of such graph with the set of all conflicting pairs giving rise to such
edge. The edges of the graph and their labels are:

• t1 → t2 labeled with {hw1 (A), w2 (A)i, hr1 (A), w1 (A)i}


• t1 → t3 labeled with {hw1 (A), r3 (A)i, hr1 (C), w3 (C)i}
• t2 → t1 labeled with {hr2 (A), w1 (A)i}
• t2 → t3 labeled with {hw2 (A), r3 (A)i}
• t3 → t1 labeled with {hw3 (B), w1 (B)i}

It is easy to see that no single action exists such that, removing the action from S, we get a
schedule that is conflict serializable. On the other hand, if we remove w1 (A), we remove all
edges connecting t1 with t2 (in both directions), and by removing one of the actions in the
set {w3 (B), w1 (B)}, or one of the actions in the set {r3 (C), w3 (C)}, we eliminate the cycle
t1 → t3 → t1 in the graph. It follows that, for example,

{w1 (A), w3 (B)}

is one minimal set (i.e., a set of minimal cardinality) of actions of S to remove so that the
remaining schedule is not only view-serializable, but also conflict-serializable.

Problem 2 Consider again the schedule S shown in Problem 1, and tell whether there is a way
to insert the “commit” command for every transaction in S such that the resulting schedule S 0
is ACR (Avoid Cascading Rollback). If the answer is yes, then show the schedule S 0 , otherwise
explain the answer.

Solution A schedule is ACR if no transaction reads from a transaction that has not committed
yet. By referring to the schedule S of Problem 1, it is easy to see that t3 reads from t2 (indeed,
r3 (A) reads from w2 (A)), but t2 cannot commit before t3 , since the action r2 (D) appears after
r3 (A). It follows that there is no way to insert the “commit” commands for every transaction in S
such that the resulting schedule is ACR.

Problem 3 We define W OB to be the class of schedules defined as follows: a schedule S


on transactions T = {t1 , . . . , tn } belongs to W OB if and only if (i) S is a complete schedule
constituted only by “write” actions, and (ii) there exists a function ψ : T → {1..n} such that for
every pair hα, βi of conflicting actions in S, where β is an action of transaction tk occurring in S
after the action α of transaction th , we have that ψ(tk ) > ψ(th ).
1. Provide the definition of view-equivalence and view-serializability.
2. By using only the definition of view-equivalence and view-serializability, and the definition
of W OB, prove or disprove that every schedule in W OB is view serializable.

Solution
1. A complete schedule S is view- serializable if there is a serial schedule S that is view-equivalent
to S, where two schedules are view-equivalent if they have the same READS-FROM relation,
and the same FINAL-WRITE set.
2. We now prove that every schedule in W OB is view serializable.
By definition of W OB, we know that, if S is in W OB, then there exists a function ψ : T →
{1..n} such that for every pair hα, βi of conflicting actions in S, with β occurring after α in S,
we have that ψ(β) > ψ(α). Let S 0 be the serial schedule on T reflecting the order represented
by ψ, i.e., such that for all i, j ∈ {1..n}, transaction ti appears before tj in S 0 if and only if
ψ(ti ) < ψ(tj ).
We show that S is view-equivalent to S 0 . We proceed by contradiction. Suppose that S and S 0
are not view-equivalent. Since S and S 0 are constituted only by “write” actions, and therefore
the READS-FROM relation is empty for both, it follows that the FINAL-WRITE set of S is
different from the FINAL-WRITE set of S 0 . This implies that there is an item A such that
wh (A) is the final write on A in S, and wk (A) is the final write on A in S 0 , with h 6= k. It
follows that for the pair of conflicting actions hwk (A), wh (A)i, wk (A) appears before wh (A)
in S 0 , whereas wk (A) appears after wh (A) in S. The fact that wk (A) appears before wh (A)
in S 0 implies that ψ(th ) < ψ(tk ), and therefore the fact that wk (A) appears after wh (A) in S
contradicts the hypothesis that ψ : T → {1..n} is such that for every pair hα, βi of conflicting
actions in S, with β occurring after α in S, we have that ψ(β) > ψ(α).

Problem 4 Consider the relation MATCH(referee,date,hometeam,hostteam), updated monthly,


that stores about 1.000.000 tuples, where each tuple hr, d, h, ti means that r was the referee of a
match played at date d between teams h and t. We assume that (i) no referee can be in more
than 20 matches with the same home team, (ii) the size of each page in our system is sufficient for
storing 100 data entries, and 100 index entries, (iii) there is no good hash function for the search
key hreferee, hometeami, and (iv) the following are the most important queries on MATCH:
Query 1 Query 2
select referee select hostteam, date
from MATCH from MATCH
where date > α and date < β where referee = γ and hometeam = δ
Tell which is the method for representing the relation MATCH you would choose in order to optimize
the computation of the above queries, explaining in detail your answer. Also, tell how many pages
are accessed during the execution of Query 2.

Solution Query 1 is an interval-based selection, that is well supported by a clustered tree-index.


Relation MATCH is subject to updates, and therefore is not static. Hence, to support query 1, we
define a clustered B∗ -tree index (called index 1) on search key hdatei, using alternative 1. To
support query 2, we cannot use a hash index (because there is no good hash function for the search
key hreferee, hometeami), and therefore we define a second B∗ -tree index (called index 2) on search
key hreferee, hometeami, using alternative 2. Since index 1 is clustered, it follows that index 2 is
unclustered.
Let us now turn our attention to computing the number of pages which are accessed during the
execution of Query 2. The MATCH relation contains 1.000.000 tuples, every page contains 100 data
entries (and, therefore, F = 100), and every data entry page is 67% full. Thus, 15.000 pages are
required for the leaves of index 2. It follows that, in the execution of query 2, given a pair hγ, δi
for the search key hreferee, hometeami, we need log100 15.000 = 3 page accesses to get to the first
data entry. Since no referee can be in more than 20 matches with the same home team, in the
worst case we need at most another page access to reach all relevant data entries, and at most 20
page accesses to reach the relevant data records. Therefore, the total number of page accesses in
the execution of query 2 is as follows:

log100 15.000 + 1 + 20 = 3 + 1 + 20 = 24.

Problem 5 Let T be a relation such that B = 666.666.667 (number of pages), and R = 100
(number of records per page). Assuming that D = 15ms (time for accessing a page), C = 100ns
(time to process one record), and that we have a clustered B∗ -tree index using alternative 1, with
search key equal to the key of T , and fan-out equal to 100, tell which is the time needed for
inserting a new tuple in T , explaining in detail your answer.

Solution The method used for inserting a new record on T is as follows:

1. we first execute a selection based on equality to locate the page of the clustered B∗ -tree index
where we have to insert (we assume such a page to be not full),

2. since we are using alternative 1, we simply execute the insertion.

The cost of executing the selection based on equality on the search key on the basis of the
clustered B∗ -tree index is D · logF (1.5B). The cost of locating where to insert in the page is
C · log2 R. The cost of inserting into the page is C, and the cost of writing the page is D. Thus,
the total time for inserting a new record in T is (15ms correspond to 15.000 ns):

D · logF (1.5 · B) + C · log2 R + C + D =


15.000 · log100 (1.5 · 666.666.667) + 100 · log2 100 + 100 + 15.000 ∼
=
15.000 · log100 1.000.000.000 + 100 · log2 100 + 100 + 15.000 ∼ =
15.000 · 5 + 100 · 7 + 100 + 15.000 = 90.800ns

You might also like