BD2 Text and Sol
BD2 Text and Sol
If it is the first exam of the date, a new tuple is inserted, otherwise the tuple for that date is updated.
In case of for each row, the total number of students must be computed with “+ 1” since each row counts for a
student. If the same student participates to more than one exam on the same date, s/he will be counted once
for each exam.
b) Also the average of the published grades could be computed with a for each row or with a for each statement
trigger considering the bulk insert event. However, in case of for each row, the average grade should be
computed from scratch for every single student that appears in the list of the exam (i.e., if the professor inserts
200 marks, the average is computed 200 times – this is not an efficient solution).
An alternative solution consists in considering the event “update of publication date” in the exam table,
operation that is done just after the bulk insertion. In such a case, a single tuple is updated and a for each row
trigger can be written to compute the average mark only once.
Here we show the case of for each statement trigger with the bulk insert event. Notice that, since we assume
that all the marks are published all together, there are no other tuples for the same exam and we can consider
only the action of insert.
c) After publication grades may change. However, we need to compute the new average grade as soon as the
publication is closed.
d) To maintain only the last 10 statistics of each exam, we can use the event “insert on Statistics” and delete the
oldest exam of the same course. Notice that MIN (as well as all the other aggregate functions) appears in the
SELECT close. The syntax min(SELECT…) is wrong!
In this first version transaction 1 and 3 request all the locks according to the given operations, even if there is a
waiting condition for a (previous) operation:
Px ta tc Py te
r1(ta) ISL1
SL1
w2(t5) ISL1 SL1 IXL2
XL2
r1(Py) ISL1 SL1 IXL2 SL1[waits] XL2
r2(ta) ISL1+ISL2
SL1+SL2 IXL2 SL1[waits] XL2
w3(te) ISL1+ISL2 SL1+SL2 IXL2+IXL3 SL1[waits] XL2 XL3[waits]
Please note that here we are making a strong assumption:
- T3 obtains its IXL3(Py) by actually “overtaking” T1, that was waiting to acquire a lock (SL1(Py)) that is incompatible with IXL
By letting this happen, we may condemn T1 to starvation, as many other Ti possible asking for IXLi(Py) will prevent T1 from proceeding
w1(ta) IXL1(upgrade)+ISL2 SL1+SL2 XL1[waits] IXL2+IXL3 SL1[waits] XL2 XL3[waits]
Please note that here we are making another strong assumption to allow T1 to attempt the execution of w1(ta):
- transaction T1 asks for new locks despite the fact that it hasn’t acquired SL1 on Py yet and is still waiting to perform r1(Py)
This doesn’t mean that the execution of T1 is not sequential (we do not state that w1(ta) executes before r1(Py)) but we assume lock “anticipation”
c2 Release(ISL2) Release(SL2) Release(IXL2) Release(XL2)
IXL1 XL1 (upgrade) IXL3 SL1[waits] XL3
w3(tc) IXL1+IXL3 XL3
r3(Px) IXL1+IXL3
SL3 [waits] IXL3 SL1[waiting]
Deadlock
r1(te)
c1
c3
If we do not make the assumptions made in the execution above, and (i) rigorously block the transactions waiting for
locks and (ii) do not let intentions “overtake” other queued lock requests, we obtain the following system behavior:
Px ta tc Py te
r1(ta) ISL1
SL1
w2(t5) ISL1 SL1 IXL2
XL2
r1(Py) ISL1 SL1 IXL2 SL1[waits] XL2
r2(ta) ISL1+ISL2
SL1+SL2 IXL2 SL1[waits] XL2
w3(te) ISL1+ISL2 SL1+SL2 IXL2 SL1[waits] IXL3 [waits] XL2 XL3[waits for IXL(Py)]
w1(ta) T1 is suspended, no more lock requests nor execution of operations until the suspended requests are satisfied
c2 Release(SL2) Release(XL2)
Release(ISL2) Release(IXL2)
ISL1 SL1 SL1 IXL3[waits] …XL3 [waits for IXL3(Py)]
w1(ta) (resumed) IXL1(upgrade)
XL1(upgrade) SL1 IXL3 [waits] …XL3 [waits for IXL3(Py)]
w3(tc) T3 is suspended
r3(Px) T3 is suspended
r1(te) IXL1 XL1 SL1 IXL3 [waits] …XL3 [waits for IXL3(Py)]
c1 Release(XL1) Release(SL1)
Release(IXL1) IXL3
XL3
w3(tc) (resumed) IXL3 IXL3 XL3
XL3
r3(Px) (resumed) IXL3+SL3 XL3 IXL3 XL3
c3 Release(IXL3,SL3) Release(IXL3) Release(XL3)
C.
1.
FOR $s in doc(..)/Theater/Show
FOR $d in distinct-values(doc(..)/Theater/Agent/Ticket[ShowTitle=$s/Title]/Date)
LET $tickets := doc(..)/Theater/Agent/Ticket[ShowTitle=$s/Title AND Date=$d AND count(Refund)=0]
WHERE count($tickets)<10 and count($tickets)>0
RETURN $s/Title, $d
2.
FOR $SC in distinct-values(doc(..)/Theater/Agent/Ticket/SeatColumn)
FOR $SR in distinct-values(doc(..)/Theater/Agent/Ticket/SeatRow)
LET $MaxNumberOfTickets := max
(FOR $a in //Agent
RETURN count($a/Ticket[SeatColumn=$SC AND SeatRow=$SR])
RETURN {$SC, $SR,
doc(..)//Agent[count(Ticket[SeatColumn=$SC AND SeatRow=$SR])=$MaxnumberOfTickets]/Name,
sum(//Ticket[SeatColumn=$SC AND SeatRow=$SR AND count(Refund)=0])/Price)
}
D.
a) As far as SEAT is concerned, the query requires to count the total number of seats which is 400. The 5 blocks
can be loaded into memory for counting (with a cost of 5 i / o); in alternative, you can take advantage of the
statistics that the system maintains for each table and directly use it without accessing the blocks of the
table, therefore with a cost equal to 0 for the count of SEAT tuples. These considerations apply also to case
b) and c). We will therefore omit the cost for accessing SEAT (alternatively, it was correct to include +5 in
each formula).
We must therefore calculate the cost of the join between SHOW and TICKET. A scan can be made on
SHOW (approximatively 15 blocks): then select the author's shows Beckett which will be found in 6
tuples (150 shows for 25 distinct authors 150/25). Such 6 tuples can be cached.
For the 6 tuples we scan TICKET (10K).
Notice that in TICKET data are sorted by date, but not by title (needed for the group by). TICKET
contains 1.6K different dates and for each date there are 125 tuples. Each block contains 20 tuples: this
means that we need to sort 6-7 blocks with the same date to have also the sorting by title. We can assume
that this can be done in main memory, without adding further costs for the sorting.
The total cost is 15 + 10K (15+6*10K if the 6 tuples are not cached)
b) (differently from point a) we can access the leaves of the B+ thus obtaining the cost 15 + 3,6K
Also the B+ on Author can be used to get the tuples of Becket: 2 accesses are needed on the B+ and 6
pointers will be followed to get the tuples containing the data about his shows (they can be cached). The
cost is 2 + 6 + 3,6K
c) Differently from point b) after accessing the shows of Beckett, we get his show titles, which can be used to
access the new B+ on (ShowTitle, Date). For each show of Beckett 3 accesses are needed. In the leaves we
will find the following number of tickets: 200K tuples / 150 shows = 1333 tuples (with showtitle, date and
pointer) for each show, for the different dates. Each leaf contains 200K tuples / 3.6K leaves = 55 tuples. The
1333 tuples will occupy around 24-25 blocks that need to be read.
Therefore the cost will be 2 + 6 + 3*24 = 80 i/o