0% found this document useful (0 votes)
39 views5 pages

BD2 Text and Sol

The document describes: 1) An active database that supports an exam registration process, with tables for students, courses, exams, grades, and statistics. Triggers are needed to maintain statistics on the total number of students and average grades for each exam. 2) Concurrency control behavior when transactions request locks on hierarchically structured pages and tuples. The detailed outcomes of lock requests are described. 3) An XML document structure for a theatre company, with elements for shows, agents, tickets. XQuery expressions are needed to extract information on underattended shows and the most popular seats. 4) The physical storage of tables for tickets, seats, and shows. Query plans and costs are described for scenarios

Uploaded by

SaladSlayer
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views5 pages

BD2 Text and Sol

The document describes: 1) An active database that supports an exam registration process, with tables for students, courses, exams, grades, and statistics. Triggers are needed to maintain statistics on the total number of students and average grades for each exam. 2) Concurrency control behavior when transactions request locks on hierarchically structured pages and tuples. The detailed outcomes of lock requests are described. 3) An XML document structure for a theatre company, with elements for shows, agents, tickets. XQuery expressions are needed to extract information on underattended shows and the most popular seats. 4) The physical storage of tables for tickets, seats, and shows. Query plans and costs are described for scenarios

Uploaded by

SaladSlayer
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

DATA BASES 2 – FEBRUARY 19TH, 2019 – DURATION: 2H

PROF. SARA COMAI, PROF. DANIELE M. BRAGA


A. Active Databases (9 p.)
STUDENT (StudentID, LastName, Name, Country)
COURSE (CourseID, Name, Professor)
EXAM (CourseID, ExamDate, ClassRoom, Time, PublicationDate, ClosedY/N)
GRADES (CourseID, StudentID, ExamDate, Grade, RejectedY/N)
STATSBYDATE (ExamDate, TotalNumberOfStudents)
STATSBYEXAM (CourseID, ExamDate, AveragePublishedGrade, AverageFinalGrade)
The database above supports (a simplified version of) the exam registration process in a technical university. Professors
insert the grades of each exam through a Web form: they fill in the data of all the students who passed the exam on that
date, and then click on the SEND button: all data are sent to the server and inserted into table GRADES with a “bulk” command
like: INSERT INTO GRADES (courseID,StudentID,Date,Grade) VALUES (808080,12345,19/02/2019,18), (808080,12543,19/02/2019,30),...;
and the PublicationDate of the exam is set. Students can reject their grade within 5 days, and grades can possibly be updated
by professors in this period. After five days from publication the exam is closed.
Write a set of triggers that allow keeping the following statistics updated: a) for each date, the total number of students who
did the exam (regardless of the acceptance/rejection of the grade); b) for each exam, the average grades initially published
by the professor and the average final grade of the exam (considering only the accepted grades) when it is “closed”. Only
the last 10 statistics for each course are kept in table STATSBYEXAM.

B Concurrency Control (6 p.)


A DBMS applies 2PL-strict and hierarchical locking on the pages (P1,P2) and tuples (t1..t5) P1 P2
hierarchically structured as shown on the right. Describe the system behavior (the detailed outcome
of lock/upgrade/unlock requests and the lock status of the involved resources, possibly shaped as a
table) when transactions T1,T2,T3 attempt to execute the following operations in the given order. t1 t2 t3 t4 t5
Assume that each transaction always requests the minimum lock required to perform each operation.
r1(t1) w2(t5) r1(P2) r2(t1) w3(t5) w1(t1) c2 w3(t3) r3(P1) r1(t5) c1 c3
C. XML (9 p.)
<!ELEMENT Theatre ( Show+, Agent+ )>
<!ELEMENT Show ( Title, Genre, Author, Director, Description )>
<!ELEMENT Agent ( Name, Address, Ticket* )>
<!ELEMENT Ticket ( ShowTitle, Date, SeatColumn, SeatRow, Spectator, Price, Refund? )>
The DTD above describes the tickets sold by agents of an avant-garde stable theatre company.
Performances are hardly planned, and rather scheduled based on the tickets that the agents manage
to sell, offering multiple options. If tickets for different shows are sold for the same date, only the
show with the most spectators is performed. The other tickets are refund, marked with a tag, and
kept in the storage for statistics. Seats are identified by row and column. Refund is EMPTY, the
other unspecified elements only contain PCData. Extract in XQuery:
(4 p.) 1. The title(s) and date(s) of the show(s), if any, performed at least once for an audience of less than 10 people.
(5 p.) 2. For each seat, the Name of the Agent who was able to sell it the largest number of times (independently of having
being then refund), and the overall revenue generated by that seat (now only considering non-refund tickets).
D. Physical Databases (6 p.)
A table TICKET(TktId, ShowTitle, Date, SeatRow, SeatCol, Spectator, select count(*) as NumberOfSoldOutBeckettPlays
Price, RefundY/N) stores 200K tickets, sold during 15 years of activity, from ( select Date, Title
in 10K blocks in a primary storage sequentially ordered by Date. A table from TICKET join SHOW on ShowTitle=Title
SEAT(Row, Column, ComfortLevel) stores 400 tuples relative to the seats where Author = “Samuel Beckett”
of the theatre in a 5 block hash structure, hashed on the primary key, and group by Date, Title
e table SHOW(Title, Genre, Author, Director,Description) stores 150 having count(*) = ( select count(*)
tuples in 15 blocks, structured as a tiny B+ built on Title. We know that
from SEAT ) )
val(Date)=1.6K, val(Author)=25, consider the query boxed on the right.
Describe briefly (but precisely) a reasonably efficient query plan and estimate its execution cost in the following scenarios
(cost estimations provided without a clear description of the associated plan will not be considered).
a) There are no secondary indexes;
b) There are a B+(Date,ShowTitle) index for TICKET (F=60, 3 levels, 3.6K leaf nodes) and a B+(Author) index for SHOW;
c) There is also a B+( ShowTitle,Date) index for TICKET (also F=60, 3 levels, 3.6K leaf nodes).
A.
a) It can be written with a for each row or with a for each statement

If it is the first exam of the date, a new tuple is inserted, otherwise the tuple for that date is updated.
In case of for each row, the total number of students must be computed with “+ 1” since each row counts for a
student. If the same student participates to more than one exam on the same date, s/he will be counted once
for each exam.

CREATE TRIGGER StatisticsByDate // for earch row version


AFTER INSERT on GRADES
FOR EACH ROW
BEGIN
IF (EXISTS SELECT * FROM STATISTICS-BY-DATE
WHERE ExamDate = new.ExamDate)
UPDATE STATISTICS-BY-DATE
SET TotalNumberOfStudenst = TotalNumberOfStudents + 1
WHERE ExamDate = new.ExamDate;
ELSE
INSERT INTO STATISTICS VALUES (new.ExamDate, 1);
END;

CREATE TRIGGER StatisticsByDate // for earch statement version


AFTER INSERT on GRADES
REFERENCING NEW_TABLE as INSERTEDGRADES
FOR EACH STATEMENT
BEGIN
declare newDate date;
select distinct ExamDate into newDate from new table; //for simplicity the date of the exam is saved in
//a variable
IF (EXISTS SELECT * FROM STATISTICS-BY-DATE
WHERE ExamDate = newDate)
UPDATE STATISTICS-BY-DATE
SET TotalNumberOfStudenst = TotalNumberOfStudents + (select count(*) from INSERTEDGRADES)
WHERE ExamDate = newDate;
ELSE
INSERT INTO STATISTICS VALUES (newDate, select count(*) from INSERTEDGRADES);
END;

b) Also the average of the published grades could be computed with a for each row or with a for each statement
trigger considering the bulk insert event. However, in case of for each row, the average grade should be
computed from scratch for every single student that appears in the list of the exam (i.e., if the professor inserts
200 marks, the average is computed 200 times – this is not an efficient solution).
An alternative solution consists in considering the event “update of publication date” in the exam table,
operation that is done just after the bulk insertion. In such a case, a single tuple is updated and a for each row
trigger can be written to compute the average mark only once.
Here we show the case of for each statement trigger with the bulk insert event. Notice that, since we assume
that all the marks are published all together, there are no other tuples for the same exam and we can consider
only the action of insert.

CREATE TRIGGER AveragePublishedExams // for earch statement version


AFTER INSERT on GRADES
REFERENCING NEW_TABLE as INSERTEDGRADES
BEGIN
INSERT INTO STATISTICS
SELECT CourseID, ExamDate, Average(Mark),0,0 FROM NSERTEDGRADES;
END;

c) After publication grades may change. However, we need to compute the new average grade as soon as the
publication is closed.

CREATE TRIGGER FinalAverage


AFTER UPDATE of CLOSED on EXAM
FOR EACH ROW
WHEN old.Closed=”N” and new.Closed=”Y”
BEGIN
UPDATE STATISTICS
SET AverageFinalMark = (SELECT Average(*) FROM EXAMS
WHERE CourseID=new.CoursID AND ExamDate=New.ExamDate AND
Rejected<>”N”)
WHERE CourseID=new.CourseID AND ExamDate=New.ExamDate
END;

d) To maintain only the last 10 statistics of each exam, we can use the event “insert on Statistics” and delete the
oldest exam of the same course. Notice that MIN (as well as all the other aggregate functions) appears in the
SELECT close. The syntax min(SELECT…) is wrong!

CREATE TRIGGER Last10


AFTER INSERT on STATISTICS
FOR EACH ROW
WHEN (10<=SELECT COUNT(*) FROM STATISTICS
WHERE CourseID=new.CourseID AND ExamDate=New.ExamDate)
BEGIN
DELETE STATISTICS
WHERE CourseID=new.CourseID AND ExamDate=(SELECT MIN(ExamDate) FROM STATISTICS
WHERE CourseID=new.CourseID)
END;
B.

In this first version transaction 1 and 3 request all the locks according to the given operations, even if there is a
waiting condition for a (previous) operation:

Px ta tc Py te
r1(ta) ISL1
SL1
w2(t5) ISL1 SL1 IXL2
XL2
r1(Py) ISL1 SL1 IXL2  SL1[waits] XL2
r2(ta) ISL1+ISL2
SL1+SL2 IXL2  SL1[waits] XL2
w3(te) ISL1+ISL2 SL1+SL2 IXL2+IXL3  SL1[waits] XL2  XL3[waits]
Please note that here we are making a strong assumption:
- T3 obtains its IXL3(Py) by actually “overtaking” T1, that was waiting to acquire a lock (SL1(Py)) that is incompatible with IXL
By letting this happen, we may condemn T1 to starvation, as many other Ti possible asking for IXLi(Py) will prevent T1 from proceeding
w1(ta) IXL1(upgrade)+ISL2 SL1+SL2 XL1[waits] IXL2+IXL3  SL1[waits] XL2  XL3[waits]
Please note that here we are making another strong assumption to allow T1 to attempt the execution of w1(ta):
- transaction T1 asks for new locks despite the fact that it hasn’t acquired SL1 on Py yet and is still waiting to perform r1(Py)
This doesn’t mean that the execution of T1 is not sequential (we do not state that w1(ta) executes before r1(Py)) but we assume lock “anticipation”
c2 Release(ISL2) Release(SL2) Release(IXL2) Release(XL2)
IXL1 XL1 (upgrade) IXL3  SL1[waits] XL3
w3(tc) IXL1+IXL3 XL3
r3(Px) IXL1+IXL3
 SL3 [waits] IXL3  SL1[waiting]
Deadlock
r1(te)
c1
c3

If we do not make the assumptions made in the execution above, and (i) rigorously block the transactions waiting for
locks and (ii) do not let intentions “overtake” other queued lock requests, we obtain the following system behavior:

Px ta tc Py te
r1(ta) ISL1
SL1
w2(t5) ISL1 SL1 IXL2
XL2
r1(Py) ISL1 SL1 IXL2  SL1[waits] XL2
r2(ta) ISL1+ISL2
SL1+SL2 IXL2  SL1[waits] XL2
w3(te) ISL1+ISL2 SL1+SL2 IXL2  SL1[waits]  IXL3 [waits] XL2  XL3[waits for IXL(Py)]
w1(ta) T1 is suspended, no more lock requests nor execution of operations until the suspended requests are satisfied
c2 Release(SL2) Release(XL2)
Release(ISL2) Release(IXL2)
ISL1 SL1 SL1  IXL3[waits] …XL3 [waits for IXL3(Py)]
w1(ta) (resumed) IXL1(upgrade)
XL1(upgrade) SL1  IXL3 [waits] …XL3 [waits for IXL3(Py)]
w3(tc) T3 is suspended
r3(Px) T3 is suspended
r1(te) IXL1 XL1 SL1  IXL3 [waits] …XL3 [waits for IXL3(Py)]
c1 Release(XL1) Release(SL1)
Release(IXL1) IXL3
XL3
w3(tc) (resumed) IXL3 IXL3 XL3
XL3
r3(Px) (resumed) IXL3+SL3 XL3 IXL3 XL3
c3 Release(IXL3,SL3) Release(IXL3) Release(XL3)

C.

1.
FOR $s in doc(..)/Theater/Show
FOR $d in distinct-values(doc(..)/Theater/Agent/Ticket[ShowTitle=$s/Title]/Date)
LET $tickets := doc(..)/Theater/Agent/Ticket[ShowTitle=$s/Title AND Date=$d AND count(Refund)=0]
WHERE count($tickets)<10 and count($tickets)>0
RETURN $s/Title, $d

2.
FOR $SC in distinct-values(doc(..)/Theater/Agent/Ticket/SeatColumn)
FOR $SR in distinct-values(doc(..)/Theater/Agent/Ticket/SeatRow)
LET $MaxNumberOfTickets := max
(FOR $a in //Agent
RETURN count($a/Ticket[SeatColumn=$SC AND SeatRow=$SR])
RETURN {$SC, $SR,
doc(..)//Agent[count(Ticket[SeatColumn=$SC AND SeatRow=$SR])=$MaxnumberOfTickets]/Name,
sum(//Ticket[SeatColumn=$SC AND SeatRow=$SR AND count(Refund)=0])/Price)
}
D.

a) As far as SEAT is concerned, the query requires to count the total number of seats which is 400. The 5 blocks
can be loaded into memory for counting (with a cost of 5 i / o); in alternative, you can take advantage of the
statistics that the system maintains for each table and directly use it without accessing the blocks of the
table, therefore with a cost equal to 0 for the count of SEAT tuples. These considerations apply also to case
b) and c). We will therefore omit the cost for accessing SEAT (alternatively, it was correct to include +5 in
each formula).

We must therefore calculate the cost of the join between SHOW and TICKET. A scan can be made on
SHOW (approximatively 15 blocks): then select the author's shows Beckett which will be found in 6
tuples (150 shows for 25 distinct authors  150/25). Such 6 tuples can be cached.
For the 6 tuples we scan TICKET (10K).
Notice that in TICKET data are sorted by date, but not by title (needed for the group by). TICKET
contains 1.6K different dates and for each date there are 125 tuples. Each block contains 20 tuples: this
means that we need to sort 6-7 blocks with the same date to have also the sorting by title. We can assume
that this can be done in main memory, without adding further costs for the sorting.
The total cost is 15 + 10K (15+6*10K if the 6 tuples are not cached)

b) (differently from point a) we can access the leaves of the B+ thus obtaining the cost 15 + 3,6K
Also the B+ on Author can be used to get the tuples of Becket: 2 accesses are needed on the B+ and 6
pointers will be followed to get the tuples containing the data about his shows (they can be cached). The
cost is 2 + 6 + 3,6K

c) Differently from point b) after accessing the shows of Beckett, we get his show titles, which can be used to
access the new B+ on (ShowTitle, Date). For each show of Beckett 3 accesses are needed. In the leaves we
will find the following number of tickets: 200K tuples / 150 shows = 1333 tuples (with showtitle, date and
pointer) for each show, for the different dates. Each leaf contains 200K tuples / 3.6K leaves = 55 tuples. The
1333 tuples will occupy around 24-25 blocks that need to be read.
Therefore the cost will be 2 + 6 + 3*24 = 80 i/o

You might also like