exercises
exercises
Databases, exercises............................................................................................................ 1
1. Exercises ...................................................................................................................... 2
1.1. ER modeling......................................................................................................... 2
1.2. Relational schemas, relational algebra ................................................................. 3
1.3. Physical organization ........................................................................................... 4
1.4. Functional dependencies ...................................................................................... 6
1.5. Normal forms ....................................................................................................... 7
1.6. Transaction management...................................................................................... 7
2. Solutions of exercises .................................................................................................. 9
2.1. ER modeling......................................................................................................... 9
2.2. Relational schemas, relational algebra ............................................................... 12
2.3. Physical organization ......................................................................................... 17
2.4. Functional dependencies .................................................................................... 20
2.5. Normal forms ..................................................................................................... 24
2.6. Transaction management.................................................................................... 24
1
1. Exercises
1.1. ER modeling
1. The monthly meals of a students’ restaurant shall be stored in a database. Each day
the meal contains a soup, a main course and a dessert. One dish can occur multiple
times in the same month, but we know that to each soup – main course combination,
only one dessert is suitable. For each dish, we want to store its name, its number of
calories, and the list of ingredients with the amount which is needed from the given
ingredient for the given dish. Create an ER diagram for this database [–]
2
1.2. Relational schemas, relational algebra
5. Transform the below ER diagram into relational schemes. Use as few relational
schemes as possible! [–]
B1 A1
B P A A2
B2 A3
Q T S U
C R D
C1 C2 D1 D2 D3
7. Given are two relational schemes R(A, B) and S(B, C) furthermore two relations r(R)
and s(S). We know that the union of r and s includes (f, c) and (d, e). We also know
that their natural join includes the following elements:
A B C
a b a
f c g
Determine the two relations! [–]
8. Given are relations r and s, and their schemes R(A, B) and S(B, C) respectively. r has
𝑛𝑟 different rows, while s has 𝑛𝑠 different rows. What is the maximal and minimal
number of rows in the natural join of the two relations (as a function of 𝑛𝑟 and 𝑛𝑠 ), if
• A is a key in R,
• B is a key in R,
• B is a key in both R and S,
• A is a key in R, and B is a key in S. [●]
9. If the cardinality of attribute A is less than the number of elements of the domain of A,
then A cannot be a (simple) key. Prove this statement right or wrong. [–]
10. Let r and s be two relations with the same attributes. Let furthermore X be a subset of
this common attribute set. Which of the following statements are true?
3
a) 𝜋𝑋 (𝑟 ∪ 𝑠) = 𝜋𝑋 (𝑟) ∪ 𝜋𝑋 (𝑠)
b) 𝜋𝑋 (𝑟\𝑠) = 𝜋𝑋 (𝑟)\𝜋𝑋 (𝑠) [–]
11. Given the following scheme description, provide relational algebra expressions for
the questions below.
PRODUCT(MANUFACTURER, MODEL, TYPE)
PC(MODEL, SPEED, RAM, HDD, CD, PRICE)
LAPTOP(MODEL, SPEED, RAM, HDD, SCREEN, PRICE)
PRINTER(MODEL, COLOR, TYPE, PRICE)
Questions:
• Which PC models have a speed of at least 1500?
• Which manufacturers produce a laptop with an at least 1000 GB HDD?
• Provide the model number and price of each product made by manufacturer B,
regardless of their type.
• Which manufacturers produce laptops but not PCs?
• Which manufacturers produce at least two different PCs or laptops with a speed
of at least 3 GHz? (There are no two identical model numbers!)
• All PCs faster than 3 GHz, and their manufacturers
• Those manufacturers, who produce laptops that have the same parameters as a
PC made by themselves (and possibly other laptops too)
• Those manufacturers, who produce PCs that have the same parameters as a
laptop made by them (and possibly other PCs too) [○]
13. Given are the following relations (with the obvious semantics):
likes(person, beer), sells(pub, beer), visits(person, pub).
Using relational algebra, express
a) the list of those beers which are liked by every person visiting the bars selling
the beer.
b) the list of those persons, who like each beer sold in all pubs visited by them. [○]
4
29. A file of 25 000 records shall be stored using a sparse index. Record length is 850 bytes,
block capacity (excluding the header) is 4000 bytes. Key size is 50 bytes, pointer size
is 18 bytes.
• At least how many blocks are needed for storing the entire structure?
• How much time does it take at most to read the contents of a record, if the RAM
has 6000 bytes free? (One block operation takes 5 milliseconds.)
• If we have 10 times as much free RAM, can the record access time be reduced?
What if we have 100 times as much RAM? How should we use this extra RAM?
[–]
30. A file of 15 525 records shall be stored using a sparse index. Record length is 850 bytes,
block capacity (excluding the header) is 4000 bytes. Key size is 50 bytes, pointer size
is 18 bytes.
• At least how many blocks are needed for storing the entire structure?
• How much time does it take at most to read the contents of a record, if the RAM
has 6000 bytes free? (One block operation takes 5 milliseconds.)
• If we have 10 times as much free RAM, can the record access time be reduced?
What if we have 100 times as much RAM? How should we use this extra RAM?
[–]
31. A file shall be stored using a dense index and a sparse index built on top of the dense
index. Give a reasonable estimation for the number of necessary blocks, given the
below conditions:
• the file contains 3106 records,
• one record is 300 bytes,
• one block is 1000 bytes,
• key size is 45 bytes,
• pointer size is 5 bytes. [○]
32. A file of 270 000 records shall be stored. We can choose from two possibilities:
building a single-layer sparse index on top a dense index, or we build a three-layered
sparse index. Which solution can be realized with fewer blocks if we furthermore want
to achieve that 20% of each and every block in the main file and the index files is
unused (i.e. no block can be filled more than 80% of its capacity)? One block is 1900
bytes, one record is 300 bytes, the key size is 35 bytes, and the size of a pointer is 15
bytes. (We assume optimal storage that is, beyond fulfilling the above requirements our
data uses the fewest possible blocks) [○]
33. One billion records shall be stored in a database. Record size is 100 bytes, block size
is 4000 bytes, one block operation takes 5 milliseconds. We have two keys, both 10
bytes in size. Pointers occupy 32 bits. For simplicity we assume that a single block fits
in the RAM at a time. We also assume that records are stored in the most compact way.
• Suggest a storage method if we would like to search with both keys. Search can
take 40 milliseconds at maximum. The method shall support interval searches
too. Create an explanatory figure about your structure.
5
• A given search is expected to return 8% of all records. Suggest the most
efficient search method you can think of. [●]
34. Using bucket hashing, what should be modified on the storage structure to reduce data
access time to its half? [–]
35. We are designing a hash based data structure for storing data on CDs in our database.
For each CD we store whether the given CD stores pictures, music, videos or data. We
ase a single character field for this the corresponding values of which are P, M, V and
D, respectively. What kind of hash function should we use if we’d like to base our hash
on this field? What is the cardinality of the field? [–]
36. 1 000 000 records shall be stored in a database using bucket hashing. The size of a
record is 110 bytes, one block is 3000 bytes, a key is 25 bytes, and a pointer is 64 bytes
long. Block access time is 5 ms. Record access time can be 20 ms at most. The hash
table fits in the RAM, and the hash function spreads the records evenly.
• What is the average record access time?
• How many bytes does the hash table occupy from the RAM?
• How much extra RAM would be needed to reduce record access time to its half?
[●]
37. Prove that Armstrong’s axioms can be deduced from the below three rules.
If X, Y, Z, C are attribute sets of a relational schema, then:
B1. X → X
B2. if X → YZ and Z → C then X → YZC
B3. if X → YZ then X → Y . [–]
40. Is the below set of axioms complete that is, are all logical consequences deducible from
them?
• If X R then X → X .
• If X , Y R and X → Y , then XW → YW is true for an arbitrary W R .
• If X , Y , Z R , X → Y and Y → Z , then X → Z . [–]
41. Provide a relation r matching schema R(A, B, C), where r has 4 rows, and no non-trivial
functional dependency is true on it. [●]
43. Are the below rules true? ( A, B, C , D are arbitrary sets of attributes on schema R .)
a) A → B, C → D A (C \ B) → BD ,
b) A → B, C → D C ( D \ A) → BD . [○]
6
44. Relation r matches schema R( A, B, C ) , and has 3 rows. Prove that there exists a
nontrivial functional dependency that r fulfills! [●]
45. Show that a 2NF relation can be redundant. Explain how its redundancy can be
eliminated. Give an example for a 2NF non-redundant relation having at least 3
elements. [–]
46. Prove that dependency sets F and G are equivalent exactly if F G + and G F + .[–]
49. Prova that if R is not BCNF, then A, B , where A, B R and R \ AB → A [●]
50. Determine whether a relation matchning (R, F) can contain redundancy and if yes, what
kind of redundancy. R(X, Y, Z, W), F = XY → Z , YZ → W , X → W , WY → X . [–]
62. Consider the following scheduling of transactions T1, T2, T3, and T4 (consecutive
operations are listed from left to right):
T2: RLOCK A; T3: RLOCK A; T2: WLOCK B; T2: UNLOCK A;
T3: WLOCK A; T2: UNLOCK B; T1: RLOCK B; T3: UNLOCK A;
T4: RLOCK B; T1: RLOCK A; T4: UNLOCK B; T1: WLOCK C;
T1: UNLOCK A; T4: WLOCK A; T4: UNLOCK A; T1: UNLOCK B;
T1: UNLOCK C.
Draw the precedence graph of the scheduling, and decide whether the scheduling is
serializable! [○]
63. Draw the precedence graph. Use locks. How does the graph change if the system is
two-phase? [●]
T1 T2
WRITE A
WRITE B
WRITE B
WRITE A
64. What happens according to the redo protocol if the transaction aborts in the indicated
steps (1-6)? [–]
7
log(T, BEGIN)
1.
LOCK(A)
LOCK(B)
2.
log(T, <old value of A>, <new value of A>)
log(T, <old value of B>, <new value of B>)
3.
log(T, COMMIT)
4.
WRITE(A)
WRITE(B)
5.
UNLOCK(A)
UNLOCK(B)
6.
65. Is the following transaction strict 2PL? If not, modify it to make it strict 2PL. What
does this protocol guarantee? [○]
LOCK A
READ A
A=A×2
WRITE A
COMMIT
UNLOCK A
66. Why can it be beneficial to use locks with timestamp-based transaction management?
[–]
67. Optional exercise: Is the below scheduling serializable with timestamp-based (R/W)
scheduling? [●]
T1 T2
t(T1) = 10 t(T2) = 20
(1) READ A
(2) WRITE A
(3) WRITE A
8
2. Solutions of exercises
2.1. ER modeling
Exercise 2
The ER diagram:
Notes::
• The exercise didn’t specify whether the identifiers of facilities are unique globally
or only within the same hospital. Thus, we could make Facility a weak entity set,
which is beneficial as this way we can guarantee that each facility belongs to
exactly one hospital.
• The solution cannot model that the director is also an employee of the given
hospital. One possible solution for this: Instead of making Director and Attending
Physicial a specialized entity set of Doctor, we transform Works At to a weak
entity set between Doctor and Hospital, and create weak entity set Director as a
specialized entity set of Works At. Weak entity set Director will furthermore have
attribute „economy degree num”. The drawback of this solution is that it doesn’t
constrain the uniqueness of Director.
9
• We have not defined cardinality for ternary relationship sets. Thus, Treats cannot
model that a patient is treated in a single facility.
• The diagram cannot model that a doctor is employed by at most 3 hospitals, so
this has to be included as a side note.1 Another possible solution is defining 3
many-to-one relationships:
This method can be used well if only a few relationship sets have to be defined.
The advantage of this method is that many-to-one relationship sets can be mapped
to a fewer relational schemes, as can be seen in the solution of Exercise 6.
1
In a database using relational data model, these restrictions can be guaranteed by using so-called consistency
conditions (check constraints) or triggers.
10
Exercise 3
The ER diagram:
11
Exercise 4
E1 R E3
E2
The ternary relationship set is transformed to three binary relationship sets. For this, we
create an entity set R, representing the relationship set. The attributes of the relationship
set generally do not provide uniqueness, so R will be a weak entity set the elements of
which are identified by the keys of entity sets E1, E2, and E3.
A1 A2 ... An
E1 RE1 R RE3 E3
RE2
E2
Analyzing the cardinality of ternary relationship types go beyond the objective of the
course and thus, is omitted here.2
Exercise 6
For schemas the keys or foreign keys of which are not clear from the notation of the
schema, we included the attributes of the keys separately. In the case of schemas where
each attribute in itself is a foreign key, all attributes together form the key.
If in part b) the attributes of a schema are not listed then they are the same as in part a).
2
For those interested in deeper insights, check out for example Trevor H. Jones, Il-Yeol Song, Binary
Equivalents of Ternary Relationships in Entity-Relationship Modeling: a Logical Decomposition Approach,
Journal of Database Management, April–June, 2000, pp. 12–19,
http://www.ischool.drexel.edu/faculty/song/publications/p_JDB99.PDF
12
Exercise 2
a) Schemas corresponding to entity sets and one-to-one relationship sets:
• PERSON(personID, place_of_birth, birthdate, name)
• DOCTOR(personID, place_of_birth, birthdate, name, MD_degree_num)
o DOCTOR(personID, MD_degree_num), key: person_id, alternative
solution
• PATIENT(personID, place_of_birth, birthdate, name, ssn)
o PATIENT(personID, ssn), key: person_id, alternative solution
• DIRECTOR(personID, place_of_birth, birthdate, name, MD_degree_num,
economy_degree_num, leads_hospital_name)
o DIRECTOR(personID, MD_degree_num, economy_degree_num,
leads_hospital_name), key: person_id, alternative solution
• ATTENDING_PHYSICIAN(personID, place_of_birth, birthdate, name,
MD_degree_num)
o ATTENDING_PHYSICIAN(personID, MD_degree_num), key:
person_id, alternative solution
• HOSPITAL(name)
• DISEASE(code, name)
• FACILITY(facility_code, address, hospital_name),
key: hospital_name, facility_code
Schemas corresponding to binary many-to-many and ternary relationship type
• HAS(personID, disease_code)
• TREATS(doctor_personID, patient_personID, hospital_name, facility_code), key:
doctor_personID, patient_personID, hospital_name, facility_code
• WORKS_AT(person_id, hospital_name)
The above alternative solutions work like this: For each specialized entity, a relation
element is created in the relation of the parent schema. This stores the common attribute
values for the specialized entity. Furthermore, a relation element is created for the
specielized entity too, in the corresponding relation. This will store the attribute values
corresponding to the specialized entity. The key of the latter will be also a foreign key for
the former. This way, the key constraint will hold in the mapped relational schema as
well (i.e. for example it will not be possible that a director and a patient have the same
personID).
Note 4 of solution of Exercise 2 – the constraint of one doctor being employed at most
three hospitals cannot be represented on an ER diagram – holds after the transformation
as well. If we created three many-to-one relationship types as suggested in the above
note, then these can be mapped to the following schemas instead of a single WORKS_AT
schema::
• WORKS_AT_1(personID, hospital_name)
• WORKS_AT_2(personID, hospital_name)
• WORKS_AT_3(personID, hospital_name)
b) Schemas corresponding to entity sets and one-to-one relationship sets: alternative
mappings for specialized entity sets are not used, PERSON schema is eliminated (as all
13
relevant people are either doctors or patients3). The personID and economy degree
number of the director will be stored in the HOSPITAL schema (guaranteeing that each
hospital can have at most one director). Thus, the DIRECTOR schema will be
unnecessary.
• PATIENT
• DOCTOR
• ATTENDING_PHYSICIAN
• HOSPITAL (name, director_personID, director_economy_degree_num)
• DISEASE
• FACILITY
Schemas of binary many-to-many and ternary relationship sets are unchanged
• HAS
• TREATS
• WORKS_AT
Exercise 3
a) Schemas corresponding to entity sets and one-to-one relationship sets (alternative
solutions for specialized entity sets work the same way as in Exercise 2 – these will not
be listed here):
• PEOPLE(person_ID, name, place_of_birth, date_of_birth)
• STAFF(person_ID, name, place_of_birth, date_of_birth, emp_ID)
• PATIENT(person_ID, name, place_of_birth, date_of_birth, ssn)
• DEPT_STAFF(person_ID, name, place_of_birth, date_of_birth, emp_ID)
• HOSP_SUPPL(person_ID, name, place_of_birth, date_of_birth, emp_ID)
• DOCTOR(person_ID, name, place_of_birth, date_of_birth, emp_ID,
MD_degree_num, position, doc_assoc_ID)
• NURSE(person_ID, name, place_of_birth, date_of_birth, emp_ID)
• DEPT_SUPPL(person_ID, name, place_of_birth, date_of_birth, emp_ID)
• DIRECTOR(person_ID, name, place_of_birth, date_of_birth, emp_ID,
MD_degree_num, position, doc_assoc_ID, eco_degree_num,
leads_h_hospital_name)
• DISEASE(code, name)
• HOSPITAL(name)
3
The PEOPLE schema is can be considered ’abstract’ with object oriented programming terminology. Note:
It is not abstract if the alternative solution for the mapping of Exercise 2 is used, as in that case, it will contain
entities.
14
• DEPARTMENT(dep_name, hospital_name, leads_d_person_ID), key: dep_name,
hospital_name
Schemas for many-to-one relationship types::
• WORKS_D(person_ID, hospital_name, dep_name), key: person_ID
• WORKS_H(person_ID, hospital_name), key: person_ID
Schema corresponding to the ternary relationship set:
• HAS(person_ID, hospital_name, department_name, disease_code)
Note: DEPARTMENT is a weak entity set, thus determining relationship set
BELONGS_TO cannot be mapped to a separate schema, as that way DEPARTMENT
would not have a key. In schema WORKS_D, the foreign key referring to
DEPARTMENT is a composite foreign key containing the entire key of DEPARTMENT
(dep_name, hospital_name). Thus, in WORKS_D, dep_name and hospital_name are not
separate foreign keys.
b) Schemas corresponding to entity sets and one-to-one relationship sets: similarly to the
schemas in Exercise 2, schema PEOPLE can be eliminated, and DOCTOR, and
DIRECTOR can be converted to a single schema. The STAFF schema can be similarly
eliminated. For many-to-one relationship types WORKS_H and WORKS_D, no separate
schemas are created. For the latter, we include the appropriate foregin keys in
DEPT_STAFF, while for the former one, we include the appropriate foreign key in
HOSP_SUPPL.
• PATIENT
• DEPT_STAFF(person_ID, name, place_of_birth, date_of_birth, emp_ID,
works_d_hospital_name, works_d_dep_name)
• HOSP_SUPPL(person_ID, name, place_of_birth, date_of_birth, emp_ID,
works_h_hospital_name)
• DOCTOR(person_ID, name, place_of_birth, date_of_birth, emp_ID,
MD_degree_num, position, doc_assoc_ID, works_d_hospital_name,
works_d_dep_name)
• NURSE(person_ID, name, place_of_birth, date_of_birth, emp_ID,
works_d_hospital_name, works_d_dep_name)
• DEPT_SUPPL(person_ID, name, place_of_birth, date_of_birth, emp_ID,
works_d_hospital_name, works_d_dep_name)
• DISEASE
• HOSPITAL(name, director_person_ID, director_eco_degree_num)
• DEPARTMENT
Schema corresponding to the ternary relationship set:
• HAS
Exercise 8
Natural join will have at least 0 rows – 0 if and only if there is no common value of
attribute B in r(R) and s(S).
A is a key in R
At most 𝑛𝑟 ⋅ 𝑛𝑠 , if the B values in each row of r and s are equal.
15
B is a key in R
Let’s create the join by finding the matching rows in r for each row of s. Since B is a key
in R, r’s attribute values on B are unique. Thus, we can find at most one matching row in
r to each row of s that is, the resulting relation will have at most 𝑛𝑠 rows.
A is a key in R, B is a key in S
From the viewpoint of the number of rows in the resulting relation, it is irrelevant that A
is a key. The solution thus, is the opposite of that of the case, where B is a key in R: The
resulting relation will have at most𝑛𝑟 rows.
Exercise 11
Provide the model number and price of each product made by manufacturer B,
regardless of their type.
r = 𝜎𝑀𝐴𝑁𝑈𝐹𝐴𝐶𝑇𝑈𝑅𝐸𝑅='B' (𝜋𝑀𝐴𝑁𝑈𝐹𝐴𝐶𝑇𝑈𝑅𝐸𝑅,𝑀𝑂𝐷𝐸𝐿,𝑃𝑅𝐼𝐶𝐸 (𝑃𝑅𝑂𝐷𝑈𝐶𝑇 ⊳⊲ 𝑝𝑐 ) ∪
𝜋𝑀𝐴𝑁𝑈𝐹𝐴𝐶𝑇𝑈𝑅𝐸𝑅,𝑀𝑂𝐷𝐸𝐿,𝑃𝑅𝐼𝐶𝐸 (𝑝𝑟𝑜𝑑𝑢𝑐𝑡 ⊳⊲ 𝑙𝑎𝑝𝑡𝑜𝑝) ∪
𝜋𝑀𝐴𝑁𝑈𝐹𝐴𝐶𝑇𝑈𝑅𝐸𝑅,𝑀𝑂𝐷𝐸𝐿,𝑃𝑅𝐼𝐶𝐸 (𝑝𝑟𝑜𝑑𝑢𝑐𝑡 ⊳⊲ 𝑝𝑟𝑖𝑛𝑡𝑒𝑟))
Which manufacturers produce at least two different PCs or laptops with a speed of
at least 3 GHz? (There are no two identical model numbers!)
𝑠 = 𝜎𝑆𝑃𝐸𝐸𝐷≥′3000' (𝜋𝑀𝑂𝐷𝐸𝐿,𝑆𝑃𝐸𝐸𝐷 (𝑝𝑐 ) ∪ 𝜋𝑀𝑂𝐷𝐸𝐿,𝑆𝑃𝐸𝐸𝐷 (𝑙𝑎𝑝𝑡𝑜𝑝)) ⊳⊲ 𝑝𝑟𝑜𝑑𝑢𝑐𝑡
𝑡 = 𝜋𝑀𝐴𝑁𝑈𝐹𝐴𝐶𝑇𝑈𝑅𝐸𝑅,𝑀𝑂𝐷𝐸𝐿 (𝑠)
𝑟 = 𝜋𝑀𝐴𝑁𝑈𝐹𝐴𝐶𝑇𝑈𝑅𝐸𝑅 (𝑡 ⊳⊲ 𝑡)
𝑀𝐴𝑁𝑈𝐹𝐴𝐶𝑇𝑈𝑅𝐸𝑅1 = 𝑀𝐴𝑁𝑈𝐹𝐴𝐶𝑇𝑈𝑅𝐸𝑅2 ∧ 𝑀𝑂𝐷𝐸𝐿1 ≠ 𝑀𝑂𝐷𝐸𝐿2
16
Those manufacturers, who produce laptops that have the same parameters as a PC
made by themselves (and possibly other laptops too).
𝑙 = 𝜋𝑀𝐴𝑁𝑈𝐹𝐴𝐶𝑇𝑈𝑅𝐸𝑅,𝑀𝑂𝐷𝐸𝐿,𝑆𝑃𝐸𝐸𝐷,𝑅𝐴𝑀,𝐻𝐷𝐷 (𝑙𝑎𝑝𝑡𝑜𝑝 ⊳⥂⊲ 𝑡𝑒𝑟𝑚é𝑘 )
𝑝 = 𝜋𝑀𝐴𝑁𝑈𝐹𝐴𝐶𝑇𝑈𝑅𝐸𝑅,𝑀𝑂𝐷𝐸𝐿,𝑆𝑃𝐸𝐸𝐷,𝑅𝐴𝑀,𝐻𝐷𝐷 (𝑝𝑐 ⊳⥂⊲ 𝑡𝑒𝑟𝑚é𝑘 )
𝑟 = 𝜋𝑙.𝑀𝐴𝑁𝑈𝐹. (𝑙 ⊳⊲ 𝑝)
𝑙.𝑀𝐴𝑁𝑈𝐹. = 𝑝.𝑀𝐴𝑁𝑈𝐹 ∧ 𝑙.𝑆𝑃𝐷 = 𝑝.𝑆𝑃𝐷 ∧ 𝑙.𝑅𝐴𝑀 = 𝑠.𝑅𝐴𝑀 ∧ 𝑙.𝐻𝐷𝐷 = 𝑝.𝐻𝐷𝐷
Those manufacturers, who produce PCs, that have the same parameters as a laptop
made by them (and possibly other PCs too).
The solution is identical to that of the previous one.
Exercise 13
Exercise 31
Exercise 32
The useful block size is 1520 bytes, the data file contains 54 000 blocks.
Exercise 33
17
b 4000
One block fits f i = = = 285 index entries.
k + p 10 + 4
1. The need for interval searches excludes the possibility for hash organization. We need
to use some kind of an index based organization. In order for a search to not need more
than 40 ms, we can perform at most 8 block operations. One possible solution is to build
dense indices for each search key, on top of the data file, and then we build a B* tree on
top of each dense index.
109
Dense indices have 109 entries, for this = 3 508 772 blocks are needed.
285
In case of a B* tree, we can fit an extra pointer into a block, even if its key does not have
any space left. Here, we can do this, as 285 (10 + 4) + 4 = 3994 , so the branching factor of
the tree will be 286. In order to be able to search in the dense index, we need a tree of
log 286 3 508 772 = 3 layers. Thus, to read a record, 3 + 1 + 1 = 5 block operations are
needed, which only takes 25 ms.
Note: In the course of solving this exercise, it has not yet been necessary to calculate the
number of blocks in the data file due to the indirection provided by the dense index.
2. 8% of all records: 109 0,08 = 8 107 records. Let’s assume that the data file is not
ordered by the search key (in case of two keys it is very unlikely to be sortable by both
keys anyways). In this case, as reading one record takes 5 block operations, in order to read
all the records included in the result, (8 107 ) 5 = 4 108 block operations are needed. If the
block access time is 5 ms, this is more than 23 days.
The problem is caused by always having to traverse the entire index structure. Idea: if we
only traverse the blocks of the dense index (3 508 772 block operations) and read the
corresponding 8107 records based on that, (3 508 772 + 8 107 ) 5 ms 4,8 days are
enough to retrieve all the needed records.
18
Interestingly, the best solution is traversing the data file itself. One block fits
b 4000 nr 109
fr = = = 40 data records and thus, br = = = 2,5 10 make up the
7
sr 100 f r 40
data file. To traverse this, we ”only” need (2,5 ⋅ 107 ) ⋅ 5 ms ≈ 1,5 days.
Exercise 36
b 3000
2. One data block fits f r = = = 27 records at maximum, one bucket contains
r
s 110
n 10
6
4 27 = 108 records in this case. The file contains B = r = = 9260 buckets, for
108 108
addressing which we need 9260 8 = 74 080 bytes – and this is how much space the has
table occupies in the RAM.
N 106
3. See Exercise 34. Here = 108 1 , thus the number of buckets have to be
B 9260
doubled, needing 74 080 bytes of extra RAM space – this is by how much the hash table
will grow. Let’s see how this affects record access times!
nr 10
6
In case we have B' = 9260 2 = 18 520 buckets, = = 54 records will be in a
B 18 520
54 54
single bucket, which needs = = 2 blocks per bucket.
f r 27
1+2
Average record access time: 𝑡average = 2 ⋅ 5 ms = 7,5 ms.
Notes:
• We haven’t considered that the last blocks of buckets are not full and thus, our
average is not exactly precise. See the following example!
a) eset b) eset
19
In case a) we store 5 records. To reach the first four, we need 1 block operation,
while to reach the 5th, we need 2 block operations. This adds up to an average of
4 1 + 1 2
= 1,2 block operations. In case b) 8 records are stored, so the average
5
4 1 + 4 2
number of block operations is = 1,5 . With the above method, we get
8
1+ 2
= 1,5 block operations. Since this is an insignificant difference, but many
2
calculations can be spared, we consider the block-level estimate to be good enough.
Anyways, this is just an estimate, and the exact block access times depend on a
number of factors.
• key size is irrelevant wrt the solution.
Exercise 37
Exercise 38
20
According to the definition, X → Y holds if for each two rows t , t r ( R) of the relation,
in each point in time it is true that if t X = t X , then t Y = t Y (where t Z means
Z (t ) that is, the projection of tuple t to attribute set Z.
1. Due to X → Y if t , t ' r ( R) , that t[ X ] = t '[ X ] , then t[Y ] = t '[Y ] .
2. Due to Y → Z if u, u ' r ( R) , that u[Y ] = u '[Y ] , then u[ Z] = u '[ Z ] .
According to the first statement, if there are equal rows in X, then those are equal in Y too.
According to the second statement, the rows that are equal in Y, are equal in Z too. By
connecting the two statements, we can see that if (at any point in time) there are two rows
that are equal on X, then these will be equal on Z too. This fulfills the requirement of
dependency X → Z .
Exercise 39
Exercise 40
False. Note that rule 2 is identical to the expandability axiom, and rule 3 is identical to
the transitivity axiom.
The problem is that the reflexivity axiom (trivial dependency) cannot be deduced from
the rules of the provided axioms. Thus, those X → Y , dependencies where Y X , but
which are true due to Y X cannot be deducted from the provided axioms, if they have
not been present in the original set of dependencies. For example, in case of the empty
dependency set F = :
• Rule 1 can only used to deducedependencies like X → X .
• Using rule 2, we have to expand both sides of the dependencies at the same time.
Since Y X and since we can only expand dependencies like X → X , we cannot
obtain dependencies like X → Y .
• In order to get a dependency like X → Y , we would need an attribute Z for which
X → Z and Z → Y . Such however, does not exist as using the other rules, we
could not generate a pair of dependencies, where the right hand side of one
dependency is the same as the left hand side of the other dependency.
Those X → Y dependencies, which are consequences of Y X -ból thus, cannot
necessarily be deduced (eg. in case of an empty dependency set), and thus this set of axioms
is not complete.
21
Example: In case of relational schema R(AB) and an empty dependency set, dependencies
AB → A , AB → B , and trivial dependencies concerning the empty set ( A → , B → ,
AB → ) cannot be deduced.
• Using rule 1, trivial dependencies A → A , B → B , AB → AB and → can be
deduced.
• Using rule 2, both sides of dependencies have to be expanded at the same time and
thus, for example AB → A cannot be deduced.
• Rule 3 cannot be applied, as we do not have dependencies, where the left hand side
of one dependency is identical to the right hand side of the other dependency.
Exercise 41
22
Exercise 43
a) true
b) false
Exercise 44
When solving Exercise 41, we prove that in case of relational schema R( A, B, C ) the below
dependencies can hold.
23
2.5. Normal forms
Exercise 47
Exercise 48
Exercise 49
Exercise 62
Exercise 63
The scheduling is not serializable, as there is no serial scheduling all effects of which are
equal to those of our scheduling. The reason is as follows: At the end of the scheduling, it
was T1 who modified item A most recently, while it was T2 who modified item B most
recently. From among the possible serial schedules, in the case of T1T2, the last transaction
to modify both data items is T2. In the case of T2T1, the last transaction to modify both data
items is T1. The precedence graph of any legal schedule, in the case of any locking method
(e.g. if we exchange each WRITE X command to a LOCK X, WRITE X, UNLOCK X
series) will be as follows:
T1 T2
24
In case of 2PL: We know that if each transaction of a legal schedule follows 2PL, then the
schedule is serializable. Thus, if a schedule is not serializable, there cannot be a legal
schedule composed of 2PL transactions.
Exercise 65
The transaction is 2PL, but not strict. By exchanging the two rows, the transaction will be
strict 2PL. The protocol ensures serializability and avoids cascading aborts.
LOCK A synchronization
READ A point
A=A×2
COMMIT
commit point
WRITE A
writing over
UNLOCK A
67. feladat
25
If the transaction that wrote X most recently – i.e. the transaction because of which C(X) is
false – commits, the scheduler sets the C(X) bit to true. If this transaction aborts, both X
and W(X) have to be reset to their previous values, and all transactions waiting for X have
to repeat their read/write attempts. Using the commit bit eliminates the problems of dirty
reads and the problem of inconsistency caused by Thomas’ Write Rule, as this way we
only apply the rule for ”committed items”, otherwise (in the case of C(X) = false) the
transaction willing to write has to wait.
Note: The commit bit works the same way as locks used for timestamp scheduling.
If we use Thomas’ Write Rule in the above example – that is, we do not modify timestamps,
we omit the write operation in step (3), and do not abort the transaction – the effect of the
resulting schedule will identical with that of serial schedule T1, T2, on any consistent
database:
T1 T2 T1 T2
(1) READ A (1) READ A
(2) WRITE A (2) WRITE A
(3) – (3) WRITE A
Using Thomas’ Write Rule we thus, found a ”trick” by which we can eventually find serial
equivalents to certain originally non-serializable schedules.
26