DBMS Module-6
DBMS Module-6
The registers are present inside the CPU. As they are present inside the CPU, they have least access time. Registers are most
expensive and smallest in size generally in kilobytes. They are implemented by using Flip-Flops.
Level-1 − Cache
Cache memory is used to store the segments of a program that are frequently accessed by the processor. It is expensive and
smaller in size generally in Megabytes and is implemented by using static RAM.
It directly communicates with the CPU and with auxiliary memory devices through an I/O processor. Main memory is less
expensive than cache memory and larger in size generally in Gigabytes. This memory is implemented by using dynamic RAM.
Secondary storage devices like Magnetic Disk are present at level 3. They are used as backup storage. They are cheaper than
main memory and larger in size generally in a few TB.
• Here each file/records are stored one after the other in a sequential
manner. This can be achieved in two ways:
• Records are stored one after the other as they are inserted into the
tables.
• This method is called pile file method.
•Indexed Clusters: - Here records are grouped based on the cluster key and stored
together. Our example above to illustrate STUDENT-COURSE cluster is an indexed
cluster. The records are grouped based on the cluster key – COURSE_ID and all the
related records are stored together. This method is followed when there is retrieval of
data for range of cluster key values or when there is a huge data growth in the clusters.
That means, if we have to select the students who are attending the course with
COURSE_ID 230-240 or there is a large number of students attending the same
course, say 250.
•Hash Clusters: - This is also similar to indexed cluster. Here instead of storing the
records based on the cluster key, we generate the hash key value for the cluster key
and store the records with same hash key value together in the memory disk.
Advantages of Clustered File Organization
This method is best suited when there is frequent request for joining
the tables with same joining condition.
When there is a 1:M mapping between the tables, it results
efficiently
Example:
Example: Example: Example:
k = 60
k = 12345 k = 12345 k = 12345
k x k = 60 x 60
M = 95 k1 = 12, k2 = 34, k3 = 5 A = 0.357840
= 3600
h(12345) = 12345 mod 95 s = k1 + k2 + k3 M = 100
h(60) = 60
= 90 = 12 + 34 + 5 h(12345)
The hash value
k = 1276 = 51 = floor[ 100 (12345*0.357840 mod 1)]
obtained is 60
M = 11 h(K) = 51 = floor[ 100 (4417.5348 mod 1) ]
h(1276) = 1276 mod 11 = floor[ 100 (0.5348) ]
=0 = floor[ 53.48 ]
= 53
The above diagram shows data block addresses same as primary key
value. This hash function can also be a simple mathematical function like
exponential, mod, cos, sin, etc. Suppose we have mod (5) hash
function to determine the address of the data block. In this case, it
applies mod (5) hash function on the primary keys and generates 3, 3, 1,
4 and 2 respectively, and records are stored in those data block
addresses.
Types of Hashing:
• Static Hashing
• Dynamic Hashing
Static Hashing
• In static hashing, the resultant data bucket address will always be the
same. That means if we generate an address for EMP_ID =103 using the
hash function mod (5) then it will always result in same bucket address
3. Here, there will be no change in the bucket address.
• Hence in this static hashing, the number of data buckets in memory
remains constant throughout. In this example, we will have four data
buckets in the memory used to store the data.
Operations of Static Hashing:
Searching a record:
When a record needs to be searched, then the same hash function retrieves the
address of the bucket where the data is stored.
Insert a Record:
When a new record is inserted into the table, then we will generate an address for a
new record based on the hash key and record is stored in that location.
Delete a Record:
To delete a record, we will first fetch the record which is supposed to be deleted.
Then we will delete the records for that address in memory.
Update a Record:
To update a record, we will first search it using a hash function, and then the data
record is updated.
If we want to insert some new record into the file but the address of a data bucket
generated by the hash function is not empty, or data already exists in that
address. This situation in the static hashing is known as bucket overflow. This is a
critical situation in this method.
To overcome this situation, there are various methods. Some commonly used
methods are as follows:
1. Open Hashing
When buckets are full, then a new data bucket is allocated for the same hash
result and is linked after the previous one. This mechanism is known as Overflow
chaining.
For example: Suppose R3 is a new address which needs to be inserted into the
table, the hash function generates address as 110 for it. But this bucket is full to
store the new data. In this case, a new bucket is inserted at the end of 110 buckets
and is linked to it.
2. Dynamic Hashing
•The dynamic hashing method is used to overcome the
problems of static hashing like bucket overflow.
•In this method, data buckets grow or shrink as the records
increases or decreases. This method is also known as
Extendable hashing method.
•This method makes hashing dynamic, i.e., it allows insertion or
deletion without resulting in poor performance.
How to search a key
•First, calculate the hash address of the key.
•Check how many bits are used in the directory, and these bits
are called as i.
•Take the least significant i bits of the hash address. This gives
an index of the directory.
•Now using the index, go to the directory and find bucket address
where the record might be.
How to insert a new record
•Firstly, you have to follow the same procedure for retrieval,
ending up in some bucket.
•If there is still space in that bucket, then place the record in it.
•If the bucket is full, then we will split the bucket and redistribute
the records.
For example:
•Consider the following grouping of keys into buckets,
depending on the prefix of their hash address:
•The last two bits of 2 and 4 are 00. So it will go into bucket
B0. The last two bits of 5 and 6 are 01, so it will go into bucket
B1. The last two bits of 1 and 3 are 10, so it will go into bucket
B2. The last two bits of 7 are 11, so it will go into B3.
• Insert key 9 with hash address 10001 into the above structure:
• Since key 9 has hash address 10001, it must go into the first bucket.
But bucket B1 is full, so it will get split.
• The splitting will separate 5, 9 from 6 since last three bits of 5, 9 are
001, so it will go into bucket B1, and the last three bits of 6 are 101,
so it will go into bucket B5.
• Keys 2 and 4 are still in B0. The record in B0 pointed by the 000 and
100 entry because last two bits of both the entry are 00.
• Keys 1 and 3 are still in B2. The record in B2 pointed by the 010 and
110 entry because last two bits of both the entry are 10.
• Key 7 are still in B3. The record in B3 pointed by the 111 and 011
entry because last two bits of both the entry are 11.
Advantages of dynamic hashing
•In this method, the performance does not decrease as the data grows in
the system. It simply increases the size of memory to accommodate the
data.
•In this method, memory is well utilized as it grows and shrinks with the
data. There will not be any unused memory lying.
•This method is good for the dynamic database where data grows and
shrinks frequently.
•This is because the data address will keep changing as buckets grow and
shrink.
•In this case, the bucket overflow situation will also occur. But it might
Indexing
• We know that data is stored in the form of records. Every record
has a key field, which helps it to be recognized uniquely.
• Indexing is a data structure technique to efficiently retrieve
records from the database files based on some attributes on which
the indexing has been done. Indexing in database systems is
similar to what we see in books.
• Indexing is defined based on its indexing attributes. Indexing can
be of the following types −
– Primary Index
– Secondary Index
– Clustering Index
Primary Index − Primary index is defined on an ordered data file.
The data file is ordered on a key field. The key field is generally the
primary key of the relation.
Primary Indexes
Ordered file with two fields
Primary key, K(i)
Pointer to a disk block, P(i)
One index entry in the index file for each block in the data file
Indexes may be dense or sparse
Dense index has an index entry for every search key value in the data
file
Sparse index has entries for only some search values
Primary Indexes
Primary Indexes: Problem
Primary Indexes: Problem
Clustering Indexes
•File records are physically ordered on a non-key field without a
distinct value for each record that field is called the clustering
field and the data file is called a clustered file.
•This differs from a primary index, which requires that the ordering field
of the data file have a distinct value for each record.
•A clustering index is also an ordered file with two fields; the first field is of
the same type as the clustering field of the data file, and the second field is
a disk block pointer
Clustering Indexes
clustering index on
the Dept_number
ordering nonkey
field of an
EMPLOYEE file
• Suppose that we consider the same ordered file with r =
300,000 records stored on a disk with block size B = 4,096
bytes.
• Imagine that it is ordered by the attribute Zipcode and there
are 1,000 zip codes in the file (with an average 300 records
per zip code, assuming even distribution across zip codes.)
• The index in this case has 1,000 index entries of 11 bytes each
(5-byte Zipcode and 6-byte block pointer) with a blocking
factor
bfri = (B/Ri) = (4,096/11) = 372 index entries per block.
• The number of index blocks is hence
bi = (ri/bfri)= (1,000/372) = 3 blocks.
To perform a binary search on the index file would need
(log2 bi)⎤ = ⎡(log23)⎤ = 2 block accesses
Secondary Index − Secondary index may be generated from a field
which is a candidate key and has a unique value in every record, or a
non-key with duplicate values.
7
0
Secondary Indexes
• Provide secondary means of accessing a data file Some primary access
exists
• The data file records could be ordered, unordered, or hashed.
• The secondary index may be created on a field that is a candidate key
and has a unique value in every record, or on a non-key field with
duplicate values
• Indexing field, K(i)
Usually need more storage space and longer search time than primary index
It is better than linear search but not than primary index. Sparse Index:
Dense Index:
If you are creating an index for every record, it is called dense index.
Multilevel Index
Index records comprise search-key values and data pointers. Multilevel index is
stored on the disk along with the actual database files. As the size of the database
grows, so does the size of the indices. There is an immense need to keep the index
records in the main memory so as to speed up the search operations. If single-level
index is used, then a large size index cannot be kept in memory which leads to
multiple disk accesses.
• Suppose that the dense secondary index of Example is converted into a
multilevel index.
• We calculated the index blocking factor bfri = 273 index entries per block,
which is also the fan-out fo for the multilevel index;
• the number of first-level blocks b1 = 1,099 blocks was also calculated.
If all the locks are not granted, the transaction rolls back
and waits until all the locks are granted.
Locking protocols
3. Two-Phase Locking 2PL: This locking protocol divides the
execution phase of a transaction into three parts.
In the first part, when the transaction starts executing, it seeks
permission for the locks it requires.
The second part is where the transaction acquires all the locks. As
soon as the transaction releases its first lock, the third phase starts. In
this phase, the transaction cannot demand any new locks; it only
releases the acquired locks.
Two-Phase Locking 2PL Cont…
• Two-phase locking has two phases, one is growing,
where all the locks are being acquired by the
transaction; and the second phase is shrinking, where
the locks held by the transaction are being released.
After acquiring all the locks in the first phase, the transaction continues to execute
normally.