DBMS Unit-3 Notes
DBMS Unit-3 Notes
B-trees are multi-level index structures where each node in the tree can have
multiple keys and multiple child nodes. B-trees are often used to store large
amounts of data in disk-based systems, as they allow for efficient searching,
insertion, and deletion of data even when the data is spread across multiple
disk blocks.
B+ trees are an extension of B-trees, where each node in the tree contains
multiple keys and multiple child nodes, except for the leaf nodes. The leaf
nodes of a B+ tree contain only the data values, while all the keys are stored
in the non-leaf nodes. This design allows for more efficient searching, as all
the search operations are performed in the non-leaf nodes, and the data values
are stored in contiguous blocks on disk.
hashing
Hashing is a technique used in computer science to map data values to an
index in an array, called a hash table. The idea behind hashing is to use a
hash function to convert the data values into a hash code, which serves as the
index in the hash table where the data will be stored.
Suppose you have a database of employee records, and each record contains
the employee’s name and their associated salary. To store this data in a hash
table, we can use the employee’s name as the key and the salary as the value.
First, we need to choose a hash function that will map the employee’s name
to a hash code. For example, we can use the following hash function:
hash_code = (sum of the ASCII values of the characters in the name) % size
of the hash table
Let’s take the name “John Doe” as an example. The ASCII values for the
characters in the name are 74, 104, 111, 104, 110, 32, 68, 111, 101. The sum
of these values is 732. If the size of the hash table is 10, then the hash code
for “John Doe” would be:
hash_code = 732 % 10 = 2
So, “John Doe” will be stored in the hash table at index 2, along with their
salary. When we need to search for John Doe’s salary, we can simply use the
hash function to compute the hash code, and then access the value stored at
that index in the hash table.
collision resolution
Collision resolution is a technique used in computer science to resolve
conflicts that occur when two or more elements map to the same hash value
in a hash table. This can lead to data loss or corruption, so it’s important to
have a method of resolving these conflicts.
There are two common methods of resolving collisions: chaining and open
addressing.
1. Chaining: In chaining, each entry in the hash table is associated with a linked
list. When a collision occurs, the new item is added to the linked list at the
corresponding entry. This way, multiple items can be stored at the same hash
index, making it possible to maintain the entire hash table without having to
worry about collisions.
Example: Suppose we have a hash table with three items: “apple”, “banana”,
and “cherry”. The hash function maps “apple” and “cherry” to the same index
(3), so they are both stored in the same linked list at index 3. The list would
look like this:
3: [apple, cherry]
2: [banana]
3: [apple]
4: [cherry]
2: [banana]
Extendible hashing
Extendible Hashing is a dynamic hash table technique used to efficiently
store and retrieve data from a large collection of keys. Unlike traditional hash
tables, extendible hashing allows for the hash table to grow and shrink
dynamically as the number of keys changes. This results in reduced overhead
in terms of memory usage and better performance in terms of look-up time.
In extendible hashing, each key is hashed to a certain number of bits and the
hash table is organized into a binary tree structure. The number of bits used to
hash each key determines the depth of the tree, and the tree is extended by
adding additional bits as the number of keys grows.
Each node in the tree represents a bucket, and each bucket contains a set of
keys that have the same hash value when hashed to the same number of bits.
When a new key is added to the tree, the hash value is calculated and the
appropriate bucket is found. If the bucket is full, the tree is extended by
adding another bit to the hash value, and the keys are redistributed into new
buckets.
Example: Suppose we have a hash table with three keys: “apple”, “banana”,
and “cherry”. Initially, each key is hashed to 4 bits, resulting in the following
binary tree structure:
0000: [apple]
0001: [banana]
0010: [cherry]
When a new key “date” is added, the hash function maps it to the value 0011.
Since the bucket 0011 is not yet created, the tree is extended by adding
another bit to the hash value:
0000: [apple]
0001: [banana]
0010: [cherry]
0011: [date]
In this example, the extendible hashing mechanism allowed the hash table to
grow dynamically as the number of keys increased, reducing the overhead in
terms of memory usage and improving performance in terms of look-up time
1. Hash Function: The hash function is used to map each key to a unique hash
value. The hash function must be chosen carefully to ensure that the number
of collisions is minimized.
2. Dynamic Table: The dynamic table is a collection of buckets that stores the
keys. Each bucket is associated with a range of hash values, and the size of
the range is adjusted dynamically based on the number of keys stored in the
table.
Bucket 2: [cherry]
If a new key “date” is added, and the hash function maps it to the same hash
value as “apple” and “banana”, the range of the bucket would be increased to
accommodate the new key. The table would now look like this:
Bucket 2: [cherry]