0% found this document useful (0 votes)
29 views

DBMS Unit-3 Notes

The document discusses different types of file organizations used in database management systems, including heap, hash, B-tree, and ISAM organizations. It also describes indexed sequential access files (ISAM), which store data sequentially on disk and use an index to map keys to records for efficient searching. B-tree and B+ tree structures provide fast search, insertion, and deletion operations by storing data in a balanced tree. Hashing is used to map data values to indexes in a hash table through hash functions. Collision resolution techniques like chaining and open addressing are used to handle collisions in hash tables.
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views

DBMS Unit-3 Notes

The document discusses different types of file organizations used in database management systems, including heap, hash, B-tree, and ISAM organizations. It also describes indexed sequential access files (ISAM), which store data sequentially on disk and use an index to map keys to records for efficient searching. B-tree and B+ tree structures provide fast search, insertion, and deletion operations by storing data in a balanced tree. Hashing is used to map data values to indexes in a hash table through hash functions. Collision resolution techniques like chaining and open addressing are used to handle collisions in hash tables.
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 9

Unit-3: File organization

In a Database Management System (DBMS), file organization refers to the


way data is stored and organized within a database. There are several types of
file organizations, including:

1. Heap file organization: data is stored without any specific order.


2. Hash file organization: data is stored based on the result of a hash function.
3. B-tree file organization: data is stored in a balanced tree structure, allowing
for efficient searching, insertion, and deletion operations.
4. ISAM (Indexed Sequential Access Method): data is stored in a file and an
index is used to allow for fast searching.
5. Clustered file organization: data is stored in a way that related records are
stored close to each other on disk.

The choice of file organization depends on the specific requirements of the


database and the type of operations that will be performed on it

Indexed sequential access files


Indexed Sequential Access Method (ISAM) is a type of file organization used
in database management systems to store data in a way that allows for
efficient searching, insertion, and deletion operations. In ISAM, data is stored
in a file and an index is used to allow for fast searching. The data is stored
sequentially on disk, and the index is used to map keys to the corresponding
record in the file. This organization provides quick access to the data, as well
as efficient use of disk space.

When a search operation is performed on an ISAM file, the system first


searches the index to locate the desired record. The index provides the
location of the record on disk, and the system then retrieves the record
directly from the file. This process is much faster than searching through the
entire file one record at a time.
ISAM is typically used in applications where there is a need for fast search
operations and efficient use of disk space. It is particularly useful for
applications where the data is stored in a sorted order, as the index can be
used to efficiently locate records even in large files

implementation using B & B++ trees


B-trees and B+ trees are two popular file organization methods used in
database management systems to store and manage data. Both B-trees and B+
trees are balanced tree data structures that provide fast search, insertion, and
deletion operations.

B-trees are multi-level index structures where each node in the tree can have
multiple keys and multiple child nodes. B-trees are often used to store large
amounts of data in disk-based systems, as they allow for efficient searching,
insertion, and deletion of data even when the data is spread across multiple
disk blocks.

B+ trees are an extension of B-trees, where each node in the tree contains
multiple keys and multiple child nodes, except for the leaf nodes. The leaf
nodes of a B+ tree contain only the data values, while all the keys are stored
in the non-leaf nodes. This design allows for more efficient searching, as all
the search operations are performed in the non-leaf nodes, and the data values
are stored in contiguous blocks on disk.

Both B-trees and B+ trees are used in database management systems to


provide fast and efficient access to data. The choice between a B-tree and a
B+ tree depends on the specific requirements of the database and the type of
operations that will be performed on it

hashing
Hashing is a technique used in computer science to map data values to an
index in an array, called a hash table. The idea behind hashing is to use a
hash function to convert the data values into a hash code, which serves as the
index in the hash table where the data will be stored.

Here’s a simple example to illustrate the concept of hashing:

Suppose you have a database of employee records, and each record contains
the employee’s name and their associated salary. To store this data in a hash
table, we can use the employee’s name as the key and the salary as the value.

First, we need to choose a hash function that will map the employee’s name
to a hash code. For example, we can use the following hash function:

hash_code = (sum of the ASCII values of the characters in the name) % size
of the hash table

Let’s take the name “John Doe” as an example. The ASCII values for the
characters in the name are 74, 104, 111, 104, 110, 32, 68, 111, 101. The sum
of these values is 732. If the size of the hash table is 10, then the hash code
for “John Doe” would be:

hash_code = 732 % 10 = 2

So, “John Doe” will be stored in the hash table at index 2, along with their
salary. When we need to search for John Doe’s salary, we can simply use the
hash function to compute the hash code, and then access the value stored at
that index in the hash table.

This is a simple example of how hashing works. In practice, more


sophisticated hash functions and collision resolution techniques are used to
ensure that the hash table remains efficient even with large amounts of data.
hashing functions
Hashing functions are mathematical algorithms used to convert data of any
size into a fixed-length output called a hash or message digest. This output is
typically represented as a string of hexadecimal characters and has the
following properties:

1. Irreversibility: It is computationally infeasible to derive the original data from


the hash.
2. Uniqueness: A small change in the original data will result in a completely
different hash.
3. Consistency: Given the same input, the hash function will always produce the
same output.

Hashing functions are commonly used in cryptography, data structures,


digital signatures, and databases.

Examples of Hashing Functions:

1. SHA (Secure Hash Algorithm): SHA is a family of cryptographic hash


functions designed by the National Security Agency (NSA) and standardized
by the National Institute of Standards and Technology (NIST). The most
commonly used SHA algorithms are SHA-1, SHA-256, and SHA-512.
2. MD5 (Message-Digest Algorithm 5): MD5 is a widely used hashing function
that generates a 128-bit hash value. It is commonly used to verify the
integrity of data transmission over a network.
3. BCrypt: BCrypt is a secure hash algorithm designed to be computationally
expensive to crack. It uses a key derivation function that increases the cost of
cracking the hash as the number of rounds increases.
4. CRC (Cyclic Redundancy Check): CRC is a commonly used hashing
function in error-detection systems. It generates a checksum that is used to
verify the integrity of data transmission over a network.
In conclusion, hashing functions play an important role in maintaining the
security and integrity of data. They are used in various applications to ensure
that data remains unaltered during transmission and storage

collision resolution
Collision resolution is a technique used in computer science to resolve
conflicts that occur when two or more elements map to the same hash value
in a hash table. This can lead to data loss or corruption, so it’s important to
have a method of resolving these conflicts.

There are two common methods of resolving collisions: chaining and open
addressing.

1. Chaining: In chaining, each entry in the hash table is associated with a linked
list. When a collision occurs, the new item is added to the linked list at the
corresponding entry. This way, multiple items can be stored at the same hash
index, making it possible to maintain the entire hash table without having to
worry about collisions.

Example: Suppose we have a hash table with three items: “apple”, “banana”,
and “cherry”. The hash function maps “apple” and “cherry” to the same index
(3), so they are both stored in the same linked list at index 3. The list would
look like this:

3: [apple, cherry]

2: [banana]

2. Open addressing: In open addressing, when a collision occurs, the algorithm


looks for the next available entry in the hash table to store the item. This
method is also called probing. There are several probing methods, such as
linear probing, quadratic probing, and double hashing.
Example: Suppose we have a hash table with three items: “apple”, “banana”,
and “cherry”. The hash function maps “apple” and “cherry” to the same index
(3), so when a collision occurs, the algorithm checks the next available index
(4) to store the item. The hash table would look like this:

3: [apple]

4: [cherry]

2: [banana]

In conclusion, collision resolution is an important aspect of hash tables as it


ensures that all elements can be stored and retrieved correctly. The choice
between chaining and open addressing depends on the specific use case, but
both methods have their pros and cons, so it’s important to choose the right
method for the task at hand

Extendible hashing
Extendible Hashing is a dynamic hash table technique used to efficiently
store and retrieve data from a large collection of keys. Unlike traditional hash
tables, extendible hashing allows for the hash table to grow and shrink
dynamically as the number of keys changes. This results in reduced overhead
in terms of memory usage and better performance in terms of look-up time.

In extendible hashing, each key is hashed to a certain number of bits and the
hash table is organized into a binary tree structure. The number of bits used to
hash each key determines the depth of the tree, and the tree is extended by
adding additional bits as the number of keys grows.

Each node in the tree represents a bucket, and each bucket contains a set of
keys that have the same hash value when hashed to the same number of bits.
When a new key is added to the tree, the hash value is calculated and the
appropriate bucket is found. If the bucket is full, the tree is extended by
adding another bit to the hash value, and the keys are redistributed into new
buckets.

Example: Suppose we have a hash table with three keys: “apple”, “banana”,
and “cherry”. Initially, each key is hashed to 4 bits, resulting in the following
binary tree structure:

0000: [apple]

0001: [banana]

0010: [cherry]

When a new key “date” is added, the hash function maps it to the value 0011.
Since the bucket 0011 is not yet created, the tree is extended by adding
another bit to the hash value:

0000: [apple]

0001: [banana]

0010: [cherry]

0011: [date]

In this example, the extendible hashing mechanism allowed the hash table to
grow dynamically as the number of keys increased, reducing the overhead in
terms of memory usage and improving performance in terms of look-up time

dynamic hashing approach


implementation and performance.
Dynamic Hashing is a technique used to efficiently store and retrieve data
from a large collection of keys. The main idea behind dynamic hashing is to
create a hash table that can grow and shrink dynamically as the number of
keys changes. This allows for better performance and reduced memory usage
compared to traditional hash tables.

Implementation: The implementation of dynamic hashing involves two steps:

1. Hash Function: The hash function is used to map each key to a unique hash
value. The hash function must be chosen carefully to ensure that the number
of collisions is minimized.
2. Dynamic Table: The dynamic table is a collection of buckets that stores the
keys. Each bucket is associated with a range of hash values, and the size of
the range is adjusted dynamically based on the number of keys stored in the
table.

For example, suppose we have a dynamic hash table with a range of 4


buckets, initially storing the keys “apple”, “banana”, and “cherry”. The hash
function maps “apple” and “banana” to the same hash value, so they are
stored in the same bucket. The table would look like this:

Bucket 1: [apple, banana]

Bucket 2: [cherry]

If a new key “date” is added, and the hash function maps it to the same hash
value as “apple” and “banana”, the range of the bucket would be increased to
accommodate the new key. The table would now look like this:

Bucket 1: [apple, banana, date]

Bucket 2: [cherry]

Performance: Dynamic hashing provides improved performance over


traditional hash tables in two ways:
1. Reduced Collisions: Dynamic hashing reduces the number of collisions by
adjusting the size of the range dynamically, ensuring that each bucket
contains a smaller number of keys.
2. Reduced Memory Usage: Dynamic hashing reduces memory usage by only
allocating memory for the buckets that are actually used, instead of allocating
a fixed amount of memory upfront.

In conclusion, dynamic hashing is a useful technique for efficiently storing


and retrieving data from a large collection of keys. Its dynamic nature allows
for reduced collisions and memory usage, resulting in improved performance
compared to traditional hash tables

You might also like