It Is A Very Efficient Method To Search The Exact Data Items Based On Hash Table
It Is A Very Efficient Method To Search The Exact Data Items Based On Hash Table
•It is a very efficient method to search the exact data items based on hash table.
Locating the storage block of a record by the hash value h(k) of its key
k.
• A hash function is a mathematical function that maps the search key to the
address where the actual records are placed and passes it to the operating system,
and the record is retrieved .
1
Mapping in a hashed file
Internal Hashing
• Hashing an internal file is called internal hashing.
• Internal Hashing is implemented as a hash table
through the use of an array of records. (In memory)
• An array index range of 0 to m-1.
• A function that transforms the hash field value into
an integer between 0 to m-1 is used.
• A common one is h(k) =k mod m. which yields a
value used for the record address.
• For character string the numeric ASCII code of the character can be
used.
• For Eg: Hashing algorithm applying mod hash function to character
string k of 20 character.
2
Internal Hashing
3
Figure 1.9 Modulo division 13.4
Internal Hashing (con’t)
• Collisions occur when a hash field value of a
record being inserted hashes to an address
that already contains a different record.
5
Collisions
Internal Hashing
Open Addressing:
Once a position specified by the hash address is found to be occupied, the
program checks the subsequent positions in order until an unused position
is found.
Chaining:
Various overflow locations are maintained by extending the array with a
number of overflow positions.
A pointer field is added to each record location.
A collision is resolved by placing the new record in an unused overflow
location and setting the pointer of the occupied hash address location to
the address of that overflow location.
Multiple hashing:
If the first hash function results in a collision, then the program applies a
second hash function. If another collision results, the program uses open
addressing or applies a third hash function and then uses open addressing
if necessary.
7
1.Open addressing resolution
10
External Hashing
11
Types of External Hashing
• Using a fixed address space is called static
hashing.
• Dynamically changing address space:
– Extendible hashing / Dynamic hashing
– Linear hashing
12
Static Hashing:
The hashing scheme where a fixed number of buckets ‘m’
is allocated for storage of records is called static hashing.
13
Static hashing
Extendible Hashing
• In Extendible Hashing, a type of directory is
maintained as an array of 2d bucket addresses.
Where d refers to the first d high (left most) order
bits and is referred to as the global depth of the
directory. However, there does NOT have to be a
DISTINCT bucket for each directory entry.
• A local depth d’ is stored with each bucket to
indicate the number of bits used for that bucket.
15
16
16- 10000
4- 00100
6- 00110
22- 10110
24- 11000
5- 0101
Initially, the global-depth and local-depth is always 1.
Assume bucket size is 3
16- 10000
4- 00100
6- 00110
22- 10110
24- 11000
5- 0101
16- 10000
4- 00100
6- 00110
22- 10110
24- 11000
5- 0101
16- 10000
4- 00100
6- 00110
22- 10110
24- 11000
5- 0101
Overflow (Bucket Splitting)
• When an overflow in a bucket occurs that
bucket is split.
• This is done by dynamically allocating a new
bucket and redistributing the contents of the
old bucket between the old and new buckets
based on the increased local depth d’+1 of
both these buckets.
22
Overflow (Bucket Splitting)
• Now the new bucket’s address must be
added to the directory.
• If the overflow occurred in a bucket whose
current local depth d’ is less than or equal
to the global depth d adjust the directory
entries accordingly. (No change in the
directory size is made.)
23
Overflow (Bucket Doubling)
• If the overflow occurred in a bucket whose
current local depth d’ is now greater than the
global depth d you must increase the global
depth accordingly.
• This results in a doubling of the directory size
for each time d is increased by 1 and
appropriate adjustment of the entries.
24
Linear Hashing
• Linear Hashing allows the hash file to expand
and shrink its number of buckets dynamically
without needing a directory.
• It starts with M buckets numbered 0 to M-1
and use the mod hash function
h(K)= K mod M
as the initial hash function called hi.
25
Linear Hashing (Con’t)
29
Redundant Array of Independent Disks
• RAID or Redundant Array of Independent Disks,
30
RAID Levels
0 Striped array with no fault tolerance
1 Disk mirroring
3 Parallel access array with dedicated parity
disk
4 Striped array with independent disks and a
dedicated parity disk
5 Striped array with independent disks and
distributed parity
6 Striped array with independent disks and
dual distributed parity
DATA PROTECTION: RAID - 31
Solution: Exploit Parallelism data
striping
Stripe the data is distributed transparently
across an array of disk to make them as a
single disk
Example: consider a big file striped across N
disks
• stripe width is S bytes
• hence each stripe unit is S/N bytes
• Ssequential read
S of S bytes at S
a time
•••
•••
•••
Disk Mirroring
RAID
Block 0
1 Block 0
1
Controller
Host
35
RAID Level-0
file data block 0 block 1 block 2 block 3 block 4
0 block 0 0 block 1
1 block 2 1 block 3
sectors 2 block 4 2
sectors
3 3
4 4
5 5
Disk 0 Disk 1
Data is striped across the HDDs in a RAID set
RAID 1
Data is mirrored to improve fault tolerance
A RAID 1 group consists of at least two mirrored disk to provide
redundancy and improved read.
In the event of disk failure, the impact on data recovery is the least
among all RAID implementations.
RAID 1 is suitable for applications that require high availability.
RAID Level-1
file data block 0 block 1 block 2 block 3 block 4
0 block 0 0 block 0
1 block 1 1 block 1
2 block 2 2 block 2
sectors 3 block 3 sectors
3 block 3
4 block 4 4 block 4
5 5
Disk 0 Disk 1
Raid level 2
This uses bit level striping. it stripes the bits across the disks.
In the above diagram b1, b2, b3 are bits. E1, E2, E3 are error correction codes.
You need two groups of disks. One group of disks are used to write the data, another
This uses Hamming error correction code (ECC), and stores this information in the
redundancy disks.
When data is written to the disks, it calculates the ECC code for the data on the fly, and
stripes the data bits to the data-disks, and writes the ECC code to the redundancy
disks.
When data is read from the disks, it also reads the corresponding ECC code from the
Raid level 2
RAID 3: Bit Interleaved
10010011 Parity
11001101 P
10010011
...
Striped physical 1 0 0 1 0 0 1 1 0
records 1 1 0 0 1 1 0 1 1
1 0 0 1 0 0 1 1 0
Logical record
Physical record
•Error detection and correction
•One separate parity disk
•Splitting the bits of each byte across multiple disks : bit –level striping
• always reads and writes complete stripes of data across all disks,
as the drives operate in parallel.
There are no partial writes that update one out of many strips in a stripe.
• Only one request can be serviced at a time
Targeted for high bandwidth applications: Multimedia, Image
Processing
CS252/Culler
2/7/02
Lec 6.42
RAID 3 – Parallel Transfer
with Dedicated Parity Disk
Bit 1 RAID
Bit 0
3
2 Bit 0
Controller
Bit1
Parity
Generated
Bit 2
Host
Bit3
P0123
Block 1
Block 5
Parity
RAID Block 2
Block 0 Block 0
Generated
Controller Block 6
P0123
Block 3
Host Block 7
P0123
P4567
RAID ARRAYS - 45
RAID 5: Block Interleaved
Distributed-Parity
block 0 block 1 block 2 block 3 P(0-3)
•It uses striping and the disks (strips) are independently accessible.
•The difference between RAID 4 and RAID 5 is the parity location.
•In RAID 4, parity is written to a dedicated disk, creating a write bottleneck
for the parity disk.
•In RAID 5, parity is distributed across all disks.
•The distribution of parity in RAID 5 overcomes the write bottleneck.
RAID 5 – Independent Disks
with Distributed Parity Block 0
Block 4
Block 1
Block 5
Parity
RAID Block 2
Block 0
4 Block 40
Generated
Controller Block 6
P4
05 1627
3
Block 3
Host
P4567
P0123
Block 7