0% found this document useful (0 votes)
6 views

DSA Unit VI Hashing and File Organization

Uploaded by

rsghewande2004
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

DSA Unit VI Hashing and File Organization

Uploaded by

rsghewande2004
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 56

Unit VI

Hashing and File Organization

1
Unit VI - Syllabus
Hashing: Hash tables and scattered tables:
Basic concepts, hash function, characteristics
of good hash function, Different key-to-address
transformations techniques, synonyms or
collisions, collision resolution techniques- linear
probing, quadratic probing, rehashing, chaining
with and without replacement.
File:Concept of File, File types and file
organization (sequential, index sequential and
Direct Access), Comparison of different file
organizations.
2
Hash tables and scattered tables

3
Hash Table
⚫Hash table is a data structure that is used to
store <key : value> pairs

⚫It uses a hash function to compute an index to


an array in which an element will be inserted
or searched

⚫Under reasonable assumptions the average


time required to search for an element in a
hash table is O(1).

4
Introduction
⚫Hashing is finding an address where the data is to be
stored as well as located using a key with the help of the a
function.
⚫Hashing is a method of directly computing the address of
the record with the help of a key by using a suitable
mathematical function called the hash function
⚫The resulting address is used as the basis for storing and
retrieving records and this address is called as home
address of the record
⚫For array to store a record in a hash table, hash function
is applied to the key of the record being stored, returning
an index within the range of the hash table
⚫The item is then stored in the table of that index position
Introduction

6
Hash Function
⚫A hash function is a mathematical function that
converts a numerical input value into another
compressed numerical value.

⚫The mod method: In this method for creating hash


functions, we map a key into one of the slots of
table by taking the remainder of key divided by
table size. That is, the hash function is

⚫h(key) = key mod table_size


⚫ h(11)= 16 % 10
⚫A perfect hash function is one that maps the set of
actual key values to the table without any collisions.

7
Hash Function

Sr.No. Key Hash Array Index


1 1 1 % 20 = 1 1
2 2 2 % 20 = 2 2
3 42 42 % 20 = 2 2
4 4 4 % 20 = 4 4
5 12 12 % 20 = 12 12
6 14 14 % 20 = 14 14
7 17 17 % 20 = 17 17
8 13 13 % 20 = 13 13
8
9 37 37 % 20 = 17 17
Bucket
⚫Bucket is an index position in hash table that
can store more than one record

⚫When the same index is mapped with two


keys, then both the records are stored in the
same bucket

9
Collision
⚫The result of two keys hashing into the same
address is called collision

1
0
Synonym

⚫Keys those hash to the same


address are called synonyms

1
1
Overflow

⚫The result of more keys hashing to


the same address and if there is no
room in the bucket, then it is said
that overflow has occurred

1
2
Load density and Load factor

⚫The maximum storage capacity that is


maximum number of records that can be
accommodated is called as load density.

⚫Load factor is the number of records


stored in table divided by maximum
capacity of table, expressed in terms of
percentage

1
3
Features of a good hash function
⚫The average performance of hashing depends on
how the
hash function distributes the set of keys among
the slots

⚫Assumption is that any given record is equally


likely to hash into any of the slots, independently
of whether any other record has been already
hashed to it or not

⚫This assumption is called as simple uniform


hashing

⚫A good hash function is the one which satisfies


the assumption of simple uniform hashing
Hash Functions

⚫Division Method
⚫Multiplication Method
⚫Extraction Method
⚫Mid-Square Hashing
⚫Folding Technique
⚫Rotation
⚫Universal Hashing
Division method
⚫One of the required features of the hash function is
that
the resultant index must be within the table index
range

⚫One simple choice for a hash function is to use the


modulus division indicated as MOD (the operator %
in C/C++)

⚫The function returns an integer

⚫If any parameter is NULL, the result is NULL

⚫Hash(Key) = Key % M
Multiplication method
⚫The multiplication method works as:
1.Multiply the key ‘Key’ by a constant A in the
range 0 < A < 1 and extract the fractional part of
Key ´ A

2.Then multiply this value by M and take the


floor of the result
Extraction method
❖ When a of the key is used for
portion the
technique addresscalled as
is the
calculation, extraction
the method
❖ In digit extraction, few digits are selected and extracted
from the key which are used as the address

Key Hashed Address


345678 357
234137 243
952671 927
Mid-Square Hashing method
⚫The mid-square hashing suggests to take
square of the key
and extract the middle digits of the squared
key as address

⚫The difficulty is when the key is large. As the


entire key participates in the address
calculation, if the key is large, then it is very
difficult to store the square of it as the square
of key should not exceed the storage limit

⚫So mid-square is used when the key size is less


than or equal to 4 digits
Key and address using mid-square

Key Square Hashed


Address
2341 5480281 802
1671 2792241 922

The difficulty of storing larger numbers square can be overcome


if for squaring we use few of digits of key instead of the
whole key
We can select a portion of key if key is larger in size and then square
the portion of it

Keys and addresses using extracting few digits,


squaring
them, and again extracting mid

Key Square Hashed


Address
234137 234 x 234 = 027889 788
567187 567 x 567 = 321489 148
Folding technique
⚫In folding technique, the key is subdivided
into subparts that are combined or folded and
then combined to form the address

⚫For the key with digits, we can subdivide the


digits in three parts, add them up, and use the
result as an address.

⚫Here the size of subparts of key could be as


that of the address
Folding technique
⚫There are two types of folding methods:
⚫Fold shift — Key value is divided into several
parts of that of the size of the address. Left,
right, and middle parts are added

⚫Fold boundary — Key value is divided into


parts of that of the size of the address

⚫Left and right parts are folded on fixed


boundary between them and the centre part
Folding technique
⚫For example, if the key is 987654321, it is
understood as
⚫Left 987 Centre 654 Right 321

⚫For fold shift, addition is


⚫987 + 654 + 321 = 1962

⚫Now discard digit 1 and the address is 962

⚫For fold boundary, addition of reverse part is


⚫789 + 456 + 123 = 1368
⚫Discard digit 1 and the address is 368
Rehashing
⚫Rehashing means hashing again.

⚫when the load factor increases to more than


its pre-defined value (default value of load
factor is 0.75), the complexity increases.

⚫This might not give the required time


complexity of O(1).

⚫So to overcome this, the size of the table is


increased (doubled) , all the values are hashed
again and stored in the new double sized
array to maintain a low load factor and low
25
complexity.
How Rehashing is done?
⚫For each addition of a new entry to the
table, check the load factor.
⚫If it’s greater than its pre-defined value (or
default value of 0.75 if not given), then
Rehash.
⚫To rehash, make a new array of double the
previous size and make it the new bucket
array.
⚫Then traverse to each element in the old
bucket Array and call the insert() for each so
as to insert it into the new larger bucket
array.
26
How Rehashing is done?

⚫Load factor ( ) = n/M


⚫Before mapping keys, we have to find load
factor
⚫If load factor ( ) <0.75 , no need to apply
rehashing.(i,e, n<M)
⚫If load factor >0.75 , apply rehashing
⚫Step I – Increase the number of buckets i.e.
initial bucket N🡪 N’ is modified bucket
⚫Where N’ = closest prime number to 2N
⚫Step II – Modify the hash function
i.e. hash(key)=key % N to hash’(key)=key % N’
27
Example - Rehashing
⚫Insert keys (6,7,8) into hash table with the
size 3 and hash function is hash(key) =
key MOD 3 Hash Keys
values
0 6
⚫6 % 3 = 0
1 7
⚫7 % 3 = 1
2 8
⚫8 % 3 = 2

⚫Load factor = n/M = 3/3 = 1 so apply


rehashing
28
Example - Rehashing
⚫hash’(key) =key % N’ (N’ is closest prime
number to 2n)
⚫2n = 2*3 =6 and closest prime number to 6
is 7 , so new hash table size is now 7
⚫Modify the hash function i.e. hash’(key) =
key % 7 Hash keys
values
0 7
1 8
⚫Insert 6 = 6%7=6
2
⚫Insert 7 = 7 % 7 = 0 3
⚫Insert 8 = 8 % 7 = 1 4
5
⚫Load factor = 3/7= <0.75 6 6
29
Collision resolution strategies
❖ No hash function is perfect.

❖ If Hash(Key1) = Hash(Key2), then Key1 and Key2 are


synonyms and if bucket size is 1, we say that collision has
occurred

❖ As a consequence, we have to store the record Key2 at


some other location

❖ A search is made for a bucket in which a record is stored


containing Key2, using one of the several collision
resolution strategies
Collision resolution strategies

❖ Open addressing (CLOSED HASHING)


❖ Linear probing
❖ Quadratic probing
❖ Double hashing, and
❖ Key offset

❖ Closed addressing (OPEN HASHING / Separate chaining (or


linked list)

3
1
Scattered
Tables

32
Open addressing
⚫In open addressing, when collision
occurs, it is resolved by finding an
available empty location other than
the home address
⚫If Hash(Key) is not empty, the
positions are probed in the following
sequence until an empty location is
found.
⚫When we reach the end of table, the
search is wrapped around to start and
the search continues till the current
collide location
3
3
Linear Probing
⚫A hash table in which a collision is resolved
by putting the item in the next empty place
in following the occupied place is called
linear probing
⚫This strategy looks for the next free
location until it is found is called as Probing.
⚫The function that we can use for probing
linearly from the next location is as follows:

⚫(Hash(x) + p(i)) MOD Max

⚫As p(i) = i for linear probing, the function

❖ becomes (Hash(x)+ i) MOD Max
⚫Initially i = 1, if the location is not empty
then it becomes 2, 3, 4, …, and so on till 3
4
Linear Probing
index key
43,135,72,2 0 H(k) = k
3,99,19,82. 1 mod M
H’(k) = (h(k)+i )
89 2 mod M
3
4
5
6
7
8
35 9
Disadvantages of Linear Probing

⚫Search time O(n) in worst case


⚫Difficulty in deletion
⚫Primary clustering
⚫Secondary clustering

36
Variants of Linear Probing

❖ With replacement

❖ Without replacement

3
7
Linear Probing With replacement

❖ If the slot is already occupied by the key there are two


possibilities, that is, either it is home address
(collision) or not key’s home address

❖ If the key’s actual address is different, then the new


key having the address at that slot is placed at that
position and the key with other address is placed in
the next empty position

3
8
Linear Probing Without
replacement

❖ When some data is to be stored in hash table, and if


the slot is already occupied by the key then another
empty location is searched for a new record

❖ There are two possibilities when location is


occupied—it is its home address or not key’s home
address.

❖ In both the cases, the without replacement strategy


empty position is searched for the key that is to be
stored
3
9
2. Quadratic Probing

⚫Quadratic probing is an open-


addressing scheme where we look
for i2th slot in i’th iteration if the
given hash value x collides in the
hash table.

⚫It operates by taking the original


index and adding successive values
of an arbitrary quadratic
polynomial until an open slot is 4
0
How Quadratic Probing is done ?

Let hash(x) be the slot index computed using


the hash function.

⚫If the slot hash(x) % M is full, then we try


(hash(x) + 1*1) % M.
⚫If (hash(x) + 1*1) % M is also full, then we try
(hash(x) + 2*2) % M.
⚫If (hash(x) + 2*2) % M is also full, then we try
(hash(x) + 3*3) % M.
⚫This process is repeated for all the values of i
until an empty slot is found.
41
Example - Quadratic Probing

⚫Problem statement
⚫Insert Ki at first free location from (u+i
2)
MOD M where i=0 to (M-1)
⚫KEYS : 3,2,9,6,11,13,7,12
⚫H(k) = 2k+3
⚫M=10
⚫Use modulo-division method and quadratic
probing to store these keys.

42
Example - Quadratic Probing
Index Keys Key Location no. of
0 probe(s)
1
2
3
4
5
6
7
8
43 9
Concept of File
Hashing: Hash tables and scattered tables:
Basic concepts, hash function, characteristics
of good hash function, Different key-to-address
transformations techniques, synonyms or
collisions, collision resolution techniques- linear
probing, quadratic probing, rehashing, chaining
with and without replacement.
File:Concept of File, File types and file
organization (sequential, index sequential and
Direct Access), Comparison of different file
organizations.
44
Concept of File

45
Concept of File

46
Concept of File

47
Concept of File

48
file organization

49
file organization

50
file organization

51
file organization

52
file organization

53
file organization

54
file organization

55
file organization

56

You might also like