0% found this document useful (0 votes)

6 views

DSA Unit VI Hashing and File Organization

Uploaded by

rsghewande2004

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views

DSA Unit VI Hashing and File Organization

Uploaded by

rsghewande2004

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 56

Unit VI

Hashing and File Organization

1
Unit VI - Syllabus
Hashing: Hash tables and scattered tables:
Basic concepts, hash function, characteristics
of good hash function, Different key-to-address
transformations techniques, synonyms or
collisions, collision resolution techniques- linear
probing, quadratic probing, rehashing, chaining
with and without replacement.
File:Concept of File, File types and file
organization (sequential, index sequential and
Direct Access), Comparison of different file
organizations.
2
Hash tables and scattered tables

3
Hash Table
⚫Hash table is a data structure that is used to
store <key : value> pairs

⚫It uses a hash function to compute an index to

an array in which an element will be inserted
or searched

⚫Under reasonable assumptions the average

time required to search for an element in a
hash table is O(1).

4
Introduction
⚫Hashing is finding an address where the data is to be
stored as well as located using a key with the help of the a
function.
⚫Hashing is a method of directly computing the address of
the record with the help of a key by using a suitable
mathematical function called the hash function
⚫The resulting address is used as the basis for storing and
retrieving records and this address is called as home
address of the record
⚫For array to store a record in a hash table, hash function
is applied to the key of the record being stored, returning
an index within the range of the hash table
⚫The item is then stored in the table of that index position
Introduction

6
Hash Function
⚫A hash function is a mathematical function that
converts a numerical input value into another
compressed numerical value.

⚫The mod method: In this method for creating hash

functions, we map a key into one of the slots of
table by taking the remainder of key divided by
table size. That is, the hash function is

⚫h(key) = key mod table_size

⚫ h(11)= 16 % 10
⚫A perfect hash function is one that maps the set of
actual key values to the table without any collisions.

7
Hash Function

Sr.No. Key Hash Array Index

1 1 1 % 20 = 1 1
2 2 2 % 20 = 2 2
3 42 42 % 20 = 2 2
4 4 4 % 20 = 4 4
5 12 12 % 20 = 12 12
6 14 14 % 20 = 14 14
7 17 17 % 20 = 17 17
8 13 13 % 20 = 13 13
8
9 37 37 % 20 = 17 17
Bucket
⚫Bucket is an index position in hash table that
can store more than one record

⚫When the same index is mapped with two

keys, then both the records are stored in the
same bucket

9
Collision
⚫The result of two keys hashing into the same
address is called collision

1
0
Synonym

⚫Keys those hash to the same

address are called synonyms

1
1
Overflow

⚫The result of more keys hashing to

the same address and if there is no
room in the bucket, then it is said
that overflow has occurred

1
2
Load density and Load factor

⚫The maximum storage capacity that is

maximum number of records that can be
accommodated is called as load density.

⚫Load factor is the number of records

stored in table divided by maximum
capacity of table, expressed in terms of
percentage

1
3
Features of a good hash function
⚫The average performance of hashing depends on
how the
hash function distributes the set of keys among
the slots

⚫Assumption is that any given record is equally

likely to hash into any of the slots, independently
of whether any other record has been already
hashed to it or not

⚫This assumption is called as simple uniform

hashing

⚫A good hash function is the one which satisfies

the assumption of simple uniform hashing
Hash Functions

⚫Division Method
⚫Multiplication Method
⚫Extraction Method
⚫Mid-Square Hashing
⚫Folding Technique
⚫Rotation
⚫Universal Hashing
Division method
⚫One of the required features of the hash function is
that
the resultant index must be within the table index
range

⚫One simple choice for a hash function is to use the

modulus division indicated as MOD (the operator %
in C/C++)

⚫The function returns an integer

⚫If any parameter is NULL, the result is NULL

⚫Hash(Key) = Key % M
Multiplication method
⚫The multiplication method works as:
1.Multiply the key ‘Key’ by a constant A in the
range 0 < A < 1 and extract the fractional part of
Key ´ A

2.Then multiply this value by M and take the

floor of the result
Extraction method
❖ When a of the key is used for
portion the
technique addresscalled as
is the
calculation, extraction
the method
❖ In digit extraction, few digits are selected and extracted
from the key which are used as the address

Key Hashed Address

345678 357
234137 243
952671 927
Mid-Square Hashing method
⚫The mid-square hashing suggests to take
square of the key
and extract the middle digits of the squared
key as address

⚫The difficulty is when the key is large. As the

entire key participates in the address
calculation, if the key is large, then it is very
difficult to store the square of it as the square
of key should not exceed the storage limit

⚫So mid-square is used when the key size is less

than or equal to 4 digits
Key and address using mid-square

Key Square Hashed

Address
2341 5480281 802
1671 2792241 922

The difficulty of storing larger numbers square can be overcome

if for squaring we use few of digits of key instead of the
whole key
We can select a portion of key if key is larger in size and then square
the portion of it

Keys and addresses using extracting few digits,

squaring
them, and again extracting mid

Key Square Hashed

Address
234137 234 x 234 = 027889 788
567187 567 x 567 = 321489 148
Folding technique
⚫In folding technique, the key is subdivided
into subparts that are combined or folded and
then combined to form the address

⚫For the key with digits, we can subdivide the

digits in three parts, add them up, and use the
result as an address.

⚫Here the size of subparts of key could be as

that of the address
Folding technique
⚫There are two types of folding methods:
⚫Fold shift — Key value is divided into several
parts of that of the size of the address. Left,
right, and middle parts are added

⚫Fold boundary — Key value is divided into

parts of that of the size of the address

⚫Left and right parts are folded on fixed

boundary between them and the centre part
Folding technique
⚫For example, if the key is 987654321, it is
understood as
⚫Left 987 Centre 654 Right 321

⚫For fold shift, addition is

⚫987 + 654 + 321 = 1962

⚫Now discard digit 1 and the address is 962

⚫For fold boundary, addition of reverse part is

⚫789 + 456 + 123 = 1368
⚫Discard digit 1 and the address is 368
Rehashing
⚫Rehashing means hashing again.

⚫when the load factor increases to more than

its pre-defined value (default value of load
factor is 0.75), the complexity increases.

⚫This might not give the required time

complexity of O(1).

⚫So to overcome this, the size of the table is

increased (doubled) , all the values are hashed
again and stored in the new double sized
array to maintain a low load factor and low
25
complexity.
How Rehashing is done?
⚫For each addition of a new entry to the
table, check the load factor.
⚫If it’s greater than its pre-defined value (or
default value of 0.75 if not given), then
Rehash.
⚫To rehash, make a new array of double the
previous size and make it the new bucket
array.
⚫Then traverse to each element in the old
bucket Array and call the insert() for each so
as to insert it into the new larger bucket
array.
26
How Rehashing is done?

⚫Load factor ( ) = n/M

⚫Before mapping keys, we have to find load
factor
⚫If load factor ( ) <0.75 , no need to apply
rehashing.(i,e, n<M)
⚫If load factor >0.75 , apply rehashing
⚫Step I – Increase the number of buckets i.e.
initial bucket N🡪 N’ is modified bucket
⚫Where N’ = closest prime number to 2N
⚫Step II – Modify the hash function
i.e. hash(key)=key % N to hash’(key)=key % N’
27
Example - Rehashing
⚫Insert keys (6,7,8) into hash table with the
size 3 and hash function is hash(key) =
key MOD 3 Hash Keys
values
0 6
⚫6 % 3 = 0
1 7
⚫7 % 3 = 1
2 8
⚫8 % 3 = 2

⚫Load factor = n/M = 3/3 = 1 so apply

rehashing
28
Example - Rehashing
⚫hash’(key) =key % N’ (N’ is closest prime
number to 2n)
⚫2n = 2*3 =6 and closest prime number to 6
is 7 , so new hash table size is now 7
⚫Modify the hash function i.e. hash’(key) =
key % 7 Hash keys
values
0 7
1 8
⚫Insert 6 = 6%7=6
2
⚫Insert 7 = 7 % 7 = 0 3
⚫Insert 8 = 8 % 7 = 1 4
5
⚫Load factor = 3/7= <0.75 6 6
29
Collision resolution strategies
❖ No hash function is perfect.

❖ If Hash(Key1) = Hash(Key2), then Key1 and Key2 are

synonyms and if bucket size is 1, we say that collision has
occurred

❖ As a consequence, we have to store the record Key2 at

some other location

❖ A search is made for a bucket in which a record is stored

containing Key2, using one of the several collision
resolution strategies
Collision resolution strategies

❖ Open addressing (CLOSED HASHING)

❖ Linear probing
❖ Quadratic probing
❖ Double hashing, and
❖ Key offset

❖ Closed addressing (OPEN HASHING / Separate chaining (or

linked list)

3
1
Scattered
Tables

32
Open addressing
⚫In open addressing, when collision
occurs, it is resolved by finding an
available empty location other than
the home address
⚫If Hash(Key) is not empty, the
positions are probed in the following
sequence until an empty location is
found.
⚫When we reach the end of table, the
search is wrapped around to start and
the search continues till the current
collide location
3
3
Linear Probing
⚫A hash table in which a collision is resolved
by putting the item in the next empty place
in following the occupied place is called
linear probing
⚫This strategy looks for the next free
location until it is found is called as Probing.
⚫The function that we can use for probing
linearly from the next location is as follows:
❖
⚫(Hash(x) + p(i)) MOD Max
❖
⚫As p(i) = i for linear probing, the function
❖
❖ becomes (Hash(x)+ i) MOD Max
⚫Initially i = 1, if the location is not empty
then it becomes 2, 3, 4, …, and so on till 3
4
Linear Probing
index key
43,135,72,2 0 H(k) = k
3,99,19,82. 1 mod M
H’(k) = (h(k)+i )
89 2 mod M
3
4
5
6
7
8
35 9
Disadvantages of Linear Probing

⚫Search time O(n) in worst case

⚫Difficulty in deletion
⚫Primary clustering
⚫Secondary clustering

36
Variants of Linear Probing

❖ With replacement

❖ Without replacement

3
7
Linear Probing With replacement

❖ If the slot is already occupied by the key there are two

possibilities, that is, either it is home address
(collision) or not key’s home address

❖ If the key’s actual address is different, then the new

key having the address at that slot is placed at that
position and the key with other address is placed in
the next empty position

3
8
Linear Probing Without
replacement

❖ When some data is to be stored in hash table, and if

the slot is already occupied by the key then another
empty location is searched for a new record

❖ There are two possibilities when location is

occupied—it is its home address or not key’s home
address.

❖ In both the cases, the without replacement strategy

empty position is searched for the key that is to be
stored
3
9
2. Quadratic Probing

⚫Quadratic probing is an open-

addressing scheme where we look
for i2th slot in i’th iteration if the
given hash value x collides in the
hash table.

⚫It operates by taking the original

index and adding successive values
of an arbitrary quadratic
polynomial until an open slot is 4
0
How Quadratic Probing is done ?

Let hash(x) be the slot index computed using

the hash function.

⚫If the slot hash(x) % M is full, then we try

(hash(x) + 1*1) % M.
⚫If (hash(x) + 1*1) % M is also full, then we try
(hash(x) + 2*2) % M.
⚫If (hash(x) + 2*2) % M is also full, then we try
(hash(x) + 3*3) % M.
⚫This process is repeated for all the values of i
until an empty slot is found.
41
Example - Quadratic Probing

⚫Problem statement
⚫Insert Ki at first free location from (u+i
2)
MOD M where i=0 to (M-1)
⚫KEYS : 3,2,9,6,11,13,7,12
⚫H(k) = 2k+3
⚫M=10
⚫Use modulo-division method and quadratic
probing to store these keys.

42
Example - Quadratic Probing
Index Keys Key Location no. of
0 probe(s)
1
2
3
4
5
6
7
8
43 9
Concept of File
Hashing: Hash tables and scattered tables:
Basic concepts, hash function, characteristics
of good hash function, Different key-to-address
transformations techniques, synonyms or
collisions, collision resolution techniques- linear
probing, quadratic probing, rehashing, chaining
with and without replacement.
File:Concept of File, File types and file
organization (sequential, index sequential and
Direct Access), Comparison of different file
organizations.
44
Concept of File

45
Concept of File

46
Concept of File

47
Concept of File

48
file organization

49
file organization

50
file organization

51
file organization

52
file organization

53
file organization

54
file organization

55
file organization

Chem 1220 Assignment 4
No ratings yet
Chem 1220 Assignment 4
5 pages
Case 2
100% (1)
Case 2
8 pages
Pizza Bomb Feasib Compiled
100% (3)
Pizza Bomb Feasib Compiled
235 pages
Hashing
No ratings yet
Hashing
30 pages
Hashing new
No ratings yet
Hashing new
48 pages
Unit-5 2
No ratings yet
Unit-5 2
9 pages
Hashing and Graphs
No ratings yet
Hashing and Graphs
28 pages
Hash Tables: Dr. Dibakar Saha
No ratings yet
Hash Tables: Dr. Dibakar Saha
26 pages
3 Hashing
No ratings yet
3 Hashing
20 pages
Hashing Part 1 Lecture
No ratings yet
Hashing Part 1 Lecture
33 pages
Hashing
No ratings yet
Hashing
56 pages
MODULE-5
No ratings yet
MODULE-5
33 pages
Hashing
No ratings yet
Hashing
20 pages
Hashing
No ratings yet
Hashing
42 pages
Hashing
No ratings yet
Hashing
34 pages
Study_Material_on_Hashing
No ratings yet
Study_Material_on_Hashing
4 pages
UNIT 1- Hashing
No ratings yet
UNIT 1- Hashing
118 pages
ADS M TECH MID 2
No ratings yet
ADS M TECH MID 2
26 pages
Hashing PDF
No ratings yet
Hashing PDF
56 pages
Hash
No ratings yet
Hash
7 pages
Done DS GTU Study Material Presentations Unit-4 13032021035653AM
No ratings yet
Done DS GTU Study Material Presentations Unit-4 13032021035653AM
24 pages
HAshing (Satish sir)
No ratings yet
HAshing (Satish sir)
52 pages
Hashing PPT For Student
No ratings yet
Hashing PPT For Student
53 pages
Hashing
No ratings yet
Hashing
75 pages
GROUP 15.Pptx Presentation
No ratings yet
GROUP 15.Pptx Presentation
29 pages
Hashing
No ratings yet
Hashing
30 pages
Hashing
No ratings yet
Hashing
23 pages
Hashing
No ratings yet
Hashing
44 pages
Hash Function
No ratings yet
Hash Function
9 pages
Cse373 10 Hashing
No ratings yet
Cse373 10 Hashing
36 pages
Hash Table: Didih Rizki Chandranegara
No ratings yet
Hash Table: Didih Rizki Chandranegara
33 pages
UNIT V - Hashing
No ratings yet
UNIT V - Hashing
20 pages
Hashing
No ratings yet
Hashing
23 pages
DSA Lab 11 Hashing
No ratings yet
DSA Lab 11 Hashing
9 pages
Hashing Algorithms
No ratings yet
Hashing Algorithms
22 pages
Hashing and Skiplist_removed
No ratings yet
Hashing and Skiplist_removed
113 pages
Hashing
No ratings yet
Hashing
16 pages
HASHING
No ratings yet
HASHING
63 pages
Week13 1
No ratings yet
Week13 1
16 pages
DS Lecture - 6 (Hashing)
No ratings yet
DS Lecture - 6 (Hashing)
32 pages
2,2Hashing
No ratings yet
2,2Hashing
30 pages
unit 1 Hashing
No ratings yet
unit 1 Hashing
61 pages
DSA MK Lect2 PDF
No ratings yet
DSA MK Lect2 PDF
92 pages
DS Lecture - 6 (Hashing)
No ratings yet
DS Lecture - 6 (Hashing)
26 pages
Unit-5
No ratings yet
Unit-5
50 pages
MODULE 5_BCS304_HASHING_Leftisht trees_OBST_Notes
No ratings yet
MODULE 5_BCS304_HASHING_Leftisht trees_OBST_Notes
32 pages
Unit 1 Dsa Hashing
No ratings yet
Unit 1 Dsa Hashing
137 pages
DS Lecture - 6 (Hashing)
No ratings yet
DS Lecture - 6 (Hashing)
27 pages
Unit 1 Dsa Hashing 2022 Compressed 1
No ratings yet
Unit 1 Dsa Hashing 2022 Compressed 1
115 pages
05 Hashing
No ratings yet
05 Hashing
47 pages
Hashing
No ratings yet
Hashing
23 pages
Hashing Techniques
No ratings yet
Hashing Techniques
13 pages
Hashing
No ratings yet
Hashing
4 pages
Hashing in Data Structure
No ratings yet
Hashing in Data Structure
43 pages
Lect Hashing
No ratings yet
Lect Hashing
36 pages
Hashing
No ratings yet
Hashing
37 pages
DS Module-X
No ratings yet
DS Module-X
74 pages
CO4 - Hashing in Data Structure
No ratings yet
CO4 - Hashing in Data Structure
13 pages
Chapter 4 Hashing and File Structure
No ratings yet
Chapter 4 Hashing and File Structure
46 pages
DSA G5 Hashing Handouts
No ratings yet
DSA G5 Hashing Handouts
7 pages
Hashing
From Everand
Hashing
Prakash Hegade
No ratings yet
Secrets of Business Math Using Excel!
From Everand
Secrets of Business Math Using Excel!
Andrei Besedin
No ratings yet
Basic Math Notes
From Everand
Basic Math Notes
Ernest Bywater
5/5 (2)
Phase 8 Coverage: Coverpoint
No ratings yet
Phase 8 Coverage: Coverpoint
4 pages
COS3701 2024 Oct-Nov Examination
No ratings yet
COS3701 2024 Oct-Nov Examination
4 pages
Hydro Graphs
100% (1)
Hydro Graphs
40 pages
High-Precision Sampling For Brillouin-Zone Integration in Metals
No ratings yet
High-Precision Sampling For Brillouin-Zone Integration in Metals
6 pages
The Adjective: The Rich Should Help The Poor
No ratings yet
The Adjective: The Rich Should Help The Poor
3 pages
Indira Gandhi National Open University: Rcchandigarh@ignou - Ac.in WWW - Ignou.ac - in
No ratings yet
Indira Gandhi National Open University: Rcchandigarh@ignou - Ac.in WWW - Ignou.ac - in
5 pages
Our Commonwealth - December 2009
No ratings yet
Our Commonwealth - December 2009
3 pages
Sri 1
No ratings yet
Sri 1
2 pages
Lowereyelid Blepharoplasty: Gregory H. Branham
No ratings yet
Lowereyelid Blepharoplasty: Gregory H. Branham
10 pages
Space Marine Battlefleet Gothic Reglas y Medidas de Naves
No ratings yet
Space Marine Battlefleet Gothic Reglas y Medidas de Naves
12 pages
Misprision of Perjury of Oath of Office
100% (5)
Misprision of Perjury of Oath of Office
2 pages
Pro Prompt Civit AI
No ratings yet
Pro Prompt Civit AI
5 pages
Quick-Teck Design Document (GSM+GPS Tracking Module)
No ratings yet
Quick-Teck Design Document (GSM+GPS Tracking Module)
8 pages
Edsc 304 - Choice Board - Tic Tac Toe
No ratings yet
Edsc 304 - Choice Board - Tic Tac Toe
4 pages
Airport
No ratings yet
Airport
14 pages
(Ebook) Fundamentals of Thermal-Fluid Sciences by Yunus A. Cengel, John M. Cimbala, Afshin J. Ghajar ISBN 9781260597585, 126059758X - Download the ebook today and own the complete version
100% (3)
(Ebook) Fundamentals of Thermal-Fluid Sciences by Yunus A. Cengel, John M. Cimbala, Afshin J. Ghajar ISBN 9781260597585, 126059758X - Download the ebook today and own the complete version
72 pages
Nglish Speaking/listening - Lessons: Unit 1 - Introducing Each Others
No ratings yet
Nglish Speaking/listening - Lessons: Unit 1 - Introducing Each Others
7 pages
M1 NutritionAndDietTherapy-TRANSES
No ratings yet
M1 NutritionAndDietTherapy-TRANSES
5 pages
Tutorial 8
No ratings yet
Tutorial 8
10 pages
School Project Proposal
No ratings yet
School Project Proposal
36 pages
Hive Installation on Windows 10
No ratings yet
Hive Installation on Windows 10
13 pages
Ship Details-Macao Strait
No ratings yet
Ship Details-Macao Strait
9 pages
Public Relation in Pakistan: Prepared By: Nida Kifayat Tayyeba Bibi
No ratings yet
Public Relation in Pakistan: Prepared By: Nida Kifayat Tayyeba Bibi
25 pages
Download EU Foreign Policy and Post Soviet Conflicts Stealth Intervention 1st Edition Nicu Popescu ebook All Chapters PDF
100% (5)
Download EU Foreign Policy and Post Soviet Conflicts Stealth Intervention 1st Edition Nicu Popescu ebook All Chapters PDF
81 pages
Features: Cabinet Reference Boxes, Doors, Dress Panels, Rack Mounting, and Accessories
No ratings yet
Features: Cabinet Reference Boxes, Doors, Dress Panels, Rack Mounting, and Accessories
6 pages
Cply
No ratings yet
Cply
1 page
ECC Summary Approval of Framework Agreement
No ratings yet
ECC Summary Approval of Framework Agreement
30 pages