Hashing
Hashing
Hashing
Hashing
♯ Introduction
Why hashing is needed?
Structure Array Linked list BST
Add first O(n) O(1) O(height)
Add last O(1) O(1) O(height)
Search O(n) O(n) O(height) -> O(logn)
Remove O(n) O(1) O(height)
Hashing 2
♯ Introduction…
Hashing: Partitioning a
large set into subsets
Hashing 3
♯ Learning Outcomes
Hashing 4
♯ Contents
1- Basic of Hashing
2- Common Hash Functions
3- Data storage of a hash structure
4- Common methods of a hash table
5- Collision Resolution
6- Load Factors, Rehashing, and Efficiency
7- Deletion
Introduction:
8- Perfect Hash Functions - Definition
9- External Hashing- Hash Functions for Extendible(extensible) Files
10- Hashing in java.util
Your works: Re-implement the given project
Hashing 5
♯ 1- Basic of Hashing
• What is hashing? A process in which a large data set will be
partitioned into some data subsets.
• What is the tool for hashing? hash function
• What will hash function do? This function is constructed by
implementer which accepts input data ( whole initial data or a
chunk of initial data or memory address of data) and an unique
index is it’s output index of a subset.
• Who will implement hash function? Hashing implementer.
• Data storage is used in hashing is called as hash table.
Number of subsets is called as table-size.
• What are hashing used for?
– Storing group of objects supporting search effectively.
– Adding security to content in cryptography (read yourself)
Hashing 6
♯ 1- Basic of Hashing …
Example: A hash function
Hashing 7
♯ 2- Common Hash Functions
• If the hash function gives the same index on different objects then
they belong to the same subset.
• If h transforms different keys into different numbers, it is called a
perfect hash function A subset contains ONE element only
Ideal case.
• To create a perfect hash function, the table has to contain at least the
same number of positions as the number of elements being hashed.
Hashing 8
♯ 2- Common Hash Functions…
Hashing 9
♯ 2- Common Hash Functions…
4
3
Hash
Object’s Key integer
O(1)
Hashing 10
♯ 2- Common Hash Functions
• The division method is the preferred choice(%, modulo).
TSize =sizeof(table), as in h(K) = K mod Tsize(K%Tsize)
Hashing 11
♯ 3- Data Storage of a Hash Structure
Hashing 12
♯ 3- Data Storage of a Hash Structure…
- Common Arrays and Hash tables
- In a common array, all data objects are identified
using unique indices and stored in a consecutive
blocks.
- In a hash table, hash function will determine the index
each stored data object.
There can be empty entries in the hash table.
The structure of a hash table entry contains
information:
Hashing 13
♯ 3- Data Storage of a Hash Structure…
Data index Storage
K1, val1 0
1
K2, val2 2
K3, val3 h 3
4
… 5
Kn, valn 6
7
…
…
…
…
Structure of a hash table entry: m-1
Method Purpose
get (key) Getting value of a given key
put (key, value) Add a data object to hash table
remove (key) Remove a data object
size() Number of stored data objects
isEmpty() Checking the hash table is empty or not
keySet() Getting key set
values() Getting set of values
Hashing 15
♯ 5- Collision Resolution
Hashing 16
♯ 5- Collision Resolution…
Case: Central Storage
Hashing 17
♯ 5- Collision Resolution…
Case: Central Storage
Open Addressing Method:
Linear Probing, p(i) =i, h’(K) = (h(K) + i) mod TSize
i= 1, 2, 3…
Resolving collisions with the linear probing method. Subscripts indicate the home
positions of the keys being hashed.
Hashing 18
♯ 5- Collision Resolution…
Case: Central Storage
Open Addressing Method:
Quadratic method: p(i)= i2, h’(K) = (h(K) i2) mod TSize
Insert B9
Collision
Insert B5 probe: i=1
Collision h(9+1)=h(10)=0 OK
probe: i=1 Insert C2
h(5+1)=h(6)=6 Collision
OK probe: i=1
Insert B2 h(2+1)=h(3)=3 No OK
Collision h(2-1)=h(1)=1 No OK
probe: i=1 probe: i=2, i2=4
h(2+1)=h(3)=3 h(2+4)= 6 No OK
No OK h(2-4), -2<0 No OK
h(2-1)=h(1)=1 probe: i=3, i2=9
OK h(2+9)= 1 No OK
h(2-9)= 2-9<0 No OK
probe: i=4, i2=16
h(2+16)= 8 OK
Using quadratic probing for collision resolution: h’(K) = (h(K) i2) mod 10
Hashing 19
♯ 5- Collision Resolution…
Case: Central Storage
Hashing 20
♯ 5- Collision Resolution…
Chaining Method
• Keys do not have to stored in table itself, each
position of the table is associated with a linked
list or chain of structures whose info fields
store keys or references to keys
• This method is called separate chaining, and a
table of references (pointers) is called a scatter
table (bảng phân phối)
Hashing 21
♯ 5- Collision Resolution…
Separate Chaining
Method
h(K) index of a
linked list of
elements having
the same value of
hash function.
Hashing 23
♯ 5- Collision Resolution…
Coalesced chaining
Index of
next
element
in the
same 7
group
Default
:
-1 9
Main
area
When cellar
is full,
inserted
element will
be put to
the main
Cellar: overflow area
region
Mechansm: bottom-up
Hashing 25
♯ 5- Collision Resolution…
Hashing 26
♯ 5- Collision Resolution…
Bucket Addressing
bucket
Insert C2
Collision
Use linear probing
Bucket 3 containing
a space
Insert C2 to bucket
3
Hashing 27
♯ 5- Collision Resolution…
Bucket
Addressing
bucket
Reference to separate
overflow area
Collision
resolution with
buckets and
overflow area
Hashing 28
♯ 6- Load Factors, Rehashing, and Efficiency
Array-based Hash table
• Efficiency:
Hashing 29
♯ 7- Deletion
Begin
h(k)
index
Search
Group
and
contains k
delete k
End
Hashing 30
♯ 7- Deletion…
• Structure and collision resolution of the hash
table will decide the way by which its elements
are deleted. They can be
– Linear search for deletion
– Linear search to locate the linked list of the
subgroup then delete an element in this linked
list.
– Linear search to locate the subgroup then delete
an element in this subgroup, update references to
next elements in the same subgroup.
Hashing 31
♯ 7- Deletion…
H’(k) = H(k) + i
Update locations
Delete A4
Linear search in the situation
where both insertion and deletion of keys are permitted
Hashing 32
♯ 8- Perfect Hash Functions
Hashing 33
♯ 9- External Hashing
Hashing 34
♯ 9- External Hashing …
Hashing 35
♯ 9- External Hashing…
Hashing 36
♯ 9- External Hashing…
Hashing 37
♯ 10- Hashing in the java.util Package
Hashing 38
♯ The java.util.HashMap class
Hashing 39
♯ The java.util.HashMap class…
Click to go the
HashSet class
Hashing 42
♯ The java.util.HashMap class…
Demonstrating
the operation of
the methods in
class HashMap
Hashing 44
♯ The java.utr]il.HashSet class
Hashing 45
♯ The java.util.HashSet class…
Hashing 48
♯ The java.util.HashTable
Hashing 49
♯ The java.util.Hashtable class…
Hashing 50
♯ The java.util.Hashtable class…
Hashing 51
♯ The java.util.Hashtable class…
Hashing 52
♯ The java.util.Hashtable class…
Hashing 53
♯ Summary: Learning Outcomes
LO7.1 Explain the concept of "hash". Define concepts
hash function and hash table and their application.
LO7.2 Demonstrate the types of hash functions: Division,
Folding,...
LO7.2 Explain the collision and collision-handling.
LO7.3 Explain the open addressing method for collision-
resolution: linear and quadratic probing.
LO7.4 Explain the chaining method for collision-resolution:
separate chaining and Coalesced chaining.
LO7.5 Define perfect hash function and extendible
hashing.
Hashing 54
♯ Summary
Hashing 55
♯ Summary (continued)
Hashing 56
♯ Summary (continued)
Hashing 57
♯ Notices about hash tables
• When should hash tables be used:
– Elements in a group are different and insertion
and search are main operations.
• What are things to be concerned before a hash
table is implemented?
– Choose a key for each element: number/string?
– Choose a hash function
– Choose a collision resolution
because these things will affect on algorithms that
will be chosen in our hash table.
Hashing 58
♯ Lab 1: Using HashMap to compute probabilities
of characters in a text file
Hashing 59
♯ Lab 1: Using HashMap to compute probabilities
of characters in a text file
Hashing 60
♯ Lab 1: Using HashMap to compute probabilities
of characters in a text file
Hashing 61
Lab 2:
♯ Using HashTable to manage a student list
SE140606,NGUYỄN TRỌNG HẢI,7
SE141127,VÕ TRỌNG ĐẠT,4
SE140913,TRẦN MINH HIẾU,7
SE62440,ĐOÀN LƯƠNG PHÚ,6
SE141153,THÁI ĐỨC THẢO,5
SE140244,PHẠM NHẬT TÂN,8
SE140861,PHẠM ĐĂNG HẢI,5
SE140929,NGUYỄN LÊ ANH LONG,9
SE140755,LÊ ANH DUY,8
SE140618,LÝ GIA HUY,8
SE63394,VŨ VĂN KHẢI,9
SE63391,BÙI LÊ QUỐC THẮNG,4
SE140367,CAO DUY QUANG,9
SE140130,TRẦN VĂN TÂM,4
SE140923,NGUYỄN VĂN TÂN,5
SE130182,DIỆP MINH THÔNG,6
SE140877,NGUYỄN HỒNG SƠN,6
SE140813,NGUYỄN ĐĂNG HUY,6
SE140503,LÊ VĨNH HƯNG,3
SE140874,LÊ HỮU HIẾU,6
SE141086,NGUYỄN MẠNH LỰC,9
SE140873,TÔN THẤT BẢO,4
SE140067,NGUYỄN TRẦN HOÀNG
LONG,5
SE140855,TRẦN HOÀNG HẢI DUY,5
SE140885,CAO HOÀNG QUY,7
SE140203,HÀ GIA PHƯỚC,3
SE130610,THÁI TIẾN ĐẠT,7
SE151525,TẠ MINH TIẾN,3
Hashing 62
♯ Lab 2: Using HashTable to manage a student list
Hashing 63
♯ Lab 2: Using HashTable to manage a student list
Hashing 64
♯ Lab 2: Using HashTable to manage a student list
Hashing 65
♯ Lab 2: Using HashTable to manage a student list
Hashing 66
♯ Lab 2: Using HashTable to manage a student list
Hashing 67
♯ Lab 2: Using HashTable to manage a student list
Hashing 69
♯ Lab 2: Using HashTable to manage a student list
Hashing 70
♯ Lab 2: Using
HashTable to
manage a student
list
Hashing 71
♯ Lab 2: Using HashTable to manage a student list
Hashing 72
♯ Lab 2: Using HashTable to manage a student list
Hashing 73
♯ Bonus: Hashing in Cryptography
Hai nhóm giải thuật: Digest, chuỗi
SHA: Security Hash Algorithms, Hash bit có độ dài
MDA: Message Digest Algorithms Content
function cố định hoặc
thay đổi
- Các cách biến đổi rất cầu kỳ
để khó bẻ khóa. Có thể có độn Digest: nội dung tóm tắt
thêm data (gọi là muối, salt) hay chữ ký số
- Hàm băm tốt ít xung đột
Áp dụng:
(1) Bảo vệ password trong các database để admin của hệ thống cũng không thể hack.
(2) Tạo tính tin cậy của dữ liệu giao dịch ( kỹ thuật blockchain)
Hashing 74
♯
Thank you.
Hashing 75