0% found this document useful (0 votes)

96 views

MIT6 006F11 Lec08 PDF

This document summarizes key points about hashing from a lecture on the topic: - Hashing is used to implement dictionaries or hash tables for fast lookup of key-value pairs in O(1) time by mapping keys to table indices. - Collisions occur when different keys hash to the same index, and are handled using chaining by storing colliding items in a linked list at that index. - A "good" hash function maps keys uniformly to indices to minimize collisions and keep lookup time low on average. Common techniques include division, multiplication, and universal hashing methods.

Uploaded by

whatthefu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

96 views

MIT6 006F11 Lec08 PDF

Uploaded by

whatthefu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Lecture 8 Hashing I 6.

006 Fall 2011

Lecture 8: Hashing I

Lecture Overview
• Dictionaries and Python

• Motivation

• Prehashing

• Hashing

• Chaining

• Simple uniform hashing

• “Good” hash functions

Dictionary Problem
Abstract Data Type (ADT) — maintain a set of items, each with a key, subject to

• insert(item): add item to set

• delete(item): remove item from set

• search(key): return item with key if it exists

We assume items have distinct keys (or that inserting new one clobbers old).
Balanced BSTs solve in O(lg n) time per op. (in addition to inexact searches like next-
largest).
Goal: O(1) time per operation.

Python Dictionaries:
Items are (key, value) pairs e.g. d = {‘algorithms’: 5, ‘cool’: 42}

d.items() → [(‘algorithms’, 5),(‘cool’,5)]

d[‘cool’] → 42
d[42] → KeyError
‘cool’ in d → True
42 in d → False

Python set is really dict where items are keys (no values)

1
Lecture 8 Hashing I 6.006 Fall 2011

Motivation
Dictionaries are perhaps the most popular data structure in CS

• built into most modern programming languages (Python, Perl, Ruby, JavaScript,
Java, C++, C#, . . . )

• e.g. best docdist code: word counts & inner product

• implement databases: (DB HASH in Berkeley DB)

– English word → definition (literal dict.)

– English words: for spelling correction
– word → all webpages containing that word
– username → account object

• compilers & interpreters: names → variables

• network routers: IP address → wire

• network server: port number → socket/app.

• virtual memory: virtual address → physical

Less obvious, using hashing techniques:

• substring search (grep, Google) [L9]

• string commonalities (DNA) [PS4]

• file or directory synchronization (rsync)

• cryptography: file transfer & identification [L10]

How do we solve the dictionary problem?

Simple Approach: Direct Access Table
This means items would need to be stored in an array, indexed by key (random access)

2
Lecture 8 Hashing I 6.006 Fall 2011

0
1
2
key item

key item

key item
.
.
.

Figure 1: Direct-access table

Problems:

1. keys must be nonnegative integers (or using two arrays, integers)

2. large key range =⇒ large space — e.g. one key of 2256 is bad news.

2 Solutions:

Solution to 1 : “prehash” keys to integers.

• In theory, possible because keys are finite =⇒ set of keys is countable

• In Python: hash(object) (actually hash is misnomer should be “prehash”) where

object is a number, string, tuple, etc. or object implementing hash (default = id
= memory address)

• In theory, x = y ⇔ hash(x) = hash(y)

• Python applies some heuristics for practicality: for example, hash(‘\0B ’) = 64 =

hash(‘\0\0C’)

• Object’s key should not change while in table (else cannot find it anymore)

• No mutable objects like lists

Solution to 2 : hashing (verb from French ‘hache’ = hatchet, & Old High German ‘happja’
= scythe)

• Reduce universe U of all keys (say, integers) down to reasonable size m for table

• idea: m ≈ n = # keys stored in dictionary

• hash function h: U → {0, 1, . . . , m − 1}

3
Lecture 8 Hashing I 6.006 Fall 2011

T
0
k1 1
. . .
U . . . k. k3
k k . k.
1
h(k 1) = 1
.
. .
2

k.
4

. 3

k2
m-1

Figure 2: Mapping keys to a table

• two keys ki , kj ∈ K collide if h(ki ) = h(kj )

How do we deal with collisions?

We will see two ways

1. Chaining: TODAY

2. Open addressing: L10

Chaining
Linked list of colliding elements in each slot of table

k. . . .
U k k . k.
1

2 k1 k4 k2
4
.k 3

. h(k 1) =
k3 h(k 2) =
h(k )
4

Figure 3: Chaining in a Hash Table

• Search must go through whole list T[h(key)]

• Worst case: all n keys hash to same slot =⇒ Θ(n) per operation

4
Lecture 8 Hashing I 6.006 Fall 2011

Simple Uniform Hashing:

An assumption (cheating): Each key is equally likely to be hashed to any slot of table,
independent of where other keys are hashed.

let n = # keys stored in table

m = # slots in table
load factor α = n/m = expected # keys per slot = expected length of a chain

Performance
This implies that expected running time for search is Θ(1+α) — the 1 comes from applying
the hash function and random access to the slot whereas the α comes from searching the
list. This is equal to O(1) if α = O(1), i.e., m = Ω(n).

Hash Functions
We cover three methods to achieve the above performance:

Division Method:
h(k) = k mod m
This is practical when m is prime but not too close to power of 2 or 10 (then just depending
on low bits/digits).

But it is inconvenient to find a prime number, and division is slow.

Multiplication Method:
h(k) = [(a · k) mod 2w ] (w − r)
where a is random, k is w bits, and m = 2r .
This is practical when a is odd & 2w−1 < a < 2w & a not too close to 2w−1 or 2w .

Multiplication and bit extraction are faster than division.

5
Lecture 8 Hashing I 6.006 Fall 2011

w
k

x 1 1 1 a

k
k
k

}
r

Figure 4: Multiplication Method

Universal Hashing
[6.046; CLRS 11.3.3]

For example: h(k) = [(ak + b) mod p] mod m where a and b are random ∈ {0, 1, . . . p − 1},
and p is a large prime (> |U|).
This implies that for worst case keys k1 6= k2 , (and for a, b choice of h):
1
P ra,b {event Xk1 k2 } = P ra,b {h(k1 ) = h(k2 )} =
m
This lemma not proved here
This implies that:
X
Ea,b [# collisions with k1 ] = E[ Xk1 k2 ]
k2
X
= E[Xk1 k2 ]
k2
X
= P r{Xk1 k2 = 1}
| {z }
k2 1
m
n
= =α
m
This is just as good as above!

6
MIT OpenCourseWare
http://ocw.mit.edu

6.006 Introduction to Algorithms

Fall 2011

For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

Linear Algebra
From Everand
Linear Algebra
Georgi E. Shilov
2.5/5 (3)
Solution of First Order Linear PDE
86% (7)
Solution of First Order Linear PDE
7 pages
Lecture 8 Hashing
No ratings yet
Lecture 8 Hashing
47 pages
Dsa 4
No ratings yet
Dsa 4
55 pages
Lecture 13 - Hash Tables
No ratings yet
Lecture 13 - Hash Tables
51 pages
Hash Tables - : Structure
No ratings yet
Hash Tables - : Structure
21 pages
1 Hashing: 1.1 Desired Properties
No ratings yet
1 Hashing: 1.1 Desired Properties
8 pages
Lecture03 Hashing
No ratings yet
Lecture03 Hashing
12 pages
1 Hashing: 1.1 Maintaining A Dictionary
No ratings yet
1 Hashing: 1.1 Maintaining A Dictionary
17 pages
Hashing
No ratings yet
Hashing
96 pages
Lecture 7 - Hash - Table - Direct - Adreess - Tables - Hash - Tables - Intro - Separate - Chaining
No ratings yet
Lecture 7 - Hash - Table - Direct - Adreess - Tables - Hash - Tables - Intro - Separate - Chaining
77 pages
Ads-Unit I
No ratings yet
Ads-Unit I
16 pages
Analysis of Algorithms CS 477/677: Hashing Instructor: George Bebis
No ratings yet
Analysis of Algorithms CS 477/677: Hashing Instructor: George Bebis
53 pages
Hashing
No ratings yet
Hashing
29 pages
Cse373 10 Hashing
No ratings yet
Cse373 10 Hashing
36 pages
CH8 Hashing
No ratings yet
CH8 Hashing
110 pages
12 Hashing
No ratings yet
12 Hashing
9 pages
Hashing
No ratings yet
Hashing
111 pages
Hashing PPT For Student
No ratings yet
Hashing PPT For Student
53 pages
CSC 302 - Hashing Techniques
No ratings yet
CSC 302 - Hashing Techniques
19 pages
Analysis of Algorithms CS 477/677: Hashing Instructor: George Bebis
No ratings yet
Analysis of Algorithms CS 477/677: Hashing Instructor: George Bebis
53 pages
Unit28 Hashing1
No ratings yet
Unit28 Hashing1
19 pages
Introduction To Hashing & Hashing Techniques: Review of Searching Techniques
No ratings yet
Introduction To Hashing & Hashing Techniques: Review of Searching Techniques
19 pages
Lect Hashing
No ratings yet
Lect Hashing
36 pages
Hashing and Indexing
No ratings yet
Hashing and Indexing
28 pages
11 Hashing
No ratings yet
11 Hashing
60 pages
DSA MK Lect2 PDF
No ratings yet
DSA MK Lect2 PDF
92 pages
Dictionary ADT: Dictionaries 4/1/2003 8:43 AM
No ratings yet
Dictionary ADT: Dictionaries 4/1/2003 8:43 AM
4 pages
05 Hashing
No ratings yet
05 Hashing
47 pages
Chapter 8 - Searching
No ratings yet
Chapter 8 - Searching
44 pages
06 - Hashing
No ratings yet
06 - Hashing
75 pages
Hash Table PDF
No ratings yet
Hash Table PDF
25 pages
06 Hashing
No ratings yet
06 Hashing
6 pages
Hashing - Datastructures and Algorithms
No ratings yet
Hashing - Datastructures and Algorithms
32 pages
Hashing
No ratings yet
Hashing
30 pages
14 Hashing
No ratings yet
14 Hashing
61 pages
Hash Tables
No ratings yet
Hash Tables
35 pages
Hash Table
No ratings yet
Hash Table
24 pages
Lecture 12
No ratings yet
Lecture 12
33 pages
Hashing
No ratings yet
Hashing
56 pages
Hashing PDF
No ratings yet
Hashing PDF
65 pages
3 Hashing
No ratings yet
3 Hashing
20 pages
Dictionaries: Sets
No ratings yet
Dictionaries: Sets
92 pages
Hashing
No ratings yet
Hashing
20 pages
Lab 3
No ratings yet
Lab 3
5 pages
unit 1 Hashing
No ratings yet
unit 1 Hashing
61 pages
HASHING
No ratings yet
HASHING
63 pages
Hashing Interactivepy
No ratings yet
Hashing Interactivepy
11 pages
Module 5
No ratings yet
Module 5
25 pages
Dsa Merged
No ratings yet
Dsa Merged
339 pages
09 Hashtable
No ratings yet
09 Hashtable
53 pages
Hashing
No ratings yet
Hashing
30 pages
CH 4
No ratings yet
CH 4
58 pages
MODULE 5_BCS304_HASHING_Leftisht trees_OBST_Notes
No ratings yet
MODULE 5_BCS304_HASHING_Leftisht trees_OBST_Notes
32 pages
Chapter10 Part1
No ratings yet
Chapter10 Part1
12 pages
Hashing RPK
No ratings yet
Hashing RPK
61 pages
DS Lecture - 6 (Hashing)
No ratings yet
DS Lecture - 6 (Hashing)
27 pages
Unit III-Hashing
100% (1)
Unit III-Hashing
135 pages
0.1 Direct-Address Tables
No ratings yet
0.1 Direct-Address Tables
10 pages
Elementary Functional Analysis
From Everand
Elementary Functional Analysis
Georgi E. Shilov
4/5 (1)
The Green Book of Mathematical Problems
From Everand
The Green Book of Mathematical Problems
Kenneth Hardy
4.5/5 (3)
Course Code Course Title Teacher Code Day Room Time Slot
No ratings yet
Course Code Course Title Teacher Code Day Room Time Slot
2 pages
Random Variable 1 PDF
No ratings yet
Random Variable 1 PDF
4 pages
Linear Differential Equations With Constant Coefficient
75% (4)
Linear Differential Equations With Constant Coefficient
9 pages
ECE 251 - Lecture 02 PDF
100% (2)
ECE 251 - Lecture 02 PDF
7 pages
CNS - M4 - Hash Function - Requirement, Security
No ratings yet
CNS - M4 - Hash Function - Requirement, Security
30 pages
IO Parallelism
No ratings yet
IO Parallelism
4 pages
3rd Term DP Notes For Ss2
100% (2)
3rd Term DP Notes For Ss2
60 pages
Memprof: A Memory Profiler For Ruby
100% (5)
Memprof: A Memory Profiler For Ruby
142 pages
Tender Form
No ratings yet
Tender Form
175 pages
DSA Assignment I006
No ratings yet
DSA Assignment I006
8 pages
Number Theory: Applications: Slides by Christopher M. Bourke Instructor: Berthe Y. Choueiry
No ratings yet
Number Theory: Applications: Slides by Christopher M. Bourke Instructor: Berthe Y. Choueiry
109 pages
Viva PPT Final-1
No ratings yet
Viva PPT Final-1
31 pages
DBMS Capsule
No ratings yet
DBMS Capsule
4 pages
Ch09 Space and Time Tradeoffs
No ratings yet
Ch09 Space and Time Tradeoffs
41 pages
Unit 4 - Compiler Design - WWW - Rgpvnotes.in
No ratings yet
Unit 4 - Compiler Design - WWW - Rgpvnotes.in
15 pages
Designing Datvault 2.0
No ratings yet
Designing Datvault 2.0
18 pages
CO4 - Hashing in Data Structure
No ratings yet
CO4 - Hashing in Data Structure
13 pages
Capstone Report
No ratings yet
Capstone Report
14 pages
An Empirical Study of The Out of Memory Errors in Apache Spark
No ratings yet
An Empirical Study of The Out of Memory Errors in Apache Spark
28 pages
Redis Cluster Specification-12
No ratings yet
Redis Cluster Specification-12
1 page
Acc320 Exam 1
No ratings yet
Acc320 Exam 1
5 pages
An Algorithm For Differential File Comparison: J. W. Hunt
No ratings yet
An Algorithm For Differential File Comparison: J. W. Hunt
9 pages
Hashing
No ratings yet
Hashing
24 pages
Teradata Architecture PDF Free
No ratings yet
Teradata Architecture PDF Free
89 pages
Selection Test - Sample - Paper
No ratings yet
Selection Test - Sample - Paper
24 pages
Exercises For Advanced Algorithms WS 20/21: Institut Für Informatik Abteilung 1
No ratings yet
Exercises For Advanced Algorithms WS 20/21: Institut Für Informatik Abteilung 1
2 pages
CSC2203 Algorithms Data Structure Lecture Notes 1
No ratings yet
CSC2203 Algorithms Data Structure Lecture Notes 1
16 pages
Teradata Interview Questions
No ratings yet
Teradata Interview Questions
6 pages
Unit 5
No ratings yet
Unit 5
4 pages
Yan Pritzker - Inventing Bitcoin - The Technology Behind The First Truly Scarce and Decentralized Money Explained 1 (2019, Amazon Digital Services)
No ratings yet
Yan Pritzker - Inventing Bitcoin - The Technology Behind The First Truly Scarce and Decentralized Money Explained 1 (2019, Amazon Digital Services)
88 pages
AWS S3 Bucket Connectivity Using Azure Functions
No ratings yet
AWS S3 Bucket Connectivity Using Azure Functions
7 pages
BDA_LabManual_2024-25
No ratings yet
BDA_LabManual_2024-25
46 pages
Privacy Preserving Query Processing Using Third Parties
No ratings yet
Privacy Preserving Query Processing Using Third Parties
10 pages
Java Lab Manual
No ratings yet
Java Lab Manual
66 pages

MIT6 006F11 Lec08 PDF

Uploaded by

MIT6 006F11 Lec08 PDF

Uploaded by

Lecture 8 Hashing I 6.

006 Fall 2011

• Simple uniform hashing

• “Good” hash functions

• insert(item): add item to set

• delete(item): remove item from set

• search(key): return item with key if it exists

d.items() → [(‘algorithms’, 5),(‘cool’,5)]

• e.g. best docdist code: word counts & inner product

• implement databases: (DB HASH in Berkeley DB)

– English word → definition (literal dict.)

• compilers & interpreters: names → variables

• network routers: IP address → wire

• network server: port number → socket/app.

• virtual memory: virtual address → physical

Less obvious, using hashing techniques:

• substring search (grep, Google) [L9]

• string commonalities (DNA) [PS4]

• file or directory synchronization (rsync)

• cryptography: file transfer & identification [L10]

How do we solve the dictionary problem?

Figure 1: Direct-access table

1. keys must be nonnegative integers (or using two arrays, integers)

Solution to 1 : “prehash” keys to integers.

• In theory, possible because keys are finite =⇒ set of keys is countable

• In Python: hash(object) (actually hash is misnomer should be “prehash”) where

• In theory, x = y ⇔ hash(x) = hash(y)

• Python applies some heuristics for practicality: for example, hash(‘\0B ’) = 64 =

• No mutable objects like lists

• idea: m ≈ n = # keys stored in dictionary

• hash function h: U → {0, 1, . . . , m − 1}

Figure 2: Mapping keys to a table

• two keys ki , kj ∈ K collide if h(ki ) = h(kj )

How do we deal with collisions?

We will see two ways

2. Open addressing: L10

Figure 3: Chaining in a Hash Table

• Search must go through whole list T[h(key)]

Simple Uniform Hashing:

let n = # keys stored in table

But it is inconvenient to find a prime number, and division is slow.

Multiplication and bit extraction are faster than division.

Figure 4: Multiplication Method

6.006 Introduction to Algorithms

You might also like