0% found this document useful (0 votes)

111 views34 pages

Hash Tables: Professor Jennifer Rexford COS 217

The document discusses hash tables as an efficient data structure for storing and accessing (key, value) pairs by using a hash function to map keys to array indices, it explains how to implement a hash table using an array of linked lists and provides an example of adding strings to an empty hash table to demonstrate how the hash function and modulo operation distribute the keys across the array.

Uploaded by

Waqas Riaz

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

111 views34 pages

Hash Tables: Professor Jennifer Rexford COS 217

Uploaded by

Waqas Riaz

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 34

Hash Tables

Professor Jennifer Rexford

COS 217

1
Goals of Today’s Lecture
• Motivation for hash tables
 Examples of (key, value) pairs
 Limitations of using arrays and linked lists

• Hash tables
 Hash table data structure
 Hash functions
 Example hashing code

• Implementing “mod” efficiently

 Binary representation of numbers
 Logical bit operators

2
Accessing Data By a Key
• Student grades: (name, grade)
 E.g., (“john smith”, 84), (“jane doe”, 93), (“bill clinton”, 81)
 Gradeof(“john smith”) returns 84
 Gradeof(“joe schmoe”) returns NULL

• Wine inventory: (name, #bottles)

 E.g., (“tapestry”, 3), (“latour”, 12), (“margaux”, 3)
 Bottlesof(“latour”) returns 12
 Bottlesof(“giesen”) returns NULL

• Years when a war started: (year, war)

 E.g., (1776, “Revolutionary”), (1861, “Civil War”), (1939, “WW2”)
 Warstarted(1939) returns “WW2”
 Warstarted(1984) returns NULL

• Symbol table: (variable name, variable value)

 E.g., (“MAXARRAY”, 2000), (“FOO”, 7), (“BAR”, -10) 3
Limitations of Using an Array
• Array stores n values indexed 0, …, n-1
 Index is an integer
 Max size must be known in advance

• But, the key in a (key, value) pair might not be a number

 Well, could convert it to a number
 And, have a separate number for each possible name

• But, we’d need an extremely large array

 Large number of possible keys (e.g., all names, all years, etc.)
 And, the number of unique keys might even be unknown
 And, most of the array elements would be empty
1776

1861

1939
4
Could Use an Array of (key, value)
• Alternative way to use an array
 Array element i is a struct that stores key and value

0 1776 Revolutionary
1 1861 Civil
2 1939 WW2

• Managing the array

 Add an elements: add to the end
 Remove an element: find the element, and copy last element over it
 Find an element: search from the beginning of the array

• Problems
 Allocating too little memory: run out of space
 Allocating too much memory: wasteful of space
5
Linked List to Adapt Memory Size
• Each element is a struct struct Entry {
 Key key int key;
 Value value
 Pointer to next element char* value;
next
struct Entry *next;
• Linked list };
 Pointer to the first element in the list
 Functions for adding and removing elements
 Function for searching for an element with a particular key

head
key key key
value value value
next next next null
6
Adding Element to a List
• Add new element at front of list
 Make ptr of new element point to the current first element
– new->next = head;
 Make the head of the list point to the new element
– head = new;

new head

key key key key

value value value value
next next next next null
7
Locating an Element in a List
• Sequence through the list by key value
 Return pointer to the element
 … or NULL if no element is found

for (p = head; p!=NULL; p=p->next) {

if (p->key == 1861)
return p;
}
return NULL;

head p p
1776 1861 1939
value value value
next next next null 8
Locate and Remove an Element (1)
• Sequence through the list by key value
 Keep track of the previous element in the list

prev = NULL;
for (p = head; p!=NULL; prev=p, p=p->next){
if (p->key == 1861) {
delete the element (see next slide!);
break;
}
pprev p
head
1776 1861 1939
value value value
next next next null 9
Locate and Remove an Element (2)
• Delete the element
 Head element: make head point to the second element
 Non-head element: make previous Entry point to next element

if (p == head)
head = head->next;
else
prev->next = p->next;

prev p
head
1776 1861 1939
value value value
next next next null 10
List is Not Good for (key, value)
• Good place to start
 Simple algorithm and data structure
 Good to allow early start on design and test of client code

• But, testing might show that this is not efficient enough

 Removing or locating an element
– Requires walking through the elements in the list
 Could store elements in sorted order
– But, keeping them in sorted order is time consuming
– And, searching by key in the sorted list still takes time

• Ultimately, we need a better approach

 Memory efficient: adds extra memory as needed
 Time efficient: finds element by its key instantly (or nearly)

11
Hash Table
• Fixed-size array where each element points to a linked list
0

TABLESIZE-1

struct Entry *hashtab[TABLESIZE];

• Function mapping each key to an array index

 For example, for an integer key h
– Hash function: i = h % TABLESIZE (mod function)
 Go to array element i, i.e., the linked list hashtab[i]
– Search for element, add element, remove element, etc. 12
Example
• Array of size 5 with hash function “h mod 5”
 “1776 % 5” is 1
 “1861 % 5” is 1
 “1939 % 5” is 4

1776 1861
0 Revolution Civil
1
2
3
4
1939
WW2

13
How Large an Array?
• Large enough that average “bucket” size is 1
 Short buckets mean fast look-ups
 Long buckets mean slow look-ups

• Small enough to be memory efficient

 Not an excessive number of elements
 Fortunately, each array element is just storing a pointer

• This is OK:
0

TABLESIZE-1
14
What Kind of Hash Function?
• Good at distributing elements across the array
 Distribute results over the range 0, 1, …, TABLESIZE-1
 Distribute results evenly to avoid very long buckets

• This is not so good:

TABLESIZE-1

15
Hashing String Keys to Integers
• Simple schemes don’t distribute the keys evenly enough
 Number of characters, mod TABLESIZE
 Sum the ASCII values of all characters, mod TABLESIZE
 …

• Here’s a reasonably good hash function

 Weighted sum of characters xi in the string
– ( aixi) mod TABLESIZE
 Best if a and TABLESIZE are relatively prime
– E.g., a = 65599, TABLESIZE = 1024

16
Implementing Hash Function
• Potentially expensive to compute ai for each value of i
 Computing ai for each value of I
 Instead, do (((x[0] * 65599 + x[1]) * 65599 + x[2]) * 65599 + x[3]) * …

unsigned hash(char *x) {

int i; unsigned int h = 0;
for (i=0; x[i]; i++)
h = h * 65599 + x[i];
return (h % 1024);
}

Can be more clever than this for powers of two! 17

Hash Table Example
Example: TABLESIZE = 7
Lookup (and enter, if not present) these strings: the, cat, in, the, hat
Hash table initially empty.
First word: the. hash(“the”) = 965156977. 965156977 % 7 = 1.
Search the linked list table[1] for the string “the”; not found.

0
1
2
3
4
5
6
18
Hash Table Example
Example: TABLESIZE = 7
Lookup (and enter, if not present) these strings: the, cat, in, the, hat
Hash table initially empty.
First word: “the”. hash(“the”) = 965156977. 965156977 % 7 = 1.
Search the linked list table[1] for the string “the”; not found
Now: table[1] = makelink(key, value, table[1])
0
the
1
2
3
4
5
6
19
Hash Table Example
Second word: “cat”. hash(“cat”) = 3895848756. 3895848756 % 7 = 2.

Search the linked list table[2] for the string “cat”; not found
Now: table[2] = makelink(key, value, table[2])

0
the
1
2
3
4
5
6
20
Hash Table Example
Third word: “in”. hash(“in”) = 6888005. 6888005% 7 = 5.
Search the linked list table[5] for the string “in”; not found
Now: table[5] = makelink(key, value, table[5])

0
the
1
2
3 cat
4
5
6
21
Hash Table Example
Fourth word: “the”. hash(“the”) = 965156977. 965156977 % 7 = 1.
Search the linked list table[1] for the string “the”; found it!

0
the
1
2
3 cat
4
in
5
6
22
Hash Table Example
Fourth word: “hat”. hash(“hat”) = 865559739. 865559739 % 7 = 2.
Search the linked list table[2] for the string “hat”; not found.
Now, insert “hat” into the linked list table[2].
At beginning or end? Doesn’t matter.

0
the
1
2
3 cat
4
in
5
6
23
Hash Table Example
Inserting at the front is easier, so add “hat” at the front

0
the
1
2
3 hat cat
4
in
5
6
24
Example Hash Table C Code
• Element in the hash table

struct Nlist {
char *key;
char *value;
struct Nlist *next;
};

• Hash table
 struct Nlist *hashtab[1024];

• Three functions
 Hash function: unsigned hash(char *x)
 Look up with key: struct Nlist *lookup(char *s)
 Install entry: struct Nlist *install(char *key, *value)
25
Lookup Function
• Lookup based on key
 Key is a string *s
 Return pointer to matching hash-table element
 … or return NULL if no match is found

struct Nlist lookup(char s) {

struct Nlist *p;

for (p = hashtab[hash(s)]; p!=NULL; p=p->next)

if (strcmp(s, p->key) == 0)
return p; /* found */
return NULL; /* not found */
}
26
Install an Entry (1)
• Install and (key, value) pair
 Add new Entry if none exists, or overwrite the old value
 Return a pointer to the Entry

struct Nlist install(char key, char *value) {

struct Nlist *p;

create and add new Entry (see next slide);

} else /* already there, so discard old value */
free(p->value);
p->value = malloc(strlen(value) + 1);
assert(p->value != NULL);
strcpy(p->value, value);
return p;
}
27
Install an Entry (2)
• Create and install a new Entry
 Allocate memory for the new struct and the key
 Insert into the appropriate linked list in the hash table

p = malloc(sizeof(*p));
assert(p != NULL);
p->key = malloc(strlen(key) + 1);
assert(p->key != NULL);
strcpy(p->key, key);

/* add to front of linked list */

unsigned hashval = hash(key);
p->next = hashtab[hashval]
hashtab[hashval] = p;
28
Why Bother Copying the Key?
• In the example, why did I do
p->key = malloc(strlen(key) + 1);
strcpy(p->key, key);
• Instead of simply
p->key = key;
• After all, the client passed me key, which is a pointer
 So, storage for the key has already been allocated
 Don’t I simply need to copy the address where the string is stored?

• I want to preserve the integrity of the hash table

 Even if the client program ultimately “frees” the memory for key
 So, the install function makes a copy of the key

• Hash table owns key, because it is part of data structure 29

Revisiting Hash Functions
• Potentially expensive to compute “mod c”
 Involves division by c and keeping the remainder
 Easier when c is a power of 2 (e.g., 16 = 24)

• Binary (base 2) representation of numbers

 E.g., 53 = 32 + 16 + 4 + 1
32 16 8 4 2 1
0 0 1 1 0 1 0 1

 E.g., 53 % 16 is 5, the last four bits of the number

32 16 8 4 2 1
0 0 0 0 0 1 0 1

 Would like an easy way to isolate the last four bits… 30

Bitwise Operators in C
• Bitwise AND (&) • Bitwise OR (|)
& 0 1 | 0 1
0 0 0 0 0 1
1 0 1 1 1 1
 Mod on the cheap!
– E.g., h = 53 & 15;
• One’s complement (~)
53 0 0 1 1 0 1 0 1  Turns 0 to 1, and 1 to 0
 E.g., set last three bits to 0
& 15 0 0 0 0 1 1 1 1 – x = x & ~7;

5 0 0 0 0 0 1 0 1
31
Bitwise Operators in C (Continued)
• Shift left (<<)
 Shift some # of bits to the left, filling the blanks with 0
 E.g., n << 2 shifts left by 2 bits
– If n is 1012 (i.e., 510), then n<<2 is 101002 (ie., 2010)
 Multiplication by powers of two on the cheap!

• Shift right (>>)

 Shift some # of bits to the right
– For unsigned integer, fill in blanks with 0
– What about signed integers?
• Can vary from one machine to another!
 E.g., n>>2 shifts right by 2 bits
– If n is 101102 (i.e., 2210), then n>>2 is 1012 (ie., 510)
 Division by powers of two on the cheap! 32
Stupid Programmer Tricks
• Confusing (val % 1024) with (val & 1024)
 Drops from 1024 bins to two useful bins
 You really wanted (val & 1023)

• Speeding up compare
 For any non-trivial value comparison function
 Trick: store full hash result in structure
struct Nlist *lookup(char *s) {
struct Nlist *p;
int val = hash(s); /* no % in hash function */

for (p = hashtab[val%1024]; p!=NULL; p=p->next)

if (p->hash == val && strcmp(s, p->key) == 0)
return p;
return NULL;
}
33
Summary of Today’s Lecture
• Linked lists
 A list is always the size it needs to be to store its contents
– Useful when the number of items may change frequently!
 A list can be rearranged simply by manipulating pointers
– When items are added/deleted, other items aren’t moved
– Useful when items are large and, hence, expensive to move!

• Hash tables
 Invaluable for storing (key, value) pairs
 Very efficient lookups
– If the hash function is good and the table size is large enough

• Bit-wise operators in C
 AND (&) and OR (|) – note: they are different from && and ||
 One’s complement (~) to flip all bits
 Left shift (<<) and right shift (>>) by some number of bits
34

Module 5 TREES
No ratings yet
Module 5 TREES
37 pages
Kami Export - Abdirahman Ali - U1sigfigspracticews 1
No ratings yet
Kami Export - Abdirahman Ali - U1sigfigspracticews 1
1 page
Intro To C - Module 10
No ratings yet
Intro To C - Module 10
10 pages
Lec12-Hash-Tables-09092024-090609pm (1)
No ratings yet
Lec12-Hash-Tables-09092024-090609pm (1)
48 pages
Hashing
No ratings yet
Hashing
44 pages
Hashing
No ratings yet
Hashing
9 pages
Hashing in Data Structure
No ratings yet
Hashing in Data Structure
43 pages
2 Hashing
No ratings yet
2 Hashing
11 pages
09 Hashtable
No ratings yet
09 Hashtable
53 pages
8 Hashtables
No ratings yet
8 Hashtables
84 pages
11 Hashtable-1
No ratings yet
11 Hashtable-1
48 pages
DSA2 Chapter 5 Hashing
No ratings yet
DSA2 Chapter 5 Hashing
44 pages
Hashing
No ratings yet
Hashing
11 pages
Hashing Cropped (1)
No ratings yet
Hashing Cropped (1)
12 pages
Lecture03 Hashing
No ratings yet
Lecture03 Hashing
12 pages
Hash Table Data Structure
No ratings yet
Hash Table Data Structure
34 pages
DSL writeup
No ratings yet
DSL writeup
64 pages
Skip List & Hashing: Cse, Postech
No ratings yet
Skip List & Hashing: Cse, Postech
36 pages
05 Hashing
No ratings yet
05 Hashing
47 pages
CS301 Lec41
No ratings yet
CS301 Lec41
18 pages
Hashing Part1
No ratings yet
Hashing Part1
73 pages
DSAL writeups
No ratings yet
DSAL writeups
51 pages
CH 4
No ratings yet
CH 4
58 pages
Maps
No ratings yet
Maps
36 pages
Lecture 13 - Hash Tables
No ratings yet
Lecture 13 - Hash Tables
51 pages
DATA STRUCTURES DIGITAL NOTES-101-110
No ratings yet
DATA STRUCTURES DIGITAL NOTES-101-110
10 pages
Lect Hashing
No ratings yet
Lect Hashing
36 pages
Hashing PDF
No ratings yet
Hashing PDF
61 pages
Week 12 Hashing
No ratings yet
Week 12 Hashing
24 pages
Hashing RPK
No ratings yet
Hashing RPK
61 pages
10 Hash Table
No ratings yet
10 Hash Table
25 pages
Intro To Hashing
No ratings yet
Intro To Hashing
10 pages
Task 2 - Hashing and Linear Probing
No ratings yet
Task 2 - Hashing and Linear Probing
16 pages
Ds Impp
No ratings yet
Ds Impp
22 pages
L15 Maps and Hashes
No ratings yet
L15 Maps and Hashes
41 pages
Lecture 8 Hashing
No ratings yet
Lecture 8 Hashing
47 pages
Data Structures and Algorithms: CS245-2010S-13 Hash Tables
No ratings yet
Data Structures and Algorithms: CS245-2010S-13 Hash Tables
41 pages
Hash Tables
No ratings yet
Hash Tables
20 pages
Hashing
No ratings yet
Hashing
14 pages
DS Unit 6
No ratings yet
DS Unit 6
15 pages
ADS Unit 3
No ratings yet
ADS Unit 3
14 pages
Hashing
No ratings yet
Hashing
96 pages
Analysis of Algorithms CS 477/677: Hashing Instructor: George Bebis
No ratings yet
Analysis of Algorithms CS 477/677: Hashing Instructor: George Bebis
53 pages
Hashing PDF
No ratings yet
Hashing PDF
65 pages
Module-4 Dictionaries and Hash Tables
No ratings yet
Module-4 Dictionaries and Hash Tables
31 pages
Hashtable #Include #Include #Include #Include
No ratings yet
Hashtable #Include #Include #Include #Include
4 pages
23-Hashing
No ratings yet
23-Hashing
14 pages
Separate Chaining Hashing Technique
No ratings yet
Separate Chaining Hashing Technique
50 pages
DS UNIT-II
No ratings yet
DS UNIT-II
12 pages
Hash Tables
No ratings yet
Hash Tables
45 pages
Final Hashing
No ratings yet
Final Hashing
41 pages
Lecture 12
No ratings yet
Lecture 12
19 pages
DSAL Lab Manual
No ratings yet
DSAL Lab Manual
61 pages
hashing
No ratings yet
hashing
14 pages
9.map 1 HashTable
No ratings yet
9.map 1 HashTable
31 pages
Unit 1 Arrays
No ratings yet
Unit 1 Arrays
30 pages
Analysis of Algorithms CS 477/677: Hashing Instructor: George Bebis
No ratings yet
Analysis of Algorithms CS 477/677: Hashing Instructor: George Bebis
53 pages
Hash Table Data Structure
No ratings yet
Hash Table Data Structure
14 pages
Hashing
No ratings yet
Hashing
7 pages
Hashing
From Everand
Hashing
Prakash Hegade
No ratings yet
300+ Python Algorithms: Mastering the Art of Problem-Solving
From Everand
300+ Python Algorithms: Mastering the Art of Problem-Solving
Hernando Abella
5/5 (1)
50 most powerful Excel Functions and Formulas
From Everand
50 most powerful Excel Functions and Formulas
Andrei Besedin
4/5 (1)
Object Oriented Programming Lab 5
No ratings yet
Object Oriented Programming Lab 5
8 pages
Tic Tac Toe Project Report Professional
No ratings yet
Tic Tac Toe Project Report Professional
6 pages
Data Structure Unit (3 &4) Notes (SnapED)
No ratings yet
Data Structure Unit (3 &4) Notes (SnapED)
25 pages
Booth's Multiplication Algorithm
No ratings yet
Booth's Multiplication Algorithm
27 pages
Csharp Dotnet Interview Questions and Answers List
No ratings yet
Csharp Dotnet Interview Questions and Answers List
13 pages
CV Bektur Soltobaev
No ratings yet
CV Bektur Soltobaev
1 page
CS - 8TH BRIDGE COURSE
No ratings yet
CS - 8TH BRIDGE COURSE
4 pages
TM298 Final by ISA 2nd Edition
No ratings yet
TM298 Final by ISA 2nd Edition
78 pages
Operators and Expressions in Java
No ratings yet
Operators and Expressions in Java
2 pages
Python Basics
No ratings yet
Python Basics
68 pages
Aa
No ratings yet
Aa
15 pages
Python 3 - Functions
No ratings yet
Python 3 - Functions
9 pages
Theory + Practical: Fahad Hussain MCS, MSCS, Dae (Cit)
No ratings yet
Theory + Practical: Fahad Hussain MCS, MSCS, Dae (Cit)
33 pages
Day9 Polymorphism
No ratings yet
Day9 Polymorphism
34 pages
Unit 1 Introduction To Data Structures
No ratings yet
Unit 1 Introduction To Data Structures
31 pages
Full Windows PowerShell in Action 3rd Edition Bruce Payette PDF All Chapters
100% (4)
Full Windows PowerShell in Action 3rd Edition Bruce Payette PDF All Chapters
62 pages
HPC prac1T
No ratings yet
HPC prac1T
9 pages
unit 5 pcc
No ratings yet
unit 5 pcc
16 pages
x86 64bit Asm Chapter
No ratings yet
x86 64bit Asm Chapter
12 pages
ALGORITHMIC_THINKING_WITH_PYTHON_LAB_MANUAL[1] (1)
No ratings yet
ALGORITHMIC_THINKING_WITH_PYTHON_LAB_MANUAL[1] (1)
43 pages
Unit IV Introduction To Unix and Shell Programming
No ratings yet
Unit IV Introduction To Unix and Shell Programming
19 pages
Ebook
No ratings yet
Ebook
124 pages
檔案2
No ratings yet
檔案2
11 pages
R310A45-CS25S12057142-questionPaper (1)
No ratings yet
R310A45-CS25S12057142-questionPaper (1)
64 pages
Mobile Application Penetration Testing Cheat Sheet
No ratings yet
Mobile Application Penetration Testing Cheat Sheet
11 pages
Term2-Datesheet 23 Jan 2025-V2
No ratings yet
Term2-Datesheet 23 Jan 2025-V2
1 page
LASU-IDC:: 31-01-2024 09:24:02: Sandwich Programme Profile
No ratings yet
LASU-IDC:: 31-01-2024 09:24:02: Sandwich Programme Profile
1 page
Yapay Zeka - 7
No ratings yet
Yapay Zeka - 7
75 pages

Hash Tables: Professor Jennifer Rexford COS 217

Uploaded by

Hash Tables: Professor Jennifer Rexford COS 217

Uploaded by

Hash Tables

Professor Jennifer Rexford

• Implementing “mod” efficiently

• Wine inventory: (name, #bottles)

• Years when a war started: (year, war)

• Symbol table: (variable name, variable value)

• But, the key in a (key, value) pair might not be a number

• But, we’d need an extremely large array

• Managing the array

key key key key

for (p = head; p!=NULL; p=p->next) {

• But, testing might show that this is not efficient enough

• Ultimately, we need a better approach

struct Entry *hashtab[TABLESIZE];

• Function mapping each key to an array index

• Small enough to be memory efficient

• This is not so good:

• Here’s a reasonably good hash function

unsigned hash(char *x) {

Can be more clever than this for powers of two! 17

struct Nlist *lookup(char *s) {

for (p = hashtab[hash(s)]; p!=NULL; p=p->next)

struct Nlist *install(char *key, char *value) {

create and add new Entry (see next slide);

/* add to front of linked list */

• I want to preserve the integrity of the hash table

• Hash table owns key, because it is part of data structure 29

• Binary (base 2) representation of numbers

 E.g., 53 % 16 is 5, the last four bits of the number

 Would like an easy way to isolate the last four bits… 30

• Shift right (>>)

for (p = hashtab[val%1024]; p!=NULL; p=p->next)

You might also like

struct Nlist lookup(char s) {

struct Nlist install(char key, char *value) {