Hash Tables: Professor Jennifer Rexford COS 217
Hash Tables: Professor Jennifer Rexford COS 217
1
Goals of Today’s Lecture
• Motivation for hash tables
Examples of (key, value) pairs
Limitations of using arrays and linked lists
• Hash tables
Hash table data structure
Hash functions
Example hashing code
2
Accessing Data By a Key
• Student grades: (name, grade)
E.g., (“john smith”, 84), (“jane doe”, 93), (“bill clinton”, 81)
Gradeof(“john smith”) returns 84
Gradeof(“joe schmoe”) returns NULL
1861
1939
4
Could Use an Array of (key, value)
• Alternative way to use an array
Array element i is a struct that stores key and value
0 1776 Revolutionary
1 1861 Civil
2 1939 WW2
• Problems
Allocating too little memory: run out of space
Allocating too much memory: wasteful of space
5
Linked List to Adapt Memory Size
• Each element is a struct struct Entry {
Key key int key;
Value value
Pointer to next element char* value;
next
struct Entry *next;
• Linked list };
Pointer to the first element in the list
Functions for adding and removing elements
Function for searching for an element with a particular key
head
key key key
value value value
next next next null
6
Adding Element to a List
• Add new element at front of list
Make ptr of new element point to the current first element
– new->next = head;
Make the head of the list point to the new element
– head = new;
new head
head p p
1776 1861 1939
value value value
next next next null 8
Locate and Remove an Element (1)
• Sequence through the list by key value
Keep track of the previous element in the list
prev = NULL;
for (p = head; p!=NULL; prev=p, p=p->next){
if (p->key == 1861) {
delete the element (see next slide!);
break;
}
pprev p
head
1776 1861 1939
value value value
next next next null 9
Locate and Remove an Element (2)
• Delete the element
Head element: make head point to the second element
Non-head element: make previous Entry point to next element
if (p == head)
head = head->next;
else
prev->next = p->next;
prev p
head
1776 1861 1939
value value value
next next next null 10
List is Not Good for (key, value)
• Good place to start
Simple algorithm and data structure
Good to allow early start on design and test of client code
11
Hash Table
• Fixed-size array where each element points to a linked list
0
TABLESIZE-1
1776 1861
0 Revolution Civil
1
2
3
4
1939
WW2
13
How Large an Array?
• Large enough that average “bucket” size is 1
Short buckets mean fast look-ups
Long buckets mean slow look-ups
• This is OK:
0
TABLESIZE-1
14
What Kind of Hash Function?
• Good at distributing elements across the array
Distribute results over the range 0, 1, …, TABLESIZE-1
Distribute results evenly to avoid very long buckets
TABLESIZE-1
15
Hashing String Keys to Integers
• Simple schemes don’t distribute the keys evenly enough
Number of characters, mod TABLESIZE
Sum the ASCII values of all characters, mod TABLESIZE
…
16
Implementing Hash Function
• Potentially expensive to compute ai for each value of i
Computing ai for each value of I
Instead, do (((x[0] * 65599 + x[1]) * 65599 + x[2]) * 65599 + x[3]) * …
0
1
2
3
4
5
6
18
Hash Table Example
Example: TABLESIZE = 7
Lookup (and enter, if not present) these strings: the, cat, in, the, hat
Hash table initially empty.
First word: “the”. hash(“the”) = 965156977. 965156977 % 7 = 1.
Search the linked list table[1] for the string “the”; not found
Now: table[1] = makelink(key, value, table[1])
0
the
1
2
3
4
5
6
19
Hash Table Example
Second word: “cat”. hash(“cat”) = 3895848756. 3895848756 % 7 = 2.
Search the linked list table[2] for the string “cat”; not found
Now: table[2] = makelink(key, value, table[2])
0
the
1
2
3
4
5
6
20
Hash Table Example
Third word: “in”. hash(“in”) = 6888005. 6888005% 7 = 5.
Search the linked list table[5] for the string “in”; not found
Now: table[5] = makelink(key, value, table[5])
0
the
1
2
3 cat
4
5
6
21
Hash Table Example
Fourth word: “the”. hash(“the”) = 965156977. 965156977 % 7 = 1.
Search the linked list table[1] for the string “the”; found it!
0
the
1
2
3 cat
4
in
5
6
22
Hash Table Example
Fourth word: “hat”. hash(“hat”) = 865559739. 865559739 % 7 = 2.
Search the linked list table[2] for the string “hat”; not found.
Now, insert “hat” into the linked list table[2].
At beginning or end? Doesn’t matter.
0
the
1
2
3 cat
4
in
5
6
23
Hash Table Example
Inserting at the front is easier, so add “hat” at the front
0
the
1
2
3 hat cat
4
in
5
6
24
Example Hash Table C Code
• Element in the hash table
struct Nlist {
char *key;
char *value;
struct Nlist *next;
};
• Hash table
struct Nlist *hashtab[1024];
• Three functions
Hash function: unsigned hash(char *x)
Look up with key: struct Nlist *lookup(char *s)
Install entry: struct Nlist *install(char *key, *value)
25
Lookup Function
• Lookup based on key
Key is a string *s
Return pointer to matching hash-table element
… or return NULL if no match is found
p = malloc(sizeof(*p));
assert(p != NULL);
p->key = malloc(strlen(key) + 1);
assert(p->key != NULL);
strcpy(p->key, key);
5 0 0 0 0 0 1 0 1
31
Bitwise Operators in C (Continued)
• Shift left (<<)
Shift some # of bits to the left, filling the blanks with 0
E.g., n << 2 shifts left by 2 bits
– If n is 1012 (i.e., 510), then n<<2 is 101002 (ie., 2010)
Multiplication by powers of two on the cheap!
• Speeding up compare
For any non-trivial value comparison function
Trick: store full hash result in structure
struct Nlist *lookup(char *s) {
struct Nlist *p;
int val = hash(s); /* no % in hash function */
• Hash tables
Invaluable for storing (key, value) pairs
Very efficient lookups
– If the hash function is good and the table size is large enough
• Bit-wise operators in C
AND (&) and OR (|) – note: they are different from && and ||
One’s complement (~) to flip all bits
Left shift (<<) and right shift (>>) by some number of bits
34