0% found this document useful (0 votes)
19 views

Sets Maps and Hash Tables Review

The document discusses sets, maps, and hash tables. Sets are collections with no duplicate elements that support operations like union and intersection. Maps are collections of key-value pairs where keys must be unique. Hash tables use hash functions to map keys to buckets, and must handle collisions when different keys hash to the same bucket using techniques like separate chaining or open addressing.

Uploaded by

Jenesis Escobar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views

Sets Maps and Hash Tables Review

The document discusses sets, maps, and hash tables. Sets are collections with no duplicate elements that support operations like union and intersection. Maps are collections of key-value pairs where keys must be unique. Hash tables use hash functions to map keys to buckets, and must handle collisions when different keys hash to the same bucket using techniques like separate chaining or open addressing.

Uploaded by

Jenesis Escobar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Sets Maps and Hash Tables

REVIEW

Set – collection that contains NO duplicate elements


• Cannot access elements by index (cannot do set[index])

Operations:

A  B = A OR B A  B = A AND B A – B = in A but NOT in B A  B = A is a SUBSET of B

l
m

A B A B A B A  B

std::set<type> s; → Ordered std::unordered_set<type> s; → Not Ordered

Methods: Methods: Same as ordered sets + 2 new:

insert(element) – adds element bucket_count() -- # buckets


erase(element) – removes element load_factor() -- # elements / # buckets
find(element) – returns iterator to element if it is found, or
returns an iterator to std::end otherwise # buckets = 4
count(element) – returns 1 if element is found or 0 otherwise # elements = 3
size() – gives number of elements in set Load factor = 3/4 = 0.75
empty() – returns if set is empty or not
Hash table
Implemented as: BST Implemented as: Hash Table
Time complexity: O(log(n)) Time complexity: O(1) (+ O(k) for hash)

std::set<type> s; std::unordered_set<type> s;

s.insert(5); s.insert(5);
s.insert(2); s.insert(2);
s.insert(4); s.insert(4);
s.insert(11); s.insert(11);
s.insert(2); // wont add 2 again s.insert(2); // wont add 2 again

// printing set using iterator yields: // printing set using iterator yields:
2, 4, 5, 11 11, 4, 5, 2

s.erase(4); s.bucket_count() // say it = 7;


s.load_factor() // 4 /7 = 0.571429
// printing set using iterator yields:
2, 5, 11
Behind the scenes

Map – collection of (key, value) pairs where key is unique

Many-to-one relationship
(Onto Mapping)
Operations:

std::map<type key, type value> m; → Ordered std::unordered_map<type key, type value> m; → Not Ordered

Methods: Methods: Same as ordered maps + 2 new:

insert (key, value) – if key already exists in map, returns false


otherwise inserts new entry with key, value pair.
map[key] = value – if key already exists in map, overwrites bucket_count() – # buckets
with new value load_factor() - # elements / # buckets
erase(key) – deletes key in map
find(key) – searches for key in map, and returns iterator to it if
found; otherwise returns iterator to map::end()
count(key) – returns 1 if key is found in map or 0 otherwise
size() – gives number of elements in map
empty() – returns if map is empty or not

Implemented as: BST Implemented as: Hash Table Clearly faster than
Time complexity: Time complexity: ordered maps :D
insert = O(log(n)), insert average case = O(1)
[] = O(log(n)) [] average case = O(1)

map<char, int> table; unordered_map<char, int> table;

table[‘b’] = 30; table[‘b’] = 30;


table[‘a’] = 10; table[‘a’] = 10;
table[‘c’] = 50; table[‘c’] = 50;
table[‘a’] = 40; // overwrites previous value of ‘a’ table[‘a’] = 40; // overwrites previous value of ‘a’

//printing using an iterator will yield: //printing using an iterator will yield:

a : 40 c : 50
b : 30 Prints in order of keys b : 30 Does not print in any sort of order
c : 50 a : 40

Hash Table – uses a hash function to compute an index (a hash code) which maps to a “bucket” containing value

Hash function: string_length % table_size


key = “macaroons” Hash code = 2

Having a good hash function is critical for hash table efficiency… value = yummy

Good hash functions will:


- Evenly distribute data (therefore minimizing the potential for
data collisions)
- Be easy to compute, (and very fast)

Bad/Invalid hash functions will:


- Produce different outputs for the same input
- Take lots of time
- Result in high potential for data collisions
7
What is a data collision? 6
5
To understand data collisions, we first must understand what load_factor is. 4
3
Load_factor = # entries / # buckets 3
2 2
It is a way to describe how “full” the hash table is becoming… 1 1
0 0
If load_factor becomes “too large”, (table becomes too full)
we should dynamically resize the table, and rehash our values to Load Factor = 3 /4 = 0.75 Load Factor = 3 /8 = 0.375
reduce the load_factor → making our table more time efficient :)

Okay so what is a Collision then?


Example]

Hash table size = 4 buckets


Julia (length = 5)
Hash function = string_length % table_size
5%4=1
Initially, our load_factor = 0 entries / 4 buckets = 0

Then let’s say we insert the key “Julia” (length = 5) John (length = 4)
4%4=0
Load factor now becomes 1 entry / 4 buckets = 0.25
888-555-1111
Then we insert “John” (length = 4)

Load factor = 2 entries / 4 buckets = 0.50 222-333-4444

Now let’s say we try to insert “Mariannae” (length = 9) Collision!


Mariannae (length = 4)
“Mariannae” hashes to the same value that “Julia” does. 9%4=1
This is a data collision.

Collision Resolution policies:

1. Separate chaining: each bucket stores a linked list; collisions are simply appended to the end of the list

Julia (length = 5)
5%4=1

John (length = 4)
4%4=0
888-555-1111 777-666-5555

222-333-4444

Mariannae (length = 4)
9%4=1

2. Open Addressing (Linear Probing): each bucket stores only one entry; if you try to add and entry and there is a
collision, move the “problem” entry (one bucket at a time) to the next available free bucket and put it there

3. Open Addressing (Quadratic Probing): same as linear probing, except you move the “problem” entry 1 bucket, then by
4 buckets, then by 9 buckets then by 16 buckets, etc.

You might also like