06 Hashtables
06 Hashtables
1 Data Structures
A DBMS uses various data structures for many different parts of the system internals:
• Internal Meta-Data: Keep track of information about the database and the system state.
• Core Data Storage: Can be used as the base storage for tuples in the database.
• Temporary Data Structures: The DBMS can build data structures on the fly while processing a
query to speed up execution (e.g., hash tables for joins).
• Table Indexes: Auxiliary data structures to make it easier to find specific tuples.
Design Decisions:
1. Data organization: How we layout memory and what information to store inside the data structure.
2. Concurrency: How to enable multiple threads to access the data structure without causing problems.
2 Hash Table
A hash table implements an associative array abstract data type that maps keys to values. It provides on
average O (1) operation complexity and O (n) storage complexity.
A hash table implementation is comprised of two parts:
• Hash Function: How to map a large key space into a smaller domain. This is used to compute an
index into an array of buckets or slots. Need to consider the trade-off between fast execution vs.
collision rate.
• Hashing Scheme: How to handle key collisions after hashing. Need to consider the trade-off between
the need to allocate a large hash table to reduce collisions vs. executing additional instructions to
find/insert keys.
3 Hash Functions
A hash function takes in any key as its input. It then return an integer representation of that key (i.e., the
“hash”). The function’s output is deterministic (i.e., the same key should always generate the same hash
output).
The DBMS does not want to use a cryptography hash function (e.g., SHA-256) because we do not need
to worry about protecting the contents of keys. These hash functions are primarily used internally by the
DBMS and thus information is not leaked outside of the system. For this lecture, we only care about the
hash function’s speed and collision rate.
The current state-of-the-art hash function is Facebook XXHash3.
Fall 2019– Lecture #06 Hash Tables
• If bucket is full, add another bucket to that chain. The hash table can grow infinitely because the
DBMS keeps adding new buckets.