0% found this document useful (0 votes)
18 views71 pages

11-Searching and Hashing Final

The document compares linear search and binary search algorithms, highlighting their efficiency and implementation. Linear search is optimal for unsorted arrays but slower, while binary search is faster on sorted arrays, operating in logarithmic time. Additionally, it discusses the use of hash tables and direct access tables for efficient record storage and retrieval, emphasizing the trade-off between speed and memory usage.

Uploaded by

Santosh Deshmukh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views71 pages

11-Searching and Hashing Final

The document compares linear search and binary search algorithms, highlighting their efficiency and implementation. Linear search is optimal for unsorted arrays but slower, while binary search is faster on sorted arrays, operating in logarithmic time. Additionally, it discusses the use of hash tables and direct access tables for efficient record storage and retrieval, emphasizing the trade-off between speed and memory usage.

Uploaded by

Santosh Deshmukh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 71

Linear Search

vs
Binary Search

Kumkum Saxena
Linear Search
◼ Your code should look something like this:

int search(int array[], int len, int value) {

int i;
for (i=0; i<len; i++) {
if (array[i] == value)
return 1;
}
return 0;
}

Kumkum Saxena Searching and Hashing page 2


Linear Search
◼ Analyze code:
◼ Clearly, if the array is unsorted, this algorithm is
optimal
▪ They ONLY way to be sure that a value isn’t in the array is
to look at every single spot of the array
▪ Just like you can’t be sure that you DON’T have some
piece of paper or form unless you look through ALL of your
pieces of paper

◼ But we ask a question:


◼ Could we find an item in an array faster if it were
already sorted?

Kumkum Saxena Searching and Hashing page 3


Binary Search
◼ Number Guessing Game from childhood
◼ Remember the game you most likely played as
a child
◼ I have a secret number between 1 and 100.
◼ Make a guess and I’ll tell you whether your guess is
too high or too low.
◼ Then you guess again. The process continues until
you guess the correct number.
◼ Your job is to MINIMIZE the number of guesses you
make.

Kumkum Saxena Searching and Hashing page 4


Binary Search
◼ Number Guessing Game from childhood
◼ What is the first guess of most people?
◼ 50.
◼ Why?
◼ No matter the response (too high or too low), the most
number of possible values for your remaining search
is 50 (either from 1-49 or 51-100)
◼ Any other first guess results in the risk that the
possible remaining values is greater than 50.
▪ Example: you guess 75
▪ I respond: too high
▪ So now you have to guess between 1 and 74
▪ 74 values to guess from instead of 50
Kumkum Saxena Searching and Hashing page 5
Binary Search
◼ Number Guessing Game from childhood
◼ Basic Winning Strategy
◼ Always guess the number that is halfway between the
lowest possible value in your search range and the
highest possible value in your search range

◼ Can we now adapt this idea to work for


searching for a given value in an array?

Kumkum Saxena Searching and Hashing page 6


Binary Search
◼ Array Search
◼ We are given the following sorted array:
index 0 1 2 3 4 5 6 7 8
value 2 6 19 27 33 37 38 41 118

◼ We are searching for the value, 19


◼ So where is halfway between?
◼ One guess would be to look at 2 and 118 and take
their average (60).
◼ But 60 isn’t even in the list
◼ And if we look at the number closest to 60
▪ It is almost at the end of the array

Kumkum Saxena Searching and Hashing page 7


Binary Search
◼ Array Search
◼ We quickly realize that if we want to adapt the
number guessing game strategy to searching an
array, we MUST search in the middle INDEX of
the array.
◼ In this case:
◼ The lowest index is 0
◼ The highest index is 8
◼ So the middle index is 4

Kumkum Saxena Searching and Hashing page 8


Binary Search
◼ Array Search
◼ Correct Strategy
◼ We would ask, “is the number I am searching for, 19,
greater or less than the number stored in index 4?
▪ Index 4 stores 33
◼ The answer would be “less than”
◼ So we would modify our search range to in between
index 0 and index 3
▪ Note that index 4 is no longer in the search space
◼ We then continue this process
▪ The second index we’d look at is index 1, since (0+3)/2=1
▪ Then we’d finally get to index 2, since (2+3)/2 = 2
▪ And at index 2, we would find the value, 19, in the array
Kumkum Saxena Searching and Hashing page 9
Binary Search
◼ Binary Search code:
int binsearch(int a[], int len, int value) {

int low = 0, high = len-1;


while (low <= high) {
int mid = (low+high)/2;
if (value < a[mid])
high = mid-1;
else if (value > a[mid])
low = mid+1;
else
return 1;
}

return 0;
}
Kumkum Saxena Searching and Hashing page 10
Binary Search
◼ Binary Search code:

◼ At the end of each array iteration, all we do is


update either low or high
◼ This modifies our search region
◼ Essentially halving it

Kumkum Saxena Searching and Hashing page 11


Binary Search
◼ Efficiency of Binary Search
◼ Analysis:
◼ Let’s analyze how many comparisons (guesses) are
necessary when running this algorithm on an array of
n items
First, let’s try n = 100
▪ After 1 guess, we have 50 items left,
▪ After 2 guesses, we have 25 items left,
▪ After 3 guesses, we have 12 items left,
▪ After 4 guesses, we have 6 items left,
▪ After 5 guesses, we have 3 items left,
▪ After 6 guesses, we have 1 item left
▪ After 7 guesses, we have 0 items left.
Kumkum Saxena Searching and Hashing page 12
Binary Search
◼ Efficiency of Binary Search
◼ Analysis:
◼ Notes:
▪ The reason for the last iteration is because the number of
items left represent the number of other possible values to
search
▪ We need to reduce this to 0.
▪ Also, when n is odd, such as when n=25
▪ We search the middle element, # 13
▪ There are 12 elements smaller than 13
▪ And 12 elements bigger than 13
▪ This is why the number of items is slightly less than ½ in
those cases

Kumkum Saxena Searching and Hashing page 13


Binary Search
◼ Efficiency of Binary Search
◼ Analysis:
◼ General case:

◼ After 1 guess, we have n/2 items left


◼ After 2 guesses, we have n/4 items left
◼ After 3 guesses, we have n/8 items left
◼ After 4 guesses, we have n/16 items left
◼ …
◼ After k guesses, we have n/2k items left

Kumkum Saxena Searching and Hashing page 14


Binary Search
◼ Efficiency of Binary Search
◼ Analysis:
◼ General case:
◼ So, after k guesses, we have n/2k items left
◼ The question is:
▪ How many k guesses do we need to make in order to find
our answer?
▪ Or until we have one and only one guess left to make?
◼ So we want to get only 1 item left
◼ If we can find the value that makes the above fraction
equal to 1, then we know that in one more guess, we’ll
narrow down the item

Kumkum Saxena Searching and Hashing page 15


Binary Search
◼ Efficiency of Binary Search
◼ Analysis:
◼ General case:
◼ So, after k guesses, we have n/2k items left
▪ Again, we want only 1 item left
▪ So set this equal to 1 and solve for k
n
k
=1 n=2 k
k = log 2 n
2
◼ This means that a binary search roughly takes log2n
comparisons when searching in a sorted array of n
items
Kumkum Saxena Searching and Hashing page 16
Binary Search
◼ Efficiency of Binary Search
◼ Analysis:
◼ Runs in logarithmic (log n) time
◼ This is MUCH faster than searching linearly
◼ Consider the following chart:
n log n
8 3
1024 10
65536 16
1048576 20
33554432 25
1073741824 30

◼ Basically, any log n algorithm is SUPER FAST.


Kumkum Saxena Searching and Hashing page 17
Hash Tables
& Hashing

Kumkum Saxena
Terminology
◼ Table
◼ An abstract data type that stores & retrieves
records according to their search key values

◼ Record
◼ Each individual row in the table
◼ Example:
◼ A database of student records
◼ So each record will have a pid, first name, last name,
SSN, address, phone, email, etc.

Kumkum Saxena Searching and Hashing page 19


Record Example
sid (key) name score
0012345 andy 81.5
This is an 0033333 betty 90
example of a 0056789 david 56.8
table.

Each individual ...


row is a record.
9903030 tom 73
9908080 bill 49

...

Consider this problem. We want to store 1,000


student records and search them by student id.

Kumkum Saxena Searching and Hashing page 20


Motivation
◼ Problem:
◼ Given this table of records
◼ We need to be able to:
◼ Add new records
◼ Delete records
◼ Search for records

◼ What’s the most efficient way of doing this?

Kumkum Saxena Searching and Hashing page 21


Motivation
◼ Problem:
◼ What’s the most efficient way of doing this?
◼ Use an array to store the records, in unsorted order
◼ Running time:
▪ Adding a record:
▪ O(1) since we simply add at the end of the unsorted array
▪ Deleting a record:
▪ Very slow, or O(n), since we have to search through the entire
array to find the desired record to delete
▪ We then have a “hole” in the array.
▪ We can quickly fill that hole by moving the last element into it,
which can happen in O(1) time.
▪ Search for a record:
▪ Very slow, or O(n), since we search through the entire table

Kumkum Saxena Searching and Hashing page 22


Motivation
◼ Problem:
◼ What’s the most efficient way of doing this?
◼ Use an array to store the records, in sorted order
◼ Running time:
▪ Adding a record:
▪ Must insert at correct position
▪ And then ALL other records, after insertion spot, must be moved
▪ Very slow, or O(n)
▪ Deleting a record:
▪ Must find the record to delete, O(n)
▪ Must fill the “hole”, which means moving all other items, O(n)
▪ Search for a record:
▪ Binary search!
▪ Fast, or O(logn)
Kumkum Saxena Searching and Hashing page 23
Motivation
◼ Problem:
◼ What’s the most efficient way of doing this?
◼ Use a binary search tree to store the records
◼ Running time:
▪ Adding a record:
▪ Inserting into proper position in BST
▪ Fast, or O(logn)
▪ Deleting a record:
▪ Must find correct position to delete
▪ Fast, or O(logn)
▪ Search for a record:
▪ Also Fast, or O(logn)

Kumkum Saxena Searching and Hashing page 24


Motivation
◼ Problem:
◼ What’s the most efficient way of doing this?
◼ Use a binary search tree to store the records
◼ BSTs seem to be the best solution to this
◼ But there’s something that is WAAAAAY faster
▪ Adding, Deleting, and Searching are all O(1): CONSTANT time
◼ A very simple, naive solution that you could come up with
before even taking this class
◼ Just use an array! But a special type of an array.
◼ Specially, use an array that is SOOOOO large that every
record has its own, exclusive cell in the array
◼ Often called a Direct Access Table
Kumkum Saxena Searching and Hashing page 25
Direct Access Table
name score
0 Assume we stored records
: based on a social security #.
: :
123456789 andy 81.5 One way is to store the records
: : : in a huge array
334561894 betty 90 index 0..999999999
: : :
589224751 david 56.8 The index into array is simply
: : : an individuals SSN.
: : :
990847852 bill 49 So this is VERY FAST
: : :
999999999
Adding, Deleting, and
Searching: O(1)

Kumkum Saxena Searching and Hashing page 26


Motivation
◼ Problem:
◼ What’s the most efficient way of doing this?
◼ Use a Direct Access Table
◼ So a Direct Access Table is WAAAAAY fast
◼ But what is the obvious, HUUUGE problem???
◼ Let’s say we want to store 1000 students based on SSN
◼ SSN is 9 digits
▪ Assume the largest SSN is 999-99-9999
◼ So we need an array that is 1 BILLION in size
◼ So, yeah, this direct access table is O(1) in speed
◼ But it is O(stupid) in size and memory
▪ HUGE overkill to have an array of 1 billion to store 1000 records
Kumkum Saxena Searching and Hashing page 27
Motivation
◼ We need a better solution!
◼ We want constant add/delete/search time
◼ And a reasonably sized array
◼ What we ideally want:
◼ Let’s say we want to store 1000 students
◼ So ideally, we only want an array of size 1000
▪ So we don’t waste space
◼ But we still want the “direct access” that results in O(1)
lookup time
◼ How can we do this?
▪ Remembering that it was the SIZE of the array that allowed for
direct access in the first place

Kumkum Saxena Searching and Hashing page 28


Motivation
◼ What we ideally want:
◼ This array is size 1000 0
◼ And we will place students into : : :
this array based on their SSN. 150 842-33-5821 Andy
◼ So we need a way of mapping : : :
368 527-44-7521 Betty
a SSN to an index : : :
◼ Example: 527 452-85-6829 David
◼ We want SSN: 527-44-7521 to
: : :
somehow refer to index 368.
: : :
884 651-54-3218 Bill
◼ If we can do that, then we : : :
accomplish our goal 999

Kumkum Saxena Searching and Hashing page 29


Magic Address Calculator
◼ Solution:
◼ Let’s build a make-believe function:
◼ the “magic address calculator”
◼ The input to this function is the “key” (ie. SSN)
◼ The function converts this SSN into an index into the
reasonably sized array
◼ Ideally, each SSN will “map” into its own index in the array
◼ So this is still in constant time!
◼ Assuming the “magic address calculator” does the
conversion in constant time …which it does!
◼ And we are using a reasonably sized array!
◼ This is the concept of a hash table.
Kumkum Saxena Searching and Hashing page 30
Terminology
◼ Hash table
◼ An array of table items, where the index is
calculated by a hash function
◼ Searching in a hash table:
◼ Let’s say you are searching for a record with key 4256
◼ To find an item in a hash table, you do NOT follow the
standard protocol of searching the entire table, record by
record, comparing the key you are looking for to the key
in each record.
◼ Rather, we use a hash function on the search key to
quickly calculate the index of the item
▪ The hash function converts the key into the correct index into
the table
Kumkum Saxena Searching and Hashing page 31
Terminology
◼ Hash function
◼ A mathematical calculation that maps the search
key to an index in a hash table
◼ Should be fast to calculate
▪ Time for calculation should be O(1)
◼ Should distribute items evenly

◼ Hashing
◼ A way to access a table (array) in relatively
constant (quick) time
◼ Uses a hash function & collision resolution scheme
Kumkum Saxena Searching and Hashing page 32
Hash Example
◼ UCF System for storing student records
◼ Could store everyone’s records with name,
address, and telephone number using SSN as the
search key
◼ Could use entire SSN, but wastes too much space
▪ Again, SSN’s have 9 digits…that’s 1 BILLION different #’s to
account for
▪ But UCF has only 50,000 students...so in an array of size 1
BILLION, only 50,000 spots will be used
▪ EPIC WASTE!
▪ On a side note, there will be no “collisions”
▪ Each record will have its own, personal spot in the array based
on its key (phone number)

Kumkum Saxena Searching and Hashing page 33


Hash Example
◼ UCF System for storing student records
◼ Could store everyone’s records with name,
address, and telephone number using SSN as the
search key
◼ Better to use last five digits of SSN number
◼ For example, instead of using HashTable [589475127] to
access that record, use HashTable[75127]
◼ Now you need an array of size 100,000
▪ Since we are using 5 digits
▪ The array can go from index 0 to index 99999
◼ So this is still twice the # of UCF students
◼ BUT, much better than an array of size 1 BILLION

Kumkum Saxena Searching and Hashing page 34


Hash Example
◼ UCF System for storing student records
◼ Could store everyone’s records with name,
address, and telephone number using SSN as the
search key
◼ Better to use last five digits of SSN number
◼ However, there is a chance of collisions
▪ SSN # 589475127 and SSN # 428475127 have the same last
five digits
▪ So they will end up “mapping” to the same index in the array
▪ This is called a “collision”
▪ That is CLEARLY a problem.
▪ Can’t store two items in one index of the array
▪ So, we will need to know how to handle collisions
▪ Will discuss in a bit
Kumkum Saxena Searching and Hashing page 35
Hash Function
◼ A hash function is written h(x)=i
◼ h is the name of the hash function
◼ x is the record search key
◼ Such as the SSN in our example
◼ i is the output of the hash function
◼ which refers to an index in they array (hash table)
◼ Let’s say we are trying to add to a hash table
◼ Once i is calculated, we can then add the record at
HashTable[i]

Kumkum Saxena Searching and Hashing page 36


Hash Function
◼ A hash function is written h(x)=i
◼ In the UCF student example,
h(589475127)=75127
◼ So now we can take the record (name, address,
phone, etc.) of the student with SSN 589475127
◼ and we can store that record at HashTable[75127]
◼ So this mock UCF hash function simple takes a
phone number and keeps the last five digits
◼ Hash functions can be as easy or as difficult as you
want

Kumkum Saxena Searching and Hashing page 37


Example Hash Functions
◼ Three simple hash functions for integers
1. Selecting digits
2. Folding
3. Modulo arithmetic
◼ Again, these are just examples!
◼ Remember the goal here
◼ Given some key (ie. SSN, student ID, phone #, etc)
◼ We want to make an “smaller” version of that key
▪ Because when a key is smaller, that means the size of the
array needed can also be smaller
◼ Use this new key to index the record

Kumkum Saxena Searching and Hashing page 38


3 Simple Hash Functions
◼ Selecting digits hash function
◼ Instead of using the whole integer, only select
several digits
◼ For example, if you have the SS#123-45-6789, just
use the first 3 digits
◼ h(123456789)=123
◼ This is like the example we already did
◼ Fast & easy to calculate, but usually does not
distribute randomly
◼ The first three numbers of a social security number
are based on location, so people of the same state
usually have the same SS#

Kumkum Saxena Searching and Hashing page 39


3 Simple Hash Functions
◼ Folding hash function
◼ Add the digits of the integer together
◼ For example, if you have the SS#123-45-6789, add all
the digits together
◼ h(123456789)=1+2+3+4+5+6+7+8+9=45 with hash
table index range 0 < h(search key) < 81
◼ Can add in different ways for hash tables of
different sizes
◼ h(123456789)=123+456+789=1368 with hash table
index range 0 < h(search key) < 2997

Kumkum Saxena Searching and Hashing page 40


3 Simple Hash Functions
◼ Modulo arithmetic hash function
◼ Using modulus as a hash function
◼ h(x) = x mod tableSize
◼ Using a prime number as tableSize reduces
collisions
◼ For tableSize = 31,
h(123456789) = 123456789 mod 31 = 2
with hash table index range 0 < h(search key) < 30

Kumkum Saxena Searching and Hashing page 41


Hash Functions
◼ Hash functions only need to be designed to
operate on integers
◼ Although objects such as strings can be used as a
search key, they can be easily converted into an
integer value
◼ Then apply hash function to the integer value

Kumkum Saxena Searching and Hashing page 42


Convert String to Integer
◼ Ways to convert a string to an integer
1. Assign A to Z the numbers 0 to 25, and add the
integers together
2. Use the ASCII or Unicode integer value for each
character, and add the integers together
3. Use the binary number for the ASCII or Unicode
integer value for each character, and
concatenate the binary numbers together

Kumkum Saxena Searching and Hashing page 43


Convert String to Integer
◼ Examples of converting a string to an integer
1. “ABC” would be 0 + 1 + 2 = 3
2. “ABC” would be 65 + 66 + 67 = 198
3. “ABC” would be 01000001 + 01000010 +
01000011 = 010000010100001001000011 =
4,276,803

Kumkum Saxena Searching and Hashing page 44


Terminology
◼ Perfect hash function
◼ Ideal situation where hash function maps each
search key into a different location in the hash
table
◼ Telephone numbers would all map to different indexes
◼ Collision
◼ When a hash function maps two or more search
keys into the same location in the hash table
◼ h(key1) = h(key2), so have the same index value

Kumkum Saxena Searching and Hashing page 45


Example Collision
◼ Need to store the student records of ICS 211
students based on student ID
◼ Student ID has 8 digits, so need array of size
100,000,000
◼ This is a waste of space, so instead use an
array of size 31, with hash function h(x) = x mod
31
◼ h(12345678)=h(26508090)=21 is an example of
a collision
◼ Both should be stored at table[21]

Kumkum Saxena Searching and Hashing page 46


Collision Resolution
◼ In case of a collision, a collision resolution
scheme must be implemented
◼ Assigns the search keys with the same hash
function to different locations in the hash table
◼ Whenever possible, items should be placed evenly in the
hash table in order to avoid these collisions
◼ Or we use another method called Bucket Hashing
or Separate Chaining

Kumkum Saxena Searching and Hashing page 47


Resolving Collisions
◼ Two main approaches to collision resolution
1. Open addressing
2. Restructure the hash table
❖ Bucket Hashing
❖ Separate Chaining

Kumkum Saxena Searching and Hashing page 48


Open Addressing
◼ Open addressing
◼ Probe (search) for open locations in the hash
table
◼ Probe sequence
◼ The sequence of locations that are examined
for a possible open location to put the next
item

Kumkum Saxena Searching and Hashing page 49


Open Addressing
◼ Three types of probing
1. Linear probing
2. Quadratic probing
3. Double hashing

Kumkum Saxena Searching and Hashing page 50


Open Addressing
◼ Linear probing
◼ In the case of a collision, keep going to the
next hash table location until find an open
location
◼ In other words, if table[i] is occupied, check
table[i+1], table[i+2], table[i+3], …
◼ Need 3 states for each hash table location:
empty, occupied, deleted
◼ Common problem
◼ Items tend to cluster together in the hash table

Kumkum Saxena Searching and Hashing page 51


Open Addressing
◼ Linear probing example
◼ Table size = 31
◼ Hash function = key mod 31
◼ h(1234) = 25 table[25] = 1234
◼ h(4055) = 25+1 table[26] = 4055
◼ h(3962) = 25+2 table[27] = 3962
◼ h(5853) = 25+3 table[28] = 5853
◼ h(1766) = 30 table[30] = 1766
◼ h(1270) = 30+1 table[0] = 1270 (wraps around)
◼ All other table entries are empty

Kumkum Saxena Searching and Hashing page 52


Open Addressing
◼ Empty, occupied, & deleted states
◼ Assume we delete record #3962
◼ This state must be changed to occupied (not
empty), so we can still locate record #5853
◼ h(1234) = 25 table[25] = 1234
◼ h(4055) = 25 table[26] = 4055
◼ delete(3962) table[27] = “deleted”
◼ h(5853) = 25 table[28] = 5853
◼ no record added table[29] = “empty”
◼ h(1766) = 30 table[30] = 1766
◼ h(1270) = 30 table[0] = 1270 (wraps around)

Kumkum Saxena Searching and Hashing page 53


Open Addressing
Insert:
38
0 19
1 8
109
2 10
3
4 ◼ Linear Probing:
after checking spot
5
h(k), try spot h(k)+1,
6 if that is full, try
7 h(k)+2, then h(k)+3,
etc.
8
9 54
Kumkum Saxena Searching and Hashing page 54
Linear Probing – Clustering

no collision
collision in small cluster
no collision

collision in large cluster

[R. Sedgewick]

55
Kumkum Saxena Searching and Hashing page 55
Open Addressing
◼ Quadratic probing
◼ Instead of checking the next location
sequentially, check the next location based on
a sequence of squares
◼ In other words, if table[i] is occupied, check
table[i+12], table[i+22], table[i+32], …
◼ Still have clustering (called “secondary clustering”),
but this method is not as problematic as linear
probing

Kumkum Saxena Searching and Hashing page 56


Open Addressing
◼ Quadratic probing example
◼ Table size = 31
◼ Hash function = key mod 31
◼ h(1234) = 25 table[25] = 1234
◼ h(4055) = 25+12 table[26] = 4055
◼ h(3962) = 25+22 table[29] = 3962
◼ h(5853) = 25+32 table[3] = 5853 (wraps around)
◼ h(1766) = 30 table[30] = 1766
◼ h(1270) = 30+12 table[0] = 1270 (wraps around)
◼ All other table entries are empty

Kumkum Saxena Searching and Hashing page 57


Quadratic Probing
0 Insert:
1 89
18
2 49
3 58
79
4
5
6
7
8
9 58
Kumkum Saxena Searching and Hashing page 58
Open Addressing
◼ Double hashing
◼ Use two hash functions, where second hash
function determines the step size to next hash
table index
◼ Some restrictions
◼ h2(searchKey) != 0 (step size should not be zero)
◼ h2 != h1 (avoids clustering)

Kumkum Saxena Searching and Hashing page 59


Quadratic Probing:
Success guarantee for  < ½
◼ If size is prime and  < ½, then quadratic probing
will find an empty slot in size/2 probes or fewer.
◼ show for all 0  i,j  size/2 and i  j
(h(x) + i2) mod size  (h(x) + j2) mod size
◼ by contradiction: suppose that for some i  j:
(h(x) + i2) mod size = (h(x) + j2) mod size
 i2 mod size = j2 mod size
 (i2 - j2) mod size = 0
 [(i + j)(i - j)] mod size = 0

Because size is prime(i-j)or (i+j) must be zero, and neither can


be

60
Kumkum Saxena Searching and Hashing page 60
Quadratic Probing: Properties
◼ For any  < ½, quadratic probing will find an empty
slot; for bigger , quadratic probing may find a slot

◼ Quadratic probing does not suffer from primary


clustering: keys hashing to the same area are not
bad

◼ But what about keys that hash to the same spot?


◼ Secondary Clustering!

61
Kumkum Saxena Searching and Hashing page 61
Open Addressing
◼ Double hashing example
◼ Table size = 31
◼ Hash function #1 = key mod 31
◼ Hash function #2 = 23 – (key mod 23)
◼ h1(1234) = 25 table[25] = 1234
◼ h1(4055) = 25, h2(4055) = 16 (+25),table[10] = 4055
◼ h1(3962) = 25, h2(3962) = 17 (+25), table[11] = 3962
◼ h1(5853) = 25, h2(5853) = 12 (+25), table[6] = 5853
◼ h1(1766) = 30 table[30] = 1766
◼ h1(1270) = 30, h2(1270) = 18 (+30), table[17] = 1270
◼ All other table entries are empty

Kumkum Saxena Searching and Hashing page 62


Open Addressing
◼ Double hashing example
◼ h1(key) = key mod 13
◼ h2(key) = 11 – (key mod 11)
◼ If key = 30, probe sequence would be 4, 7, 10, 0, 3,
6, 9, 12, 2, 5, 8, 11, 1 (step 3 each time)
◼ If key = 50, probe sequence would be 11, 3, 8, 0, 5,
10, 2, 7, 12, 4, 9, 1, 6 (step 5 each time)

Kumkum Saxena Searching and Hashing page 63


Resolving Collisions with Double Hashing
0 Hash Functions:
H(K) = K mod M
1
H2(K) = 1 + ((K/M) mod (M-1))
2 M=
3
4 Insert these values into the hash table
in this order. Resolve any collisions
5 with double hashing:
6 13
7 28
33
8
147
9 43
64
Kumkum Saxena Searching and Hashing page 64
Open Addressing
◼ If table size is prime, then probe sequence
will visit all table locations
◼ With open addressing, increasing table size
will reduce collisions
◼ When increasing the size, the hash function
needs to be reapplied to every item in the old
hash table to place it in the new hash table

Kumkum Saxena Searching and Hashing page 65


Restructuring the Hash Table
◼ How is a hash table restructured for
collision resolution?
◼ The structure of the hash table is changed so
that the same index location can store multiple
items
◼ Two ways to restructure a hash table for
collision resolution
1. Bucket hashing
2. Separate chaining

Kumkum Saxena Searching and Hashing page 66


Restructuring the Hash Table
◼ Bucket hashing
◼ A hash table that has an array at each location
table[i], so that items of the same hash index
are stored here
◼ Choosing the size of the bucket is problematic
◼ If too small, will have collisions
◼ If too big, will waste space

Kumkum Saxena Searching and Hashing page 67


Restructuring the Hash Table
◼ Bucket hashing example
◼ Table size = 31
◼ Hash function = key mod 31
◼ h(1234) = 25 table[25][0] = 1234
◼ h(4055) = 25 table[25][1] = 4055
◼ h(3962) = 25 table[25][2] = 3962
◼ h(5853) = 25 table[25][3] = 5853
◼ h(1766) = 30 table[30][0] = 1766
◼ h(1270) = 30 table[30][1] = 1270
◼ All other table entries are empty

Kumkum Saxena Searching and Hashing page 68


Restructuring the Hash Table
◼ Separate chaining
◼ A hash table that has linked list (a chain) at
each location table[i], so that items of the
same hash index are stored here
◼ Size of the table is dynamic
◼ Less problematic than static bucket implementation

Kumkum Saxena Searching and Hashing page 69


Restructuring the Hash Table
◼ Separate chaining example
◼ Table size = 31
◼ Hash function = key mod 31
◼ h(1234) = 25, table[25]=>1234
◼ h(4055) = 25, table[25]=>4055=>1234
◼ h(3962) = 25, table[25]=>3962=>4055=>1234
◼ h(5853) = 25, table[25]=>5853=>3962=>4055=>1234
◼ h(1766) = 30, table[30]=>1766
◼ h(1270) = 30, table[30]=>1270=>1766

Kumkum Saxena Searching and Hashing page 70


Hash Tables
◼ Summary:
◼ We use a hash table to accomplish O(1) access
time into a table
◼ While keeping the table to a reasonable size
◼ Use a hash function to map the record “keys” into an
index in the hash table
◼ Collisions are bound to happen and are taken care of
using several possible methods
◼ Comparison of Implementations (slowest to
quickest)
◼ Linear probing, quadratic probing, double hashing,
separate chaining
Kumkum Saxena Searching and Hashing page 71

You might also like