0% found this document useful (0 votes)
111 views14 pages

FCE Review (Quiz3)

The document summarizes several computer science data structures and algorithms. It describes heaps, including their definition as a complete binary tree that satisfies the max-heap or min-heap property. It also discusses binary search trees, dynamic programming through the longest common subsequence problem, algorithms for finding medians and order statistics, hashing techniques like hash tables and functions, and the heapsort sorting algorithm.

Uploaded by

Yifan Bao
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
111 views14 pages

FCE Review (Quiz3)

The document summarizes several computer science data structures and algorithms. It describes heaps, including their definition as a complete binary tree that satisfies the max-heap or min-heap property. It also discusses binary search trees, dynamic programming through the longest common subsequence problem, algorithms for finding medians and order statistics, hashing techniques like hash tables and functions, and the heapsort sorting algorithm.

Uploaded by

Yifan Bao
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Fundamentals of Computer Engineering

Zhengnan Li

October 26, 2019

Contents
1 Heap (Data Structure) 2
1.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.4 Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4.1 Max-Heapify . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4.2 Build a heap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4.3 Heap Sort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.5 Current Implementations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 Binary Search Tree (Data Structure) 6


2.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.3 Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.4 Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.4.1 Insertion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

3 Dynamic Programming 9
3.1 Elements of Dynamic Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.2 The Longest Common Subsequence Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.2.1 Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.2.2 Recursive, top-down . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.2.3 Memoization, bottom-up . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

4 Medians and Order Statistics 11


4.1 Quadratic Worst-case Running Time Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4.2 Linear Worst-case Running Time Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

5 Hashing 11
5.1 Direct-address tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
5.2 Hash tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
5.3 Hash table with chaining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
5.4 Hash functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
5.5 Open addressing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
5.6 Universal hashing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
5.7 Perfect hashing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

1
1 Chapter
Heap (Data Structure)
6 Heapsort

1
16
2 3
14 10
4 5 6 7 1 2 3 4 5 6 7 8 9 10

8 7 9 3 16 14 10 8 7 9 3 2 4 1
8 9 10
2 4 1
(a) (b)
Figure 1: A max-heap viewed as (a) a binary tree and (b) an array.

1.1Figure 6.1 A max-heap viewed as (a) a binary tree and (b) an array. The number within the circle
Definition
at each
A heap is a node in thetree-based
specialized tree is thedata
value storedwhich
structure at thatis node. Thean
essentially number
almost above a node
complete is the
tree that corresponding
satisfies the
heapindex in the
property: in array. Above
a max heap, forand
any below the C,
given node array
if P are
is a lines
parentshowing parent-child
node of C, then the keyrelationships;
(the value) of Pparents
is greater than ortoequal
are always to the
the left of key of children.
their C. In a minTheheap, thehas
tree keyheight
of P is three;
less than
theornode
equal at
to index
the key4 of C. The
(with value 8)
nodehas
at height
the "top"
one.of the heap (with no parents) is called the root node.

1.2 Properties
Root of a .i/
•PARENT tree – i = 1

•1 Parent
return bi=2c
of node i – di/2e

• Left (child) of node i – 2i


L EFT .i/
• Right (child) of node i – 2i + 1

•1 Max
return 2i – for every node i other than the root node, the value of i node is larger than (smaller
(min)-heap
than) or equal to its children.
R IGHT .i/
1.3 Operations
1 return 2i C 1
• Basic

find-max (or find-min): find a maximum item of a max-heap, or a minimum item of a min-heap,
On –most computers, the L EFT procedure can compute 2i in one instruction by
respectively (a.k.a. peek)
simply shifting
– insert: addingthe
a newbinary
key to representation of i left by one bit position. Similarly, the
the heap (a.k.a., push)
R IGHT procedure
– extract-max (orcan quickly compute
extract-min): 2i Cof1maximum
returns the node by shifting the abinary
value from representation
max heap (or mini-
mum value
of i left by onefrom abit
minposition
heap) after and
removing
thenit from the heap
adding in (a.k.a.,
a 1 as pop)
the low-order bit. The
– delete-max (or delete-min): removing the root node of a max heap (or min heap), respectively
PARENT procedure can compute bi=2c by shifting i right one bit position. Good
– replace: pop root and push a new key. More efficient than pop followed by push, since only need
implementations
to balance once, of
not heapsort often implement
twice, and appropriate for fixed-sizethese
heaps. procedures as “macros” or “in-
line” procedures.
• Creation
There are two kinds of binary heaps: max-heaps and min-heaps. In both kinds,
the values in the nodes satisfy a heap property, the specifics of which depend on
2
the kind of heap. In a max-heap, the max-heap property is that for every node i
– create-heap: create an empty heap
– heapify: create a heap out of given array of elements
– merge (union): joining two heaps to form a valid new heap containing all the elements of both,
preserving the original heaps.
– meld: joining two heaps to form a valid new heap containing all the elements of both, destroying
the original heaps.

• Inspection

– size: return the number of items in the heap.


– is-empty: return true if the heap is empty, false otherwise.

• Internal

– increase-key or decrease-key: updating a key within a max- or min-heap, respectively


– delete: delete an arbitrary node (followed by moving last node and sifting to maintain heap)
– sift-up: move a node up in the tree, as long as needed; used to restore heap condition after inser-
tion. Called "sift" because node moves up the tree until it reaches the correct level, as in a sieve.
– sift-down: move a node down in the tree, similar to sift-up; used to restore heap condition after
deletion or replacement.

1.4 Algorithms
1.4.1 Max-Heapify

Algorithm 1 Max-Heapify (A, i)


l = 2i ; // Take the left child of i
r = 2i + 1 ; // Take the right child of i
if l ≤ A.size and A[l] > A[i] ; // update the largest with either its left child or its
right child.
then
largest = l
else
largest = i
if r ≤ A.size and A[r] > A[largest] then
largest = r
if largest 6= i ; // If one of the above if updated the largest, swap the largest with the
current node i and recursively call itself to update the children of largest.
then
swap A[i] with A[largest]
Max-Heapify (A, largest)

An example is shown in Fig. 2.

3
6.2 Maintaining the heap property 155

1 1
16 16
2 3 2 3
i 4 10 14 10
4 5 6 7 4 5 6 7
14 7 9 3 i 4 7 9 3
8 9 10 8 9 10
2 8 1 2 8 1
(a) (b)

1
16
2 3
14 10
4 5 6 7
8 7 9 3
8 9 10
i
2 4 1
(c)

Figure 2: The action of Max-Heapify (A, 2). (a) A[2] violates the max-heap property (2) exchange A[2] with
Figure 6.2 The action of M AX -H EAPIFY.A; 2/, where A: heap-size D 10. (a) The initial con-
A[4] (3) recursively call Max-Heapify (A, 4).
figuration, with AŒ2 at node i D 2 violating the max-heap property since it is not larger than
both children. The max-heap property is restored for node 2 in (b) by exchanging AŒ2 with AŒ4,
1.4.2 Build which
a heap destroys the max-heap property for node 4. The recursive call M AX -H EAPIFY.A; 4/ now
has i D 4. After swapping AŒ4 with AŒ9, as shown in (c), node 4 is fixed up, and the recursive call
M AX -H EAPIFY.A; 9/ yields no further change to the data structure.
Algorithm 2 Build-Max-Heap (A)
A.heap − size = A.length
children to satisfy the max-heap property. The node indexed by largest, however,
forall i = bA.length/2c do
now has the original value AŒi, and thus the subtree rooted at largest might violate
Max-Heapify (A, i)
the max-heap property. Consequently, we call M AX -H EAPIFY recursively on that
subtree.
An example is shown
The in Fig. 3.time of M AX -H EAPIFY on a subtree of size n rooted at a given
running
node i is the ‚.1/ time to fix up the relationships among the elements AŒi,
1.4.3 Heap AŒL SortEFT .i/, and AŒR IGHT .i/, plus the time to run M AX -H EAPIFY on a subtree
rooted at one of the children of node i (assuming that the recursive call occurs).
Algorithm 3 Heap-Sort (A) subtrees each have size at most 2n=3—the worst case occurs when
The children’s
Build-Max-Heap (A)
the bottom level of the tree is exactly half full—and therefore we can describe the
running
forall i ← A.length time of2 M
downto AX -H EAPIFY by the recurrence
do
swap A[1] with A[i]
T .n/
Max-Heapify (A,
1)T .2n=3/ C ‚.1/ :

The Build-Max-Heap taks time O(n) and each of the n − 1 calls to Max-Heapify takes O(log n), thus leads to
an overall complexity of Heap-Sort O(n log n).

1.5 Current Implementations


The C++ Standard Library provides the make_heap, push_heap and pop_heap algorithms for heaps
(usually implemented as binary heaps), which operate on arbitrary random access iterators. It treats the it-
erators as a reference to an array, and uses the array-to-heap conversion. It also provides the container adap-

4
158 Chapter 6 Heapsort

A 4 1 3 2 16 9 10 14 8 7

1 1
4 4
2 3 2 3
1 3 1 3
4 5 6 7 4 5 6 7
2 i 16 9 10 i 2 16 9 10
8 9 10 8 9 10
14 8 7 14 8 7
(a) (b)

1 1
4 4
2 3 2 3
1 3 i i 1 10
4 5 6 7 4 5 6 7
14 16 9 10 14 16 9 3
8 9 10 8 9 10
2 8 7 2 8 7
(c) (d)

1 1
i 4 16
2 3 2 3
16 10 14 10
4 5 6 7 4 5 6 7
14 7 9 3 8 7 9 3
8 9 10 8 9 10
2 8 1 2 4 1
(e) (f)

Figure 3: The actions ofFigure


Build-Max-Heap
6.3 The operation of (A).
B UILD(a)
-M AXwe start
-H EAP withthei data
, showing = 10/2
structure= 5 (b)
before A[4]
the call to is smaller than one of
M AX -H EAPIFY in line 3 of B UILD -M AX -H EAP. (a) A 10-element input array A and the bi-
its child, we need to swap A[4] = 2 with A[8] = 14, and we do not need to recursively call Max-Heap again.
nary tree it represents. The figure shows that the loop index i refers to node 5 before the call
(c) we now move on to iM AX =-H3, we .A;
EAPIFY need toThe
i/. (b) swap A[3] that
data structure 3 withTheA[7]
= results. = 10
loop index i for(d) we iteration
the next move on to i = 2 and it
is swapped with A[5] =refers16 and then A[10] = 7, leading to next step (e) we now
whenever M AX -H EAPIFY is called on a node, the two subtrees of that node are both max-heaps.
at i = 1, we need to swap
to node 4. (c)–(e) Subsequent iterations of the for loop in B UILD -M AX -H EAP. Observe that

A[1] = 4 with its larger(f)children


The max-heapA[4]
after = 14,-Mthen
B UILD with
AX -H EAP A[4] ’s left child A[9] = 8, leading to the final state (f).
finishes.
Note that every step of Max-Heapify, the sub-tree of node i is always a max-heap.

5
tor priority_queue, which wraps these facilities in a container-like class. However, there is no standard
support for the replace, sift-up/sift-down, or decrease/increase-key operations.

2 Binary Search Tree (Data Structure)


2.1 Definition
Binary search trees (BST), sometimes called ordered or sorted binary trees, are a particular type of container:
a data structure that stores "items" (such as numbers, names etc.) in memory. They allow fast lookup, addi-
tion and removal of items, and can be used to implement either dynamic sets of items, or lookup tables that
allow finding an item by its key (e.g., finding the phone number of a person by name).
Binary search trees keep their keys in sorted order, so that lookup and other operations can use the princi-
ple of binary search: when looking for a key in a tree (or a place to insert a new key), they traverse the tree
from root to leaf, making comparisons to keys stored in the nodes of the tree and deciding, on the basis of the
comparison, to continue searching in the left or right subtrees. On average, this means that each comparison
allows the operations to skip about half of the tree, so that each lookup, insertion or deletion takes time pro-
portional to the logarithm of the number of items stored in the tree. This is much better than the linear time
required to find items by key in an (unsorted) array, but slower than the corresponding operations on hash
tables.

2.2 Properties
Let x be a node in a binary search tree. If y is a node in the left subtree of x, then y.key ≤ x.key. If y is a
node in the right subtree of x, then y.key ≥ x.key.

2.3 Operations
• Searching: search an element in the tree. We begin by examining the root node. If the tree is null, the
key we are searching for does not exist in the tree. Otherwise, if the key equals that of the root, the
search is successful and we return the node. If the key is less than that of the root, we search the left
subtree. Similarly, if the key is greater than that of the root, we search the right subtree. This process is
repeated until the key is found or the remaining subtree is null. If the searched key is not found after a
null subtree is reached, then the key is not present in the tree. This process runs in O(h) time on a tree
with height h.

• Insertion: insert an element in the tree. Insertion begins as a search would begin; if the key is not
equal to that of the root, we search the left or right subtrees as before. Eventually, we will reach an
external node and add the new key-value pair (here encoded as a record ’newNode’) as its right or left
child, depending on the node’s key. In other words, we examine the root and recursively insert the new
node to the left subtree if its key is less than that of the root, or the right subtree if its key is greater
than or equal to the root. This process runs in O(h) time on a tree with height h.

• Minimum (maximum): find the minimum (maximum) element in the tree. This process begins with
the root. Because the property of binary search tree, the left (right) child is always smaller (larger) than
the parent. We follow the left (right) child until we encounter a nil, which terminates the process, and
return that element as the minimum (maximum). This process takes O(h) running time.

• Successor and predecessor: the successor of a node x is the node with the smallest key greater than
x.key, likewise, the predecessor is the node with the largest key that is smaller than x.key.

6
Algorithm 4 Tree-Insert (T , z)
if T == nil then
Assign T as a new node with key z
if z < T.key then
T.lef t ← Tree-Insert (T.lef t, z)
else if z ≥ T.key then
T.right ← Tree-Insert (T.right, z)
return T

2.4 Algorithms
2.4.1 Insertion

The Decision Tree/Comparison Model


• All sorting algorithms we have seen so far use only comparisons to gain information about the input.

• We will prove that such algorithms have to perform Ω(n log n) comparisons. To prove bound, we need
formal model:
Comparison (or Decision tree) Model
– Binary tree where each internal node is labeled ai ≤ aj (ai is the i’th input ele-
ment)
– Execution corresponds to root-leaf path
∗ at each internal node a comparison ai ≤ aj is performed and branching
– Leaf contains result of computation

• Decision tree model corresponds to algorithms where only comparisons can be used to gain knowledge
about input.

• Any algorithm has a corresponding decision tree (just ignore everything else but the comparisons made
by the algorithm); and any decision tree corresponds to an algorithm.

• Example: Decision tree for sorting 3 elements.


a <a
1 2

a <a
2 3 a <a
1 3

<1,2,3> a <a
1 3 <2,1,3> a <a
2 3

<1,3,2> <3,1,2> <2,3,1> <3,2,1>

• The algorithm must be able to sort any possible input (of n elements, for any n). Put differently, for
each input the algorithm must be able to compute the permutation of the input that gives sorted order.

• An execution of the algorithm corresponds to a root-to-leaf path in the tree; it corresponds to identify-
ing the permutation that sorts the input.

• Each leaf of the tree represents a permutation (that corresponds to sorted order for that path).

• Worst case number of comparisons performed corresponds to maximal root-leaf path (=height of tree).

• Therefore lower bound on height ⇒ lower bound on sorting.

7
• If we could prove that any decision tree on n inputs must have height Ω(n lg n) ⇒ any sorting algorithm
(that uses only comparisons) must make at least Ω(n lg n) comparisons in the worst case.

Sorting Lower Bound in the Decision Tree/Comparison Model


Theorem: Any decision tree sorting n elements has height Ω(n log n).
Proof:
• Assume elements are the (distinct) numbers 1 through n

• There must be n! leaves (one for each of the n! permutations of n elements)

• Tree of height h has at most 2h leaves

• Thus the height must be so that 2h ≥ n! and we get

h ≥ log(n!)
n
X
= log i
i=2
n/2−1 n
X X
= log i + log i
i=2 i=n/2
n
X n
≥ 0+ log
2
i=n/2
n n
= · log
2 2
= Ω(n log n)

From here it follows that: Any sorting algorithm that uses only comparisons takes Ω(n log n) in the worst case.

Counting sort
Input: integers in the range {0, .., N − 1}, for some N ≥ 2.
Counting-sort uses the same idea as bucket sort, except that, instead of creating buckets (with linked lists), it
stores everything in an array. It’s more elegant.

Algorithm 5 B = Counting-Sort (A)


Create an auxiliary array C[0..N − 1]
forall i = 0 to N - 1 do
C[i] = 0
forall i = 0 to N - 1 do
C[A[i]] + +
forall i = 0 to N - 1 do
C[i] = C[i] + C[i − 1]
forall i = N − 1 to 0 do
B[C[A[i]]] = A[i]
C[A[i]] − −

Analysis: O(n + N ) time and O(N + n) extra space.

8
Counting-sort is stable.

A sorting algorithm is called stable if it leaves equal elements in the same order that it
found them.

To understand why stability is important, assume you want to sort a bunch of names by Last name. Let’s say
they were previously sorted by first name. If we use a stable sorting algorithm to sort by Last name, it would
keep all the people with same last name in the order they were in the input, which means that people with
same last name would appear in the order of their first names.

Bucket Sort (Radix Sort)

Algorithm 6 Bucket-Sort (A)


Create an auxiliary array B[0..N − 1]
forall i = 0 to N-1 do
Insert A[i] into B[A[i]]
forall i = 0 to N - 1 do
Traverse B[i]

Input: integers in the range {0, .., N − 1}, for some N ≥ 2.

Analysis: O(n + N ) time and O(N + n) extra space.


How does Θ(n + N ) compare with Θ(n lg n)? Put differently, when is Bucket-sort efficient?

• When N is small. For e.g. if N = O(n), then Bucket-sort runs in O(n) time.

3 Dynamic Programming
3.1 Elements of Dynamic Programming
• Optimal substructure – if an optimal solution to the problem contains within it optimal solutions to
subproblem. Couple of common patterns of discovering optimal substructures

– Solution to the problem consists of making a choices, and making such choices leaves one or more
subproblem to be solved
– Usually if a problem is given, the choices are also obvious to lead to an optimal solution
– Given this choice, one should determine which subproblem ensue and how to best characterize the
resulting subproblem space
– You show that the solutions to the subproblem used within an optimal solution must themselves be
optimal by using a cut-and-paste technique. (usually prove by contradiction).

• Overlapping subproblems – the space of subproblem must be small in thr sense that a recursive algo-
rithm for the problem solves the same subproblems over and over, rather than always generating new
subproblems. Typically the total number of distinct subproblems is a polynomial in the input size.

3.2 The Longest Common Subsequence Problem


3.2.1 Problem
Given two sequence X[1 · · · m], Y[1 · · · n]. Find a longest sequence common to both. E.g., X = {A, B, C, B, D, A, B}
and Y = {B, D, C, A, B, A}. We denote C[i, j] = |LCS (X[1 · · · i], Y[1 · · · j])|.

9
(
C[i − 1, j − 1] + 1 x[i] = y[j]
C[i, j] =
max {C[i, j − 1], C[i − 1, j]} o.w.

3.2.2 Recursive, top-down

Algorithm 7 LCS (X, Y, i, j)


if X[i] = Y[j] then
C[i, j] = C[i − 1, j − 1] + 1
else
C[i, j] = max {LCS(X, Y, i, j − 1), LCS(X, Y, i − 1, j)}
return C[i, j]

3.2.3 Memoization, bottom-up

Algorithm 8 LCS (X, Y, m, n)


forall i ← 1 to m do
C[i, 0] ← 0
forall j ← 1 to n do
C[0, j] ← 0
forall i ← 1 to m do
forall j ← 1 to n do
if X[i] = Y[i] then
C[i, j] ← C[i − 1, j − 1] + 1
B[i, j] ← “ -00
else if C[i − 1, j] ≥ C[i, j − 1] then
C[i, j] ← C[i − 1, j]
B[i, j] ← “ ↑00
else
C[i, j] ← C[i, j − 1]
B[i, j] ← “ ←00

return C, B

Algorithm 9 Print-LCS (B, X, i, j)


if i = 0 or j = 0 then
return
if B[i, j] = “ -00 then
Print-LCS (B, X, i − 1, j − 1)
print X[i]
if B[i, j] = “ ←00 then
Print-LCS (B, X, i, j − 1)
if B[i, j] = “ ↑00 then
Print-LCS (B, X, i − 1, j)

10
4 Medians and Order Statistics
4.1 Quadratic Worst-case Running Time Algorithm
The following algorithm outputs the i-th smallest element in array A. On average the running time is Θ(n)
but the worst case running time is Θ(n2 ).

Algorithm 10 Rand-Select(A, p, r, i)
if p = r then
return A[p]
q ← Rand-Partition(A, p, r)
k ←q−p+1
if i = k then
return A[q]
else if i < k then
return Rand-Select(A, p, q − 1, i)
else
return Rand-Select(A, q + 1, r, i − k)

4.2 Linear Worst-case Running Time Algorithm


We can achieve linear worse-case running time, and the algorithm is as follows.
The SELECT algorithm determines the i-th smallest of an input array of n > 1 distinct elements by executing
the following steps. (If n = 1, then SELECT merely returns its only input value as the i-th smallest.)

1. Divide the n elements of the input array into bn/5c groups of 5 elements each and at most one group
made up of the remaining n mod 5 elements.

2. Find the median of each of the dn/5e groups by first insertion-sorting the elements of each group (of
which there are at most 5) and then picking the median from the sorted list of group elements.

3. Use SELECT recursively to find the median x of the dn/5e medians found in step 2.

4. Partition the input array around the median-of-medians x using the modified version of PARTITION.
Let k be one more than the number of elements on the low side of the partition, so that x is the k-th
smallest element and there are n − k elements on the high side of the partition.

5. If i = k, then return x. Otherwise, use SELECT recursively to find the i-th smallest element on the low
side if i < k, or the i − kth smallest element on the high side if i > k.

5 Hashing
5.1 Direct-address tables
Direct addressing is a simple technique that works well when the universe U of keys is reasonably small. Sup-
pose that an application needs a dynamic set in which each element has a key drawn from the universe U =
{0, 1, · · · , m − 1}, where m is not too large. We shall assume that no two elements have the same key.

5.2 Hash tables


When the set K of keys stored in a dictionary is much smaller than the universe U of all possible keys, a hash
table requires much less storage than a direct- address table. Specifically, we can reduce the storage require-
ment to Θ(|K|) while we maintain the benefit that searching for an element in the hash table still requires

11
Each of these operations takes only O.1/ time.
256 Chapter 11 Hash Tables
T
0
key satellite data
U 1
11.2 Hash tables
(universe of keys) 2
0 6 2
9 3
7
4
The downside of direct addressing is obvious:
4
if the3 universe U is large, storing
1 a table T of size jU j may be impractical, or even impossible, given the memory
2 5
K
available 5 K of keys actually stored
on 3a typical computer. Furthermore, the set
(actual 6
may be so small relative to U that most of the space allocated for T would be
keys) 5 8 7
wasted.
8
When the set K of keys stored in a dictionary is8 much smaller than the uni-
9
verse U of all possible keys, a hash table requires much less storage than a direct-
address table. Specifically, we can reduce the storage requirement to ‚.jKj/ while
we maintain the benefit that searching for an element in the hash table still requires
Figure 4: Each key in the universe
11.1O.1/
only
Figure U time.
How {0,
The 9} corresponds
· · ·a ,dynamic
=to implement
1, catch is that this
set bybound to an the
is for index
a direct-address Tin thekey
average-case
table . Each table.the The
time,
in set K =
whereas
universe
{2, 3, 5, 8} of actual keys determines the slots in the table that contain pointers to elements. The other slots,
U D f0;
for 1; :
direct: : ; 9g corresponds
addressing it to
holdsan index
for in
the the table.
worst-case The set
time. K D f2; 3; 5; 8g of actual keys
determines thedirect
slots inaddressing,
the table thatan contain pointers
withtokey
elements. The other slots,
k.heavily shaded,
heavily shaded, contain NIL. With
contain NIL .
element k is stored in slot With hashing,
this element is stored in slot h.k/; that is, we use a hash function h to compute the
slot from the key k. Here, h maps the universe U of keys into the slots of a hash
only O(1) time. The catch istable thatT this
Œ0 : : mbound
 1: is for the average-case time, whereas for direct addressing it
holds for the worst-case time.
h W U ! f0; 1; : : : ; m  1g ;
With hashing, this element is stored in slot k; that is, we use a hash function h to compute the slot from the
key k. Here, h maps the universe theofsize
where U keys
m ofinto
thethe
hashslots
tableofis atypically
hash table
muchT[0,
less ·than jU −
·· ,m j. We
1] say that an
element with key k hashes to slot h.k/; we also say that h.k/ is the hash value of
key k. Figure 11.2
h : illustrates
U 7→ {0, the
1, · ·basic
· , midea.
− 1}The hash function reduces the range
of array indices and hence the size of the array. Instead of a size of jU j, the array
. can have size m.

T
0
U
(universe of keys)
h(k1)
h(k4)
k1
K k4
(actual k5 h(k2) = h(k5)
keys)
k2
k3 h(k3)

m–1

Figure 11.2 Using Figure


a hash 5: k2 and
function k5 collided.
h to map keys to hash-table slots. Because keys k2 and k5 map
to the same slot, they collide.

5.3 Hash table with chaining


5.4 Hash functions
• Division method
h(k) = k mod m
A prime number not too close to an exact power of 2 could be a good choice for m.
• Multiplication method
m = 2r

h(k) = (Ak mod 2w ) >> (w − r)

12
11.2 Hash tables 257

U k1 k4
(universe of keys)

k1
K k4 k5
(actual k5 k2 k7
k7
keys)
k2 k3
k8 k3
k6
k8 k6

Figure 6: h(kFigure
1 ) = h(k
11.3 4 ),Collision
h(k5 ) resolution h(k2 ), each
= h(k7 )by=chaining. hash table
Each hash-table slot Tslot contains
Œj  contains a doubly
a linked list of linked list.
all the keys whose hash value is j . For example, h.k1 / D h.k4 / and h.k5 / D h.k7 / D h.k2 /.
The linked list can be either singly or doubly linked; we show it as doubly linked because deletion is
where r is thefaster
number of slots, A is an integer between 2w−1 and 2w , and not too close
that way. to power of 2.
Example. m = 23 , r = 3, w = 7, 26 < A < 27 ⇒ A = 1011001b , k = 1101011b . Therefore, we have
There is one hitch: two keys may hash to the same slot. We call this situation
Ak = 10010100110011b , and Ak mod 27 = 0110011b , then we right shift by w − r = 4 bits. Finally we
a collision. Fortunately, we have effective techniques for resolving the conflict
obtain h(k) = 011b .
created by collisions.
Of course, the ideal solution would be to avoid collisions altogether. We might
5.5 Open addressing try to achieve this goal by choosing a suitable hash function h. One idea is to
make h appear to be “random,” thus avoiding collisions or at least minimizing
• Linear probing – the hash function depend on the key k and probe number i. We keep on increasing i
their number. The very term “to hash,” evoking images of random mixing and
until we find an empty captures
chopping, slot for the
ourspirit
key ofk.this approach. (Of course, a hash function h must be
deterministic in that a given input k should always produce the same output h.k/.)
h(k, i) = (h(k, 0) + i) mod m
Because jU j > m, however, there must be at least two keys that have the same hash
value; avoiding collisions altogether is therefore impossible. Thus, while a well-
• Double hashing designed, “random”-looking hash function can minimize the number of collisions,
we still need a method for h(k, i) = (hthe
resolving 1 (k) + ih2 (k))
collisions mod
that do m
occur.
The remainder of this section presents the simplest collision resolution tech-
Given an open-address hash
nique, table
called with load
chaining. factor
Section 11.4 αintroduces
= n/m < an 1, the expected
alternative methodnumber of probes in an unsuc-
for resolving
cessful search is at most
collisions, assuming
− α) open
1/(1called uniform hashing.
addressing.

5.6 Collision resolution by chaining


Universal hashing
We define U as the In universe
chaining, of keys, andall
we place letthe be a finite
H elements thatcollection of same
hash to the hashing
slotfunctions that maps U 7→
into the same
{0, · · · , m − 1}. Welinked
definelist,
H as Figure 11.3if shows.
universal U and
∀x, y ∈Slot x 6= y,a the
j contains newtoset
pointer the size
head of the list of
all stored elements that hash to j ; if there are no such elements, slot j contains NIL.
|H|
| {h ∈ H : h(x) = h(y)} | =
m
If we choose h ∈ H randomly, and it maps n keys into m slots, we have,
n
E {h(x) = h(y), x 6= y} .
m

13
5.7 Perfect hashing
Given n keys and construct a static hash table m = O(n), and search takes O(1) time – yet another example
278 Chapter 11 Hash Tables
of trading memory for time.

S
T m 0 a0 b0 0
0 1 0 0 10
0 S2
1 m 2 a2 b2
2 9 10 18 60 72 75
3 0 1 2 3 4 5 6 7 8
S
4 m 5 a5 b5 5
5 1 0 0 70
0 S7
6 m 7 a7 b7
7 16 23 88 40 52 22 37
8 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Figure 7: Using perfect


Figurehashing
11.6 Using to store thehashing
perfect set K to {10, the
= store 22, set
37, K40,D52,f10;
60,22;
70,37;
72,40;
75}.52; The
60; 70;outer hashThe
72; 75g. function
is h(k) = ((ak + b)outer
mod hash p)function
mod m, whereDa..ak
is h.k/ = 3,Cb b/= mod
42, pp/=mod 101,m,and m=
where 9. 3,For
a D b Dexample,
42, p Dh(75) = 2, and
101, and
so key 75 hashes tom slot
D 9.2 of Fortable T . h.75/
example, A secondary
D 2, andhash table
so key Sj stores
75 hashes all 2keys
to slot hashing
of table T . Atosecondary
slot j. The
hashsize of
hash table Sj is mtable
j =S nj2j ,stores
and all
thekeys
associated
hashing to hash
slot function
j . The sizeis of
hjhash
(k) table nj2 , and
= ((aSjjkis+mbjj )D mod p)themod mj . Since
associated
h2 (75) = 7, key 75hash
is stored
functioninisslot 7 of
hj .k/ secondary
D ..a j k C bj / hash
mod p/table
modSm2j. . No
Sincecollisions
h2 .75/ Doccur
7, keyin75any of the
is stored in secondary
slot 7
hash tables, and soofsearching
secondary hash takestable
constant
S2 . No time in the
collisions worst
occur in anycase.
of the secondary hash tables, and so searching
takes constant time in the worst case.

call a hashing technique perfect hashing if O.1/ memory accesses are required to
perform a search in the worst case.
To create a perfect hashing scheme, we use two levels of hashing, with universal
hashing at each level. Figure 11.6 illustrates the approach.
The first level is essentially the same as for hashing with chaining: we hash
the n keys into m slots using a hash function h carefully selected from a family of
universal hash functions.
Instead of making a linked list of the keys hashing to slot j , however, we use a
small secondary hash table Sj with an associated hash function hj . By choosing
the hash functions hj carefully, we can guarantee that there are no collisions at the
secondary level.
In order to guarantee that there are no collisions at the secondary level, however,
we will need to let the size mj of hash table Sj be the square of the number nj of
keys hashing to slot j . Although you might think that the quadratic dependence
of mj on nj may seem likely to cause the overall storage requirement to be exces-
sive, we shall show that by choosing the first-level hash function well, we can limit
the expected total amount of space used to O.n/.
We use hash functions chosen from the universal classes of hash functions of
Section 11.3.3. The first-level hash function comes from the class Hpm , where as
in Section 11.3.3, p is a prime number greater than any key value. Those keys

14

You might also like