Afshine Amidi, Shervine Amidi - Algorithms & Data Structures_ Super Study Guide (2022)
Afshine Amidi, Shervine Amidi - Algorithms & Data Structures_ Super Study Guide (2022)
Algorithms &
Data Structures
First edition
We would like to dedicate this book to our beloved grandparents,
Dr. Mohammad Sadegh Azimi and Dr. Atieh Azimi,
who will always stay in our hearts.
Contents
1 Foundations 1
1.1 Algorithmic concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.2 Types of algorithms . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.3 Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2 Mathematical concepts . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.2.1 Combinatorics . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.2.2 Mathematical analysis . . . . . . . . . . . . . . . . . . . . . . 12
1.2.3 Bit manipulation . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.3 Classic problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.3.1 Traveling salesman . . . . . . . . . . . . . . . . . . . . . . . . 16
1.3.2 Knapsack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.3.3 N -Queens . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
1.3.4 Coin change . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2 Data structures 26
2.1 Arrays and strings . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.1.1 Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.1.2 Strings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.2 Stacks and queues . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.2.1 Stacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.2.2 Queues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.3 Hash tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.3.1 General concepts . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.3.2 Collisions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.4 Advanced hash tables . . . . . . . . . . . . . . . . . . . . . . . . . . 40
2.4.1 Bloom filters . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
2.4.2 Count-min sketches . . . . . . . . . . . . . . . . . . . . . . . 42
2.5 Linked lists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
2.5.1 Singly linked lists . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.5.2 Doubly linked lists . . . . . . . . . . . . . . . . . . . . . . . . 49
i
Super Study Guide Contents
Index 131
ii
Super Study Guide Foundations
SECTION 1
Foundations
In this first section, we will start with the basic concepts of algorithms, along
with mathematics notions that are used in many problems.
1.1.1 Overview
❒ Algorithm – Given a problem, an algorithm A is a set of well-defined instructions
that runs in a finite amount of time and space. It receives an input I and returns
an output O that satisfies the constraints of the problem.
I O
Suppose we want to return the sum of all of the elements of a given list. An example
of an iterative algorithm would be to sequentially add each element of the list to a
variable, and return its final value.
❒ Recursion – A recursive algorithm uses a function that calls itself. It is composed
of the following components:
• Base case: This is the set of inputs for which the outputs are known.
• Recursive formula: The answer of the current step is based on function calls
relying on previous steps, eventually using the base case answer.
1
Super Study Guide Foundations
...
... ...
...
...
...
❒ Call stack – In a recursive algorithm, the space used by function calls ci is called
the stack space.
cn
...
c2
c1
❒ Stack overflow – The problem of stack overflow occurs when a recursive algo-
rithm uses more stack space than the maximum allowed N .
cN+1
cN
...
c2
c1
A solution to circumvent this bottleneck is to convert the code from being recursive
to being iterative so that it relies on memory space, which is typically bigger than
stack space.
❒ Memoization – Memoization is an optimization technique aimed at speeding up
the runtime by storing results of expensive function calls and returning the cache
when the same result is needed.
2
Super Study Guide Foundations
...
To illustrate this technique, let’s consider the following problem: given a sorted
array A, we want to return all pairs of elements that sum up to a given number.
• A brute-force approach would try all possible pairs (ai , aj ) and return those
that sum up to that number. This method produces the desired result but
not in minimal time.
• A non-brute-force approach could use the fact that the array is sorted and
scan the array using the two-pointer technique.
3
Super Study Guide Foundations
t
0 N
To illustrate this concept, let’s consider the problem of finding the longest path
from a given starting point in a weighted graph. A greedy approach constructs
the final path by iteratively selecting the next edge that has the highest weight.
The resulting solution may miss a longer path that has large edge weights "hidden"
behind a low-weighted edge.
❒ Divide and conquer – A divide and conquer (D&C) algorithm computes the
final result of a problem by recursively dividing it into independent subproblems:
P P1 P2
Pi → Si
• Combine: The result of each subproblem is combined to find the final answer.
S S1 S2
Algorithms following the D&C principle include sorting algorithms such as merge
sort and quick sort.
P P1 P2
4
Super Study Guide Foundations
Top-down This approach finds the target value by recursively computing previ-
ous values.
• Step 1 : Try computing the desired value Fn and notice that it is based on
previous values.
• Step 2 : Try computing the previous values, which themselves rely on earlier
values, some of which may have already been computed. In this case, we can
use memoization to avoid duplicate operations.
Bottom-up This approach starts from already-known results and iteratively com-
putes succeeding values until it reaches the target value.
• Step 1 : Compute F0 , F1 , F2 , etc. in a predetermined way. These values are
typically stored in an array.
F0 F1 F2 ... Fn−2 Fn−1 Fn ...
• Step 2 : Deduce Fn .
F0 F1 F2 ... Fn−2 Fn−1 Fn ...
F0 F1 ... Fn F0 F1 ... Fn
5
Super Study Guide Foundations
1.1.3 Complexity
❒ Definition – The concept of complexity is used to quantify the efficiency of an
algorithm. There are two types of complexities:
Both measures are usually given as a function of the input size n, although other
parameters can also be used.
❒ Notations – The complexity f of an algorithm can be described using a known
function g with the notations described in the table below:
g
f = o(g) ∀ϵ > 0, ∃n0 , ∀n ⩾ n0 Negligible compared to g
c⋅g
f = O(g) ∃c > 0, ∃n0 , ∀n ⩾ n0 Upper-bounded by g f
g
"big oh of g" |f (n)| ⩽ c|g(n)| f (n) ⩽ g(n)
n→+∞
n0
c2 ⋅ g
∃c1 , c2 > 0, ∃n0 , ∀n ⩾ n0 Similar to g
f = Θ(g)
|f (n)| ⩾ c1 |g(n)| f g
c1 ⋅ g
"theta of g" f (n) ∼ g(n)
n→+∞
|f (n)| ⩽ c2 |g(n)|
n0
Remark: The big oh notation is frequently used to describe the time and space
complexity of a given algorithm.
❒ Orders of magnitude – The table below highlights the main kinds of runtime
complexities T (n) as a function of the input size n:
6
Super Study Guide Foundations
T(n) n! 2n n 2 n log(n)
n
log(n)
n
O(1) < O(log(n)) < O(n) < O(n log(n)) < O(n2 ) < O(2n ) < O(n!)
7
Super Study Guide Foundations
n
d > logb (a) Θ(nd ) T (n) = 4T + Θ(n3 ) −→ T (n) = Θ(n3 )
2
❒ Problem complexity – Problems can be divided into classes that quantify how
hard it is to solve them and to verify whether a proposed solution works. The table
below presents two well-known classes:
8
Super Study Guide Foundations
?
• P ⊇ NP: It is unclear whether being able to verify a solution in polynomial
time implies that we can solve the problem in polynomial time. The general
consensus is that P ̸⊇ NP (hence P ̸= NP) but this has not been formally
proven yet.
"harder"
P ?
NP
NP-complete
NP-hard
1.2.1 Combinatorics
❒ Factorial – The factorial n! of a given integer n is defined as follows:
n! ≜ n × (n − 1) × ... × 2 × 1
Remark: By convention, 0! = 1.
n
❒ Binomial coefficient – For given integers 0 ⩽ k ⩽ n, the notation is read
k
"n choose k" and is called a binomial coefficient. It is defined as follows:
n n!
≜
k k!(n − k)!
9
Super Study Guide Foundations
n
n
X n
(x + y) = xn−k y k
k
k=0
10
Super Study Guide Foundations
n!
P (n, k) = C(n, k) × k! =
(n − k)!
1 2 1 2 1 2
2 1 2 1 2 1
We note that this way of ordering permutations can be applied to sets containing
any type of ordered elements, such as integers and characters.
❒ Next lexicographic permutation – Given an array A = [a0 , ..., an−1 ] rep-
resenting a permutation of the set of n elements {a0 , ..., an−1 }, the goal of this
problem is to find the next permutation in the lexicographic order.
2 3 8 6 5 2 ?
0 1 2 3 4 5
2 3 8 6 5 2
0 1 2 3 4 5
k
11
Super Study Guide Foundations
2 3 8 6 5 2
0 1 2 3 4 5
k l
2 3 8 6 5 2 2 5 8 6 3 2
0 1 2 3 4 5 0 1 2 3 4 5
2 5 8 6 3 2 2 5 2 3 6 8
0 1 2 3 4 5 0 1 2 3 4 5
12
Super Study Guide Foundations
a = bq + r
a is called the dividend, b the divisor, q the quotient and r the remainder.
a ≡ r [b]
F2
F3
F0 F1
...
F4
Remark: The first ten numbers of the sequence are 0, 1, 1, 2, 3, 5, 8, 13, 21, 34.
+∞
X
n= nk bk where nk ∈ {0, ..., b − 1}
k=0
13
Super Study Guide Foundations
n = (...nk ...n3 n2 n1 n0 )b
The most commonly used bases are summarized in the table below:
Representation
Base b Possible values nk Application
of n = 154
2
{0, 1} (10011010)2 Logic
"binary"
10
{0, ..., 9} (154)10 Day-to-day life
"decimal"
16
{0, ..., 9, A, ..., F } (9A)16 Memory allocation
"hexadecimal"
+∞
X
n= nk 2k where nk ∈ {0,1}
k=0
❒ Bit notations – A binary number represented with N bits has the following
bit-related notations:
14
Super Study Guide Foundations
❒ Bitwise operators – The truth table of the bitwise operators OR, XOR, AND
between x ∈ {0, 1} and y ∈ {0, 1} is given below:
OR XOR AND
x y x|y x∧y x&y
0 0 0 0 0
1 0 1 1 0
0 1 1 1 0
1 1 1 0 1
❒ Tricks – The table below shows a few bit-related tricks that are useful in estab-
lishing non-trivial results with minimal effort:
15
Super Study Guide Foundations
n−1
X
Remark: The first trick stems from the fact that 2n − 1 = 2k .
k=0
❒ Integer overflow – The problem of integer overflow occurs when the number of
bits needed to encode an integer n exceeds the number of available bits N . More
precisely, binary numbers encoded on N bits cannot exceed 2N − 1.
2N−1 29 28 27 26 25 24 23 22 21 20
... X X 1 ... 0 0 1 1 0 1 1 1 0 0
This issue can sometimes be alleviated by changing the order in which operations
are performed. For example, a safe way to average two integers a and b is shown
below:
Naive Safe
a+b b−a
Formula a+
2 2
Order of s1 = b − a
s1 = a + b s1
operations s1 s2 =
s2 = 2
2 s3 = a + s2
16
Super Study Guide Foundations
The distance between each pair of cities (ci , cj ) is noted di,j and is known.
di,j
ci cj
A naive approach would consist of enumerating all possible solutions and finding
the one that has the minimum cumulative distance. Given a starting city, there are
(n − 1)! possible paths, so this algorithm would take O(n!) time. This approach is
impracticable even for small values of n.
Suppose c1 is the starting city. This arbitrary choice does not influence the final
result since the resulting path is a cycle. We define the following quantities:
• Ck contains all sets of k distinct cities in {c2 , ..., cn }:
n o
for k ∈ [[0, n − 1]], Ck = Ck |Ck ⊆ {c2 , ..., cn }, #Ck = k
n−1
We note that Ck has a size of .
k
17
Super Study Guide Foundations
• g(Ck , cj ) is the distance of the shortest path that starts from c1 , goes through
each city in Ck ∈ Ck exactly once and ends at cj ∈ / Ck .
Ck ∈ 𝒞k
c1
g(Ck, cj) cj
• Initialization: The shortest path between the starting city c1 and each of the
other cities cj with no intermediary city is directly given by d1,j :
C0 = Ø
c1
g(C0, cj)
cj
• Compute step: Suppose that for all Ck−1 ∈ Ck−1 and ci ̸∈ Ck−1 , we know the
distance of the shortest path g(Ck−1 , ci ) between c1 and ci via the k − 1 cities
in Ck−1 .
Let’s take Ck ∈ Ck . For all ci ∈ Ck , we note that Ck \{ci } ∈ Ck−1 and that
(naturally) ci ̸∈ Ck \{ci }, which means that g(Ck \{ci }, ci ) is known.
We can deduce the distance of the shortest path between c1 and cj ̸∈ Ck via
k cities by taking the minimum value over all second-to-last cities ci ∈ Ck :
n o
∀Ck ∈ Ck , ∀cj ∈
/ Ck , g(Ck , cj ) = min g(Ck \{ci }, ci ) + di,j
ci ∈Ck
Ck \{ci} ∈ 𝒞k−1
c1 di,j
g(Ck \{ci}, ci) ci cj
18
Super Study Guide Foundations
Time Space
n−1 n−1
k × (n − 1 − k) × (n − 1 − k) ×
k k
Since Cn−1 = {c2 , ..., cn }, the last step of the algorithm gives:
n o
g(Cn−1 , c1 ) = min g(Cn−1 \{ci }, ci ) + di,1
i∈[[2,n]]
g(Cn−1\{ci}, ci)
Cn−1\{ci} ∈ 𝒞n−2
c1
di,1
ci
1.3.2 Knapsack
The 0/1 knapsack problem is a classic problem where the goal is to maximize the
sum of values of items put in a bag that has a weight limit. Here, "0/1" means that
for each item, we want to know whether we should include it (1) or not (0).
v1 v2 vn
...
C w1 w2 wn
More formally, we have a bag of capacity C and n items where each item i ∈ [[1, n]]
has value vi and weight wi . We want to find a subset of items I ⊆ {1, ..., n} such
that:
X X
I = argmax vi with wi ⩽ C
I⊆[[1,n]] i∈I i∈I
A naive approach would consist of trying out every combination of the n items and
take the one with the maximum value which also satisfies the cumulative weight
constraint. Such an approach has a time complexity of O(2n ).
Luckily, this problem can be solved with a bottom-up dynamic programming ap-
proach that has a time complexity of O(nC) and a space complexity of O(nC).
19
Super Study Guide Foundations
We note Vi,j the maximum value of a bag of capacity j ∈ [[0, C]] that contains items
among {1, 2, ..., i}. Our objective is to get Vn,C .
0 ... j ... C
0
...
i Vi,j
...
n Vn,C
0 1 ... C
0 0 0 0 0
1 0
... 0
n 0
0 1 ... C
0
1
...
n
20
Super Study Guide Foundations
– Bag 2: does not contain item i. VB2 is already known: it is the maximum
value of the bag of capacity j containing items among {1, ..., i − 1}.
VB2 = Vi−1,j
Vi−1,j−wi vi Vi,j
j − wi wi
j
– Case Vi,j = Vi−1,j : This means that item i was not included. We go to
position (i − 1, j).
Vi−1,j Vi,j
j
j
21
Super Study Guide Foundations
1.3.3 N -Queens
A queen is a chess piece that can move horizontally, vertically or diagonally by any
number of steps.
• Step 2 : Put the 2nd queen in the second column of the chessboard. As long as
the partial configuration is invalid, keep changing its position until conditions
are satisfied.
22
Super Study Guide Foundations
• Step 3 : If there is nowhere to place the next queen, go back one step and try
the next partial configuration.
Given an unlimited number of coins of values {c1 , .., ck }, the coin change problem
aims at finding the minimum number of coins that sum up to a given amount S.
By convention, we assume that c1 < ... < ck .
23
Super Study Guide Foundations
In other words, we want to find the minimum value of x = x1 + ... + xk such that
k
X
S= x i ci
i=1
?
d0 d1 ... dS
0 1 S
• Initialization:
1 1 1
c1 ... c2 ... ck ... S
– Case ∃i, ds−ci > 0: We look back at valid values ds−ci and see which
coin ci minimizes the total number of coins needed to obtain amount s.
n o
ds = min ds−ci + 1
i∈[[1,k]]
ds−ci >0
min +1
... ds−ci ... ds−c1 ds ...
s − ci s − c1 s S
24
Super Study Guide Foundations
At the end of this step, amounts s ∈ {1, .., S} with ds = 0 cannot be obtained
with the given coins.
25
Super Study Guide Data Structures
SECTION 2
Data structures
In this second section, we will go through the most common data structures
that are used in everyday algorithms, such as arrays, strings, queues, stacks, hash
maps, linked lists and more.
In this part, we will learn about arrays and strings along with common tricks such
as Kadane’s algorithm and the sliding window trick.
2.1.1 Arrays
a0 a1 ... an−1
0 1 ... n−1
Most programming languages start their indices from 0. The resulting arrays are
said to be 0-indexed.
26
Super Study Guide Data Structures
i2
S max = max
∑
ai
i1,i2
i=i1
Trick Scan the array and keep track of the following quantities:
• Sicurrent : the maximum subarray sum of A up to index i that contains ai .
a0 ...
0 ...
S0current
• Update step: For each index i ∈ [[1, n − 1]], compute Sicurrent by distinguishing
two cases:
current
– Case Si−1 < 0: Appending ai to the currently tracked subarray sum
is worse than starting over. We set Sicurrent = ai .
... ai−1 ai ...
... i − 1 i ...
Sicurrent
current
– Case Si−1 ⩾ 0: It is worth continuing on the current subarray and
appending ai to it. We set Sicurrent = Si−1
current
+ ai .
27
Super Study Guide Data Structures
Sicurrent
Then, the maximum subarray sum seen so far is updated if a new maximum
is detected:
Simax = max Si−1
max
, Sicurrent
S max = Sn−1
max
❒ Merge intervals – The merge intervals problem is a classic problem that aims at
producing an array of non-overlapping intervals based on an array I = [I0 , ..., In−1 ]
of intervals that are potentially overlapping.
We note:
a0 a1 an−1
... ...
a0 a1 b0 b1 an−1 bn−1 b0 b1 bn−1
0 1 ... n−1
Ik′ = [a′k , b′k ] and a′k ⩽ b′k for k ∈ [[0, m − 1]] with m ⩽ n
a′0 a′m−1
... ...
a′0 b′0 a′m−1 b′m−1 b′0 b′m−1
0 ... m−1
The output array I ′ is obtained in O(n log(n)) time and O(n) space:
• Initialization:
28
Super Study Guide Data Structures
– The input array I is sorted with respect to the lower bound ak of each
interval Ik in O(n log(n)) time. Without loss of generality, we renumber
intervals so that a0 ⩽ ... ⩽ an−1 .
– The output array is initialized with I0′ = I0 .
... a′0
... ...
a′0 = a0 b0 b0
0 ...
– Case bk < bj : Ik does not add any new information since it is entirely
overlapping with the last added interval. We discard Ik .
... a′i
... ... ... ... ...
a′i ak bk bj bj
... i ...
– Case bj < ak : There is no overlap between Ik and the last added interval,
so we add Ik to the final array I ′ at index i + 1.
a′i a′i+1
... ... ... ... ...
a′i b′i = bj a′i+1 = ak bk b′i bk
... i i+1 ...
We repeat the update step for all intervals Ik in a sequential way to obtain the
output array of non-overlapping intervals I ′ .
2.1.2 Strings
❒ Definition – A string s is a data structure that can be seen as an array of
characters s0 , ..., sn−1 .
s0 s1 ... sn−1
0 1 ... n−1
29
Super Study Guide Data Structures
The following table summarizes a few categories of strings that have special prop-
erties:
❒ Longest substring – Given a string s0 ...sn−1 , the goal of this problem is to find
the length of the longest substring that does not contain repeated characters.
A naive approach would check the constraint on every possible substring using a
nested for-loop in O(n2 ) time. However, there is a more efficient approach that
uses the sliding window trick.
Trick Use a left pointer l and a right pointer r to delimit a window which has the
constraint to not have repeated characters. We have the following rules:
• Constraint is not violated: The current solution may have not reached its full
potential yet. The r pointer advances until the constraint is violated.
l r l r r
l r l l r
30
Super Study Guide Data Structures
• Initialization:
– Initialize an empty hash set that aims at storing the characters that are
part of the current substring.
– Set a global counter c to 0, which keeps track of the maximum substring
length encountered so far.
– Set the left pointer l and right pointer r to 0.
c u t e b e a r
lr h
?
c u t e b e a r c u t e b
l r h
– If sr is already in the hash set, then adding it again would make us have
sr twice in the substring.
c u t e b e a r c u t e b
l r h
Increment l and remove sl from the hash set until sr is not there anymore.
The substring gets trimmed down.
c u t e b e a r c u t e b
l l r h
31
Super Study Guide Data Structures
– Increment r by 1.
c u t e b e a r e b
l r r h
• Final step: c is the length of the biggest substring without repeated characters.
Remark: The sliding window trick is found in many string problems.
2.2.1 Stacks
❒ Definition – A stack s is a data structure that deals with elements s1 , ..., sn in
a Last In First Out (LIFO) order.
sn
...
s2
s1
Push Pop
Insert an element on the top of Remove the element from the top
the stack. of the stack and return its value.
s4 s4
s3 s3
s2 s2
s1 s1
As a result, the last element added to the data structure will be the first one
removed.
Remark: Stacks are commonly used for Depth-First Search (DFS) graph traversals.
❒ Operations – The main operations that can be performed on a stack are ex-
plained below:
32
Super Study Guide Data Structures
s3
Depending on whether the desired value gets
Search O(n) found right away, the stack needs to be popped s2
at most n times. s1
ti
i
di
In other words, for each day i ∈ [[0, n − 1]], we want to find the number of days di
33
Super Study Guide Data Structures
such that:
di = min (j − i)
j>i,tj >ti
A naive approach would compute the answer for each day in isolation. By scanning
up to O(n − i) elements for each day i, this approach leads to an overall time
complexity of O(n2 ).
However, a more efficient approach uses stacks and computes the solution in O(n)
time and O(n) space:
• Initialization:
– Initialize an empty stack s.
– Initialize the output array D = [d0 , ..., dn−1 ] with zeros.
0 0 ... 0
0 1 ... n−1 s
ti
kj ... k1 i
We start from day 0. For each day i, we have the following cases:
– If the temperature tk at the top of the stack is lower than the current
temperature ti , pop (k, tk ) from the stack, and write the corresponding
day difference in the final array:
dk = i − k
34
Super Study Guide Data Structures
Repeat the popping procedure until the condition is not satisfied any-
more.
k1 tk1
...
kj tkj
j ... 1 0 ... 0
...
kj ... k1 i ... n−1 s
... s
The array D contains the final answer. Entries with zeros correspond to days that
do not have future warmer days.
2.2.2 Queues
❒ Definition – A queue q is a data structure that deals with elements q1 , ..., qn
in a First In First Out (FIFO) order, where its tail has the most recently arrived
elements and its head has elements that arrived the earliest.
q1 q2 ... qn
head tail
Enqueue Dequeue
Insert element at the tail of Remove element from the head
the queue. of the queue.
q1 q2 q3 q4 q1 q2 q3 q4
As a result, the first element added to the data structure will be the first one to be
removed.
Remark: Queues are commonly used for Breadth-First Search (BFS) graph traver-
sals.
❒ Operations – The main operations that can be performed on a queue are ex-
plained below:
35
Super Study Guide Data Structures
In this part, we will go through the basics of hash tables along with methods to
resolve collisions.
36
Super Study Guide Data Structures
s1 s2 ... sn
A hash function f is used to link an element s to the address f (s) of the bucket
that contains it. f (s) is called hash value of s.
f
1
s
f (s) ...
s
...
❒ Hash table – A hash table, also called hash map, is a data structure used to
index large amounts of data with fast key search times. It consists of an unordered
collection of key-value pairs {(k1 , v1 ), ..., (kn , vn )}.
k1 → v1 k2 → v2 ... kn → vn
A hash function f is used to link a key k to the address f (k) of the bucket containing
the associated value v. f (k) is called hash value of k.
f
1
k
...
f (k)
k v
...
• Have a resolution method if two distinct keys have the same hash value, i.e.
form a hash collision
37
Super Study Guide Data Structures
Remark: Basic hash functions include summing the ASCII representation of char-
acters, whereas more sophisticated ones use advanced algebraic properties.
k1 f (k2) k2 v2
Given the key k, compute the
k2 f (k1)
Insertion O(1) corresponding bucket f (k) and add k1 v1
k3 f (k3)
the value v there. k3 v3
k1 f (k2) k2 v2
Given the key k, compute the
k2 f (k1)
Deletion O(1) corresponding bucket f (k) and k1 v1
k3 f (k3)
remove the value v from there. k3 v3
Remark: It is crucial to have a good hash function to ensure the O(1) runtimes.
2.3.2 Collisions
❒ Definition – A hash collision happens when two different keys have the same
hash value. When this is the case, we need a way to resolve it.
1
k1
...
f (k1)
f (k2)
k2
...
38
Super Study Guide Data Structures
❒ Load factor – The load factor ρ of a hash table is defined based on the total
number of items stored n = #{(ki , vi )} and the number of buckets b as follows:
n
ρ≜
b
Remark: If there are more items than there are buckets, then we have ρ > 1. In
this situation, the pigeonhole principle guarantees the presence of hash collisions.
❒ Collision resolution – The most common resolution methods can be put into
two categories:
Linear f (k) k0 v0
probing Tries to put the value in the next k0
p(1)
...
p(x) = x
bucket until there is no more collision. k
? ?
p(2)
...
Quadratic Tries to put the value in buckets that
are further and further away in a ? ?
probing ... ...
quadratic fashion, until there is no
p(x) = x2 more collision.
f1 f2
f1(k) k0 v0
Double
Uses a secondary hash function if the
k0
hashing
...
• Closed addressing: We append the value to the existing bucket. This works
best when when ρ is close to 1.
The risk of collisions highlight the importance of choosing a hash function that leads
to a uniform distribution of hash values.
39
Super Study Guide Data Structures
x ✓ ⟹ x ∈? B
x × ⟹ x∉B
f0
f1
b0 b1 b2 ... bm−1
...
fk−1
Element bj of the array is set to 1 when an element x is inserted and verifies fi (x) = j
for a given i ∈ [[0, k − 1]], with j ∈ [[0, m − 1]].
The false positive rate ϵ quantifies how unreliable a positive prediction given by the
bloom filter is:
40
Super Study Guide Data Structures
kn
k
ϵ ≈ 1 − e− m
m
k= loge (2)
n
x ✓ ⟹ x
x
x × ⟹ ×
❒ Operations – An initialized bloom filter has all the bits of its array B set to 0.
Insertion In order to insert an element x to the bloom filter, we follow the steps
below:
f0
x f1
1 1 ... 1
...
fk−1
41
Super Study Guide Data Structures
Deletion Deleting an element from the bloom filter is not possible, since removing
associated bits may inadvertently remove other elements.
– All corresponding bits are on: This means that the element x is possibly
in the set. It can also not be, in which case the prediction is a false
positive.
f0
f1
x 1 1 ... 1 ⟹ x ∈? B
...
fk−1
– One of the bits is off : This means that the element is definitely not in
the set, and we are sure of it.
f0
f1
x 1 0 ... 1 ⟹ x∉B
...
fk−1
x #x ̂ ≳ #x
• k independent hash functions f0 , ..., fk−1 with uniform distributions over val-
ues between 0 and m − 1
42
Super Study Guide Data Structures
Each element ci,j is an integer that corresponds to the number of times an element
x was inserted and verified fi (x) = j for a given i ∈ [[0, k − 1]], with j ∈ [[0, m − 1]].
k and m are hyperparameters that are chosen when initializing the data structure.
The bigger they are, the more accurate the approximation will be, but also the more
space the data structure will take and the more time each operation will take.
Application Suppose we would like to check how many times an element ap-
peared in a large stream of data.
• A naive approach would consist of keeping track of the stream of data and
counting the number of times each element appeared in a large entry table.
The space taken by this approach is as big as the number of distinct elements
seen. This could be very big.
• Now let’s assume we add a count-min sketch that approximately counts each
element. We would keep a fixed-size data structure that could handle large
streams of data and provide reasonable approximations.
N N
x ... #x x #x ̂ ≳ #x
N≫1 k×m
43
Super Study Guide Data Structures
f0 ... +1
x f1 +1 ...
... ...
fk−1 +1 ...
Deletion Deleting an element from the count-min sketch is not possible, since
removing associated counts may inadvertently remove other elements.
• Check step: Take the minimum across all the corresponding counts:
c = min c0,f (x) , ..., ck−1,f (x)
#x 0 k−1
This estimate is an upper bound because it might have been inflated by hash colli-
sions.
f0 ... c0,m−1
f1 c1,2 ...
x
... ...
fk−1 ck−1,0 ...
44
Super Study Guide Data Structures
v next
The first node of a singly linked list is called the head. The overall data structure
is represented as follows:
v1 next
v2 next ... next
vn next
NULL
head
❒ Operations – The main operations that can be performed on a singly linked list
are explained below:
❒ Floyd’s algorithm – Floyd’s algorithm, also known as the tortoise and hare
algorithm, is able to detect the starting point of a cycle in a linked list.
45
Super Study Guide Data Structures
v6
v5 v7
pT
v1 v2 v3 v4 D v8
pH
v11 v9
v10
a
It finds the start of the cycle using the two-pointer method in 3 steps, with a time
complexity of O(n) and a space complexity of O(1):
The tortoise and the hare both start from the head of the linked list and
respectively travel distances dT and dH until they meet. At the meeting
point, these quantities verify dH = 2dT .
We note ∆1 the difference between the distances traveled by the hare and the
tortoise:
∆1 ≜ dH − dT = dT
v6
v5 d v7
...
v1 v2 v3 v4 v8
... pT
v11 v9
v10 pH
Given that the two animals necessarily meet somewhere in the cycle, we can
also write:
dT = a + d + n1 D
where n2 > n1
dH = a + d + n2 D
This means that we can rewrite ∆1 as:
∆1 ≜ dH − dT = (n2 − n1 )D
46
Super Study Guide Data Structures
Hence, we have:
dT = (n2 − n1 )D
• Restart step: Now, we keep the hare at the meeting point and place the
tortoise back to the head of the linked list.
v6
v5 v7
pT
v1 v2 v3 v4 v8
v11 v9
v10 pH
Since the tortoise has moved back by a distance dT , the new distance ∆2
between them is such that:
∆2 ≜ ∆1 + dT = 2dT = 2(n2 − n1 )D
Hence, we have:
∆2 = kD with k ∈ N∗
• Detection step: We make each animal move 1 step at a time, thus keeping the
distance ∆2 between them constant.
v6
v5 v7
pT
v1 v2 v3 v4 v8
pH
v11 v9
v10
Since ∆2 is a multiple of the length D of the cycle, the new meeting point
between the two animals will precisely be the start of the cycle.
Remark: This algorithm can be used to detect cycles, length of portions of linked
lists, among other use cases.
❒ Duplicate number – Given an array A = [a0 , ..., an ] of size n + 1 where each
element ai is in the range [[1, n]], suppose there exists exactly two elements ai1 , ai2
47
Super Study Guide Data Structures
such that ai1 = ai2 ≜ adup . All other elements are assumed to be distinct. The
goal of the problem is to find the duplicate value adup .
adup
a0 ... ai1 ... ai2 ... an
0 ... i1 ... i2 ... n
A naive approach would sort the array and scan the result to find the two consecutive
elements that are equal. This would take O(n log(n)) time.
However, a more efficient approach takes O(n) time and leverages Floyd’s algorithm.
Trick Suppose we represent the array by n + 1 distinct nodes that compose one
or more linked lists. Each element ai of the array A is seen as a node i that points
to node ai .
ai
... ... ⟺ i ai
i
0 a0
i1 adup i2
From the two points above, we deduce that the linked list starting from node 0
contains adup at the start of its cycle.
48
Super Study Guide Data Structures
Algorithm The duplicate number is found in O(n) time and O(1) space:
• Initialization: Think of array A in terms of linked list(s) using the trick above.
Note that no extra-space is needed. For a given element ai , the next element
can be accessed via aai .
0 2 3 4
2 4 3 5 3 1 ⟺
0 1 2 3 4 5
5 1
• Compute step: Apply Floyd’s algorithm from node 0, which takes O(n) time
and O(1) space.
0 2 3 4 0 2 3 4
...
5 1 5 1
• Final step: The start of the cycle is the duplicate number adup .
0 2 3 4
⟺ 2 4 3 5 3 1
0 1 2 3 4 5
5 1
• a value v
The first and last nodes of a doubly linked list are called the head and tail respec-
tively. The overall data structure is represented as follows:
49
Super Study Guide Data Structures
50
Super Study Guide Data Structures
e1 → e1
prev prev prev prev
NULL e1 e2 ... eC NULL ...
eC → eC
next next next next
• The doubly linked list provides an ordering of elements with respect to their
last use, with the most recently used ones being closer to the head.
• The hash table maps values to nodes and enables access times to existing
nodes in O(1).
...
ei → ei
prev prev prev prev
NULL e1 ... ei ...
next next next
...
• Update step: Convey the fact that the node is now the most recently used
element.
prev
prev prev prev prev
NULL e1 ... ei ...
next next next
next
51
Super Study Guide Data Structures
Insertion This operation is achieved in O(1) time and depends on the situation:
• Inserted element not in cache. When a new element e is inserted, we have the
following cases:
– If the cache is below capacity: Insert the new element at the front of the
doubly linked list and add it to the hash table.
prev prev prev
NULL e1 ... ek NULL ...
e → e
prev prev next next next
e
next
∗ Insertion step: Insert the new element at the front of the doubly
linked list and add it to the hash table.
prev prev
NULL e1 ... NULL ...
e → e
prev prev next next
e
next
52
Super Study Guide Graphs and Trees
SECTION 3
In this third section, we will go through the basics of graphs along with useful
graph traversal algorithms. Then, we will focus on both standard trees along with
trees that have special properties, such as binary search trees and tries.
3.1 Graphs
In this part, we will study the main notions of graphs and graph traversal algorithms
such as BFS, DFS and topological sorting. We will learn about the main shortest
paths algorithms such as Dijkstra’s, A⋆ , Bellman-Ford, and Floyd-Warshall algo-
rithms.
VA VB
VD VC
53
Super Study Guide Graphs and Trees
Adjacency
Collection of unordered lists where each VA → {VB, VD} VB → Ø
VC → {VA} VD → Ø
entry Vi maps to all the nodes Vj such
list
that Eij exists in the graph.
❒ Degree – A node can have the following characteristics based on its adjacent
edges:
Number of connected
In-Degree
inbound edges
Directed
Number of connected
Out-Degree
outbound edges
Remark: Nodes of even (resp. odd) degree are called even (resp. odd) nodes.
Indeed, every edge links two nodes, therefore responsible for increasing the sum of
degrees by 2 (one per node). We have the following formula:
X
deg(v) = 2|E|
v∈V
54
Super Study Guide Graphs and Trees
Remark: A consequence of the above formula is that there is an even number of odd
nodes.
Acyclic Cyclic
Does not contain a cycle Contains at least a cycle
55
Super Study Guide Graphs and Trees
1 VA
2 VB 3 VC 4 VD
5 VE 6 VF 7 VG
The graph traversal is performed in O(|V | + |E|) time and O(|V |) space:
– Queue q that keeps track of the next nodes to potentially visit. In the
beginning, the first node is the only element inside.
– Hash set hv that keeps track of visited nodes. It is initially empty.
VA
VB VC VD
VA
VE VF VG q hv
VA
VB VC VD VA
VB VC VD
VE VF VG q hv
• Final step: Repeat the update step until the queue gets empty. The set hv
represents the set of visited nodes.
56
Super Study Guide Graphs and Trees
VA
VB VC VD VA VB VC VD
VE VF VG
VE VF VG q hv
1 VA
6 VB 3 VC 2 VD
7 VE 5 VF 4 VG
The graph traversal is performed in O(|V | + |E|) time and O(|V |) space:
– Stack s that keeps track of the next nodes to potentially visit. In the
beginning, the first node is the only element inside.
– Hash set hv that keeps track of visited nodes. It is initially empty.
VA
VB VC VD
VA
VE VF VG s hv
57
Super Study Guide Graphs and Trees
VA
VD
VB VC VD VC VA
VB
VE VF VG s hv
• Final step: Repeat the update step until the stack gets empty. The set hv
represents the set of visited nodes.
VA
VB VC VD VA VD VC VG
VF VB VE
VE VF VG s hv
❒ Graph traversal summary – The table below highlights the main differences
between BFS and DFS:
Breadth-First Search (BFS) Depth-First Search (DFS)
Mindset Level by level Depth first
Possible - Iteration using a stack
Iteration using a queue
approaches - Recursion
1 1
Illustration 2 3 4 6 3 2
5 6 7 7 5 4
Trick Perform a BFS starting from each unvisited land cell of the grid to explore
the associated island and skip cells that have already been visited.
58
Super Study Guide Graphs and Trees
Algorithm The entire grid is explored in O(m × n) time and O(m × n) space as
follows:
• Explore step: For each cell of the grid, we have the following situations:
– Water not visited: Skip this cell as it is not part of any island.
X X X X X 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0
0 0 0 0 0 1 1 1 1 1 1 1 1 0 0 1 0 0 0 0
0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0
0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0
0 0 0 0 0 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0
0 1 0 0 0 0 1 1 1 1 1 1 1 1 0 0 0 0 0 0
0 0 1 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 0 0
– Land not visited: Perform a BFS that starts from that cell and visits all
neighboring land cells. This process uses a temporary queue q to track
nodes to visit, and marks visited nodes by changing their values to X.
X X X X X X X X X X X X 0 0 0 0 0 0 0 0
0 0 0 0 0 X X X X X X X X 0 0 X 0 0 0 0
0 0 0 0 0 X X X X X X X X X X 0 0 0 0 0
0 0 0 0 0 X X X X X X X X X X 0 0 0 0 0
0 0 0 0 0 X X X X X X X X X 0 0 0 0 0 0
0 1 0 0 0 0 X X X X X X X X 0 0 0 0 0 0
0 0 1 0 0 0 0 0 X X X X X 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 X 0 0 1 1 0 0
X X X X X X X X X X X X X X X X X X X X
X X X X X X X X X X X X X X X X X X X X
X X X X X X X X X X X X X X X X X X X X
X X X X X X X X X X X X X X X X X X X X
X X X X X X X X X X X X X X X X X X X X
X X X X X X X X X X X X X X X X X X X X
X X X X X X X X X X X X X X X X X X X X
X X X X X X X X X X X X X X X X X X X X
59
Super Study Guide Graphs and Trees
We note that:
• Using an iterative graph traversal algorithm is helpful in preventing stack
overflow in case the exploration of an island leads to too many stack calls.
• It would have been equally possible to make island explorations using DFS.
❒ Robot room cleaner – The goal of this classic problem is to clean a room
composed of n cells using a robot cleaner that can only perform the 3 following
actions:
The configuration of the room is not known by the robot but is assumed to be of
finite size. It has obstacles along the way that the robot needs to avoid. We assume
that the robot initially starts in a cell with no obstacle.
Trick
• Perform a DFS using an exploration strategy that consists of trying all possible
directions and that backtracks if an obstacle is found.
?
• Rotating to the right can be done by keeping track of the direction (dx , dy )
that the robot is pointing towards.
60
Super Study Guide Graphs and Trees
y⃗
x⃗
dx x ⃗ + dy y ⃗ dy x ⃗ − dx y ⃗
Algorithm The robot cleans the room in O(n) time and O(n) space as follows:
A hash set is used to keep track of the coordinates of the cells visited by the
robot.
• Explore step: This is the main step where the robot cleans the cell and moves
to an unvisited cell.
– When it does not have any more new cells to visit, the robot backtracks
by turning right twice, moving and turning right twice again to be point-
ing back to the initial direction.
61
Super Study Guide Graphs and Trees
• Final step: When the robot cleaned all cells, it backtracks to its original cell.
VA VB VC VA VB VD
VB
topsort
⟺ VC VA or
VD VC VA VD VB
VD VC
We note that:
• Initialization: Initialize an empty array that keeps track of the final ordering
of vertices.
• Compute step: Repeat the following procedure until all nodes are visited:
62
Super Study Guide Graphs and Trees
VA VB
VC
VD VC
VA VB
VC
VD VC
• Final step: When all nodes are visited, the array contains a valid ordering of
vertices.
VA VB
VC VA VB VD
VD VC
❒ Weighted graph – A weighted graph is a graph where each edge Eij has an
associated weight wij .
Remark: When the weight is interpreted as a distance, it is often noted di,j .
The algorithm finds a solution in O(|E| log(|V |)) time and O(|V |) space:
• Initialization:
63
Super Study Guide Graphs and Trees
di,j VS VA VB VC VD VE
VS 2 VB 1 VD
VS 0 +∞
5 1 5 2
3
2 VS → VS VA → ? VB → ?
VA VC VE hp
VC → ? VD → ? VE → ?
• Compute step: Repeat the following procedure until all nodes are visited:
– Pick the unvisited node i with the lowest distance dVS , i and visit it.
di,j VS VA VB VC VD VE
VS 2 VB 1 VD
VS 0 +∞
5 1 5 2
3
2 VS → VS VA → ? VB → ?
VA VC VE hp
VC → ? VD → ? VE → ?
– Look at the unvisited nodes j that have an edge coming from the newly-
visited node i.
∗ Case dVS , j > dVS , i + di,j : This means that the distance of the path
from node VS to j via node i is smaller than the current distance.
We update the corresponding distance:
di,j VS VA VB VC VD VE
VS 2 VB 1 VD
VS 0 5 2 +∞
5 1 5 2
3
2 VS → VS VA → VS VB → VS
VA VC VE hp
VC → ? VD → ? VE → ?
• Final step:
– The final distances are those of the shortest paths between VS and each
of the other nodes j ∈ V of the graph.
– The associated shortest paths can be reconstructed using hp .
64
Super Study Guide Graphs and Trees
di,j VS VA VB VC VD VE
VS 2 VB 1 VD
VS 0 5 2 5 3 7
5 1 5 2
3
2 VS → VS VA → VS VB → VS
VA VC VE hp
VC → VD VD → VB VE → VC
Remark: Given that choices are based on local optima, Dijkstra’s algorithm does not
guarantee a correct solution when there are negative edge weights. Indeed, if such
an edge were to be found "too late" in the exploration process, the algorithm may
visit a node earlier than desired and produce a suboptimal path.
To better direct the search towards the target node, it modifies the distance used
to select the next node by using a heuristic hj, VT that approximates the remaining
distance between unvisited nodes j and the target node VT .
dVS,j hj,VT
VS j VT
⋆
The distance used to determine the next node j to visit is given by d A :
⋆
dVAS , j = dVS , j + hj, VT
• Initialization:
65
Super Study Guide Graphs and Trees
di,j VS VA VB VC VD VE
VS 2 VB 1 VD
VS 0 +∞
5 1 5 2
3
2 VS → VS VA → ? VB → ?
VA VC VE hp
VC → ? VD → ? VE → ?
• Compute step: Repeat the following procedure until there is no more updates,
which is at most |V | − 1 times:
For each node i of the graph, look at its neighbors j and update the distance
dVS , j between the source node VS and the node j depending on the following
situations:
– Case dVS , j > dVS , i + di,j : This means that the distance of the path from
node VS to j via node i is smaller than the current distance. We update
the corresponding distance:
di,j VS VA VB VC VD VE
VS 2 VB 1 VD
VS 0 5 2 7 3 9
5 1 5 2
3
2 VS → VS VA → VS VB → VS
VA VC VE hp
VC → VB VD → VB VE → VC
• Final step:
– The final distances are those of the shortest paths between VS and each
of the other nodes j ∈ V of the graph.
– The associated shortest paths can be reconstructed using hp .
di,j VS VA VB VC VD VE
VS 2 VB 1 VD
VS 0 5 2 5 3 7
5 1 5 2
3
2 VS → VS VA → VS VB → VS
VA VC VE hp
VC → VD VD → VB VE → VC
66
Super Study Guide Graphs and Trees
di,j VS VA VB VC VD VE
VS 0 5 2 +∞
2 1
VS VB VD VA +∞ 0 +∞
5 1 5 2
3
VB +∞ 0 5 1 +∞
VA VC 2 VE VC 1 +∞ 0 +∞ 2
VD +∞ 2 0 +∞
VE +∞ 3 0
• Update step: For each intermediary node k ∈ V , loop through start nodes
i ∈ V and end nodes j ∈ V :
– Case di,j > di,k + dk,j : This means that the distance from node i to j
via node k is smaller than the current distance. We make the following
update:
di,j ←− di,k + dk,j
di,j VS VA VB VC VD VE
VS 0 5 2 +∞
2 1
VS VB VD VA +∞ 0 +∞
5 1 5 2
3
VB +∞ 0 5 1 +∞
VA VC 2 VE VC 1 +∞ 0 +∞ 2
VD 3 +∞ 2 0 +∞
VE +∞ 3 0
67
Super Study Guide Graphs and Trees
– Case di,j ⩽ di,k + dk,j : This means that the proposed path does not
improve the current distance. We do not make any updates.
di,j VS VA VB VC VD VE
VS 0 5 2 +∞
2 1
VS VB VD VA +∞ 0 +∞
5 1 5 2
3
VB +∞ 0 5 1 +∞
VA VC 2 VE VC 1 +∞ +∞ 0 +∞ 2
VD 3 +∞ +∞ 2 0 +∞
VE +∞ 3 0
• Final step: The resulting matrix gives the shortest path between each pair of
nodes i and j.
di,j VS VA VB VC VD VE
VS 0 5 2 5 3 7
2 1
VS VB VD VA +∞ 0 +∞
5 1 5 2 VB 4 9 0 3 1 5
3
VA VC 2 VE VC 1 6 3 0 4 2
VD 3 8 5 2 0 4
VE 6 11 8 5 3 0
This algorithm allows for negatively weighted edges but it does not allow for negative
cycles like the other shortest path algorithms.
Remark: In the same fashion as in Dijkstra’s and Bellman-Ford’s algorithms, we
can reconstruct the resulting shortest paths by keeping track of nodes during updates.
❒ Shortest path summary – The differences between the main shortest paths
algorithms are summarized below:
68
Super Study Guide Graphs and Trees
Remark: In general, a connected graph can have more than one spanning tree.
❒ Cayley’s formula – A complete undirected graph with N vertices has N N −2
different spanning trees.
2 1
5 1 5 2
3
2
For example, the MST shown above has a total edge weight of 11.
Remark: A graph can have more than one MST.
❒ Prim’s algorithm – Prim’s algorithm aims at finding an MST of a weighted
connected undirected graph.
69
Super Study Guide Graphs and Trees
A solution is found in O(|E| log(|V |)) time and O(|V | + |E|) space with an imple-
mentation that uses a min-heap:
2 1
5 1 5 2
3
2
• Compute step: Repeat the following procedure until all nodes are visited:
– Pick an unvisited node that connects one of the visited notes with the
smallest edge weight.
– Visit it.
2 1
5 1 5 2
3
2
• Final step: The resulting MST is composed of the edges that were selected
by the algorithm.
2 1
5 1 5 2
3
2
• Compute step: Repeat the following procedure until all nodes are visited:
– Pick an edge that has the smallest weight such that it does not connect
two already-visited nodes.
70
Super Study Guide Graphs and Trees
2 1
5 1 5 2
3
2
5 1 5 2
3
2
• Final step: The resulting MST is composed of the edges that were selected
by the algorithm.
2 1
5 1 5 2
3
2
3.2.2 Components
❒ Connected components – In a given undirected graph, a connected component
is a maximal connected subgraph.
71
Super Study Guide Graphs and Trees
– An initially empty hash set hv that keeps track of the visited nodes.
– An initially empty stack s that keeps track of the order at which the
nodes have had their neighbors visited.
VC VD VH
VA
VB VE VF VG
s hv
s hv
– Once a visited node has all its neighbors visited, push the visited node
into the stack s.
VE VB
VC VD VH
VC VD
VA
VB VE VF VG
VD
s hv
72
Super Study Guide Graphs and Trees
VE VE VB
VC VD VH
VB VC VD
VA
VC
VB VE VF VG
VD
s hv
The same process is successively initiated on the remaining unvisited nodes of the
graph until all nodes are visited.
VG
VC VD VH VH
VE VB
VA VF
VC VD
VG VA
VB VE VF VA VG
...
VD VH VF
s hv
VG
VC VD VH VH
VA VF
VG VA
VB VE VF
...
VD
s hv
73
Super Study Guide Graphs and Trees
VG
VC VD VH VH
VG
VA VF
VG VA
VB VE VF
...
VD
s hv
Perform a DFS starting from that node. For the nodes that are being
explored:
∗ DFS node is not visited: Add it to hv and add the node to the hash
set identifying the strongly connected component.
∗ DFS node is visited: Skip it.
– Node is visited: Skip it.
VC VD VH
VG VH
VA
VF VA
VG VA
VB VE VF
...
VD
s hv
VC VD VH
VG VH
VA
VF VA
VB VE VF VG VE VD
VC VB
s hv
3.3 Trees
In this part, we will learn about trees and focus on various types of binary trees,
including heaps and binary search trees. Then, we will focus on applications of
generalized types of trees such as tries.
74
Super Study Guide Graphs and Trees
• Incoming edge:
– There is exactly one node that has no incoming edge, and that node is
called the root.
– Each of the other nodes has exactly one incoming edge.
• Outgoing edge: A node cannot have an outgoing edge that points to itself.
o o o
o o o
i i l
l l l
Here are some examples of graphs that are not trees, along with the associated
reason:
75
Super Study Guide Graphs and Trees
❒ Depth – The depth of a given node n, noted depth(n), is the number of edges
from the root r of the tree to node n.
0 r
1
76
Super Study Guide Graphs and Trees
❒ Height – The height of a given node n, noted height(n), is the number of edges
of the longest path from node n to the deepest leaf.
3
2
1
0 l l l
l l
s
s q
p q p
Remark: For the purposes of this definition, we may consider a node to be a de-
scendant of itself.
❒ Node distance – The distance d(p,q) between two nodes p and q is the minimum
number of edges between these two nodes. If we note s ≜ LCA(p,q), we have the
following formula:
0 r
1 s
2 p
3 q
77
Super Study Guide Graphs and Trees
VA
VB VC VA VB VC VD VE
VD VE
Remark: There can be more than one way to serialize a tree. The figure above
illustrates a level-by-level approach in the case of binary trees.
❒ Definition – A binary tree is a tree where each node has at most 2 children.
The table below sums up possible properties of binary trees:
Remark: A perfect tree has exactly 2h+1 − 1 nodes with h the height of the tree.
❒ Diameter – The diameter of a binary tree is the longest distance between any
of its two nodes.
78
Super Study Guide Graphs and Trees
❒ Main tree traversals – The table below summarizes the 3 main ways to recur-
sively traverse a binary tree:
Remark: Pre-order, in-order and post-order traversals are all variations of DFS.
79
Super Study Guide Graphs and Trees
3 2 3 2
1 2 1 1 0 2 1 1
1 0 1 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0
3.3.3 Heaps
❒ Definition – A heap is a complete binary tree with an additional property that
makes it either a min-heap or a max-heap:
VA
VB VC ⟺ VA VB VC VD VE VF
0 1 2 3 4 5
VD VE VF
80
Super Study Guide Graphs and Trees
The following parts will use max-heaps for consistency. However, one could also
use min-heaps to obtain the same results.
❒ Heapify up – Let’s suppose that the last child is potentially not fulfilling the
properties of the max-heap. The heapify up operation, also called bubble up, aims
at finding the correct place for this node.
24
10 12
5 3 15
?
• Update step: While the node’s parent has a lower value than the node’s, swap
the two.
24 24
10 12 10 15
...
5 3 15 5 3 12
❒ Heapify down – Let’s suppose that the root is potentially not fulfilling the
properties of the max-heap. The heapify down operation, also called bubble down,
aims at finding the correct place for this node.
2
10 ? 15
5 3 12
• Update step: While the highest-value child of the node has a higher value than
the node’s, swap the two.
81
Super Study Guide Graphs and Trees
2 15
10 15 ... 10 12
5 3 12 5 3 2
• Maximum value: Look at the value corresponding to the root of the heap. It
takes O(1) time.
24 24
10 ? 15 10 15
5 3 12 5 3 12
• Any other value: Traverse the tree, given that we have no information as to
where each node is. It takes O(n) time.
24 24
10 15 10 15
5 3 12 5 3 12
?
24 ?
10 15 20
5 3 12
82
Super Study Guide Graphs and Trees
24 24
10 15 heapify up 10 20
...
5 3 12 20 5 3 12 15
?
• Swap step: Swap node with last child and remove new last child.
24
10 15
5 3 12
• Heapify step: Move the newly-placed node to its final position depending on
the situation:
12 15
10 15 heapify 10 12
? ...
5 3 5 3
❒ k smallest elements – Given array A = [a0 , ..., an−1 ], the goal is to find the k
smallest elements, with k ⩽ n.
83
Super Study Guide Graphs and Trees
Algorithm The solution is found in O(n log(k)) time and O(k) space:
• Initialization:
– Set an empty max-heap that will keep track of the k smallest elements.
– Add the first k elements into the max-heap.
At any given point, we note M the value of the root of the max-heap, i.e. its
maximum value.
⩽M ⩽M
• Update step: We need to see whether any of the remaining elements ai∈[[k,n−1]]
is potentially part of the k smallest elements:
⩽M ⩽M
⩽M ⩽M
84
Super Study Guide Graphs and Trees
• Its value is greater than any node values in its left subtree
• Its value is less than any node values in its right subtree
<v >v
12 24
5 24 15
3 10 15 12
Illustration
10
❒ Operations – The main operations that can be performed on a BST each have
a time complexity of O(h) and are explained below:
Search Starting from the root, compare the node value with the value v we want
to search for.
12
5 24
3 10 15
?
85
Super Study Guide Graphs and Trees
10 < 12 12
5 24 5 24
<
10
3 10 15 3 10 15
12
5 24
3 10 15
Continue this process recursively until either finding the target node or hitting the
end of the tree, in which case v is not present in the BST.
12 ?
5 24 11
3 10 15
In order to do that, we check if there is already a node with the same value in O(h)
time. At the end of the search, there are two possible situations:
• Node is found: There is nothing else to do, since the element that we want to
insert is already in the tree.
• Node is not found: By construction, the last node seen during the search is a
leaf.
– If v is higher than the value of the last node, add it as its right child.
– If v is lower than the value of the last node, add it as its left child.
86
Super Study Guide Graphs and Trees
11 < 12 12
5 24 5 24
<
11
3 10 15 3 10 15
<
11
11
12 12
5 24 5 24
3 10 15 3 15
12 12
5 24 5 15
3 10 15 3 10
• 2 children: Replace the node with the max of its left subtree, which is also
equal to the in-order predecessor.
12 10
5 24 5 24
3 10 15 3 15
❒ Definition – An N -ary tree is a tree where each node has at most N children.
87
Super Study Guide Graphs and Trees
V1 ... VN
❒ Trie – A trie, also called prefix tree, is an N -ary tree that allows for efficient
storing and fast retrieval of words. The path from the root node to another given
node forms a word that is deduced by concatenating the characters along that path.
c1 c2 ... ck
• A character c.
• A hash table h = {(c1 , n1 ), ..., (ck , nk )} gathering the children of that node,
where ci is the child’s character and ni its associated node.
• A boolean that indicates whether the word formed by the path leading to that
node is in the dictionary a or not a .
b k p
e i o
a d n e
r d m
Note that even though all characters of a word are present in the correct order, it
does not necessarily mean that the word itself is present in the trie. In order to
88
Super Study Guide Graphs and Trees
ensure that the word is indeed in the trie, an important additional condition is for
the boolean of the last character to be set to a .
The operations that can be done with a trie are described below:
• Character step: Starting from the root, traverse the trie by checking one
character of w at a time. If any character is missing, it means that the word
is not present.
• Word step: Check whether the boolean of the node corresponding to the last
character is set to a . If it is not, it means that the word is not present.
b k p
b e a r e i o
a d n e
r d m
• Character step: Starting from the root, traverse the trie one character of
w at a time. Starting from the first missing character (if applicable), add
corresponding nodes until completing the word.
b k p
p o e t e i o
a d n e
r d m t
89
Super Study Guide Graphs and Trees
b k p
p o e m e i o
a d n e
r d m
Remark: Use cases of a trie include searching for a word starting with a given prefix.
v
O(log n)
<v >v
We arbitrarily focus on the three following nodes, which are useful during a tree
rotation:
90
Super Study Guide Graphs and Trees
Left rotation
Before After
Pivot Right child of root New root
Original root Root Left child of pivot
Moving child Left child of pivot Right child of original root
r p
p r
Illustration <r >p
c >p <r c
Right rotation
Before After
Pivot Left child of root New root
Original root Root Right child of pivot
Moving child Right child of pivot Left child of original root
r p
p r
Illustration >r <p
<p c c >r
Remark: Performing tree rotations does not change the in-order traversal of the
BST.
• Coloring:
91
Super Study Guide Graphs and Trees
• Count: All paths from a given node to any of the leaves have the same number
of black nodes, often referred to as the black-height.
12
5 24
3 10 15 NIL
|hL − hR | ⩽ 1
hL TL TR hR
❒ Range query – Given an array A = [a0 , ..., an−1 ], a range query qf (A, i, j) is
defined as being an operation f over elements ai , ..., aj .
92
Super Study Guide Graphs and Trees
5 4 9 0 1 3 5 4 9 0 1 3 5 4 9 0 1 3
Illustration i j i j i j
0 9 5+4+9+0+1+3=22
In the case of a range sum query qS , we note that for 0 ⩽ i ⩽ j < n, we have:
This means that if we can compute qS (A, 0, i) for all i, then we can handle any
range sum query on array A.
❒ Prefix sum array – A prefix sum array P = [p0 , ..., pn−1 ] is an array that
computes range sum queries on A in O(1) time.
a0 ai−1 ai
p0 pi−1 pi
0 i−1 i
Access The computation of a range sum query takes O(1) time as follows:
qS (A, i, j) = pj − pi−1
93
Super Study Guide Graphs and Trees
∀k ∈ [[i, n − 1]], pk ←− pk + x
+x +x +x +x +x
❒ Binary indexed tree – A binary indexed tree B, also called Fenwick tree, is a
data structure that efficiently handles range sum queries on array A = [a0 , ..., an−1 ].
Updates in A are reflected in B in O(log(n)) time.
a0 a1 a2 a3 a4 b1 b2 b4
0 1 2 3 4
b3 b5
Rescaling A prerequisite is to map the input array indices from [[0, n − 1]] to
[[1,n]]. We rewrite A as A′ = [0, a′1 , ..., a′n ] with a′i = ai−1 .
a0 a1 ... ai−1 ... an−1
Intuition Elements of the binary indexed tree leverage the binary representations
of their indices i ∈ [[1,n]]. In order to do that, it relies on the concept of lowest set
bit l. We note li the lowest set bit of i.
2li
(i)10 ( . . . 1 0 . . . 0 )2
(1)10 (0 0 1)
12
(2)10 (0 1 0)2
(3)10 (0 1 11)2
(4)10 1 0 0)2
(1
(5)10 (1 0 11)2
94
Super Study Guide Graphs and Trees
A given element bi of the binary indexed tree B is said to be responsible for the
range of indices [[i − 2li + 1, i]] and we have:
i
X
bi = a′j
j=i−2li +1
Array Tree
- Ranges of indices are next to nodes
Ranges of indices are indicated
- Parent of bi is bi−2li
by horizontal lines
- Tree depth is O(log(n))
{0}
b4 b0
b2
{1} {1,2} {1,2,3,4}
b0 b1 b3 b5 b1 b2 b4
0 a′1 a′2 a′3 a′4 a′5
{3} {5}
0 1 2 3 4 5 b3 b5
b4
b2
b0 b1 b3 b5
0 a′1 a′2 a′3 a′4 a′5
0 1 2 3 4 5
• Compute step: For each index i ∈ [[1,n]], the relevant partial sums are propa-
gated through the overlapping ranges of responsible values:
95
Super Study Guide Graphs and Trees
b4
b2
b0 b1 b3 b5
0 a′1 a′2 a′3 a′4 a′5
0 1 2 3 4 5
{0}
b4 b0
b2 {1} {1,2} {1,2,3,4}
b0 b1 b3 b5 ⟺ b1 b2 b4
0 a′1 a′2 a′3 a′4 a′5
{3} {5}
0 1 2 3 4 5 b3 b5
S ←− S + bj
{0}
b4 b0
b2 {1} {1,2} {1,2,3,4}
b0 b1 b3 b5 ⟺ b1 b2 b4
0 a′1 a′2 a′3 a′4 a′5
{3} {5}
0 1 2 3 4 5 b3 b5
bi ←− bi + x
96
Super Study Guide Graphs and Trees
b4
b2
b0 b1 b3 b5
0 a′1 a′2 a′3 a′4 a′5
0 1 2 3 4 5
• Compute step: We continue the same update process with j ←− i + 2li , and
then by further increments of 2lj until reaching the maximum index of the
array:
bj ←− bj + x
b4
b2
b0 b1 b3 b5
0 a′1 a′2 a′3 a′4 a′5
0 1 2 3 4 5
s[[0,4]]
s[[0,2]] s[[3,4]]
a0 a1 a2 a3 a4
0 1 2 3 4 s[[0,1]] s{2} s{3} s{4}
s{0} s{1}
Intuition Each node of the segment tree is responsible for a range [[l, r]] of indices
of the original array. In particular, each node s[[l,r]] falls into one of the following
categories:
l+r
• Case l ̸= r: It has two children. By noting m = :
2
– Its left child is responsible for indices [[l, m]].
– Its right child is responsible for indices [[m + 1, r]].
97
Super Study Guide Graphs and Trees
s[[l,r]]
l m r s[[l,m]] s[[m+1,r]]
s{l}
l r
The root node is responsible for indices [[0, n − 1]]. The segment tree has a height
of O(log(n)).
s[[0,n−1]]
O(log n)
s[[0,m]] s[[m+1,n−1]]
For illustrative purposes, we are going to focus on range sum queries so that oper-
ations of the segment tree can be easily compared to those of the data structures we
saw previously.
Construction The segment tree is built recursively in O(n) time, where the value
of each node depends on where it is located:
• Node has no children: Its value is the corresponding value in the original
array:
s{e} = ae
⟺ s{e}
e
• Node has two children: Its value is the sum of the values of its child nodes:
98
Super Study Guide Graphs and Trees
s[[l,r]]
s[[l,m]] s[[m+1,r]]
Given an interval I = [[i, j]], we start from the root node of the segment tree and
make recursive calls.
sIN
sIN
I
sIN
sIN
I
sIN sIN
1 2
sIN
sIN
I
99
Super Study Guide Graphs and Trees
We start at the root node and make recursive calls. We check whether i is in the
interval IN = [[l, r]] associated with the node:
sIN
sIN
i m
sIN sIN
1 2
sIN
m i
sIN
sIN sIN
1 2
• Case i ∈
/ IN : The node does not contain information related to ai .
sIN
sIN
i
100
Super Study Guide Sorting and Search
SECTION 4
In this last section, we will go through the most common sorting algorithms,
along with useful search techniques that are used to solve many problems.
In this part, we will learn about common sorting algorithms such as bubble sort,
insertion sort, selection sort, merge sort, heap sort and quick sort. We will also
study sorting algorithms such as counting sort and radix sort that are particularly
efficient in some special cases.
In this part, arrays of n elements are visually represented as histograms. The height
of each bar represents the value of the associated element in the array.
❒ Sorting algorithm – A sorting algorithm takes an unsorted array A = [a0 , ..., an−1 ]
as input and returns a sorted array A′ = [a′0 , ..., a′n−1 ] as output. A′ is a permutation
of A such that:
a′0 ⩽ a′1 ⩽ ... ⩽ a′n−1
A A′
101
Super Study Guide Sorting and Search
stable
? unstable
Remark: Examples of stable sorting algorithms include merge sort and insertion
sort.
The next sections will go through each sorting algorithm into more detail.
❒ Bubble sort – Bubble sort is a stable sorting algorithm that has a time com-
plexity of O(n2 ) and a space complexity of O(1).
Intuition Compare consecutive elements and swap them if they are not in the
correct order.
Algorithm
• Compute step: Starting from the beginning of the array, compare the element
al at position i with the element ar at position i + 1.
102
Super Study Guide Sorting and Search
...
• Repeat step: Repeat the compute step until no swap can be done.
We note that:
• At the end of the k th pass, the last k elements of the array are guaranteed to
be in their final positions.
– If the input array is already sorted, then the algorithm finishes in 1 pass.
– If the input array is reverse sorted, then the algorithm finishes in n − 1
passes.
❒ Insertion sort – Insertion sort is a stable sorting algorithm that has a time
complexity of O(n2 ) and a space complexity of O(1).
Algorithm
103
Super Study Guide Sorting and Search
• Repeat step: Repeat the compute step until the sorted subarray reaches a size
of n.
...
❒ Selection sort – Selection sort is a stable sorting algorithm that has a time
complexity of O(n2 ) and a space complexity of O(1).
104
Super Study Guide Sorting and Search
Algorithm
• Initialization:
min
• Compute step: Starting from i = 1, we want to insert a new element into the
sorted subarray of size i.
– Find the minimum value amin among the remaining elements at positions
i, ..., n − 1 of the array. By construction, amin is greater or equal than all
elements of the current sorted subarray.
– Swap amin with the element at position i of the array.
min
• Repeat step: Repeat the compute step until the sorted subarray reaches a size
of n.
...
min min
105
Super Study Guide Sorting and Search
Remark: Insertion and selection sort are very similar in that they build the sorted
array from scratch and add elements one at a time.
❒ Cycle sort – Cycle sort is an unstable sorting algorithm that has a time com-
plexity of O(n2 ) and a space complexity of O(1).
Intuition Determine the index of the final position of each element in the sorted
array.
Algorithm
Ni = # {k, ak < ai }
Ni
The final position of ai is at index Ni , since we know that there are Ni smaller
elements than ai in the final sorted array.
106
Super Study Guide Sorting and Search
Ni
Keep moving elements using this logic until getting back to position i.
• Repeat step: Repeat the compute step until reaching the end of the array. We
can see cycles being formed from the way elements are moved.
Remark: This algorithm sorts the array with a minimum amount of rewrites since
it only moves elements to their final positions.
❒ Basic sorting algorithms summary – The table below summarizes the main
basic sorting algorithms:
Time
Type Space Stability
Best Average Worst
Bubble sort O(n) O(n2 ) O(n2 ) O(1) Yes
107
Super Study Guide Sorting and Search
Algorithm
• Divide step: Divide the array into as many subarrays as there are elements.
Each resulting subarray has only one element and can be considered as sorted.
• Conquer step: For each pair of sorted subarrays, build a sorted array by
merging them using the two-pointer technique.
Repeat the process until all subarrays are merged into one final sorted array.
108
Super Study Guide Sorting and Search
Algorithm
• Initialization: Build a max-heap from the unsorted array in O(n) time. This
is done by recursively swapping each parent with their child of highest value.
7 4 7 7
6 6
1 3 4 6
4 4 ⟺ 4 4 ⟺
3 3
2 2
1 2 7 4 6 1 2 1 4 3
• Repeat step: Repeat the compute step until the subarray reaches a size of n.
7 2 7
6 6
1 3
4 4 ⟺ 4 4
3 3
2 2
1 4 4 6 7 1
109
Super Study Guide Sorting and Search
❒ Quick sort – Quick sort is an unstable sorting algorithm that has a time com-
plexity of O(n2 ) and a space complexity of O(n).
Intuition Recursively choose an element of the array to be the pivot, and put:
Algorithm
• Compute step:
l r l r l r
110
Super Study Guide Sorting and Search
– Pivot inclusion step: The final position of the left pointer l represents
the position in the array where its left elements are all array elements
that are smaller than or equal to the pivot. We swap the pivot with the
element from position l.
At the end of this step, the pivot is at its correct and final position.
• Recursion step: The compute step is run recursively on the resulting subarrays
on the left and right of the pivot. They are then merged back together to form
the final sorted array.
...
We note that the runtime of the algorithm is sensitive to the choice of the pivot and
the patterns found in the input array. In general, a time complexity of O(n log(n))
and a space complexity of O(log(n)) can be obtained when the pivot is a good
approximation of the median of the array.
111
Super Study Guide Sorting and Search
Time
Type Space Stability
Best Average Worst
Merge sort O(n log(n)) O(n log(n)) O(n log(n)) O(n) Yes
Heap sort O(n log(n)) O(n log(n)) O(n log(n)) O(1) No
2
Quick sort O(n log(n)) O(n log(n)) O(n ) O(n) No
Intuition Determine the final position of each element by counting the number
of times its associated value appears in the array.
Algorithm
• Number of occurrences: This step takes O(n + k) time and O(k) space.
7
6
4 4
3 +1
2 0 1 2 3 4 5 6 7
1
...
7
6
4 4
3 +1
2 0 1 2 3 4 5 6 7
1
112
Super Study Guide Sorting and Search
Each count cv represents the number of times the value v ∈ {0, ..., k}
appears in array A.
– Cumulative step: Compute the cumulative sum of array C = [c0 , .., ck ]
and move each resulting element to the right. This operation is done
in-place and takes O(k) time and O(1) space.
0 1 1 1 2 0 1 1 0 0 1 2 3 5 5 6
0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7
For each value v present in A, the associated element cv in the resulting array
C indicates its starting index i in the final sorted array A′ .
• Construction of the sorted array: This step takes O(n) time and O(n) space.
7
6
4 4 4
3 0 0 1 2 3 5 5 6
2 0 1 2 3 4 5 6 7
1
3
7 7
6 6
4 4 4 4
3 0 1 2 3 5 5 5 7 3
2 0 1 2 3 4 5 6 7 2
1 1
5
❒ Radix sort – Radix sort is a stable sorting algorithm that is well suited for
integer arrays where elements are written with a limited number of digits d, each
digit being in the range [[0, k]]. This algorithm has a time complexity of O(d(n + k))
and a space complexity of O(n + k).
113
Super Study Guide Sorting and Search
Intuition Successively sort elements based on their digits using counting sort.
X... X X X . .. X X X .. .X X
...
Algorithm
• Compute step: Perform counting sort based on the rightmost digit. This step
takes O(n + k) time and O(n + k) space.
72
67 72 67
40 40 40 40
31 27 31 27
15 15
• Repeat step: Repeat the compute step on the remaining digits until reaching
the leftmost digit.
72 67 72
67
40 40 40 40
31 27 27 31
15 15
The trick of this algorithm lies in the stability property of counting sort: the relative
ordering based on a given (weak) digit helps in breaking ties of later (stronger) digits.
❒ Special sorting algorithms summary – The table below summarizes the main
special sorting algorithms:
Time
Type Space Stability
Best Average Worst
Counting sort O(n + k) O(n + k) O(n + k) O(n + k) Yes
Radix sort O(d(n + k)) O(d(n + k)) O(d(n + k)) O(n + k) Yes
114
Super Study Guide Sorting and Search
• Scan step: Check whether the constraint of the problem is verified on the first
element of the array.
• Final step: Stop the search when the desired condition is satisfied.
To illustrate this concept, suppose A = [a0 , ..., an−1 ] a sorted array. Given S, the
goal is to find two elements ai , aj such that ai + aj = S.
ai + aj = S
• Initialization: Initialize pointers l and r that are set to be the first and last
element of the array respectively.
115
Super Study Guide Sorting and Search
l r
l l r
l r r
• Final step: Repeat the compute step until one of the following situations
happens:
lr
l r
❒ Trapping rain water – A classic problem that uses the two-pointer technique
is the trapping rain water problem. Given a histogram H = [h0 , ..., hn−1 ] that
represents the height hi ⩾ 0 of each building i, the goal is to find the total volume
V of water that can be trapped between the buildings.
hi
i
116
Super Study Guide Sorting and Search
Tricks
vi
hlM hi hrM
i
• Initialization:
hlM hrM
l r
• Compute step: Update the values of the maximum heights of left and right
buildings with the current values:
hM M
l = max(hl , hl ) and hM M
r = max(hr , hr )
– If hM M
l < hr , then the left side is the limiting side, so:
117
Super Study Guide Sorting and Search
l ←− l + 1
vl
hlM hrM
l l r
– Conversely, if hM M
l ⩾ hr , then:
r ←− r − 1
hlM hrM
l r r
• Final step: The algorithm finishes when the two pointers meet, in which case
the amount of water is given by V .
hlM hrM
l r
118
Super Study Guide Sorting and Search
• Initialization: Choose the lower bound l and the upper bound r corresponding
to the desired search space. Here, l = 0 and r = n − 1.
l r
The above formula is written in a way that avoids integer overflow in case l
and r are too large.
am
l r
– Case t < am : Shrink the search space to the left side by updating r to
r = m − 1.
am
l r r
– Case t > am : Shrink the search space to the right side by updating l to
l = m + 1.
am
l l r
– Case t = am : We found a solution.
am
l r
• Final step: Repeat the compute step until one of the following situations
happens:
119
Super Study Guide Sorting and Search
A naive approach would combine the two sorted arrays in O(n1 +n2 ) time, and then
select the median of the resulting array. However, a O(log(n1 )) approach exists that
uses binary search in a clever way.
Tricks
Then we can deduce the median by looking at the maximum of the left par-
tition and the minimum of the right partition.
• If we know how many elements from A are in the left partition, then we can
deduce how many elements from B are in there too.
n1 + n2 n1 + n2
≈ ≈
2 2
• By noting Aleft , Aright , Bleft , Bright the left and right partitions from A and B,
we know the partitioning is done correctly when the following conditions are
satisfied:
C1
a0 ... ai−1 ai ... an1−1 b0 ... bj−1 bj ... bn2−1
C2
120
Super Study Guide Sorting and Search
The binary search solution is based on the position of the partitioning within the
array A. The solution is found in O(log(n1 )) time:
n1 + n2 + 1
⌊ ⌋
n2 − +i
n1 − i 2
a0 ... ai−1 ai ... an1−1 b0 ... bj−1 bj ... bn2−1
n1 + n2 + 1
⌊ ⌋
i
−i
2
– (C1) is not satisfied: We restrict the search space to the left side of i.
a0 ... al−1 al ... ai−1 ai ... ar−1 ar ... an1−1
l r r
– (C2) is not satisfied: We restrict the search space to the right side of i.
a0 ... al−1 al ... ai−1 ai ... ar−1 ar ... an1−1
l l r
– Both (C1) and (C2) are satisfied: The partitioning is done correctly.
a0 ... al−1 al ... ai−1 ai ... ar−1 ar ... an1−1
l i r
121
Super Study Guide Sorting and Search
• Final result: Once the correct partitioning is found, we deduce the value of
the median. We note:
We have:
a p a p y p a p y p a p p s p a p y p a p p
0 i n−1 0 m−1
We note si:i+m−1 the substring of s of length m that starts from si and finishes at
si+m−1 .
❒ Brute-force approach – The brute-force approach consists of checking whether
pattern p matches any possible substring of s of size m in O(n × m) time and O(1)
space.
Starting from i = 0, compare the substring that starts from index i with pattern p.
a p a p y p a p y p a p p s
p a p y p a p p
• One of the characters is not matching: The pattern does not match the asso-
ciated substring.
a p a p y p a p y p a p p s
p a p y p a p p
122
Super Study Guide Sorting and Search
a p a p y p a p y p a p p s
p a p y p a p p
Intuition Avoid starting the pattern search from scratch after a failed match.
a p a p y p a p y p a p p s
p a p y p a p p
a p a p y p a p y p a p p s
p a p p p a p y
We do so by checking whether the pattern matched so far has a prefix that is also
a suffix:
p a p y p a p p 0 0 1 0 1 2 3 1
0 1 2 3 4 5 6 7
• Step 2 : Find the pattern in the original string using the prefix-suffix array.
a p a p y p a p y p a p p s
p a p y p a p p
0 1 2 3 4 5 6 7
0 0 1 0 1 2 3 1
• It is the length of the largest prefix that is also a suffix in the substring p0 ...pi .
123
Super Study Guide Sorting and Search
p a p y p a p p
0 0 1 0 1 2 3 1
• It is also the index in the pattern that is right after the prefix.
p a p y p a p p
0 1 2 3 4 5 6 7
0 0 1 0 1 2 3 1
p a p y p a p p
j i
0
• Compute step: Determine the length of the largest prefix that is also a suffix
for substring p0 ...pi :
– Case pi ̸= pj and j = 0: The substring does not have a prefix that is
also a suffix, so we set qi to 0:
qi ←− 0
We increment i by 1.
p a p y p a p p
j i
0 0
qi ←− j + 1
124
Super Study Guide Sorting and Search
p a p y p a p p
j i
0 0 1
– Case pi ̸= pj and j > 0: The substring may have a smaller prefix that is
also a suffix:
j ←− qj−1
We repeat this process until we fall in one of the two previous cases.
p a p y p a p p
j j i
0 0 1 0
a p a p y p a p y p a p p s
i
p a p y p a p p
j
0 0 1 0 1 2 3 1
125
Super Study Guide Sorting and Search
a p a p y p a p y p a p p s
i
p a p y p a p p
3 j
0 0 1 0 1 2 3 1
• Final step: The algorithm stops when one of the following situations happens:
a p a p y p a p y p a p p s
i
p a p y p a p p
j
0 0 1 0 1 2 3 1
Intuition Compute the hash value of the pattern along with those of the sub-
strings of the same length. If the hash values are equal, confirm whether the un-
derlying strings indeed match.
The trick is to choose a hash function h that can deduce h(si+1:i+m ) from the hash
value of the previous substring h(si:i+m−1 ) in O(1) time via a known function f :
126
Super Study Guide Sorting and Search
a p a p y p a p y p a p p s
Algorithm
m
p a p y p a p p
h(p)
• Compute step: Starting from index i = 0, we use the hashing trick to compute
the hash value of substring si:i+m−1 that starts from i and that has the same
length as p in O(1) time.
a p a p y p a p y p a p p s p a p y p a p p
i
h(si:i+m−1) = h(p)
127
Super Study Guide Notations
Notations
This page summarizes the meaning behind the conventions taken in this book.
Coloring
• ❒ Blue: definitions, notations
• ❒ Red: theorems, properties, algorithms, techniques, summaries
• ❒ Green: operations, applications, classic problems
• Purple: remarks
Symbols
Notation Meaning Illustration
≜ by definition, equal to 3! ≜ 3 × 2 × 1
←− is assigned the value of i ←− 1
≈ approximatively equal to e ≈ 2.72
⊆ is a subset of {2} ⊆ {2, 3}
∅ empty set {2} ∩ {3} = ∅
#S or |S| number of elements in S #{4, 6, 7} = 3
=⇒ implies that x > 0 =⇒ x2 > 0
⇐⇒ equivalent to x = 0 ⇐⇒ x2 = 0
x1 +...+xn
a
b estimate of a µ
b= n
Indexing
• Indices of arrays are chosen to start from 0 to align with the convention taken
by most programming languages.
• The numbering of elements in other data structures, such as stacks, queues,
linked lists, hash sets and hash tables, start from 1.
128
Super Study Guide References
References
Adelson-Velsky G., Landis E. (1962) An algorithm for the organization of informa-
tion, Proceedings of the USSR Academy of Sciences, Vol. 146, pages 263-266
Bayer R. (1972) Symmetric binary B-Trees: Data structure and maintenance algo-
rithms, Acta Informatica, Vol. 1, pages 290-306
Cayley A. (1889) A theorem on trees, Quarterly Journal of Pure and Applied Math-
ematics, Vol. 23, pages 376-378
Fenwick P. M. (1994) A new data structure for cumulative frequency tables, Software:
Practice and Experience, Vol. 24, No. 3, pages 327-336
Floyd R. W. (1962) Algorithm 97: Shortest path, Communications of the ACM, Vol.
5, No. 6, page 345
Floyd R.W. (1967) Nondeterministic Algorithms, Journal of the ACM, Vol. 14, No.
4, pages 636-644
Fredkin E. (1960) Trie Memory, Communications of the ACM, Vol. 3, No. 9, pages
490-499
129
Super Study Guide References
plications to Searching and Sorting, Journal of the ACM, Vol. 9, No. 1, pages
13-28
Knuth D. (1998) Sorting and searching, The Art of Computer Programming, Volume
3
Kruskal J. B. (1956) On the shortest spanning subtree of a graph and the traveling
salesman problem, Proceedings of the American Mathematical Society, Vol. 7, No.
1, pages 48-50
Merigoux D. (2013) Pense-bête : analyse
Prim R. C. (1957) Shortest Connection Networks And Some Generalizations, The
Bell System Technical Journal, pages 1389-1401
Windley P. F. (1960) Trees, Forests and Rearranging, The Computer Journal, Vol.
3, No. 2, pages 84-88
130
Index
A C
A⋆ algorithm . . . . . . . . . . . . . . . . . . . . . . . 65 call stack . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
addressing Cayley’s formula . . . . . . . . . . . . . . . . . . . . 69
closed . . . . . . . . . . . . . . . . . . . . . . . . . . 39 coin change problem . . . . . . . . . . . . . . . . 23
open . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 collision
adjacency chaining . . . . . . . . . . . . . . . . . . . . . . . 39
list . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 double hashing . . . . . . . . . . . . . . . . . 39
matrix . . . . . . . . . . . . . . . . . . . . . . . . . 54 linear probing . . . . . . . . . . . . . . . . . . 39
algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 quadratic probing . . . . . . . . . . . . . . 39
array . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 combination . . . . . . . . . . . . . . . . . . . . . . . . 10
AVL tree . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
space . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
B time . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
backtracking . . . . . . . . . . . . . . . . . . . . . . . . . 3 complexity notation
Bellman-Ford algorithm . . . . . . . . . . . . . 65 Ω(f ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
binary digit . . . . . . . . . . . . . . . . . . . . . . . . . 14 O(f ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
binary indexed tree . . . . . . . . . . . . . . . . . 94 θ(f ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6
binary number . . . . . . . . . . . . . . . . . . . . . . 14 o(f ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6
binary search . . . . . . . . . . . . . . . . . . . . . . 119 connected component . . . . . . . . . . . . . . . 71
binary search tree . . . . . . . . . . . . . . . . . . . 84 count-min sketch . . . . . . . . . . . . . . . . . . . . 42
binary tree . . . . . . . . . . . . . . . . . . . . . . . . . . 78 counting sort . . . . . . . . . . . . . . . . . . . . . . 112
balanced . . . . . . . . . . . . . . . . . . . . . . . 79 cycle sort . . . . . . . . . . . . . . . . . . . . . . . . . . 106
complete . . . . . . . . . . . . . . . . . . . . . . . 78
diameter . . . . . . . . . . . . . . . . . . . . . . . 78 D
full . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 DAG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
perfect . . . . . . . . . . . . . . . . . . . . . . . . . 78 daily temperatures . . . . . . . . . . . . . . . . . . 33
binomial degree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
coefficient . . . . . . . . . . . . . . . . . . . . . . . 9 even . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
theorem . . . . . . . . . . . . . . . . . . . . . . . .10 in- . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
bit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 odd . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
highest set . . . . . . . . . . . . . . . . . . . . . 14 out- . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
least significant . . . . . . . . . . . . . . . . 14 depth-first search . . . . . . . . . . . . . . . . . . . 57
left shift . . . . . . . . . . . . . . . . . . . . . . . 15 Dijkstra’s algorithm . . . . . . . . . . . . . . . . . 63
lowest set . . . . . . . . . . . . . . . . . . . . . . 14 divide and conquer . . . . . . . . . . . . . . . . . . . 4
most significant . . . . . . . . . . . . . . . . 14 dynamic programming . . . . . . . . . . . . . . . 4
right shift . . . . . . . . . . . . . . . . . . . . . . 15 bottom-up . . . . . . . . . . . . . . . . . . . . . . 5
bloom filter . . . . . . . . . . . . . . . . . . . . . . . . . 40 top-down . . . . . . . . . . . . . . . . . . . . . . . .5
breadth-first search . . . . . . . . . . . . . . . . . 55
brute-force . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 E
bubble edge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
down . . . . . . . . . . . . . . . . . . . . . . . . . . 81 weight . . . . . . . . . . . . . . . . . . . . . . . . . 69
up . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 Euclidean division . . . . . . . . . . . . . . . . . . 12
bubble sort . . . . . . . . . . . . . . . . . . . . . . . . 102 dividend . . . . . . . . . . . . . . . . . . . . . . . 13
131
Super Study Guide Index
divisor . . . . . . . . . . . . . . . . . . . . . . . . . 13 while . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
quotient . . . . . . . . . . . . . . . . . . . . . . . 13
remainder . . . . . . . . . . . . . . . . . . . . . . 13 K
k largest elements . . . . . . . . . . . . . . . . . . . 84
F k smallest elements . . . . . . . . . . . . . . . . . 83
factorial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9 Kadane’s algorithm . . . . . . . . . . . . . . . . . 27
Fenwick tree . . . . . . . . . . . . . . . . . . . . . . . . 94
Kahn’s algorithm . . . . . . . . . . . . . . . . . . . 62
Fibonacci sequence . . . . . . . . . . . . . . . . . .13
knapsack problem . . . . . . . . . . . . . . . . . . . 19
First In First Out . . . . . . . . . . . . . . . . . . . 35
Knuth-Morris-Pratt algorithm . . . . . 123
Floyd’s algorithm . . . . . . . . . . . . . . . . . . . 45
Kosaraju’s algorithm . . . . . . . . . . . . . . . . 72
Floyd-Warshall algorithm . . . . . . . . . . . 67
Kruskal’s algorithm . . . . . . . . . . . . . . . . . 70
G
graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 L
acyclic . . . . . . . . . . . . . . . . . . . . . . . . . 55 Last In First Out . . . . . . . . . . . . . . . . . . . 32
bipartite . . . . . . . . . . . . . . . . . . . . . . . 55 lexicographic ordering . . . . . . . . . . . . . . . 11
complete . . . . . . . . . . . . . . . . . . . . . . . 55 linear search . . . . . . . . . . . . . . . . . . . . . . . 115
connected . . . . . . . . . . . . . . . . . . . . . . 55 linked list
cyclic . . . . . . . . . . . . . . . . . . . . . . . . . . 55 doubly . . . . . . . . . . . . . . . . . . . . . . . . . 49
directed . . . . . . . . . . . . . . . . . . . . . . . . 53 singly . . . . . . . . . . . . . . . . . . . . . . . . . . 45
undirected . . . . . . . . . . . . . . . . . . . . . 53 load factor . . . . . . . . . . . . . . . . . . . . . . . . . . 39
greedy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 lowest common ancestor . . . . . . . . . . . . 77
LRU cache . . . . . . . . . . . . . . . . . . . . . . . . . . 50
H
handshaking lemma . . . . . . . . . . . . . . . . . 54 M
hash master theorem . . . . . . . . . . . . . . . . . . . . . . 8
collision . . . . . . . . . . . . . . . . . . . . . . . . 38 maximum subarray sum . . . . . . . . . . . . .28
function . . . . . . . . . . . . . . . . . . . . . . . .37 mean . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
map . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 median . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
median of sorted arrays . . . . . . . . . . . . 120
table . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
memoization . . . . . . . . . . . . . . . . . . . . . . . . . 2
value . . . . . . . . . . . . . . . . . . . . . . . . . . .37
merge intervals problem . . . . . . . . . . . . . 28
heap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
merge sort . . . . . . . . . . . . . . . . . . . . . . . . . 107
max- . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
minimum spanning tree . . . . . . . . . . . . . 69
min- . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
heap sort . . . . . . . . . . . . . . . . . . . . . . . . . . 108
modulo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
heapify
monotonic stack . . . . . . . . . . . . . . . . . . . . 34
down . . . . . . . . . . . . . . . . . . . . . . . . . . 81
up . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
Held-Karp algorithm . . . . . . . . . . . . . . . . 17 N
N -ary tree . . . . . . . . . . . . . . . . . . . . . . . . . . 87
I N -Queens problem . . . . . . . . . . . . . . . . . . 22
insertion sort . . . . . . . . . . . . . . . . . . . . . . 103 node
integer overflow . . . . . . . . . . . . . . . . . . . . . 16 depth . . . . . . . . . . . . . . . . . . . . . . . . . . 76
integer representation . . . . . . . . . . . . . . . 13 distance . . . . . . . . . . . . . . . . . . . . . . . .77
iteration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 height . . . . . . . . . . . . . . . . . . . . . . . . . .77
for . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 number of islands . . . . . . . . . . . . . . . . . . . 58
132
Super Study Guide Index
133
Super Study Guide Authors
You can find a more detailed individual bio for each of them below.
134