Algorithm complexity — Algorithm Design 1 documentation
Algorithm complexity — Algorithm Design 1 documentation
Algorithm complexity
Introductory examples
We intuitively understand that the amount of operations required by an algorithm usually
depends on the size of input data. Let consider a simple example with an algorithm, often
called linear-search, that tries to find the first cell in an array of size n that contains a given
element:
This algorithm takes as input an array of size len(length) and an element. It browses the
array elements from the first to the last one. If, at iteration i it encounters the value
element at the position i of array , it stops and returns the index value i . If we reach the
end of the for loop, meaning that the element is not in the array, it returns the value -1. Some
examples of executions of linear-search:
i1 = find([3,5,6], 3) # i1 = 0
i2 = find([3,5,6], 6) # i2 = 2
i3 = find([3,5,6], 8) # i3 = -1
Pseudocode
Algorithms are usually written in pseudocode. In this course, we will follow the Python
syntax for pseudocode. Pseudocode algorithms should be easy to implement in common
programming languages such as C, C++, Java, and so we will forbid ourselves to use
Python specificities that are not common to other languages (for example dictionaries, list
comprehensions, high level functions). We will also avoid using specific libraries (like
numpy, pandas, etc.) that are not part of the standard libraries of most programming
languages.
In practice we will only use simple Python constructs like loops, conditions, and basic
data structures (numbers, lists, strings).
We can see that in the worst case, if the searched element is not in the array, the algorithm
has to browse the whole array, and it performs n comparisons. In other words, in the worst
case, the number of operations performed by this algorithm is proportional to the size of the
input. On the other hand, we might get lucky: if the searched element is located in the first
cell of the array, we will perform a single comparison.
So, in general, the number of operations depends on the size of the input n . However, the
number of operations for two different inputs with the same size can be different: we can
thus ask ourselves what is the number of operations in the worst case ? In the average case ?
In the best case ? We will often concentrate on the worst case as:
The worst case gives a guaranteed upper bound on the runtime for any input.
The average case analysis requires assuming a probability distribution of the input which
is difficult to test in practice.
The best case is often meaningless. For example in linear search, the best case (the
searched element is in the first cell of the array) requires a simple comparison, but that
doesn’t help us much to understand the algorithm.
Warning
In the following, time complexities are given by default in the worst case.
Amortized complexity
In the chapter about data-structures, we will see a 4-th possibility called amortized
complexity, where we consider the average number of operations when an algorithm is
called several times on the same data-structure.
When we perform the analysis of an algorithm we usually have to do some assumptions and
simplifications on the cost of basic operations. We usually use the RAM (Random Access
Machine) model where all the basic operations take a bounded time (this time might be
different for different operations, but it does not depend on the size of the input):
simple arithmetic and comparison operations on integers and floating point numbers
(+, -, /, *, %, <, <=, ==, …)
random memory access (we can store and read an integer or a float anywhere in
memory)
control (if, function call, return…)
Math reminder
Here is a quick reminder on some useful mathematical properties. In the following a, b,
and c are positive numbers, and n is a positive integer:
b c b+c
a a = a
1 −b
b
= a
a
b c bc
(a ) = a
0.5
√a = a
Asymptotic complexity
Complexity analysis is at the heart of algorithm design. In the previous section, we have seen
on a simple example that the number of operations done by an algorithm is a function f of
the parameter n denoting the size of the input problem. Usually, this function f is increasing:
the larger the problem, the more operations we have to do to solve it. In complexity analysis,
we want to study at which rate f growths when n increases. In other words, we are not
interested about small inputs or small variations in the number of operations: we want to
know how the algorithm behaves when the data become large. This leads us to asymptotic
analysis, where one wants to compare the growth rates of different functions when we look
toward infinity.
In practice, the growth rate of this function f will determine the maximal size of the problem
that can be solved by the algorithm. Assume that we are working on a single core processor
that is able to execute 100 GOPS (1010 operations per second). We have 5 different
algorithms which respectively need to execute n, n log(n), n2, n3 and en operations, with n
the size of the problem. We wonder what is the maximal size n of the problem we can solve
with each of these algorithms with a given amount of time.
For example, consider an algorithm that requires n2 operations to solve a problem of size n,
and we have 1 day (86400 seconds) of computing time. During this time, we can execute
10
86400 ∗ 10 operations on our processor. Because we need n2 operations to solve our
problem of size n, we can process a problem of size √86400 ∗ 1010 ≈ 2.9 ∗ 107 during this
time. The order of this number is 107, which means that with our n2 algorithm, in 1 day we
can solve a problem whose size is comprised between 10 000 000 and 100 000 000
elements. This is of course a very rough estimation, but it gives us a good insight of which
data we will be able to process with algorithms of different runtime complexities.
The following table shows the size order of the maximal problem that can be solved for
different algorithm complexities and different computation times.
10 11 13 14 16
n 10 (10G) 10 (100G) 10 (10T) 10 (100T) 10 (10P)
8 10 12 13 14
n log(n) 10 (100M) 10 (10G) 10 (1T) 10 (10T) 10 (100T)
2 5 5 6 7 8
n 10 (100k) 10 (100k) 10 (1M) 10 (10M) 10 (100M)
3 3 3 4 4 5
n 10 (1k) 10 (1k) 10 (10k) 10 (10k) 10 (100k)
n
e 10 10 10 10 10
You can see that algorithms requiring n or n log(n) operations scale very well and will be
able to work with very large data (more than billions of elements). As soon as we reach
polynomial complexity with an order strictly greater than 1, the maximal size of the problem
we can process will quickly drop, and we can see that adding more computation time does
not really help. In the last case, with a number of operations growing as an exponential
function of the size of the problem, only very small problems (usually less than 100 elements)
can be solved, whatever the computation power and time we can put on it.
Thus, in asymptotic analysis, we need to be able to compare the growth rate of functions.
This is not a completely trivial question because we want:
to ignore the behavior of the function for small values. For example, in Figure Fig. 1, the
function i(x) = xx is bellow the function g(x) = x
2
for x . However, i then growths
< 2
and l(x) = x3 + x: the second function growths faster than the first one because it has
an additional linear term x. However, this linear term will become insignificant compared
to x3 when x tends toward ∞.
50
50
40
40
30
30
20
20
10
10
–0.2
–0.2 0
0 0.2
0.2 0.4
0.4 0.6
0.6 0.8
0.8 1
1 1.2
1.2 1.4
1.4 1.6
1.6 1.8
1.8 2
2 2.2
2.2 2.4
2.4 2.6
2.6 2.8
2.8 3
3 3.2
3.2 3.4
3.4 3.6
3.6 3.8
3.8 4
4 4.2
4.2 4.4
4.4 4.6
4.6 4.8
4.8 5
5
Fig. 1 Plot of common functions with different growth rates. The left part of the graph (close to 0)
is irrelevant in asymptotic analysis: we only want to know how fast the curves increase on the
right.
The O (pronounced “big o”, like in orange), o (pronounced “small o”, like in orange), and Θ
(pronounced “big Theta”), which are part of the Bachmann–Landau notations, are the most
classical tools used in algorithm complexity analysis to express those ideas. These notations
have quite simple intuitive interpretations. Let f and g be two functions:
Notation Interpretation
The growth rate of the function f is smaller than or equal to the growth rate
of the function g. Somehow, it is like writing, f ≲ g (for the
f = O(g)
growth rate ordering). We also say that g is an upper bound of f .
The growth rate of the function f is strictly smaller than the growth rate of
the function g. Somehow, it is like writing, f < g (for the growth rate
f = o(g) ordering). We also say that g is a strict upper bound of f , or that
g dominates f .
The growth rate of the function f is equal to the growth rate of the
function g. Somehow, it is like writing, f ∼ g (for the growth
f = Θ(g)
rate ordering). We also say that g is a tight bound of f .
Formal definition of O
Let f and g be two functions, we have f = O(g) if
+
∃n0, M ∈ R , s.t. ∀n ≥ n0, 0 ≤ f (n) ≤ M ⋅ g(n)
Which reads: “the function f is big O of g if, there exist two positive real numbers n0 and
M such that, for all number n greater than or equal to n0, the value f (n) must be smaller
of equal than the value g(n) multiplied by the constant M ”. Thus, we ignore everything
that happens before n0, and we can multiply g by an arbitrarily large constant M such
that after n0, f (n) is smaller than M ⋅ g(n).
M·g(n)
f(n)
n0 n
Fig. 2 f = O(g) : after n0, f (n) is smaller than M ⋅ g(n)
Formal definition of Θ
Let f and g be two functions, we have f = Θ(g) if
+
∃n0, M1, M2 ∈ R , s.t. ∀n ≥ n0, M1 ⋅ g(n) ≤ f (n) ≤ M2 ⋅ g(n)
In other words, g is a tight bound of f if we can find two constants M1 and M2 such that,
toward infinity, M1 ⋅ g(x) is smaller than or equal to f and M2 ⋅ g(x) is smaller than or
equal to f .
M2·g(n)
f(n)
M1·g(n)
n0 n
Fig. 3 f = Θ(g) : after n0, f (n) is smaller than M2 ⋅ g(n) and greater than M1 ⋅ g(n)
Formal definition of o
Let f and g be two functions, we have f = o(g) if
f (n)
lim = 0
n→∞ g(n)
Note that these definitions also impose that the functions are asymptotically positive: for any
x ≥ x0, f (x) is greater or equal than 0. In the following, we assume that all the considered
function are asymptotically positive, this is not a restriction in our case as, in algorithm
design, we are interested in counting instructions which always leads to (asymptotically)
positive functions.
Note
Two other notations are also commonly found in books: Ω (pronounced “big Omega”) and
ω (pronounced “small Omega”) which are the opposite of the O and o notations.
f = Ω(g) means that g is a lower bound of f and f = ω(g) means that g is a strict lower
bound of f
Those notations are rarely seen in practice (for example in documentations) because they
don’t correspond to very useful results for developers: we want guaranteed maximum or
expected runtime, not a minimum runtime. They are however useful in some theoretical
context (for example in complexity proves).
Example 1: Let us show, in a first simple example how this definition allows us to get rid of
multiplicative and additive constants. Let f (x) = 3x + 6, we will show that f (x) = O(x).
We want to find x0, and M such that, for any x ≥ x0:
3x + 6 ≤ M x ⇔
6
3 + ≤ M
x
At this point we have 2 unknowns x0 and M , but only 1 equation. Let arbitrarily fix x0 = 2
which implies 6 ≤ M , thus we can fix M = 6 which gives us: for all x greater or equal than
2, we have 3x + 6 ≤ 6x, which can be visually verified in Figure Fig. 4. Thus, we have shown
that 3x + 6 = O(x) : this obviously naturally generalizes to any linear function.
14
14
12
12
10
10
8
8
6
6
4
4
2
2
–3.5
–3.5 –3
–3 –2.5
–2.5 –2
–2 –1.5
–1.5 –1
–1 –0.5
–0.5 0
0 0.5
0.5 1
1 1.5
1.5 2
2 2.5
2.5 3
3 3.5
3.5 4
4 4.5
4.5 5
5 5.5
5.5
–2
–2
Fig. 4 The two curves 3x + 6 and 6x crosses at x0 = 2 , after this 6x is always above 3x + 6.
Example 2: Now, let show in a more complex examples that these definitions also enables to
get rid of non-dominant terms. Let f (x) = 3x2 + 2x + 4, we will show that f (x) = Θ(x2).
We want to find x0, M1, and M2 such that, for any x ≥ x0 :
2 2 2
M1x ≤ 3x + 2x + 4 ≤ M2x ⇔
2 4
M1 ≤ 3 + + ≤ M2
2
x x
At this point we have 3 unknowns x0, M1, and M2 but only 1 equation. Let arbitrarily fix
x0 = 2 which implies 5 ≤ M2, thus we can fix M = 5. For M1, any value lower than 3 will
do the trick for all positive x, let’s take M1 = 2, which gives us: for all x greater or equal than
2, we have 2x2 ≤ 3x2 + 2x + 4 ≤ 5x2, which can be visually verified in Figure Fig. 5. Thus
we have shown that 3x2 + 2x + 4 = Θ(x2) : this obviously naturally generalizes to any
quadratic function.
40
40
35
35
30
30
25
25
20
20
15
15
10
10
5
5
–0.5
–0.5 0
0 0.5
0.5 1
1 1.5
1.5 2
2 2.5
2.5 3
3 3.5
3.5 4
4 4.5
4.5 5
5 5.5
5.5
Fig. 5 The three curves 2x2, 3x2 + 2x + 4, and 5x2. 2x2 is always bellow 3x2 + 2x + 4, while
2 2 2 2
5x crosses 3x + 2x + 4 at x0 = 2, after this 5x is always above 3x + 2x + 4. The function
2 2
3x + 2x + 4 is thus tightly bounded by x .
Note
In practice, the runtime complexities of algorithms are often given by default as big O
(upper bound) even if the bound is tight (Θ).
Memory complexity
Up to this point, we talked about runtime complexity where we are interested in the
amount of operations we need to do to solve a problem of a given size. Another useful
information is the amount of memory we need, this is referred as memory complexity. It
works like runtime complexity but instead of counting operations, we count the maximum
amount of memory we need at any one time in the algorithm. Note that the runtime
complexity is always as least as large as the memory complexity.
2 3 n n
1 < log(n) < √ n < n < n log(n) < n < n < 2 < e < n!
Here are some names which are commonly used for the growth rate of common functions:
Notation Name
Θ(log(n)) logarithmic
Θ(n) linear
Θ(n )
2
quadratic
c
Θ(n ), c ≥ 1 polynomial
n
Θ(c ), c > 1 exponential
Θ(n!) factorial
Multiplication by a constant
Outer multiplication by a constant can always be ignored:
+
∀k ∈ R , k ⋅ f = Θ(f )
In words, f is a tight bound of k ⋅ f for any positive constant k. For example, 3 ⋅ n2 = Θ(n )
2
.
In other words, it means that any constant multiplicative factor k can be ignored in
complexity analysis. In particular ∀k ∈ R+, k = Θ(1): this means that any constant number
of operations is equivalent to a single operation in complexity analysis.
a = 2 * 3 # Θ(1)
b = a - 5 # Θ(1)
This algorithm has 2 operations each one has constant runtime complexity; its overall
runtime complexity is thus Θ(1) + Θ(1) = 2 ⋅ Θ(1) = Θ(1) .
The same thing happens when we have a for loop with a fixed number of iterations:
# a is an array of size n
for i in range(10):
process(a) # Θ(f(n))
The number of operations if 10 ⋅ Θ(f (n)), but this factor 10 is irrelevant in complexity
analysis and can be ignored. The time complexity of this portion of code is thus in Θ(f (n)).
Of course, when the number of iterations in the for loop depends on the input size n, then
the situation is different:
# a is an array of size n
for i in range(len(a)):
a[i] = a[i] * 2 # Θ(1)
The for loop has a single operation with a constant runtime complexity. The overall runtime
complexity of the algorithm is
Θ(1) + Θ(1) + … = n ⋅ Θ(1) = Θ(n) . Here the previous
n times
Additions
In additions, only the dominant term matters
f + g = Θ(f ) if g = o(f )
This situation happens in algorithmic whenever you have a sequence of function calls with
different time complexities:
# a is an array of size n
process_1(a) # given complexity: Θ(n^3)
process_2(a) # given complexity: Θ(n^2)
The total number of operations performed by this portion of code is equal to the sum of the
operations performed by process_1 and process_2 . However, the call to process_2 is
irrelevant in complexity analysis because its complexity is dominated by the one process_1 .
3
The time complexity of this portion of code is thus in Θ(n ).
Polynomial
This is a direct implication of the two rules above: the term of highest rank is a tight
bound of any polynomial expression. For any an, … , a0 ∈ R, an > 0:
n n
anx + … + a1x + a0 = Θ(x )
For example 0.1n4 − 28n2 + 1018x = Θ(n )
4
.
Multiplication by a Function
When we multiply functions, we multiply their growth rates. If e = Θ(f ) and g = Θ(h) ,
then we have e ⋅ g = Θ(f ⋅ h) .
This situation happens in algorithmic whenever you have loops with a number of iterations
that depends on the size of the input:
# a is an array of size n
for i in range(1, len(a) - 1): # n-2 iterations
process(a, i) # given complexity: Θ(n^3)
The total number of operations performed by this portion of code is equal to the product of
n − 2 and the number of operations performed by process . The fact that there is n − 2 and
not exactly n iterations is irrelevant for the complexity analysis. The time complexity of this
portion of code is obtained by multiplying the two time complexities Θ(n4).
Several variables
Algorithms sometimes have several inputs of different sizes (for examples 2 arrays of sizes n1
and n2) or an input with several dimensions (for example an image with n lines and m
columns). There are several possible extensions of the asymptotic notations to this case (see
the Wikipedia page for example), we won’t go into the formal details, but we will sometimes
use notations such as Θ(n2 + m) meaning that the runtime complexity growth quadratically
with respect to n but only linearly with respect to m.
A B o (A < B) O (A ≲ B) Θ (A ∼ B)
n n V ? F V ? F V ?
n n
2
V ? F V ? F V ?
√n log(n) V ? F V ? F V ?
A B o (A < B) O (A ≲ B) Θ (A ∼ B)
n√n n log(n) V ? F V ? F V ?
n√n n
2
V ? F V ? F V ?
n
2
n(n − 1)/2 V ? F V ? F V ?
n log(n) n
2
+ n log(n) V ? F V ? F V ?
2
n
2
n+4
V ? F V ? F V ?
2
n
3
n
V ? F V ? F V ?
2
3n+2
3
n
V ? F V ? F V ?
Corriger >
In the following, whenever we ask for a runtime complexity, we ask you to express the
complexity in the simplest form possible. For example, if an algorithm has a quadratic
complexity, the expected answer is Θ(n2) and not Θ(n2 + 1), even if both are correct.
def algo1(n):
for i in range(1000):
f1(n)
f2(n)
def algo3(n):
for i in range(log(n)):
f2(n)
for i in range(sqrt(n)):
f1(n)
def algo4(n):
for i in range(n):
for j in range(n):
f1(n)
f2(n)
Corriger >
In a second use case, we are working with square upper triangular matrices of size
n × n, thus we know that all the values below the diagonal are 0 and can thus be
ignored (we don’t even need to store them), leading to an optimized addition algorithm:
def matrix_triup_add(mtu1, mtu2, mtu3):
for i in range(len(m1)): # number of lines n = len(m1)
for j in range(i, len(m1[i])): # number of columns m = len(m1[i]) - i
m3[i][j] = m1[i][j] + m2[i][j]
return m3
Corriger >
def algo_if(n):
for i in range(n):
if (...): # some condition evaluated in Θ(1)
instruction1 # Θ(1)
instruction2 # Θ(1)
else:
instruction3 # Θ(1)
instruction4 # Θ(1)
instruction5 # Θ(1)
This case is simple, because even if we have two branches, the complexity of each branch is
the same: both branch execute a bounded number of instructions with a constant runtime
complexity Θ(1), the runtime complexity of each branch is thus also Θ(1). Therefore, for the
runtime complexity analysis, the result of the condition does not matter, in all cases, the
runtime complexity of the instructions executed inside the for loop is Θ(1). The overall
runtime complexity is thus n ⋅ Θ(1) = Θ(n).
In more complex cases, the different branches of the condition don’t have the same runtime
complexity:
def algo_if2(n):
for i in range(n):
if (...): # some condition evaluated in Θ(1)
instruction1 # Θ(1)
instruction2 # Θ(1)
else:
fun(n) # Θ(n)
This becomes more complicated: if the condition is true then the branch we execute has a
constant runtime complexity Θ(1) and otherwise it has a linear runtime complexity Θ(n).
Sometimes, we are able to quantify how many times we will take each branch, allowing us to
perform a tight (Θ) analysis of the overall runtime complexity of the algorithm. However, this
is not always possible, and we then have to rely on an upper bound (O) analysis thanks to the
following observation: Assume that the first branch of the if statement runs in O(c1) and the
second branch runs in O(c2) for some functions c1 and c2, then the runtime complexity of
the whole if statement is bounded by O(max(c1, c2)).
According to this rule, we can say that the algorithm algo_if2 has a quadratic runtime
2
complexity O(n ) (this is the worst that can happen if the condition is always false).
However, we don’t know if this bound is tight: maybe the second branch of the if statement
is indeed nearly never taken and the real runtime complexity is smaller.
Moreover, cond(n) is a boolean function (it returns true or false) which runs in constant
time Θ(1).
def algo1(n):
if cond(n):
f1(n)
else:
f1(n)
f1(n)
V ? F
def algo2(n):
for i in range(n):
if cond(n):
f1(n)
else:
f2(n)
V ? F
def algo3(n):
for i in range(log(n)):
if cond(n):
f2(n)
elif cond(n*3+1):
f1(n)
f2(n)
else:
f2(n)
f1(n)
V ? F
def algo4(n):
for i in range(n):
if cond(n):
for j in range(n):
f1(n)
else:
f2(n)
V ? F
def algo5(n):
for i in range(n):
if cond(n):
for j in range(n):
f1(n)
else:
if cond(n/2):
f2(n)
else:
f3(n)
Runtime complexity of algo5(n) : O( ), can we say that the bound is tight?
V ? F
Corriger >
Exercises
Exercice 5 : Selection and insertion sort
The sorting problem is one of the most famous in algorithmic courses because it is easy
to understand, widely used in practice, and many algorithms with various complexities
exist for it. In the sorting problem, the input is an array a of numbers and the objective is
to find a new array r which is a permutation (a reordering) of the input array a such that
the element of the new array are sorted in non-decreasing order (
∀i, j, i < j ⇒ r[i] <= r[j])
A very simple algorithm, which is often used intuitively when we have for example to sort
a deck of card, is the selection sort algorithm. In this algorithm, the input array is divided
into two parts, a sorted one (at the beginning of the array) and an unsorted one (the
remaining part of the array). At the very beginning of the algorithm, the sorted part is
empty, and we repeat the following steps until the unsorted part is empty:
def selection_sort(array):
# i is the limit between the sorted and the unsorted part of the array
for i in range(len(array) - 1):
# find the minimum of a[i::array.length]
jmin = i
for j in range(i + 1, len(array)):
if array[j] < array[jmin]:
jmin = j
Another intuitive approach is the insertion sort. Here instead of looking for the minimum
element in the unsorted part, we just take the first one and search where to insert in the
sorted part of the array. The steps are as follows:
def insertion_sort(array):
# i is the limit between the sorted and the unsorted part of the array
for i in range(1, len(array)):
# j decreases until we find the good position for array[i], ie when array[j] <=
array[i]
j = i - 1
while j >= 0 and array[j] > array[j + 1]:
array[j], array[j + 1] = array[j + 1], array[j]
j = j - 1
Note that both algorithms do not allocate new memory, they directly sort the input array:
we say that they work in place.
For each of these two algorithms find the number of comparisons done and the runtime
complexity in the worst case and in the best case . Hint, you might consider 3 cases: 1)
where the input array is already sorted (ex. [1, 2, 3, 4] ), 2) where the input array is
sorted in reverse order (ex. [4, 3, 2, 1] ), and 3) where the input array is random (ex.
[2, 1, 4, 3] ).
selection_sort Θ( )
insertion_sort Θ( )
Corriger >
times = []
for n in [1e3, 1e4, 1e5, 1e6]: # problem sizes : 1000, 10000, 100000...
data = create_data(n) # create input of size n
t1 = time()
algorithm(data) # run the algorithm
t2 = time()
times.append(t2-t1) # store elapsed time
It is then possible to plot the experimental runtime against the problem size:
Fig. 6 Experimental runtime of 3 algorithms (orange, green, and blue curves).
In practice, the runtime curves are noisy, but we should be able to fit a curve that
corresponds to the theoretical complexity on it. A simple trick to find if a curve
corresponds to a given polynomial complexity Θ(nc) is to take the measured time t1 and
t2at sizes n and 2n (for some arbitrary size value n), then we should have t2/t1 ≈ 2c.
So for a linear complexity, we should have a ratio t2/t1 ≈ 2, for a quadratic complexity,
we should have a ratio t2/t1 ≈ 4, for a cubic complexity, we should have a ratio
t2/t1 ≈ 8 and so on.
Programming exercise 1 : Implement selection sort and verify that the experimental
runtime matches with the theoretical complexity in the python notebook.
Programming exercise 2 : Implement insertion sort and verify that the experimental
runtime matches with the theoretical complexity in the python notebook.
Corriger >
def alg_0(n):
return n + 1
and for any integer k greater than 0:
def alg_k(n):
res = n
for i in range(n):
res = alg_(k-1)(res) # note that the input of alg_(k-1) is the output of the previous
call to alg_(k-1)
return res
alg_0(n) : Θ( )
alg_1(n) : Θ( )
alg_2(n) : Θ( )
Corriger >