Thesis
Thesis
Department of Mathematics
Faculty of Science
Universidad de los Andes
May 29, 2024
Abstract
i
Contents
1 Introduction 2
3 Numerical Semigroups 12
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.2 Invariants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.3 Wilf’s Conjecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
5 Experiments 22
5.1 ER-type model experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
6 Results 32
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
6.2 Lower Bound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
6.3 Upper bound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
6.3.1 Proof of the upper bound . . . . . . . . . . . . . . . . . . . . . . . . . . 37
A Useful Bounds 42
1
Chapter 1
Introduction
The Probabilistic Method is a powerful tool, with applications in Combinatorics, Graph The-
ory, Number Theory and Computer Science. It is a nonconstructive method that proves the
existence of an object with a certain property, usually a graph, by showing that the probability
that a randomly chosen object has that property is greater than zero. In this thesis, we will
apply the probabilistic method to numerical semigroups.
A numerical semigroup is a subset of N that is closed under addition (Definition 3.1.1). These
objects are studied in the context of commutative algebra and algebraic geometry, and they have
applications in integer programming, coding theory and cryptography [1]. There are numerical
invariants that are used to study numerical semigroups, such as the embedding dimension,
the genus and the Frobenius number (Definitions 3.2.2, 3.2.6 and 3.2.4). For example, the
Frobenius number is defined as the maximum of the complement of the numerical semigroup
over the integers.
The Erdös-Rényi (ER) model is a commonly used model of random graphs, where each edge is
chosen with probability p, independently of the other edges (Definition 2.4.1). In contrast to
graphs, the study of random models of numerical semigroups requires the use of number theory
tools due to their algebraic nature. This thesis investigates the average behavior of random
numerical semigroup invariants using a probabilistic model similar to the ER model (Definition
4.2.1).
Our central result is Theorem 6.1.1, which is similar to Theorem 4.2.1, the main result of
• J. De Loera, C. O’Neill, and D. Wilburne, “Random numerical semigroups and a simplicial
complex of irreducible semigroups,” The Electronic Journal of Combinatorics, P4–37,
2018.
Theorem 4.2.1 describes the behavior of the expected embedding dimension, genus, and Frobe-
nius number of a random numerical semigroup, depending on the parameters of the model. It
gives an explicit bound of the expected value of these invariants. On the other hand, Theorem
6.1.1 describes the behavior of the invariants almost surely, that is, with probability that tends
to one as the parameters of the model converge to certain values. Our proof is more elementary
and provides asymptotically tighter bounds on the behaviour of these invariants.
We used experiments to study the behavior of ER-type random numerical semigroups, which
led to the proof of Theorem 6.1.1. For our experiments, we used numsgps-sage [3, O’Neill][4,
Delgado] and for visualizations we used IntPic [5, Delgado]. We also implemented our own
2
publicly available repository randnumsgps [6] for generating and visualizing random numerical
semigroups in Python.
The structure of the thesis is as follows:
• Chapter 2 discusses the Probabilistic Method, based on the work of Noga Alon and Joel
H. Spencer.
• Chapter 3 focuses on numerical semigroups, providing definitions, examples, and results
necessary for understanding their structure.
• Chapter 4 introduces three models of random numerical semigroups, including our newly
proposed model. We also present recent results in the field, including Theorem 4.2.1.
• Chapter 5 details the algorithms and experiments conducted.
• Chapter 6 presents the main results, including the proof of Theorem 6.1.1, its implications
and its relation with Theorem 4.2.1.
To sum up, we provide a detailed study of random numerical semigroups using probabilistic
models, experimental data and software tools.
3
Chapter 2
2.1 Introduction
This chapter is based on the book The Probabilistic Method by Noga Alon and Joel H. Spencer
[7].
Definition 2.1.1. A tournament is a directed graph T on n vertices such that for every pair
of vertices i, j ∈ V (T ), exactly one of the edges (i, j) or (j, i) is in E(T ).
The name of a tournament comes from the fact that it can be thought of as a sports tournament
where each vertex represents a team and each team plays every other team exactly once. The
edge (i, j) represents a win for team i over team j. A tournament T has property Sk if for
every subset K ⊆ V (T ) of size k, there is a vertex v ∈ V (T ) such that (v, s) ∈ E(T ) for all
s ∈ K. That is, for every set of k teams there is a team that beats all of them. For example,
the tournament in Figure 2.1 has property S1 since every team is beaten by another team.
A natural question to ask is: for every k, is there a tournament with property Sk ? The answer
is yes. We will prove this using the Probabilistic Method. First we define a probability space
over the set of tournaments on n vertices:
A random tournament on a set of n vertices is a tournament T such that for every pair of
vertices i, j ∈ V (T ), the edge (i, j) is in E(T ) with probability 21 and the edge (j, i) is in E(T )
with probability 21 , independently of all other edges. Thus, every tournament on n vertices has
the same probability, which means that this probability space is symmetric.
The main idea is to show that for sufficiently large n as a function of k, such that the probability
that a random tournament on n vertices has property Sk is greater than zero. This implies
• •
4
that there is at least one tournament with property Sk .
Theorem 2.1.1 (Theorem 1.2.1 [7]). For every k ∈ N, there is a tournament with property Sk .
Proof. Fix a subset K ⊆ V (T ) of size k. Consider the event AK that there is no vertex
v ∈ V (T ) such that (v, s) ∈ E(T ) for all s ∈ S. For any vertex v ∈ V (T ) \ K, the probability
/ E(T ) for all s ∈ K is 2−k . Thus,
that (v, s) ∈
n−k
Pr[AK ] = 1 − 2−k .
Now, if we consider all subsets K ⊆ V (T ) of size k, then the probability that T does not have
property Sk is the probability that at least one of the events AK occurs. Since there are nk
such subsets, by the union bound,
_ X n
Pr
A K
≤
Pr[A K ] = 1 − 2 −k n−k
.
k
K⊆V (T ) K⊆V (T )
|K|=k |K|=k
We want to show that, for some n, the probability of this event is less than 1. Using Propositions
A.0.1 and A.0.3, we have that
_ n n−k
Pr
AK ≤ 1 − 2−k (2.1)
k
K⊆V (T )
|K|=k
en k n−k
−2−k
= ek log( k )−
n n−k
≤ e 2k . (2.2)
k
Then, (2.2) is less than 1 if
n n−k
k log − < 0.
k 2k
Which is true if
n k n
0 > k log n − k
> k log n − k log k + k − k
2 2 2
n n − k
= k log − .
k 2k
Thus,
n _
> k2k =⇒ Pr
AK
< 1.
log n
K⊆V (T )
|K|=k
Hence, for sufficiently large n, the probability that a random tournament on n vertices does not
have property Sk is less than one. Therefore, the probability that there exists a tournament
on n vertices with property Sk is greater than zero, which means that there exists at least one
tournament with property Sk .
We make two observations:
5
1. We used the union bound. The union bound is a common technique in the Probabilistic
Method. It states that for any events A1 , ..., An ,
We will extensively use this technique in this thesis. In a measure space, the union bound
is the same property as subadditivity.
2. The proof is nonconstructive. It does not give us a way to find a tournament with property
Sk . It only shows that there is at least one. This is a common feature of the Probabilistic
Method. However, in this case, we have that for large enough n, the probability that a
random tournament on n vertices has property Sk is close to one. This means that we
can find a tournament with property Sk by generating random tournaments until we find
one with the desired property. If n is large enough, it will be highly probable, though
harder to verify.
In this chapter, we will introduce some tools that are useful for applying the Probabilistic
Method in discrete settings. We will also give some examples of the method in action.
= E[X] + E[Y ].
Also, if a ∈ R,
X
E[aX] = axPr[X = x]
x∈Rg(X)
6
X
=a xPr[X = x]
x∈Rg(X)
= aE[X].
Note that there is a point x such that x ≥ E[X] and Pr[X = x] > 0, and there is a point
x ≤ E[X] such that Pr[X = x] > 0. The following result by Szele (1943) is often considered as
one of the first applications of the Probabilistic Method.
Theorem 2.2.2 (Theorem 2.1.1 [7]). There is a tournament with n players and at least n!2−(n−1)
Hamiltonian paths.
Proof. Let X be the number of Hamiltonian paths in a random tournament. Let σ
be a permutation, and let Xσ be the indicator variable for the event that σ is a Hamilto-
nian path of the random tournament. That is, σ is an ordering of the vertices P such that
(σ(1), σ(2)), ..., (σ(n − 1), σ(n)) are edges of the tournament. Then, X = σ Xσ . By the
linearity of expectation,
" #
X X X 1
E[X] = E Xσ = E[Xσ ] = n−1
= n!2−(n−1) .
σ σ σ
2
The standard practice is to denote the expected value by µ and the variance by σ 2 . The use of
the following inequality is called the second moment method
Theorem 2.3.1 (Chebyshev’s inequality). For λ > 0,
7
1
Pr[|X − µ| ≥ λσ] ≤ .
λ2
Proof.
σ 2 = Var[X] = E[(x − µ)2 ] ≥ λ2 σ 2 Pr[|X − µ| ≥ λσ].
where
Cov[Xi , Xj ] = E[Xi Xj ] − E[Xi ]E[Xj ].
Note that Cov[Xi , Xj ] = 0 if Xi and Xj are independent. Furthermore, if, for each i, Xi is an
indicator variable of event Ai , that is, Xi = 1 if Ai occurs and Xi = 0 otherwise, then
Suppose that X only takes nonnegative integer values, and we are interested in bounding
Pr[X = 0]. First, note that
Pr[X > 0] ≤ E[X]. (2.4)
For a sequence of variables X1 , X2 , ..., we say that X satisfies a property A almost always if
limn→∞ Pr[Xn satisfies A] = 1.
Thus, using (2.4), if E[X] → 0, then X = 0 almost always. On the other hand, if E[X] → ∞, it
is not necessarily true that X > 0 almost always. For instance, consider an obviously imaginary
game where you throw a coin until it lands heads up and you get paid 2n dollars if it takes n
throws. Then, E[X] = ∞ but X = 0 with probability 12 . In some cases, we can use the second
moment method to show that if E[X] → ∞ and we have more information about Var[X], then
X > 0 almost always.
Theorem 2.3.2 (Theorem 4.3.1 [7]). Pr[X = 0] ≤ Var[X]
E[X]2
.
Proof. We apply Chebyschev’s inequality 2.3.1 with λ = σµ . Thus,
1 σ2
Pr[X = 0] ≤ Pr[|X − µ| ≥ λσ] ≤ = .
λ2 µ2
Var[X]
Pr[X = 0] ≤ .
ε2 E[X]2
Thus, we have a tighter result:
8
Corollary 2.3.2. If Var[X] ∈ o(E[X]2 ), then X ∼ E[X] almost always..
Finally, if X = X1 + · · · + Xn , where each Xi is the indicator variable of event Ai . For indices
i, j such that i ̸= j, we say that i ∼ j if the events Ai and Aj are not independent. Let
X
∆= Pr[Ai ∧ Aj ]. (2.5)
i∼j
Corollary 2.3.3. If E[X] → ∞ and ∆ = o(E[X 2 ]), then X > 0 almost always. Also, X ∼ E[X]
almost always.
Proof. When i ∼ j,
and so
X X
Var[X] ≤ E[X] + Cov[Xi , Xj ] ≤ E[X] + Pr[Ai ∧ Aj ] = E[X] + ∆. (2.6)
i̸=j i∼j
Pr[{i, j} ∈ G] = p
9
Proof. For each vertex i in G, let Ai be the event that i is an isolated vertex and define its
indicator variable
1 if i is an isolated vertex,
Xi =
0 if i is not an isolated vertex.
( )
(log n) 2
ln n −k log n ±O k2
Let p = k for k ∈ R>0 . Then, since 1 − k logn n = e n n2
,
n
n−1
ln n
lim E[X] = lim n 1 −
n→∞ n→∞ n
−k ln n
= lim ne = lim n1−k .
n→∞ n→∞
lim Pr[G(n, p) ⊨ A] = 1.
n→∞
We have to use the second moment method. We will prove that Var[X] = o(E[X]2 ).
First,
X X
E[Xi Xj ] = Pr[Xi = Xj = 1]
i̸=j i̸=j
for if i is an isolated vertex, then there is no edge between i and j so we only have to
account for the remaining n − 2 edges that contain j.
• • •
• • •
10
Pn Pn
Thus, since i=1 E[Xi2 ] = i=1 E[Xi ] = E[X] and limn→∞ p(n) = 0,
11
Chapter 3
Numerical Semigroups
3.1 Introduction
So far we have only discussed graphs. In this chapter, we will introduce a new object which
has a different structure, but for which the Probabilistic Method can be used to prove results.
Definitions and results in this chapter can be found in [9] and [1].
Definition 3.1.1. A numerical semigroup is a subset S ⊆ N for which
1. 0 ∈ S,
2. S is closed under addition, i.e. a, b ∈ S implies a + b ∈ S, and
3. S has finite complement in N.
Examples of numerical semigroups include N and N\{1}. Subsets of N which are not numerical
semigroups include the set of even numbers, any finite set and N0 \ {2}.
Example 3.1.1. The McNugget Semigroup is the set of all non-negative integers which can be
expressed as a sum of non-negative multiples of 6, 9 and 20 (see Figure 3.1).
Suppose you are in the United Kingdom and you wish to order 43 McNuggets. The cashier
will hesitate for a while before telling you that they do not sell 43 McNuggets, since there is no
combination of boxes of 6, 9 and 20 McNuggets which add up to 43 [10]. However, if you order
44 McNuggets, one possibility is that you will receive one box of 20 McNuggets, two boxes of
9 McNuggets and one box of 6 McNuggets. This is because 44 can be expressed as a sum of
non-negative multiples of 6, 9 and 20, namely 44 = 2 · 20 + 2 · 9 + 6. In general, if you order
more than 43 McNuggets, you will receive your order.
Let us see why the McNugget Semigroup is a numerical semigroup. First, we note that 0 can
be expressed as a sum of non-negative multiples of 6, 9 and 20, namely 0 = 0 · 6 + 0 · 9 + 0 · 20.
Next, we note that if a and b can be expressed as a sum of non-negative multiples of 6, 9 and
20, then so can a + b. Finally, we note that the complement of the McNugget Semigroup in N0
is finite, since
44 = 2 · 20 + 2 · 9 + 6, 45 = 5 · 9,
46 = 2 · 20 + 6, 47 = 20 + 3 · 9,
48 = 8 · 6, 49 = 2 · 20 + 9.
12
And every integer greater than 49 can be expressed as a sum of one of these numbers plus a
multiple of 6.
The McNugget semigroup is an example of a numerical semigroup which is finitely generated.
This means that there exists a finite set A = {a1 , . . . , an } such that S = ⟨A⟩, where
λ1 a1 + . . . + λn an = 1.
Then,
X
n X
n
k := λ i ai + a1 |λi |ai ≡ 1 (mod a1 ),
i=1 i=1
3.2 Invariants
Let S be a numerical semigroup.
13
44 45 46 47 48 49
38 39 40 41 42 43
32 33 34 35 36 37
26 27 28 29 30 31
20 21 22 23 24 25
14 15 16 17 18 19
8 9 10 11 12 13
2 3 4 5 6 7
-4 -3 -2 -1 0 1
Definition 3.2.1. The multiplicity of S, denoted by m(S), is the smallest non-zero element of
S.
For instance, the multiplicity of the McNugget Semigroup is 6.
Let A and B be non-empty finite subsets of N. Then we denote by A + B the set
{a + b : a ∈ A, b ∈ B}.
Theorem 3.2.1. There exists a unique minimal generating set A with S = ⟨A⟩.
Proof. Let A = S \ (S + S). This means that every element in A is not the sum of two
elements in S. First we prove that A generates S. Note that A generates 0. Suppose that
s ∈ S \ A. Then s = a + b, such that a and b are in S and a, b < s. If we proceed recursively,
in a finite number of steps we can express s as a sum of elements of A.
Now, we show that A is minimal. If S = ⟨A′ ⟩, then for a ∈ A, if a is a sum of non-negative
multiples of elements of A′ , then, since a is not the sum of two elements in S, a must be an
element of A′ .
Since the minimal generating set is unique, we can define the following invariant.
Definition 3.2.2. The embedding dimension of S, denoted by e(S), is the cardinality of the
minimal generating set of S.
Corollary 3.2.1. e(S) ≤ m(S).
Proof. Apply Theorem 3.2.1 and the proof of Theorem 3.1.1.
Let n be a non-zero element of S.
Definition 3.2.3. The Apéry set of n in S is the set
Ap(S, n) = {s ∈ S : s − n ∈
/ S}
14
where bi is the first element of S such that bi ≡ i (mod n). For instance,
Ap(⟨6, 9, 20⟩, 6) = {0, 49, 20, 9, 40, 29}.
.
Proposition 3.2.1. Each element of Ap(S, n) is either an element of the minimal generating
set or a sum of two elements of Ap(S, n).
Proof. First, 0 = 0 + 0. If 0 < s ∈ Ap(S, n), the other option is that s = a + b such that
a ∈ S \ Ap(S, n) and b ∈ S. But this is not possible, since that means that a − n ∈ S, and
so a − n + b ≡ s (mod n), which contradicts that s is the first element of its congruence class
modulo n.
For example, in the case of the McNugget semigroup,
29 = 20 + 9, 40 = 20 + 20 and 49 = 40 + 9.
Definition 3.2.4. The Frobenius number of S, denoted by F (S), is the largest element of
(N ∪ {−1}) \ S.
The Frobenius number of the McNugget semigroup is 43. For n ∈ S, using the definition of
the Frobenius number, we have that
F (S) = max(Ap(S, n)) − n, (3.1)
since any number larger than this will belong to S. It has been proven that finding the Frobenius
number from a variable number of generators is an NP-hard problem [11]. On the other hand,
integer programming can be used to find the Frobenius number of a numerical semigroup and,
vice versa, the Frobenius number plays an important role in the analysis of integer programming
algorithms [12].
Theorem 3.2.2. Let a, b ∈ N such that gcd(a, b) = 1. Then
F (⟨a, b⟩) = ab − a − b.
Proof. If a and b are relatively prime, then b modulo a generates Za and (a − 1)b is the
maximum element of Ap(⟨a, b⟩, a). Apply equation 3.1.
Definition 3.2.5. The conductor of S, denoted by c(S), is the first element of S greater than
F (S), so c(S) = F (S) + 1.
In Figure 3.1, the elements of the McNugget semigroup are colored with light blue, the elements
of the minimal generating set are shown in red and the conductor is shown in dark blue. The
lowest element of each column is the first element of its congruence class modulo 6, which is
the Apéry set of 6 in the McNugget semigroup.
In general, the Apéry set can be used in to calculate many properties of numerical semigroups
[13]. For instance, it can be used to calculate the cardinality of the set of gaps G(S) = N \ S.
Definition 3.2.6. The genus of S, denoted by g(S), is the cardinality of G(S).
Proposition 3.2.2. Let n be a non-zero element of S.
1 X n−1
g(S) = s − . (3.2)
n 2
s∈Ap(S,n)
15
Proof. First, if we divide the complement of S in N into congruence classes modulo n, we
can find the number of gaps in congruence class i by counting the number of elements in that
class before the first element of S in that class. Thus, if bi ∈ Ap(S, n) and bi ≡ i (mod n), we
can write bi = ki n + i, and the number of gaps in congruence class i is ki . Therefore,
!
X
n−1
1 X
n−1
n(n − 1) n−1
g(S) = ki = n ki + −
n 2 2
i=1 i=1
!
1 X
n
n−1
= ki n + i −
n i=1 2
1 X n−1
= s − .
n 2
s∈Ap(S,n)
Equations 3.1 and 3.2 are known as the Selmer formulas [14]. Note that, for S such that
g(S) > 0,
g(S) ≤ F (S),
since F (S) is the largest element of N \ S. Also,
c(S) F (S) + 1
g(S) ≥ = ,
2 2
since s ∈ S implies that F (S) − s ∈
/ S, which means that at least half of the elements which
are less than c(S) do not belong to S. This shows the following for a numerical semigroup S
with g(S) > 0.
Proposition 3.2.3. g(S) ≤ F (S) ≤ 2g(S).
Finally, we define a class of semigroups which are used in the proof of Theorem 4.2.1.
Definition 3.2.7. A numerical semigroup S is irreducible if it is maximual with respect to
inclusion among all numerical semigroups with the same Frobenius number.
n(S) 1
≥ .
c(S) e(S)
Wilf’s conjecture states that the density of the elements of a numerical semigroup which are
less than the conductor (also called the small elements) is bounded below by the inverse of
the embedding dimension. This conjecture has been verified for specific classes of numerical
semigroups [1]: for a numerical semigroup S, Wilf’s conjecture holds whenever one of these
conditions is satisfied:
• e(S) ≤ 3,
16
• e(S) = m(S),
• F (S) − 1 + g(S) ≤ 4,
• 4g(S) ≤ 3c(S),
• n(S) ≤ 4,
• 4n(S) ≥ c(S),
• F (S) ≤ 20.
This is not an exhaustive list. As there are several published papers on partial solutions, a
recent survey on Wilf’s conjecture can be found in [16].
17
Chapter 4
We present three distinct models of random numerical semigroups. The first model employs a
uniform distribution for its generators and is the most extensively studied. The second model
is inspired by the Erdös-Rényi approach to random graphs. The third model is a new approach
we introduce, characterized by a fixed Frobenius number.
In other words, a point in G(T ) is chosen uniformly at random and the corresponding semigroup
is returned. V. I. Arnold was the first to study this model [17]. For any numerical semigroup,
for A = {a1 , . . . , an } such that a1 ≤ . . . ≤ an , Erdös and Graham [18] proved that:
ha i
1
F (⟨A⟩) ≤ 2an − a1 .
n
On the other hand, Aliev and Gruber [19] proved an optimal lower bound for the Frobenius
number, namely:
1 1
F (⟨A⟩) > (n − 1)! n−1 (a1 · · · an ) n−1 − (a1 + · · · + an ).
In [17] and [20], Arnold conjectured that the average behavior of the Frobenius number is, up
to a constant, given by the lower bound, i.e.:
1 1
F (⟨A⟩) ∼ (n − 1)! n−1 (a1 · · · an ) n−1 .
In 2009, Aliev, Henk and Hindrichs [12] proved Arnold’s conjecture, by showing the following
theorem.
18
Theorem 4.1.1. Let n ≥ 3. Then, for every constant D,
" #
F (⟨A⟩)
≥ D ∈ o(D−2 n+1 ).
n−1
Pr 1
(a1 · · · an ) n−1
The statement of this theorem does not depend on T , it only depends on n. The proof is based
on a discrete inverse arithmetic-geometric mean inequality.
Pr[n ∈ A] = p,
lim E[e(S)] < ∞, lim E[g(S)] < ∞ and lim E[F (S)] < ∞,
M →∞ M →∞ M →∞
19
a fixed Frobenius number n (Definition 3.2.7). It turns out that the faces of the n−th simplicial
complex count the number of sets A ⊂ {1, . . . , n − 1} satisfying n ∈
/ ⟨A⟩ ([2, Proposition 23]).
Thus, the expected value of the embedding dimension can be calculated from the entries of the
h-vector (in the sense of algebraic combinatorics [21]) of this simplicial complex.
The authors also provide bounds for the expected value of the embedding dimension and the
genus, when p is constant.
Theorem 4.2.2. Let S ∼ S(M, p), where p is a constant. Then,
6 − 8p + 3p2 2 − p2
≤ lim E[e(S)] ≤ ,
2 − 2p2 + p3 M →∞ p
6 − 14p + 11p2 − 3p3 (1 − p)(2 − p2 )
≤ lim E[g(S)] ≤ , and
2p − 2p3 + p4 M →∞ p2
6 − 14p + 11p2 − 3p3 2(1 − p)(2 − p2 )
≤ lim E[f (S)] ≤ .
2p − 2p3 + p4 M →∞ p2
As p tends to 0, this says that the limit of the expected value of the embedding dimension is
between a function that tends to a constant and a function that is asymptotically linear in p1 .
Also, the expected values of the genus and the Frobenius number are between a function that
is asymptotically linear in p1 and a function that is asymptotically quadratic in p1 . It is useful
to think of these functions in terms of p1 , since it is the expected value of the gap between
randomly selected elements of A in the ER-type model.
We now prove part (a) of Theorem 4.2.1.
1
Theorem 4.2.3 (Theorem 5 [2]). M
is a threshold function for co-finiteness.
Proof. We prove the two parts of the theorem separately.
Part 1
Suppose that p ∈ o 1
M
. As in Inequality 2.4,
Part 2
Suppose that M1 ∈ o(p) and limM →∞ p = 0. We prove that S is co-finite almost always, by
showing that A almost always contains a co-prime pair of elements.
For each pair of distinct co-prime integers i, j ∈ {1, . . . , m}, let Ai,j be the event that i, j ∈ A.
Let 1i,j be the indicator variable of event Ai,j and let
X
X= 1i,j .
i<j
gcd(i,j)=1
We show that X > 0 almost always using the second moment method. First we calculate E[X].
It is a well-established result in number theory [22, Theorem 332] that the limiting probability
20
6
of two integers being co-prime is π2
. That is,
Therefore, E[X] ∼ π32 M 2 p2 . Now, {i, j} ∼ {i′ , j ′ } (see (2.5)) if i = i′ and j ̸= j ′ . Thus, by
varying over triples i, j, j ′ , we obtain
X
∆= Pr[Ai,j ∧ Ai′ ,j ′ ] ≤ M 3 p3 .
{i,j}∼{i′ ,j ′ }
Therefore, ∆ ∈ o(E[X]2 ), since M1 ∈ o(p). Thus, by the second moment method (Corollary
2.3.3), X > 0 almost always. This means that A almost always contains a co-prime pair of
elements and so S is co-finite.
In chapter 6, we prove part (b) of Theorem 4.2.1 using standard probabilistic methods. We
also prove a result related to part (c) (Lemma 6.3.2) .
21
Chapter 5
Experiments
For the experiments, we generated random numerical semigroups for several values of 1/p in
the range [4, 1000]. For each 1/p, the sample size was 1000 and we calculated the average
embedding dimension (Table 5.1) and the average Frobenius number (Table 5.2). As shown in
the experiments done in [2], the bounds found in Theorem 4.2.2 are not tight.
22
1/p Lower Bound e(S) Upper bound
4.00 2.21 2.79 7.75
75.14 2.95 9.05 150.27
146.29 2.97 10.67 292.56
217.43 2.98 12.28 434.85
288.57 2.99 12.94 577.14
359.71 2.99 13.65 719.43
430.86 2.99 14.14 861.71
502.00 2.99 14.73 1,004.00
573.14 2.99 15.29 1,146.28
644.29 2.99 16.17 1,288.57
715.43 2.99 16.07 1,430.86
786.57 2.99 16.34 1,573.14
857.71 3.00 16.61 1,715.43
928.86 3.00 17.39 1,857.71
1,000.00 3.00 17.22 2,000.00
Table 5.1: Average embedding dimension of random numerical semigroups generated using the
ER-type model (15 samples of 1000 random numerical semigroups).
Table 5.2: Average Frobenius number of random numerical semigroups generated using the
ER-type model (15 samples of 1000 random numerical semigroups).
23
We plot the average embedding dimension and the average Frobenius number found in the
experiments in Figures 5.1 and 5.2. These experiments led us to conjecture that the av-
erage embedding dimension grows as log(1/p) and the average Frobenius number grows as
(1/p) log(1/p). Although we did not prove this conjecture, it led us in the right direction to
prove Theorem 6.1.1.
15.0
Experiments
12.5
10.0
7.5
5.0
2.5
0 200 400 600 800 1000
1/p
Figure 5.1: Average embedding dimension of random numerical semigroups generated using the
ER-type model.
24
Expected Frobenius Number
Experiments
25000 4(1/p) log(1/p)
20000
Experiments
15000
10000
5000
Figure 5.2: Average Frobenius number of random numerical semigroups generated using the
ER-type model vs 4(1/p) log(1/p)
350
300
250
Frequency
200
150
100
50
0
0 20 40 60 80 100 120
Frobenius Number
Figure 5.3: Histogram of the Frobenius number of random numerical semigroups generated
using the ER-type model, for 1/p = 4.
25
Histogram of Frobenius Numbers, 1/p = 220 (n = 1000)
140
120
100
Frequency
80
60
40
20
0
0 2000 4000 6000 8000 10000 12000 14000 16000
Frobenius Number
Figure 5.4: Histogram of the Frobenius number of random numerical semigroups generated
using the ER-type model, for 1/p = 220.
We also plot the histograms of the embedding dimension and the Frobenius number for 1/p =
4, 220, 1000 (Figures 5.3-5.8). These histograms hint that the distribution of the embedding
dimension and the Frobenius number of ER-type random numerical semigroups may converge
to a known distribution, which is a topic for future research.
26
Histogram of Frobenius Numbers, 1/p = 1000 (n = 1000)
120
100
Frequency
80
60
40
20
0
0 20000 40000 60000 80000
Frobenius Number
Figure 5.5: Histogram of the Frobenius number of random numerical semigroups generated
using the ER-type model, for 1/p = 1000.
250
200
Frequency
150
100
50
0
2 4 6 8 10 12
Embedding Dimension
Figure 5.6: Histogram of the embedding dimension of random numerical semigroups generated
using the ER-type model, for 1/p = 4.
27
Histogram of Embedding Dimension, 1/p = 220 (n = 1000)
140
120
100
Frequency
80
60
40
20
0
0 10 20 30 40
Embedding Dimension
Figure 5.7: Histogram of the embedding dimension of random numerical semigroups generated
using the ER-type model, for 1/p = 220.
250
200
Frequency
150
100
50
0
0 10 20 30 40 50
Embedding Dimension
Figure 5.8: Histogram of the embedding dimension of random numerical semigroups generated
using the ER-type model, for 1/p = 1000.
28
Finally, we show some visualizations of ER-type random numerical semigroups, using the algo-
rithm below for the generation of Ap(S, m(S)), with M, p and m(S) as parameters.
To generate a visualization, run the script smgps_visualizer.py found in the repository [6]
and introduce a number n. The script will generate a ER-type random numerical semigroup
with multiplicity n and p = 1/n, and will plot a visualization.
This visualizations are useful to understand the structure of ER-type random numerical semi-
groups. For example, a number is less likely to be in the minimal generating set as it grows.
This is because there are more elements that can be in the span of the minimal generating set
as the generation of the numerical semigroup progresses.
A related observation is that there are patterns in the density of the numerical semigroup,
in the density of the minimal generating set and in the density of the Apéry set. One could
think of this patterns as phase transitions. A topic for further research would be to formalize
a definition of this density and to study its behavior.
29
S(M, p) with M = 100000 and p = 0.1
Apéry set
Minimal generating set
Semigroup
0 20 40 60 80
Apéry set
Minimal generating set
Semigroup
30
S(M, p) with M = 100000 and p = 0.01
Apéry set
Minimal generating set
Semigroup
Apéry set
Minimal generating set
Semigroup
31
Chapter 6
Results
6.1 Introduction
In this chapter, we present the main results of this thesis. We will prove a theorem similar to
parts (b) and (c) of Theorem 4.2.1 using standard probabilistic arguments.
Theorem 6.1.1. Let S ∼ S(M, p), where p = p(M ) is a monotone decreasing function of M
and M1 ∈ o(p(M )). Then,
(a) If limM →∞ p(M ) = 0, then for every K ∈ N,
lim Pr[e(S) > K] = lim Pr[g(S) > K] = lim Pr[F (S) > K] = 1.
M →∞ M →∞ M →∞
(b) If limM →∞ p(M ) > 0, then e(S), g(S) and F (S) are bounded in probability, i.e., for every
ε > 0, there exists Kε such that
Pr[e(S) < Kε ] > 1 − ε, Pr[g(S) < Kε ] > 1 − ε and Pr[F (S) < Kε ] > 1 − ε.
We will also show that that this Theorem is stronger than Theorem 4.2.1. The proof of part
(b) of this theorem is based on Lemma 6.3.1, which is a result on sums of random subsets of
cyclic groups.
32
1
• E1 : No generator selected is less than ap
.
Let X1 be the number of generators selected from {1, . . . , ⌊ ap
1
⌋}. Then
1 1
Pr[¬E1 ] = Pr[X1 > 0] ≤ E[X1 ] ≤ p · = . (6.1)
ap a
3a
• E2 : At most 2
generators are selected from T .
Let X2 be the number of generators selected in T , then X2 ∼ Bin( ap , p) and we can use
the bound in Proposition A.0.5 with r = 3a
2
to get that
3a 3a
(1 − p) 3a
(1 − p) 6
Pr[¬E2 ] = Pr X2 > ≤ 23a = 2
≤ .
2 ( 2 − a)2 ( a2 )2 a
P [b ∈ AT ] = P [b ∈ AT |Y(1) = y1 ] if b > y1 .
Since y1 ≥ 1
ap
, we have that
|⟨y1 ⟩ ∩ AT | ≤ |⟨y1 ⟩ ∩ T | ≤ a2 .
33
X p pa2
≤ ≤ .
1− 7
1 − a7
b∈⟨y1 ⟩∩T a
Also,
and so
p
Pr[b ∈ AT |E1 ∧ E2 ∧ Y(1) = y1 ∧ · · · ∧ Y(t−1) = yt−1 ] ≤ .
1 − a7
Then, as in (6.6),
pa2t
Pr[Y(t) is not minimal|E1 ∧ E2 ] ≤ .
1 − a7
Therefore, we can use the union bound and k ≤ 3a
2
to conclude that
3a
−1
p X2
34
6.3 Upper bound
Before proving part (b) of the Theorem 6.1.1, we will prove a lemma that shows that a cyclic
group of prime order is covered by the sums of a random subset of logarithmic size almost
always.
Lemma 6.3.1. Let q be a prime number and A be a random subset of Zq of size 4⌊6 log2 q⌋.
As q tends to infinity, 2⌊6 log2 q⌋A covers Zq almost always.
Proof. Let s ∈ N such that s ≤ q. Let A be a uniformly random subset of Zq of size s, that
is,
1
Pr(A) = q .
s
For a given z ∈ Zq and k ∈ N for which k ≤ s/2, let
( )
X
Nzk := K ⊆ Zq : |K| = k, t=z .
t∈K
Note that |Nzk | = 1 q
q k
, since K ∈ N0k if and only if K + k −1 z ∈ Nzk for every z ∈ Zq .
For K ∈ Nzk , let EK be the event that K ⊂ A. Let XK be the indicator variable of EK . We
define the random variable X
Xz = XK .
K∈Nzk
Note that Xz counts the number of sets of size k which add up to z. We now find E[Xz ]. Since
the sum of every subset K ⊂ S is in Zq ,
X
s
Xz = ,
z∈Z
k
q
and so
X X
s
= E Xz = E[Xz ].
k z∈Z z∈Z
q q
If |K ∩ L| = j,
q−2k+j
Pr[EK ∧ EL ] = q
s−2k+j
.
s
35
We can bound the number of events for which |K ∩ L| = j. First we choose K as any set in
Nzk and then we choose the remaining k − j elements as any subset of Zq \ K with size k − j.
Thus,
q−2k+j
1 q q − k s−2k+j
∆j ≤ .
q k k−j q
s
2⌊6 log2 q⌋
∆j q ⌊6 log2 q⌋ q q 1
≤ ≤ ≤ ∼ ,
E[Xz ]2 4⌊6 log2 q⌋
2⌊6 log2 q⌋
2⌊6 log2 q⌋
⌊6 log2 q⌋
2⌊6 log2 q⌋ q5
2⌊6 log2 q⌋ 2
since ⌊6 log2 q⌋
≤ 4⌊6 log2 q⌋
2⌊6 log2 q⌋
(Proposition A.0.4).
E[Xz ] + ∆ 1 X ∆j k
Pr[Xz = 0] ≤ = +
E[Xz ]2 E[Xz ] j=0 E[Xz ]2
1 (k + 1) 1 2⌊6 log2 q⌋ + 1
≤ + 5
= + .
E[Xz ] q E[Xz ] q5
q2 2⌊6 log2 q⌋ + 1
= 4⌊6 log2 q⌋
+ (6.10)
2⌊6 log2 q⌋
q4
2
q 2⌊6 log2 q⌋ + 1
≤ + (6.11)
22⌊6 log2 q⌋ q4
1 6 log q
∼ 10 + = o(1). (6.12)
q q4
We conclude that, as q → ∞, Xz > 0 for every z ∈ Zq almost always. Thus, for every z ∈ Zq ,
there exists K ∈ Nzk such that K ⊂ A almost always. This means that 2⌊6 log2 q⌋A covers Zq
almost always.
36
6.3.1 Proof of the upper bound
Lemma 6.3.2. Let ψ(x) be a function for which x(log x)2 ∈ o(ψ(x)) .Then
1
lim Pr F (S) ≤ ψ = 1.
p→0 p
Part 1
Consider the event D1 that a prime q is selected, such that
200 4 4 1 4 1
log ≤ q ≤ log log log .
p p p p p p
Then
Pr[¬D1 ] ≤ (1 − p) p (4 log p −200) ≤ e−(4 log p −200) ∈ O(p4 ).
1 1 1
Part 2
Given D1 , let D2 be the event that more than 24 log q generators are selected. Let X ∼ Bin(q, p).
Since
4 1 4 1
q≤ log log log ,
p p p p
then 4
4 4
24 log q ≤ 24 log ≤ 100 log .
p p
Also, since
200 4
q≥ log ,
p p
then
4
E[X] = qp ≥ 200 log .
p
Remember Chernoff’s bound:
λ2
Pr[X ≤ E[X] − λ] ≤ e− 2E[X] (6.13)
E[X]
Thus, using λ = 2
,
E[X]
= e− 8 ≤ e−25 log p ∈ O(p25 ).
E[X] 4
Pr[¬D2 ] ≤ Pr X ≤ E[X] − (6.14)
2
37
Part 3
Finally, assume D1 and D2 . Let A be the set of generators of chosen before q. Since the
generators are chosen randomly and |A| ≥ 24 log q, we can apply Lemma 6.3.1 to Zq ∼= Ap(⟨A∪
{q}⟩, q). Consider the event D3 that Ap(⟨A ∪ {q}⟩, q) will be completed before step
3 !
1 1
12q log q ∈ O log .
p p
1 6 log q
Pr[¬D3 ] ≤ 10
+
q q4
4
1 6 log p4
≤ 10 + 4 ∈ O(p ).
4
200
p
log p4 200
p
4
log p
Now, note that the expected value of the Frobenius number can be bounded by
Now,
Z
n=0
∞
(2x)2 e−p
2 (x−1)
≤ p2 dx
0
38
2
8ep
= 4 .
p
Therefore, 3
K 1
E[F (S)] ≤ log + C,
p p
for some constants K and C that do not depend on p.
Corollary 6.3.1. " 3 #
K 1
lim Pr g(S) ≤ log = 1.
p→0 p p
by the bound on the right tail of the binomial distribution (Proposition A.0.5), we have that
" 3 #
1
lim Pr e(S) ≤ 3K log = 1.
p→0 p
39
Bibliography
40
[22] G. H. Hardy and E. M. Wright, An introduction to the theory of numbers. Oxford uni-
versity press, 1979.
[23] W. Feller, An introduction to probability theory and its applications. John Wiley & Sons,
1971, vol. 1.
41
Appendix A
Useful Bounds
We include some bounds that are useful in the proofs of the main results. By Stirling’s Formula,
we have that k
√ k
k! ∼ 2πk . (A.1)
e
en k
Proposition A.0.1. n
k
≤ k
for 1 ≤ k ≤ n.
Proof. Using (A.1), we have that, for k ≥ 1,
k
k
k! ≥ .
e
Then en k
n nk
≤ k = .
k k k
e
n k
Proposition A.0.2. k
≥ n
k
for 1 ≤ k ≤ n.
Proof. k−1
n Y n − i n k
= ≥ .
k i=0
k−i k
42
Proposition A.0.5. Let X ∼ Bin(n, p). If r > np,
r(1 − p)
Pr[X ≥ r] ≤ .
(r − np)2
(n − r)p
Pr[X ≤ r] ≤ .
(np − r)2
43