0% found this document useful (0 votes)
8 views

A Question of Erdős and Graham On Egyptian Fractions

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

A Question of Erdős and Graham On Egyptian Fractions

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

A question of Erdős and Graham on Egyptian fractions

David Conlon∗ Jacob Fox† Xiaoyu He‡ Dhruv Mubayi§ Huy Tuan Pham¶
Andrew Suk‖ Jacques Verstraëte∗∗
arXiv:2404.16016v1 [math.CO] 24 Apr 2024

Abstract
Answering a question of Erdős and Graham, we show that for each fixed positive rational number x the
number of ways to write x as a sum of reciprocals of distinct positive integers each at most n is 2(cx +o(1))n for
an explicit constant cx increasing with x.

1 Introduction
The study of Egyptian fractions, that is, sums of reciprocals of distinct positive integers, has a long history in
combinatorial number theory (see, for example, [4]). The fact that every positive fraction can be written as an
Egyptian fraction goes back at least to work of Fibonacci at the start of the 13th century. Much more recently, a
result of Bloom [2] says that any subset of the natural numbers of positive upper density has a finite subset the
sum of whose reciprocals adds to one.
In this paper, we will be concerned with a problem raised by Erdős and Graham [8, Page 36] in 1980 (see also [3,
Problem 297]): how many ways are there to write one as a sum of distinct unit fractions with denominator at most
n? Very recently, Steinerberger [11] showed that the number of such Egyptian fractions is at most 20.93n . This
already answered one particular question of Erdős and Graham, who asked whether the answer was 2n−o(n) . Here
we answer their question much more precisely by showing that the count is 2(1+on (1))cn for an explicit constant
c ≈ 0.91117.
Our main result more generally estimates the number of Egyptian fractions summing to any fixed positive
rational. Let h : [0, 1] → R be given by h(p) = −p log2 p − (1 − p) log2 (1 − p) for p ∈ (0, 1) and h(0) = h(1) = 0.
Theorem 1. For any fixed x ∈ Q>0 , the number of subsets A ⊆ [n] with x = a∈A 1/a is 2cx n+o(n) , where
P
Z 1  
1
cx := h dy
0 1 + eλ/y
and λ is the unique real number such that
1
1
Z
dy = x.
0 y(1 + eλ/y )
In particular, cx is a strictly increasing function with c0 = 0, c1 ≈ 0.91117 and cx → 1 as x → ∞.
Our proof has two main steps. In the first step, we use entropy methods to show that 2(cx +o(1))n is the correct
asymptotic count for the number of Egyptian fractions formed by adding distinct unit fractions with denominator
at most n whose sum is at most x. Then, in the second step, we use a method reminiscent of the absorption
technique in extremal graph theory to show that the same asymptotic count holds for the number of Egyptian
fractions summing to exactly x. Very roughly, we first set aside a small reservoir subset of [n]. Then, after finding
many subsets of [n] disjoint from this reservoir whose sums of reciprocals are somewhat smaller than x and whose
denominators have no very large prime power factors, we iteratively ‘clean’ these fractions by adding unit fractions
from the reservoir to obtain a sum x′ < x with small denominator. This is accomplished through the use of a recent
result [5] on the existence of homogeneous generalized arithmetic progressions in subset sums. Finally, we find a
small subset of the reservoir whose sum of reciprocals is equal to the remaining difference x − x′ .
∗ Department of Mathematics, California Institute of Technology, Pasadena, CA 91125. Email: [email protected].
† Department of Mathematics, Stanford University, Stanford, CA 94305. Email: [email protected].
‡ Department of Mathematics, Princeton University, Princeton, NJ 08544. Email: [email protected].
§ Department of Mathematics, Statistics and Computer Science, University of Illinois, Chicago, IL 60607. Email: [email protected].
¶ Department of Mathematics, Stanford University, Stanford, CA 94305. Email: [email protected].
‖ Department of Mathematics, University of California at San Diego, La Jolla, CA 92093. Email: [email protected].
∗∗ Department of Mathematics, University of California at San Diego, La Jolla, CA 92093. Email: [email protected].

1
2 Counting through entropy
P
In this section, we will use entropy methods to estimate the number of subsets A ⊆ [n] with s(A) := a∈A 1/a ≤ x.
To state our result, we need some notation. As in the introduction, let h(p) = −p log2 p − (1 − Pp) log2 (1 − p) for
n
0P< p < 1 and h(0) = h(1) = 0. Given x > 0, we choose p1 , . . . , pn ∈ [0, 1] so as to maximize m=1 h(pm ) given
n
m=1 pm /m ≤ x. We then let P (x) be the distribution of (Y1 , . . . , Yn ), where each Ym = 1 with probability
Pn pm and
0 otherwise independently of each other. Then the (Shannon) entropy of P (x) is given by H(P (x)) = m=1 h(pm ).
Lemma 2. For ε > 0 and x ∈ (0, (1 − ε)(ln n)/2), the number of subsets A ⊆ [n] with s(A) ≤ x is bounded above
by 2H(P (x)) . Furthermore, there is cx,n > 0 with cx,n = Θ(e−2x
√) such that, for any U ⊆ [n], the number of subsets
A ⊆ U with s(A) ≤ x is bounded below by 2H(P (x))−(n−|U|)−O( n/cx,n )
.
We first characterize P (x).
Pn 1
Lemma 3. For x < (1/2)( m=1 1/m), the distribution P (x) is given by setting pm = 1+ecx,n n/m for the unique
Pn
cx,n such that m=1 pm /m = x. Moreover, there is C > 0 such that, provided x < (1 − ε)(ln n)/2, cx,n > 0 and
cx,n ∈ [C −1 e−2x , Ce−2x ]. Finally, for ε > 0 and assuming x ∈ [ε, 1/ε], cx,n = λ + on (1), where λ > 0 is the unique
solution to Z 1
1
dy = x.
0 y(1 + eλ/y )
In particular, Z 1  
1
H(P (x)) = (1 + on (1))n h dy = Θ(n). (1)
0 1 + eλ/y
Note that here the on (1) terms may depend on ε.
Proof. Note that H(P (x)) is strongly convex and bounded (as a function of p1 , . . . P , pn ) in [0, 1]n . The unique
n
stationary point of H(P (x)) in [0, 1]n is p1 = · · · = pnP = 1/2. Thus, for x < (1/2)( m=1 1/m), the maxima of
n
H(P (x)) must be achieved on the boundary P of [0, 1] ∩ { m pm /m ≤ x} and, since h(x) = 0 for x ∈ {0, 1}, we must
have that the maxima are achieved on m pm /m = x. At a maximum (p∗1 , . . . , p∗n ), we must have that ∇H(P (x))
is parallel to (1, 1/2, . . . , 1/n). Noting that h′ (p) = log 1−p ∗ 1
p , we obtain that such a point satisfies pm = 1+ec′ /m for
some c′ . By the condition m≤n p∗m /m = x, we must also have that c′ satisfies
P

n
X 1
= x.
m=1
m(1 + ec′ /m )

Letting c = cx,n = c′ /n, we have that


n
1 X 1
= x.
n m=1 (m/n)(1 + ec/(m/n) )

It is easy to check that c > 0 when x < (1 − ε)(ln n)/2 and, for x sufficiently large, that c = Θ(e−2x ). Indeed, for
the last estimate, we observe that
cn/k
X 1
= Θ(e−k k −1 ),
m(1 + ecn/m )
m=cn/(k+1)

so X 1
= O(1).
m≤cn
m(1 + ecn/m )

For 1 ≤ k ≤ 1/c,
cn(k+1) cn(k+1)
X 1 1 X 1 1
≤ , ≥ .
m=cnk
m(1 + ecn/m ) k(1 + e1/(k+1) ) m=cnk
m(1 + ecn/m ) (k + 1)(1 + e1/k )

Note that 1 + e1/k = 2 + O(1/k), so we obtain that


1/c 1/c
X 1 X 1 1
, = ln(1/c) + O(1),
k(1 + e1/(k+1) ) (k + 1)(1 + e1/k ) 2
k=1 k=1

2
from which we immediately deduce the desired estimate on c.
Pn R1
Finally, for x ∈ [ε, 1/ε] and c > 0, we can approximate n1 m=1 (m/n)(1+e 1
c/(m/n) ) by the integral
1
0 y(1+ec/y ) dy.
R1 1
We thus obtain that for λ > 0 satisfying 0 y(1+e λ/y ) dy = x, we have cx,n = λ + on (1), from which (1) follows

readily.
We will use the following version of the standard Berry–Esseen bound [1, 7, 10] in the proof of Lemma 2.
LemmaPn
4. Let X1 , . . . , Xn be independent centered random variables with E[Xi2 ] = ζi and E[Xi3 ] = ρi . Let
Xi
Z = (Pni=1ζi )1/2 and Z ′ be a standard Gaussian. Then
i=1

Pn
ρi
sup |Pr[Z ≤ y] − Pr[Z ′ ≤ y]| ≤ C Pn i=1 3/2 .
y∈R ( i=1 i )
ζ

Proof of Lemma 2. Let S be a finite set of real numbers and n = |S|. Let rS (x) be the number of subsets of S
that sum to at most x. Define the random variable X to be a uniform random subset of S whose elements sum
to at most x. Note that the entropy of X satisfies H(X) = log2 rS (x), so rS (x) = 2H(X)
P . For eachPs ∈ S, let Xs
be the indicator random variable of the event s ∈ X and let ps = Pr[Xs ], so that E[ s∈S sXs ] = s∈S sps ≤ x.
Observe that X has the same distribution as the joint distribution of the n random variables Xs . Therefore, by
subadditivity of the entropy function, we have
X X
H(X) ≤ H(Xs ) = h(ps ).
s∈S s∈S

Hence, we get the upper bound


rS (x) ≤ 2h ,
P P
where h is the maximum value of s∈S h(ps ) over all choices of (ps )s∈S satisfying s∈S sps ≤ x. In particular, for
S = {1/m : m ∈ [n]}, we obtain that the number of subsets A ⊆ [n] with s(A) ≤ x is at most 2H(P (x)) , as claimed.
We now turn to the lower bound. Consider independent Bernoulli random P variables Ym for m ∈ [n] satisfying
Ym = 1 with probability pm and Ym = 0 otherwise. Let Y = (Ym )m∈[n] , Z = m∈[n] Ym /m and E be the indicator
P
of the event Z ≤ x. Recall that, for a ∈ {0, 1}, the conditional entropy H(Y |E = a) = − y∈{0,1}n Pr[Y = y|E =
P
a] log Pr[Y = y|E = a] and H(Y |E) = a∈{0,1} Pr[E = a]H(Y |E = a). Since E is determined by Y , we have
X
H(Y ) = H(Y, E) = H(Y |E) + H(E) = H(E) + Pr[E = a]H(Y |E = a).
a∈{0,1}

We thus have
1
H(Y |E = 1) = (H(Y ) − H(E) − Pr[Z > x]H(Y |E = 0)) . (2)
Pr[Z ≤ x]
Let c = cx,n as in Lemma 3. By that lemma, the random variable Z has variance
n  n
!
e−cn/m
  
X 1 1 X 1
cn/m )m2
− cn/m )2 m2
=Θ 2
=Θ . (3)
m=1
(1 + e (1 + e m=1
m cn
Pcn/k e−cn/m −k Pcn(k+1) e−cn/m
To see the last bound, observe that m=cn/(k+1) m2 = Θ( ecn ). Similarly, for 1 ≤ k ≤ 1/c, m=cnk m2 =
−1/k
Θ( ecnk2 ).
The desired bound follows from summing these estimates over k.
By a similar argument, the sum of the centered third moments of the Yi is
n n
" 3  3 # !
e−cn/m
 
X 1 1 1 1 1 X
− + 1− − =O
m=1
1 + ecn/m m m(1 + ecn/m ) 1 + ecn/m m(1 + ecn/m ) m=1
m3
 
1
=O . (4)
(cn)2

The Berry–Esseen bound, Lemma 4, then yields that, for g ∼ N (0, 1),

Pr(E = 1) = Pr[Z ≤ x] = Pr[g ≤ 0] + O((cn)−1/2 ) = 1/2 + O((cn)−1/2 ), (5)

3
where we used that E[Z] = x, together withP(3) and (4).
n
We next bound H(Y1 , . . . , Yn |E = 0) ≤ m=1 H(Ym |E = 0). To bound the summands, we note by Bayes’ rule
that
Pr(E = 0|Ym = 1)Pr(Ym = 1)
Pr(Ym = 1|E = 0) =
Pr(E = 0)
and we will use a similar argument with the Berry–Esseen bound to show that Pr(Ym = 1|E = 0) is close to

Pr(Ym = 1). Indeed, the calculations above similarly yield that the random variable Zm = Z − Ym + 1/m is a sum
′ 1 1 ′
of independent random variables with EZm = x + m − m(1+ecn/m ) , Var(Zm ) = Θ(1/(cn)) and the sum of centered
third moments O(1/(cn)2 ). By Lemma 4, for g ∼ N (0, 1),

Pr(E = 0|Ym = 1) = Pr(Zm > x)
1/m − 1/(m(1 + ecn/m ))
 
= Pr g > − ′ )1/2
+ O((cn)−1/2 )
Var(Zm
 √ 
cn
= 1/2 + O (cn)−1/2 + ,
m

assuming that m > 10 cn for the last bound, where we used the simple estimate Pr(g > z) = 12 + O(z) for |z| ≤ 1.
Therefore,  √ 
1/2 + O (cn)−1/2 + mcn √ 
Pr(E = 0|Ym = 1) cn 1
−1 = − 1 ≤ O + √ .
Pr(E = 0) 1/2 + O((cn)−1/2 ) m cn

Thus, by Bayes’ rule, √ 


|Pr(Ym = 1|E = 0) − Pr(Ym = 1)| cn 1
≤O +√ . (6)
Pr(Ym = 1) m cn
From (6), we have
 √   √ 
cn 1 cn 1
Pr(Ym = 1) 1 − O +√ ≤ Pr(Ym = 1|E = 0) ≤ Pr(Ym = 1) 1 + O +√ .
m cn m cn

Since h′ (p) = log 1−p 1


p ≤ log p , we have that

h(Pr(Ym = 1|E = 0))


 
1
≤ h(Pr(Ym = 1)) + log  √   |Pr(Ym = 1|E = 0) − Pr(Ym = 1)|
Pr(Ym = 1) 1 − O mcn + √1
cn
√
 

1 cn 1
≤ h(Pr(Ym = 1)) + log  √   · O +√ Pr(Ym = 1)
Pr(Ym = 1) 1 − O mcn + √1cn m cn
√ 
cn 1 1
≤ h(Pr(Ym = 1)) + O +√ Pr(Ym = 1) log ,
m cn Pr(Ym = 1)
 
1√
where in the last inequality we used that Pr(Ym = 1) ≤ 1/2, so log 
cn
 = O log Pr(Y1m =1) .
Pr(Ym =1) 1−O m + √1cn
Therefore,
n
X
H(Y1 , . . . , Yn |E = 0) ≤ H(Ym |E = 0)
m=1
√ X
≤ 10 cn + h(Pr(Ym = 1|E = 0))

m>10 cn
√

  
X cn 1 1
≤ 10 cn + H(Ym ) + O +√ Pr(Ym = 1) log
√ m cn Pr(Ym = 1)
m>10 cn


 
X cn 1 1
≤ H(Y1 , . . . , Yn ) + 10 cn + O +√ Pr(Ym = 1) log
√ m cn Pr(Ym = 1)
m>10 cn

4
√


X cn 1 cn/m
≤ H(Y1 , . . . , Yn ) + 10 cn + O +√
√ m cn 1 + ecn/m
m>10 cn
p
≤ H(Y1 , . . . , Yn ) + O( n/c).
√    √
cn −k √k2
Pcn/k cn/m
Again, for the last estimate, we note that m=cn/(k+1) mcn + √1cn 1+e cn/m ≤ O k2 e cn
= O(e−k cn) and,
Pcn(k+1)  √cn 
cn/m
  √ 
for 1 ≤ k ≤ 1/c, m=cnk + √1 cn/m ≤ O cn √1 ≤ O k1 cn . Summing over k, we thus have
m cn 1+e k cn

√ 
X cn 1 cn/m p
O +√ cn/m
= O( n/c).
√ m cn 1+e
m>10 cn

Combining with (2), and noting that H(E) ≤ 1 and Pr(Z ≤ x) = 1/2 + O((cn)−1/2 ) by (5), we obtain that
p p
H(Y1 , . . . , Yn |E = 1) ≥ (1 − O((cn)−1/2 ))(H(P (x)) − O( n/c)) = H(P (x)) − O( n/c).

Using that for any random variable X√we have H(X) ≤ log |supp(X)|, we obtain that the number of subsets A ⊆ [n]
with s(A) ≤ x is at least 2H(P (x))−O( n/c)
. This implies that the number of subsets A ⊆ [n] \ U with s(A) ≤ x is
at least √
2−(n−|U|) 2H(P (x))−O( n/c) ,
as required.

3 Subset sums of modular inverses


The main technical tool we still need is the following result, which says that if q is a large prime power and we
take a dense subset I of the interval [q ε , 2q ε ], then every residue class mod q can be written as the sum of a small
number of reciprocals of elements of I. Roughly speaking, this allows us to cancel out any particular prime power
from the denominator of a fraction in the absorption step of the proof of Theorem 1. Given a set A of integers, we
will use the notation Σ[s] (A) for the collection of sums of subsets of A of size at most s.
Theorem 5. Let δ, ε > 0 and let q be a prime power which is sufficiently large in terms of δ, ε. If I is a subset of
[q ε , 2q ε ] consisting of elements coprime to q with |I| ≥ δq ε , then Σ[s] (I −1 ) (mod q) = Zq for s = q ε/2 .
In the proof of Theorem 5, we will make use of the following key result from [5]. Recall that a generalized
arithmetic progression (henceforth GAP) P of dimension k is a set of integers {x0 + ℓ1 x1 + ℓ2 x2 + · · · + ℓk xk |0 ≤
ℓ1 < L1 , . . . , 0 ≤ ℓd < Lk }. A GAP is called proper if it has size exactly L1 L2 · · · Lk . We say that P is homogeneous
if x0 divides x1 , . . . , xk . For a natural number P
t, we define tP to be the t-fold sumset of P , while if t is a positive
real number which is not an integer and P = { ki=1 ni xi : ai ≤ ni ≤ bi } is a homogeneous GAP, we can generalize
Pk
the definition by setting tP = { i=1 ni xi : tai ≤ ni ≤ tbi }.
Theorem 6. For any β > 1 and 0 < η < 1, there are positive constants c and k such that the following holds. Let
A be a subset of [n] of size m with n ≤ mβ and let s ∈ [mη , cm/ log m]. Then there exists a subset  of A of size at
least m − c−1 s log m and a proper GAP P of dimension at most k such that  ∪ {0} is a subset of P . Furthermore,
there exists A′ ⊆ Â of size at most s such that Σ(A′ ) contains a homogeneous translate of csP , where csP is proper.
We will also need the following simple variant of Dirichlet’s simultaneous approximation theorem. For a residue
class i (mod q), we use the notation i for the unique integer in (−q/2, q/2] congruent to i modulo q.
Lemma
Qk 7. Given a prime power q, integers d1 , . . . , dk coprime to q and positive integers a1 , . . . , ak such that
i=1 a i = A, there exists a positive integer T < q and integers d′1 , . . . , d′k such that T di = d′i (mod q) and |d′i | ≤
2(q/ai ) · (A/q)1/k for all i ∈ [k].

Proof. Let bi = 2(q/ai ) · (A/q)1/k . Note that ⌊sdi /bi ⌋ takes at most q/bi values as s ranges over Zq . By the
Qk
pigeonhole principle, there exist distinct s 6= s′ in Zq such that ⌊sdi /bi ⌋ = ⌊s′ di /bi ⌋ for all i ∈ [k], since i=1 bqi =
k
(q/A)2−k i=1 ai < q. Letting T = s′ − s, we then have that |T di | ≤ bi = 2(q/ai ) · (A/q)1/k .
Q

5
We now proceed to the proof of Theorem 5. The basic idea is to use Theorem 6 to argue that there is a large
subset J of the set of inverses I −1 which is contained in a proper GAP P of bounded dimension k such that Σ[s] (J)
contains a proper translate of csP . We then exploit the nature of the set of inverses to argue that k must in fact
be 1, that is, P is simply a progression, from which the required result quickly follows.
Proof of Theorem 5. Let s = q ε/2 . Let I and I −1 denote the set of integer representations (in (−q/2, q/2]) of I and
I −1 . By Theorem 6, there is c depending only on ε such that we can find J ⊆ I −1 of size at least |I| − c−1 s log |I| =
(1 − o(1))|I| and a proper GAP P of dimension k = Oε (1) such that J ∪ {0} ⊆ P and Σ[s] (J) contains a translate
of csP which is proper. Pk Qk
By expanding P by a factor of up to 2k if necessary, we can write P = u=1 [−au , au ]du . Let A = u=1 au .
With these au and du , we apply Lemma 7 to find a value Pkof T satisfying the conclusions of that lemma and let
T · P = {tx : x ∈ P }. Note that, for any j ∈ T · P , |j| ≤ u=1 au |d′u | ≤ 2kq(A/q)1/k .
Claim. Let N denote the number of solutions to the equation i · j = T (mod q) with i ∈ I and j ∈ T · J. Then

|I|/2 ≤ N < q C/ log log q · 8kq ε (A/q)1/k . (7)

We first complete the proof of Theorem 5 assuming the claim. From (7) and the assumption that |I| ≥ δq ε , we
deduce that
A ≥ δ k (16k)−k q 1−Ck/ log log q .
On the other hand, since Σ[s] (J) contains a translate of csP with csP proper and Σ[s] (J) ⊆ (−sq/2, sq], we have
that
ck sk A ≤ ck sk |P | = |csP | ≤ |Σ[s] (J)| ≤ sq.
Hence, A ≤ c−k qs1−k = c−k q 1−(k−1)ε/2 . Provided q is sufficiently large in terms of δ, ε, these two estimates on A
together imply that k = 1. Therefore, csP is an arithmetic progression of length Ω(As) > q. Furthermore, P must
have common difference coprime with q as J ⊆ I −1 is contained in P . Hence, any translate of csP covers all residue
classes in Zq . This finishes the proof of the theorem assuming the claim.
It remains to verify the claim. Since each j0 ∈ J has j0−1 ∈ I (mod q), the number of solutions to i · j = T
(mod q) with i ∈ I, j = T · j0 ∈ T · J ⊆ T · P is at least |J| = (1 − o(1))|I| ≥ |I|/2. Since I ⊆ [q ε , 2q ε ], we have that

|i · j| ≤ 2q ε · 2kq(A/q)1/k .

As such, if i · j = T (mod q), then i · j = qx + T , where 0 ≤ |x| ≤ 4kq ε (A/q)1/k . But the number of solutions to
the equation i · j = qx + T with 0 ≤ |x| ≤ 4kq ε (A/q)1/k is bounded above by
X
τ (qx + T ) < q C/ log log q · 8kq ε (A/q)1/k ,
|x|≤4kqε (A/q)1/k

where τ (n) denotes the number of divisors of n and we have used the standard bound τ (n) ≤ q C/ log log q for an
absolute constant C and all n ≤ q 2 . This completes the proof of the claim.

4 Absorption
We are now ready to prove Theorem 1 in the following explicit form. We recall that a positive integer n is t-smooth
if all of its prime factors are at most t and t-powersmooth if all of its prime power factors are at most t.
x ≤ ξ ln n is a rational whose
Theorem 8. Let ε > 0 be sufficiently small. Then there exists ξ > 0 such that if P
denominator is (n1−ε /2)-powersmooth, then the number of subsets A ⊆ [n] with x = a∈A 1/a is at least 2cx n−cε n ,
where cε → 0 as ε → 0 and Z 1  
1
cx := h dy
0 1 + eλ/y
with λ the unique real number such that
1
1
Z
dy = x.
0 y(1 + eλ/y )
We first record a simple lemma guaranteeing that most integers at most n are (n1−ε /2)-powersmooth.

6
Lemma 9. For δ sufficiently small, n sufficiently large in terms of δ and t = n1−δ , at least (1 − 2δ)n positive
integers at most n are t-powersmooth.
Proof. It is well known that if t = nu , the number of t-smooth numbers up to n is asymptotic to ψ(u)n, where ψ
is the Dickman function taking values in (0, 1) for u ∈ (0, 1). Moreover, for u > 1/2, ψ(u) = 1 + ln u. Thus, for
u = 1 − δ and t = n1−δ , at least (1 + ln(1 − δ) − o(1))n ≥ (1 − 32 δ)n positive integers at most n are n1−δ -smooth.
Among the t-smooth numbers, the only ones that are not t-powersmooth are those divisible by a prime power pα
where p ≤ t but pα > t. Using the prime number theorem, we may upper bound the total count of such exceptional
smooth numbers by X jnk jnk
= π(t) = o(n).
t t
p≤t

Hence, at least (1 − 32 δ)n − o(n) ≥ (1 − 2δ)n positive integers at most n are t-powersmooth, as required.
P
Finally, we prove Theorem 8. Recall the notation that, for A ⊆ [n], s(A) = a∈A 1/a.
Proof of Theorem 8. By choosing cε suitably, we can assume that n is sufficiently large in terms of ε. Let L be
sufficiently large, assuming in particular that Theorem 5 applies for ε as in the statement of the theorem, δ = 12
and all q > L. Let K denote the least common multiple of all S prime powers at most L. We first reserve the set R
of multiples of K in [n]. Let P(q) = q · [q ε , 2q ε ] \ R and P = q P(q), where q ranges over all prime powers at most
n1−ε /2. Here q · S = {qs : s ∈ S} and the notation [q ε , 2q ε ] refers to the set of integers in this interval. Let S denote
the set of (n1−ε /2)-powersmooth numbers at most n and U = S \ (R ∪ P). Lemma 9 implies that |S| ≥ (1 − O(ε))n
and we also have that |P ∩ [n]| ≤ q≤n1−ε 2q ε ≤ 4 logn n . Thus,
P

n − |U| ≤ n/K + O(εn). (8)

Let η > 0 be a constant to be chosen later. By Lemma 2, applied with x replaced by (1 − η)x, we can find many
subsets of U whose sums of reciprocals are at most (1 − η)x. Indeed, the number of such subsets is at least

2H(P ((1−η)x))−(n−|U |)−O( n/c(1−η)x,n ) .

Fix one such sum corresponding to a set A0 ⊆ [n], and let x0 = x − s(A0 ) ≥ ηx. Consider the following procedure,
where at each step i we have a real number xi and a set Ai for which s(Ai ) = xi :
1. In decreasing order over the prime powers larger than L, consider the largest prime power q = qi ≤ n1−ε /2 of
a prime p = pi which appears as a factor of the denominator of xi . We then find Bi ⊆ (1/q) · P(q) of size at
most q ε/2 such that, for xi = uvii with ui , vi coprime, s(Bi ) = − vui /q
i
(mod q). We say that step i succeeds if
we can find such a Bi . If it does succeed, we update xi+1 = xi − s(q · Bi ) and Ai+1 = Ai ∪ q · Bi , noting by
our choice that no nonzero power of p divides the denominator of xi+1 . Furthermore, any new prime power
divisor of xi+1 is at most 2q ε .
2. We iterate until all the prime powers qi > L have been processed. At this point, the final output xf is a
rational number whose denominator is L-powersmooth. We then find a subset of the reservoir R whose sum
of inverses is equal to xf .
The following claim guarantees that the procedure above succeeds.
−1−ε/2
Claim. For each i, step i succeeds. Furthermore, s(qi · Bi ) ≤ qi and, for some absolute constant C > 0,

|xf − x0 | ≤ Cε−1 L−ε/2 . (9)

Proof. Theorem 5 implies immediately that step i always succeeds. Furthermore, by our choice of Bi ,
ε/2
qi −1−ε/2
s(qi · Bi ) ≤ = qi .
qi1+ε
The estimate (9) follows since the qi are distinct integers between L and n, so
X X −1−ε/2 X
|xf − x0 | = s(qi · Bi ) ≤ qi ≤ m−1−ε/2 < O(ε−1 L−ε/2 ),
i L<qi <n L<m<n

as required.

7
We ensure that L, η > 0 are chosen (depending on ε) so that Cε−1 L−ε/2 < ηx/2. From (9), we have that x ≥ xf
and x − xf is a positive rational number at most x whose denominator is L-powersmooth. As such, we have that
K(x − xf ) is a positive integer with K(x − xf ) ≤ Kx. We now note that there exists a subset D ⊆ [n/K] such
that s(D) = K(x − xf ), where we use the assumption that x ≤ ξ ln n for ξ chosen sufficiently small in ε, so that
Kx < ε log(n/K). To see that this is the case, one may, for example, make use of Croot’s result [6] that one can
always be written as the sum of reciprocals of numbers from any interval of the form [t, (e + o(1))t]. This allows us
to iteratively remove K(x − xf ) disjoint subsets from [n/K], the sum of the reciprocals of each of which is one. We
then set their union toPbe D, noting
1
Pthat K · D ⊆ R is disjoint from U and P.
We then have x = d∈D Kd + i s(qi · Bi ) + s(A0 ) and the number of such distinct representations is at least
the number of choices for A0 , which is bounded below by

2H(P ((1−η)x))−(n−|U |)−O( n/c(1−η)x,n ) ≥ 2H(P (x))−cε n ,

for an appropriate constant cε with cε → 0 as ε → 0, where we have used Lemma 2 and (8). This completes the
proof of Theorem 8.

Note added. As we completed this paper, we learned that a result similar to our Theorem 1 was obtained
simultaneously and independently, though using rather different methods, by Yang P. Liu and Mehtaab Sawhney [9].
Acknowledgements. We are grateful to the American Institute of Mathematics for hosting the SQuaREs project
at which this work was initiated. Research supported by NSF Awards DMS-2054452 and DMS-2348859 (David
Conlon), NSF Award DMS-2154129 (Jacob Fox), NSF Award DMS-2103154 (Xiaoyu He), NSF Awards DMS-
1952767 and DMS-2153576 (Dhruv Mubayi), a Clay Research Fellowship and a Stanford Science Fellowship (Huy
Tuan Pham), an NSF CAREER Award and NSF Awards DMS-1952786 and DMS-2246847 (Andrew Suk) and NSF
Award DMS-1800332 (Jacques Verstraëte).

References
[1] A. C. Berry, The accuracy of the Gaussian approximation to the sum of independent variates, Trans. Amer.
Math. Soc. 49 (1941), 122–136. 2
[2] T. F. Bloom, On a density conjecture about unit fractions, preprint available at arXiv:2112.03726 [math.NT].
1
[3] T. F. Bloom, www.erdosproblems.com, March 2024. 1
[4] T. F. Bloom and C. Elsholtz, Egyptian fractions, Nieuw Arch. Wiskd. 23 (2022), 237–245. 1
[5] D. Conlon, J. Fox and H. T. Pham, Homogeneous structures in subset sums and non-averaging sets, preprint
available at arXiv:2311.01416 [math.CO]. 1, 3
[6] E. S. Croot, On unit fractions with denominators in short intervals, Acta Arith. 99 (2001), 99–114. 4
[7] C.-G. Esseen, On the Liapunoff limit of error in the theory of probability, Arkiv för Matematik, Astronomi och
Fysik A28 (1942), 1–19. 2
[8] P. Erdős and R. L. Graham, Old and new problems and results in combinatorial number theory, Monogr.
Enseign. Math., 28, Université de Genève, L’Enseignement Mathématique, Geneva, 1980, 128 pp. 1
[9] Y. P. Liu and M. Sawhney, On further questions regarding unit fractions, preprint available at arXiv:2404.07113
[math.NT]. 4
[10] I. G. Shevtsova, An improvement of convergence rate estimates in the Lyapunov theorem, Dokl. Math. 82
(2010), 862–864. 2
[11] S. Steinerberger, On a problem involving unit fractions, preprint available at arXiv:2403.17041 [math.CO]. 1

You might also like