1501.05438v2
1501.05438v2
Preface vii
Acknowledgements ix
1 Introduction 1
1.1 History and new developments . . . . . . . . . . . . . . . . . . . . . 2
1.2 The circle method: Fourier analysis on Z . . . . . . . . . . . . . . . . 6
1.3 The major arcs M . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.3.1 What do we really know about L-functions and their zeros? . 9
1.3.2 Estimates of fb(α) for α in the major arcs . . . . . . . . . . . 10
1.4 The minor arcs m . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.4.1 Qualitative goals and main ideas . . . . . . . . . . . . . . . . 14
1.4.2 Combinatorial identities . . . . . . . . . . . . . . . . . . . . 16
1.4.3 Type I sums . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.4.4 Type II, or bilinear, sums . . . . . . . . . . . . . . . . . . . . 21
1.5 Integrals over the major and minor arcs . . . . . . . . . . . . . . . . 24
1.6 Some remarks on computations . . . . . . . . . . . . . . . . . . . . . 28
I Minor arcs 41
3 Introduction 43
3.1 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.2 Comparison to earlier work . . . . . . . . . . . . . . . . . . . . . . . 45
3.3 Basic setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.3.1 Vaughan’s identity . . . . . . . . . . . . . . . . . . . . . . . 45
iii
iv CONTENTS
4 Type I sums 51
4.1 Trigonometric sums . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.2 Type I estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.2.1 Type I: variations . . . . . . . . . . . . . . . . . . . . . . . . 63
5 Type II sums 77
5.1 The sum S1 : cancellation . . . . . . . . . . . . . . . . . . . . . . . . 80
5.1.1 Reduction to a sum with µ . . . . . . . . . . . . . . . . . . . 80
5.1.2 Explicit bounds for a sum with µ . . . . . . . . . . . . . . . . 84
5.1.3 Estimating the triple sum . . . . . . . . . . . . . . . . . . . . 89
5.2 The sum S2 : the large sieve, primes and tails . . . . . . . . . . . . . . 93
14 Conclusion 259
14.1 The `2 norm over the major arcs: explicit version . . . . . . . . . . . 259
14.2 The total major-arc contribution . . . . . . . . . . . . . . . . . . . . 261
14.3 The minor-arc total: explicit version . . . . . . . . . . . . . . . . . . 267
14.4 Conclusion: proof of main theorem . . . . . . . . . . . . . . . . . . . 275
IV Appendices 277
A Norms of smoothing functions 279
A.1 The decay of a Mellin transform . . . . . . . . . . . . . . . . . . . . 280
A.2 The difference η+ − η◦ in `2 norm. . . . . . . . . . . . . . . . . . . . 283
A.3 Norms involving η+ . . . . . . . . . . . . . . . . . . . . . . . . . . . 285
0
A.4 Norms involving η+ . . . . . . . . . . . . . . . . . . . . . . . . . . . 286
A.5 The `∞ -norm of η+ . . . . . . . . . . . . . . . . . . . . . . . . . . . 288
vi CONTENTS
The ternary Goldbach conjecture (or three-prime conjecture) states that every odd
number n greater than 5 can be written as the sum of three primes. The purpose of this
book is to give the first full proof of this conjecture.
The proof builds on the great advances made in the early 20th century by Hardy and
Littlewood (1922) and Vinogradov (1937). Progress since then has been more gradual.
In some ways, it was necessary to clear the board and start work using only the main
existing ideas towards the problem, together with techniques developed elsewhere.
Part of the aim has been to keep the exposition as accessible as possible, with
an emphasis on qualitative improvements and new technical ideas that should be of
use elsewhere. The main strategy was to give an analytic approach that is efficient,
relatively clean, and, as it must be for this problem, explicit; the focus does not lie in
optimizing explicit constants, or in performing calculations, necessary as these tasks
are.
Organization. In the introduction, after a summary of the history of the problem,
we will go over a detailed outline of the proof. The rest of the book is divided in three
parts, structured so that they can be read independently: the first two parts do not refer
to each other, and the third part uses only the main results (clearly marked) of the first
two parts.
As is the case in most proofs involving the circle method, the problem is reduced to
showing that a certain integral over the “circle” R/Z is non-zero. The circle is divided
into major arcs and minor arcs. In Part I – in some ways the technical heart of the proof
– we will see how to give upper bounds on the integrand when α is in the minor arcs.
Part II will provide rather precise estimates for the integrand when the variable α is in
the major arcs. Lastly, Part III shows how to use these inputs as well as possible to
estimate the integral.
Each part and each chapter starts with a general discussion of the strategy and
the main ideas involved. Some of the more technical bounds and computations are
relegated to the appendices.
vii
viii PREFACE
1 2
Mellin transform of
Smoothing func-
twisted Gaussian
Type I sums tions and their use
4 8 11
5 9 12
14
Conclusion
Acknowledgements
The author is very thankful to D. Platt, who, working in close coordination with him,
provided GRH verifications in the necessary ranges, and also helped him with the usage
of interval arithmetic. He is also deeply grateful to O. Ramaré, who, in reply to his
requests, prepared and sent for publication several auxiliary results, and who otherwise
provided much-needed feedback.
The author is also much indebted to A. Booker, B. Green, R. Heath-Brown, H.
Kadiri, D. Platt, T. Tao and M. Watkins for many discussions on Goldbach’s prob-
lem and related issues. Several historical questions became clearer due to the help
of J. Brandes, K. Gong, R. Heath-Brown, Z. Silagadze, R. Vaughan and T. Wooley.
Additional references were graciously provided by R. Bryant, S. Huntsman and I.
Rezvyakova. Thanks are also due to B. Bukh, A. Granville and P. Sarnak for their
valuable advice.
The introduction is largely based on the author’s article for the Proceedings of the
2014 ICM [Hel14b]. That article, in turn, is based in part on the informal note [Hel13b],
which was published in Spanish translation ([Hel13a], translated by M. A. Morales and
the author, and revised with the help of J. Cilleruelo and M. Helfgott) and in a French
version ([Hel14a], translated by M. Bilu and revised by the author). The proof first
appeared as a series of preprints: [Helb], [Hela], [Helc].
Travel and other expenses were funded in part by the Adams Prize and the Philip
Leverhulme Prize. The author’s work on the problem started at the Université de
Montréal (CRM) in 2006; he is grateful to both the Université de Montréal and the
École Normale Supérieure for providing pleasant working environments. During the
last stages of the work, travel was partly covered by ANR Project Caesar No. ANR-
12-BS01-0011.
The present work would most likely not have been possible without free and pub-
licly available software: SAGE, PARI, Maxima, gnuplot, VNODE-LP, PROFIL / BIAS,
and, of course, LATEX, Emacs, the gcc compiler and GNU/Linux in general. Some ex-
ploratory work was done in SAGE and Mathematica. Rigorous calculations used either
D. Platt’s interval-arithmetic package (based in part on Crlibm) or the PROFIL/BIAS
interval arithmetic package underlying VNODE-LP.
The calculations contained in this paper used a nearly trivial amount of resources;
they were all carried out on the author’s desktop computers at home and work. How-
ever, D. Platt’s computations [Plab] used a significant amount of resources, kindly do-
nated to D. Platt and the author by several institutions. This crucial help was provided
by MesoPSL (affiliated with the Observatoire de Paris and Paris Sciences et Lettres),
ix
x ACKNOWLEDGEMENTS
Université de Paris VI/VII (UPMC - DSI - Pôle Calcul), University of Warwick (thanks
to Bill Hart), University of Bristol, France Grilles (French National Grid Infrastructure,
DIRAC national instance), Université de Lyon 1 and Université de Bordeaux 1. Both
D. Platt and the author would like to thank the donating organizations, their technical
staff, and all those who helped to make these resources available to them.
Chapter 1
Introduction
The question we will discuss, or one similar to it, seems to have been first posed by
Descartes, in a manuscript published only centuries after his death [Des08, p. 298].
Descartes states: “Sed & omnis numerus par fit ex uno vel duobus vel tribus primis”
(“But also every even number is made out of one, two or three prime numbers.”1 .) This
statement comes in the middle of a discussion of sums of polygonal numbers, such as
the squares.
Statements on sums of primes and sums of values of polynomials (polygonal num-
bers, powers nk , etc.) have since shown themselves to be much more than mere cu-
riosities – and not just because they are often very difficult to prove. Whereas the study
of sums of powers can rely on their algebraic structure, the study of sums of primes
leads to the realization that, from several perspectives, the set of primes behaves much
like the set of integers, or like a random set of integers. (It also leads to the realization
that this is very hard to prove.)
If, instead of the primes, we had a random set of odd integers S whose density –
an intuitive concept that can be made precise – equaled that of the primes, then we
would expect to be able to write every odd number as a sum of three elements of S,
and every even number as the sum of two elements of S. We would have to check by
hand whether this is true for small odd and even numbers, but it is relatively easy to
show that, after a long enough check, it would be very unlikely that there would be any
exceptions left among the infinitely many cases left to check.
The question, then, is in what sense we need the primes to be like a random set of
integers; in other words, we need to know what we can prove about the regularities of
the distribution of the primes. This is one of the main questions of analytic number
theory; progress on it has been very slow and difficult.
Fourier analysis expresses information on the distribution of a sequence in terms
of frequencies. In the case of the primes, what may be called the main frequencies –
those in the major arcs – correspond to the same kind of large-scale distribution that
is encoded by L-functions, the family of functions to which the Riemann zeta function
1 Thanks are due to J. Brandes and R. Vaughan for a discussion on a possible ambiguity in the Latin
wording. Descartes’ statement is mentioned (with a translation much like the one given here) in Dickson’s
History [Dic66, Ch. XVIII].
1
2 CHAPTER 1. INTRODUCTION
belongs. On some of the crucial questions on L-functions, the limits of our knowledge
have barely budged in the last century. There is something relatively new now, namely,
rigorous numerical data of non-negligible scope; still, such data is, by definition, finite,
and, as a consequence, its range of applicability is very narrow. Thus, the real question
in the major-arc regime is how to use well the limited information we do have on the
large-scale distribution of the primes. As we will see, this requires delicate work on
explicit asymptotic analysis and smoothing functions.
Outside the main frequencies – that is, in what are called the minor arcs – estimates
based on L-functions no longer apply, and what is remarkable is that one can say
anything meaningful on the distribution of the primes. Vinogradov was the first to give
unconditional, non-trivial bounds, showing that there are no great irregularities in the
minor arcs; this is what makes them “minor”. Here the task is to give sharper bounds
than Vinogradov. It is in this regime that we can genuinely say that we learn a little
more about the distribution of the primes, based on what is essentially an elementary
and highly optimized analytic-combinatorial analysis of exponential sums, i.e., Fourier
coefficients given by series (supported on the primes, in our case).
The circle method reduces an additive problem – that is, a problems on sums, such
as sums of primes, powers, etc. – to the estimation of an integral on the space of
frequencies (the “circle” R/Z). In the case of the primes, as we have just discussed, we
have precise estimates on the integrand on part of the circle (the major arcs), and upper
bounds on the rest of the circle (the minor arcs). Putting them together efficiently to
give an estimate on the integral is a delicate matter; we leave it for the last part, as it
is really what is particular to our problem, as opposed to being of immediate general
relevance to the study of the primes. As we shall see, estimating the integral well does
involve using – and improving – general estimates on the variance of irregularities in
the distribution of the primes, as given by the large sieve.
In fact, one of the main general lessons of the proof is that there is a very close
relationship between the circle method and the large sieve; we will use the large sieve
not just as a tool – which we shall, incidentally, sharpen in certain contexts – but as a
source for ideas on how to apply the circle method more effectively.
This has been an attempt at a first look from above. Let us now undertake a more
leisurely and detailed overview of the problem and its solution.
We would now say “every integer greater than 1”, since we no long consider 1 to
be a prime number. Moreover, the conjecture is nowadays split into two:
• the weak, or ternary, Goldbach conjecture states that every odd integer greater
than 5 can be written as the sum of three primes;
• the strong, or binary, Goldbach conjecture states that every even integer greater
than 2 can be written as the sum of two primes.
As their names indicate, the strong conjecture implies the weak one (easily: subtract 3
from your odd number n, then express n − 3 as the sum of two primes).
The strong conjecture remains out of reach. A short while ago – the first complete
version appeared on May 13, 2013 – the author proved the weak Goldbach conjecture.
Theorem 1.1.1. Every odd integer greater than 5 can be written as the sum of three
primes.
In 1937, I. M. Vinogradov proved [Vin37] that the conjecture is true for all odd
numbers n larger than some constant C. (Hardy and Littlewood had proved the same
statement under the assumption of the Generalized Riemann Hypothesis, which we
shall have the chance to discuss later.)
It is clear that a computation can verify the conjecture only for n ≤ c, c a constant:
computations have to be finite. What can make a result coming from analytic number
theory be valid only for n ≥ C?
An analytic proof, generally speaking, gives us more than just existence. In this
kind of problem, it gives us more than the possibility of doing something (here, writing
an integer n as the sum of three primes). It gives us a rigorous estimate for the number
of ways in which this something is possible; that is, it shows us that this number of
ways equals
main term + error term, (1.1)
where the main term is a precise quantity f (n), and the error term is something whose
absolute value is at most another precise quantity g(n). If f (n) > g(n), then (1.1) is
non-zero, i.e., we will have shown the existence of a way to write our number as the
sum of three primes.
(Since what we truly care about is existence, we are free to weigh different ways
of writing n as the sum of three primes however we wish – that is, we can decide that
some primes “count” twice or thrice as much as others, and that some do not count at
all.)
Typically, after much work, we succeed in obtaining (1.1) with f (n) and g(n) such
that f (n) > g(n) asymptotically, that is, for n large enough. To give a highly simplified
example: if, say, f (n) = n2 and g(n) = 100n3/2 , then f (n) > g(n) for n > C, where
C = 104 , and so the number of ways (1.1) is positive for n > C.
We want a moderate value of C, that is, a C small enough that all cases n ≤ C can
be checked computationally. To ensure this, we must make the error term bound g(n)
as small as possible. This is our main task. A secondary (and sometimes neglected)
possibility is to rig the weights so as to make the main term f (n) larger in comparison
to g(n); this can generally be done only up to a certain point, but is nonetheless very
helpful.
4 CHAPTER 1. INTRODUCTION
As we said, the first unconditional proof that odd numbers n ≥ C can be written
as the sum of three primes is due to Vinogradov. Analytic bounds fall into several
categories, or stages; quite often, successive versions of the same theorem will go
through successive stages.
1. An ineffective result shows that a statement is true for some constant C, but gives
no way to determine what the constant C might be. Vinogradov’s first proof of
his theorem (in [Vin37]) is like this: it shows that there exists a constant C such
that every odd number n > C is the sum of three primes, yet give us no hope of
finding out what the constant C might be.2 Many proofs of Vinogradov’s result
in textbooks are also of this type.
2. An effective, but not explicit, result shows that a statement is true for some
unspecified constant C in a way that makes it clear that a constant C could
in principle be determined following and reworking the proof with great care.
Vinogradov’s later proof ([Vin47], translated in [Vin54]) is of this nature. As
Chudakov [Chu47, §IV.2] pointed out, the improvement on [Vin37] given by
Mardzhanishvili [Mar41] already had the effect of making the result effective.3
3. An explicit result gives a value of C. According to [Chu47, p. 201], the first
explicit version of Vinogradov’s result was given by Borozdkin in his unpub-
lished doctoral dissertation, written under the direction of Vinogradov (1939):
C = exp(exp(exp(41.96))). Such a result is, by definition, also effective.
16.038
Borodzkin later [Bor56] gave the value C = ee , though he does not seem to
have published the proof. The best – that is, smallest – value of C known before
the present work was that of Liu and Wang [LW02]: C = 2 · 101346 .
4. What we may call an efficient proof gives a reasonable value for C – in our case,
a value small enough that checking all cases up to C is feasible.
How far were we from an efficient proof? That is, what sort of computation could
ever be feasible? The situation was paradoxical: the conjecture was known above an
explicit C, but C = 2 · 101346 is so large that it could not be said that the problem could
be attacked by any foreseeable computational means within our physical universe. (A
truly brute-force
√ verification up to C takes at least C steps; a cleverer verification takes
well over C steps. The number of picoseconds since the beginning of the universe is
less than 1030 , whereas the number of protons in the observable universe is currently
estimated at ∼ 1080 [Shu92]; this limits the number of steps that can be taken in
any currently imaginable computer, even if it were to do parallel processing on an
astronomical scale.) Thus, the only way forward was a series of drastic improvements
in the mathematical, rather than computational, side.
I gave a proof with C = 1029 in May 2013. Since D. Platt and I had verified
the conjecture for all odd numbers up to n ≤ 8.8 · 1030 by computer [HP13], this
established the conjecture for all odd numbers n.
2 Here, as is often the case in ineffective results in analytic number theory, the underlying issue is that of
Siegel zeros, which are believed not to exist, but have not been shown not to; the strongest bounds on (i.e.,
against) such zeros are ineffective, and so are all of the many results using such estimates.
3 The proof in [Mar41] combined the bounds in [Vin37] with a more careful accounting of the effect of
(In December 2013, I reduced C to 1027 . The verification of the ternary Gold-
bach conjecture up to n ≤ 1027 can be done on a home computer over a weekend,
as of the time of writing (2014). It must be said that this uses the verification of the
binary Goldbach conjecture for n ≤ 4 · 1018 [OeSHP14], which itself required com-
putational resources far outside the home-computing range. Checking the conjecture
up to n ≤ 1027 was not even the main computational task that needed to be accom-
plished to establish the Main Theorem – that task was the finite verification of zeros of
L-functions in [Plab], a general-purpose computation that should be useful elsewhere.)
What was the strategy of the proof? The basic framework is the one pioneered by
Hardy and Littlewood for a variety of problems – namely, the circle method, which, as
we shall see, is an application of Fourier analysis over Z. (There are other, later routes
to Vinogradov’s result; see [HB85], [FI98] and especially the recent work [Sha14],
which avoids using anything about zeros of L-functions inside the critical strip.) Vino-
gradov’s proof, like much of the later work on the subject, was based on a detailed
analysis of exponential sums, i.e., Fourier transforms over Z. So is the proof that we
will sketch.
At the same time, the distance between 2 · 101346 and 1027 is such that we cannot
hope to get to 1027 (or any other reasonable constant) by fine-tuning previous work.
Rather, we must work from scratch, using the basic outline in Vinogradov’s original
proof and other, initially unrelated, developments in analysis and number theory (no-
tably, the large sieve). Merely improving constants will not do; rather, we must do
qualitatively better than previous work (by non-constant factors) if we are to have any
chance to succeed. It is on these qualitative improvements that we will focus.
***
It is only fair to review some of the progress made between Vinogradov’s time and
ours. Here we will focus on results; later, we will discuss some of the progress made
in the techniques of proof. See [Dic66, Ch. XVIII] for the early history of the problem
(before Hardy and Littlewood); see R. Vaughan’s ICM lecture notes on the ternary
Goldbach problem [Vau80] for some further details on the history up to 1978.
In 1933, Schnirelmann proved [Sch33] that every integer n > 1 can be written as
the sum of at most K primes for some unspecified constant K. (This pioneering work
is now considered to be part of the early history of additive combinatorics.) In 1969,
Klimov gave an explicit value for K (namely, K = 6 · 109 ); he later improved the
constant to K = 115 (with G. Z. Piltay and T. A. Sheptickaja) and K = 55. Later,
there were results by Vaughan [Vau77a] (K = 27), Deshouillers [Des77] (K = 26)
and Riesel-Vaughan [RV83] (K = 19).
Ramaré showed in 1995 that every even number n > 1 can be written as the sum of
at most 6 primes [Ram95]. In 2012, Tao proved [Tao14] that every odd number n > 1
is the sum of at most 5 primes.
There have been other avenues of attack towards the strong conjecture. Using ideas
close to those of Vinogradov’s, Chudakov [Chu37], [Chu38], Estermann [Est37] and
van der Corput [van37] proved (independently from each other) that almost every even
number (meaning: all elements of a subset of density 1 in the even numbers) can be
written as the sum of two primes. In 1973, J.-R. Chen showed [Che73] that every even
6 CHAPTER 1. INTRODUCTION
number n larger than a constant C can be written as the sum of a prime number and
the product of at most two primes (n = p1 + p2 or n = p1 + p2 p3 ). Incidentally,
J.-R. Chen himself, together with T.-Z. Wang, was responsible for the best bounds on
C (for ternary Goldbach) before Lui and Wang: C = exp(exp(11.503)) < 4 · 1043000
[CW89] and C = exp(exp(9.715)) < 6 · 107193 [CW96].
Matters are different if one assumes the Generalized Riemann Hypothesis (GRH).
A careful analysis [Eff99] of Hardy and Littlewood’s work [HL22] gives that every
odd number n ≥ 1.24 · 1050 is the sum of three primes if GRH is true4 . According
to [Eff99], the same statement with n ≥ 1032 was proven in the unpublished doctoral
dissertation of B. Lucke, a student of E. Landau’s, in 1926. Zinoviev [Zin97] improved
this to n ≥ 1020 . A computer check ([DEtRZ97]; see also [Sao98]) showed that the
conjecture is true for n < 1020 , thus completing the proof of the ternary Goldbach
conjecture under the assumption of GRH. What was open until now was, of course, the
problem of giving an unconditional proof.
4 In fact, Hardy, Littlewood and Effinger use an assumption somewhat weaker than GRH: they assume
that Dirichlet L-functions have no zeroes satisfying <(s) ≥ θ, where θ < 3/4 is arbitrary. (We will review
Dirichlet L-functions in a minute.)
1.2. THE CIRCLE METHOD: FOURIER ANALYSIS ON Z 7
We can see right away from this that (f ∗ g)(n) can be non-zero only if n can be
written as n = m1 + m2 for some m1 , m2 such that f (m1 ) and g(m2 ) are non-zero.
Similarly, (f ∗g ∗h)(n) can be non-zero only if n can be written as n = m1 +m2 +m3
for some m1 , m2 , m3 such that f (m1 ), f2 (m2 ) and f3 (m3 ) are all non-zero. This
suggests that, to study the ternary Goldbach problem, we define f1 , f2 , f3 : Z → C so
that they take non-zero values only at the primes.
Hardy and Littlewood defined f1 (n) = f2 (n) = f3 (n) = 0 for n non-prime (and
also for n ≤ 0), and f1 (n) = f2 (n) = f3 (n) = (log n)e−n/x for n prime (where x is
a parameter to be fixed later). Here the factor e−n/x is there to provide “fast decay”,
so that everything converges; as we will see later, Hardy and Littlewood’s choice of
e−n/x (rather than some other function of fast decay) comes across in hindsight as
being very clever, though not quite best-possible. (Their “choice” was, to some extent,
not a choice, but an artifact of their version of the circle method, which was framed
in terms of power series, not in terms of exponential sums with arbitrary smoothing
functions.) The term log n is there for technical reasons – in essence, it makes sense
to put it there because a random integer around n has a chance of about 1/(log n) of
being prime.
We can see that (f1 ∗ f2 ∗ f3 )(n) 6= 0 if and only if n can be written as the sum
of three primes. Our task is then to show that (f1 ∗ f2 ∗ f3 )(n) (i.e., (f ∗ f ∗ f )(n))
is non-zero for every n larger than a constant C ∼ 1027 . Since the transform of a
convolution equals a product of transforms,
Z Z
(f1 ∗f2 ∗f3 )(n) = f1 ∗\ f2 ∗ f3 (α)e(αn)dα = (fb1 fb2 fb3 )(α)e(αn)dα. (1.2)
R/Z R/Z
R
Our task is thus to show that the integral R/Z (fb1 fb2 fb3 )(α)e(αn)dα is non-zero.
As it happens, fb(α) is particularly large when α is close to a rational with small
denominator. Moreover, for such α, it turns out we can actually give rather precise
estimates for fb(α). Define M (called the set of major arcs) to be a union of narrow
arcs around the rationals with small denominator:
[ [ a 1 a 1
M= − , + ,
q qQ q qQ
q≤r a mod q
(a,q)=1
where Q is a constant times x/r, and r will be set later. (This is a slight simplification:
the major-arc set we will actually use in the course of the proof will be a little different,
8 CHAPTER 1. INTRODUCTION
By (1.2) and (1.3), this will imply immediately that (f1 ∗ f2 ∗ f3 )(n) > 0, and so we
will be done.
The name of circle method is given to the study of additive problems by means of
Fourier analysis over Z, and, in particular, to the use of a subdivision of the circle R/Z
into major and minor arcs to estimate the integral of a Fourier transform. There was
a “circle” already in Hardy and Ramanujan’s work [HR00], but the subdivision into
major and minor arcs is due to Hardy and Littlewood, who also applied their method
to a wide variety of additive problems. (Hence “the Hardy-Littlewood method” as an
alternative name for the circle method.) For instance, before working on the ternary
Goldbach conjecture, they studied the question of whether every n > C can be written
as the sum of kth powers (Waring’s problem). In fact, they used a subdivision into
major and minor arcs to study Waring’s problem, and not for the ternary Goldbach
problem: they had no minor-arc bounds for ternary Goldbach, and their use of GRH
had the effect of making every α ∈ R/Z yield to a major-arc treatment.
Vinogradov worked with finite exponential sums, i.e., fi compactly supported.
From today’s perspective, it is clear that there are applications (such as ours) in which
it can be more important for fi to be smooth than compactly supported; still, Vino-
gradov’s simplifications were an incentive to further developments. In the case of the
ternary Goldbach’s problem, his key contribution consisted in the fact that he could
give bounds on fb(α) for α in the minor arcs without using GRH.
An important note: in the case of the binary Goldbach conjecture, the method fails
at (1.4), and not before; if our understanding of the actual value of fbi (α) is at all correct,
it is simply not true in general that
Z Z
|fb1 (α)||fb2 (α)|dα < fb1 (α)fb2 (α)e(αn)dα.
m M
Let us see why this is not surprising. Set f1 = f2 = f3 = f for simplicity, so that
we have the integral of the square (fb(α))2 for the binary problem, and the integral of
the cube (fb(α))3 for the ternary problem. Squaring, like cubing, amplifies the peaks
of fb(α), which are at the rationals of small denominator and their immediate neighbor-
hoods (the major arcs); however, cubing amplifies the peaks muchRmore than squaring.
This is why, even though the arcs making up M are very narrow, M (fb(α))3 e(αn)dα
1.3. THE MAJOR ARCS M 9
is larger than m |fb(α)|3 dα; that explains the name major arcs – they are not large, but
R
they give the major part of the contribution. R In contrast, squaring amplifies the peaks
less, and this is why the absolute value of M fb(α)2 e(αn)dα is in general smaller than
|fb(α)|2 dα. As nobody knows how to prove a precise estimate (and, in particular,
R
m
lower bounds) on fb(α) for α ∈ m, the binary Goldbach conjecture is still very much
out of reach.
To prove the ternary Goldbach conjecture, it is enough to estimate both sides of
(1.4) for carefully chosen f1 , f2 , f3 , and compare them. This is our task from now on.
for <(s) > 1, and by analytic continuation for <(s) ≤ 1. (The Riemann zeta function
ζ(s) is the L-function for the trivial character, i.e., the character χ such that χ(n) = 1
for all n.) Taking logarithms and then derivatives, we see that
∞
L0 (s, χ) X
− = χ(n)Λ(n)n−s , (1.5)
L(s, χ) n=1
for <(s) > 1, where Λ is the von Mangoldt function (Λ(n) = log p if n is some prime
power pα , α ≥ 1, and Λ(n) = 0 otherwise).
Dirichlet introduced his characters and L-series so as to study primes in arithmetic
progressions. In general, and after some work, (1.5) allows us to restate many sums
over the primes (such as our Fourier transforms fb(α)) as sums over the zeros of L(s, χ).
A non-trivial zero of L(s, χ) is a zero of L(s, χ) such that 0 < <(s) < 1. (The other
zeros are called trivial because we know where they are, namely, at negative integers
and, in some cases, also on the line <(s) = 0. In order to eliminate all zeros on
<(s) = 0 outside s = 0, it suffices to assume that χ is primitive; a primitive character
modulo q is one that is not induced by (i.e., not the restriction of) any character modulo
d|q, d < q.)
The Generalized Riemann Hypothesis for Dirichlet L-functions is the statement
that, for every Dirichlet character χ, every non-trivial zero of L(s, χ) satisfies <(s) =
1/2. Of course, the Generalized Riemann Hypothesis (GRH) – and the Riemann Hy-
pothesis, which is the special case of χ trivial – remains unproven. Thus, if we want to
prove unconditional statements, we need to make do with partial results towards GRH.
Two kinds of such results have been proven:
10 CHAPTER 1. INTRODUCTION
Most work in the literature follows the first alternative, though [Tao14] did use a
finite verification of RH (i.e., GRH for the trivial character). Unfortunately, zero-free
regions seem too narrow to be useful for the ternary Goldbach problem. Thus, we are
left with the second alternative.
In coordination with the present work, Platt [Plab] verified that all zeros s of L-
functions for characters χ with modulus q ≤ 300000 satisfying =(s) ≤ Hq lie on the
line <(s) = 1/2, where
where Λ is the von Mangoldt function (as in (1.5)) . The use of α rather than −α is just
a bow to tradition, as is the use of the letter S (for “sum”); however, the use of Λ(n)
rather than just plain log p does actually simplify matters.
The function η here is sometimes called a smoothing function or simply a smooth-
ing. It will indeed be helpful for it to be smooth on (0, ∞), but, in principle, it need
not even be continuous. (Vinogradov’s work implicitly uses, in effect, the “brutal trun-
cation” 1[0,1] (t), defined to be 1 when t ∈ [0, 1] and 0 otherwise; that would be fine for
the minor arcs, but, as it will become clear, it is a bad idea as far as the major arcs are
concerned.)
Assume α is on a major arc, meaning that we can write α = a/q+δ/x for some a/q
(q small) and some δ (with |δ| small). We can write Sη (α, x) as a linear combination
X δ
Sη (α, x) = cχ Sη,χ , x + tiny error term, (1.7)
χ
x
where X
δ
Sη,χ ,x = Λ(n)χ(n)e(δn/x)η(n/x). (1.8)
x
In (1.7),√χ runs over primitive Dirichlet characters of moduli d|q, and cχ is small
(|cχ | ≤ d/φ(q)).
Why are we expressing the sums Sη (α, x) in terms of the sums Sη,χ (δ/x, x), which
look more complicated? The argument has become δ/x, whereas before it was α.
Here δ is relatively small – smaller than the constant c0 r, in our setup. In other words,
e(δn/x) will go around the circle a bounded number of times as n goes from 1 up to a
constant times x (by which time η(n/x) has become small, because η is of fast decay).
This makes the sums much easier to estimate.
To estimate the sums Sη,χ , we will use L-functions, together with one of the most
common tools of analytic number theory, the Mellin transform. This transform is es-
sentially a Laplace transform with a change of variables, and a Laplace transform, in
turn, is a Fourier transform taken on a vertical line in the complex plane. For f of fast
enough decay, the Mellin transform F = M f of f is given by
Z ∞
dt
F (s) = f (t)ts ;
0 t
for any σ within an interval. We can thus express e(δt)η(t) in terms of its Mellin
transform Fδ and then use (1.5) to express Sη,χ in terms of Fδ and L0 (s, χ)/L(s, χ);
12 CHAPTER 1. INTRODUCTION
shifting the integral in the Mellin inversion formula to the left, we obtain what is known
in analytic number theory as an explicit formula:
X
Sη,χ (δ/x, x) = [bη (−δ)x] − Fδ (ρ)xρ + tiny error term.
ρ
Here the term between brackets appears only for χ trivial. In the sum, ρ goes over all
non-trivial zeros of L(s, χ), and Fδ is the Mellin transform of e(δt)η(t). (The tiny error
term comes from a sum over the trivial zeros of L(s, χ).) We will obtain the estimate
we desire if we manage to show that the sum over ρ is small.
The point is this: if we verify GRH for L(s, χ) up to imaginary part H, i.e., if
we check√that all zeroes ρ of L(s, χ) with |=(ρ)| ≤ H satisfy <(ρ) = 1/2, we have
|xρ | = x. In other words, xρ is very small (compared to x). However, for any
ρ whose imaginary part has absolute value greater than H, we know next to nothing
about its real part, other than 0 ≤ <(ρ) ≤ 1. (Zero-free regions are notoriously weak
for =(ρ) large; we will not use them.) Hence, our only chance is to make sure that
Fδ (ρ) is very small when |=(ρ)| ≥ H.
This has to be true for both δ very small (including the case δ = 0) and for δ not so
small (|δ| up to c0 r/q, which can be large because r is a large constant). How can we
choose η so that Fδ (ρ) is very small in both cases for τ = =(ρ) large?
The method of stationary phase is useful as an exploratory tool here. In brief, it
suggests (and can sometimes prove) that the main contribution to the integral
Z ∞
dt
Fδ (t) = e(δt)η(t)ts (1.9)
0 t
can be found where the phase of the integrand has derivative 0. This happens when
t = −τ /2πδ (for sgn(τ ) 6= sgn(δ)); the contribution is then a moderate factor times
η(−τ /2πδ). In other words, if sgn(τ ) 6= sgn(δ) and δ is not too small (|δ| ≥ 8, say),
Fδ (σ + iτ ) behaves like η(−τ /2πδ); if δ is small (|δ| < 8), then Fδ behaves like F0 ,
which is the Mellin transform M η of η. Here is our goal, then: the decay of η(t) as
|t| → ∞ should be as fast as possible, and the decay of the transform M η(σ + iτ )
should also be as fast as possible.
This is a classical dilemma, often called the uncertainty principle because it is the
mathematical fact underlying the physical principle of the same name: you cannot have
a function η that decreases extremely rapidly and whose Fourier transform (or, in this
case, its Mellin transform) also decays extremely rapidly.
What does “extremely rapidly” mean here? It means (as Hardy himself proved)
“faster than any exponential e−Ct ”. Thus, Hardy and Littlewood’s choice η(t) = e−t
seems essentially optimal at first sight.
However, it is not optimal. We can choose η so that M η decreases exponentially
(with a constant C somewhat worse than for η(t) = e−t ), but η decreases faster than
exponentially. This is a particularly appealing possibility because it is t/|δ|, and not so
much t, that risks being fairly small. (To be explicit: say we check GRH for characters
of modulus q up to Hq ∼ 50 · c0 r/q ≥ 50|δ|. Then we only know that |τ /2πδ| &
8. So, for η(t) = e−t , η(−τ /2πδ) may be as large as e−8 , which is not negligible.
Indeed, since this term will be multiplied later by other terms, e−8 is simply not small
1.3. THE MAJOR ARCS M 13
enough. On the other hand, we can assume that Hq ≥ 200 (say), and so M η(s) ∼
e−(π/2)|τ | is completely negligible, and will remain negligible even if we replace π/2
by a somewhat smaller constant.)
2
We shall take η(t) = e−t /2 (that is, the Gaussian). This is not the only possible
choice, but it is in some sense natural. It is easy to show that the Mellin transform Fδ
2
for η(t) = e−t /2 is a multiple of what is called a parabolic cylinder function U (a, z)
with imaginary values for z. There are plenty of estimates on parabolic cylinder func-
tions in the literature – but mostly for a and z real, in part because that is one of the
cases occuring most often in applications. There are some asymptotic expansions and
estimates for U (a, z), a, z, general, due to Olver [Olv58], [Olv59], [Olv61], [Olv65],
but unfortunately they come without fully explicit error terms for a and z within our
range of interest. (The same holds for [TV03].)
In the end, I derived bounds for Fδ using the saddle-point method. (The method
of stationary phase, which we used to choose η, seems to lead to error terms that are
too large.) The saddle-point method consists, in brief, in changing the contour of an
integral to be bounded (in this case, (1.9)) so as to minimize the maximum of the
integrand. (To use a metaphor in [dB81]: find the lowest mountain pass.)
Here we strive to get clean bounds, rather than the best possible constants. Consider
the case k = 0 of Corollary 8.0.2 with k = 0; it states the following. For s = σ + iτ
with σ ∈ [0, 1] and |τ | ≥ max(100, 4π 2 |δ|), we obtain that the Mellin transform Fδ of
2
η(t)e(δt) with η(t) = e−t /2 satisfies
2|τ | 2
(
3.001e−0.1065( |`| ) if 4|τ |/`2 < 3/2.
|Fδ (s + k)| + |Fδ ((1 − s) + k)| ≤ (1.10)
3.286e−0.1598|τ | if 4|τ |/`2 ≥ 3/2.
Similar bounds hold for σ in other ranges, thus giving us estimates on the Mellin
2
transform Fδ for η(t) = tk e−t /2 and σ in the critical range [0, 1]. (We could do a little
better if we knew the value of σ, but, in our applications, we do not, once we leave
the range in which GRH has been checked. We will give a bound (Theorem 8.0.1) that
does take σ into account, and also reflects and takes advantage of the fact that there
is a transitional region around |τ | ∼ (3/2)(π/δ)2 ; in practice, however, we will use
Cor. 8.0.2.)
A moment’s thought shows that we can also use (1.10) to deal with the Mellin
2
transform of η(t)e(δt) for any function of the form η(t) = e−t /2 g(t) (or, more gener-
2
ally, η(t) = tk e−t /2 g(t)), where g(t) is any band-limited function. By a band-limited
function, we could mean a function whose Fourier transform is compactly supported;
while that is a plausible choice, it turns out to be better to work with functions that are
band-limited with respect to the Mellin transform – in the sense of being of the form
Z R
g(t) = h(r)t−ir dr,
−R
where h : R → C is supported on a compact interval [−R, R], with R not too large (say
2
R = 200). What happens is that the Mellin transform of the product e−t /2 g(t)e(δt)
2
is a convolution of the Mellin transform Fδ (s) of e−t /2 e(δt) (estimated in (1.10)) and
14 CHAPTER 1. INTRODUCTION
that of g(t) (supported in [−R, R]); the effect of the convolution is just to delay decay
of Fδ (s) by, at most, a shift by y 7→ y − R.
We wish to estimate Sη,χ (δ/x) for several functions η. This motivates us to derive
an explicit formula (§) general enough to work with all the weights η(t) we will work
with, while being also completely explicit, and free of any integrals that may be tedious
to evaluate.
Once that is done, and once we consider the input provided by Platt’s finite verifi-
cation of GRH up to Hq , we obtain simple bounds for different weights.
2
For η(t) = e−t /2 , x ≥ 108 , χ a primitive character of modulus q ≤ r = 300000,
and any δ ∈ R with |δ| ≤ 4r/q, we obtain
δ
Sη,χ , x = Iq=1 · ηb(−δ)x + E · x, (1.11)
x
Here ηb stands for the Fourier transform from R to R normalized as follows: ηb(t) =
R∞ √ 2 2
−∞
e(−xt)η(x)dx. Thus, ηb(−δ) is just 2πe−2π δ (self-duality of the Gaussian).
This is one of the main results of Part II; see §7.1. Similar bounds are also proven
2 2
there for η(t) = t2 e−t /2 , as well as for a weight of type η(t) = te−t /2 g(t), where
g(t) is a band-limited function, and also for a weight η defined by a multiplicative
convolution. The conditions on q (namely, q ≤ r = 300000) and δ are what we
expected from the outset.
Thus concludes our treatment of the major arcs. This is arguably the easiest part of
the proof; it was actually what I left for the end, as I was fairly confident it would work
out. Minor-arc estimates are more delicate; let us now examine them.
It would be possible to work with narrow major arcs that become narrower as q
increases simply by allowing q to be very large (close to x), and assigning each angle
to the fraction closest to it. This is, in fact, the common procedure. However, this
makes matters more difficult, in that we would have to minimize at the same time the
√ √
factors in front of terms x/q, x/ q, etc., and those in front of terms q, qx, and so
on. (These terms are being compared to the trivial bound x.) Instead, we choose to
strive for a direct dependence on δ throughout; this will allow us to cap q at a much
√
lower level, thus making terms such as q and qx negligible. (This choice has been
taken elsewhere in applications of the circle method, but, strangely, seems absent from
previous work on the ternary Goldbach conjecture.)
How good must our bounds be? Since the major-arc bounds are valid only for
q ≤ r = 300000 and |δ| ≤ 4r/q, we cannot afford even a single factor ofplog x (or
any other function tending to ∞ as x → ∞) in front of terms such as x/ q|δ0 |: a
factor like that would make the term larger than the trivial bound x if q|δ0 | is equal to
a constant (r, say) and x is very large. Apparently, there was no such “log-free bound”
with explicit constants in the literature, even though such bounds were considered to
be in principle feasible, and even though previous work ([Che85], [Dab96], [DR01],
[Tao14]) had gradually decreased the number of factors of log x. (In limited ranges for
q, there were log-free bounds without explicit constants; see [Dab96], [Ram10]. The
estimate in [Vin54, Thm. 2a, 2b] was almost log-free, but not quite. There were also
bounds [Kar93], [But11] that used L-functions, and thus were not really useful in a
truly minor-arc regime.)
√
It also seemed clear that a main bound proportional to (log q)2 x/ q (as in [Tao14])
was too large. At the same time, it was not really necessary to reach a bound of the
best possible form that could be found through Vinogradov’s basic approach, namely
√
x q
|Sη (α, x)| ≤ C . (1.13)
φ(q)
Such a bound had been proven by Ramaré [Ram10] for q in a limited range and C
non-explicit; later, in [Ramc] – which postdates the first version of [Helb] – Ramaré
broadened the range to q ≤ x1/48 and gave an explicit value for C, namely, C = 13000.
Such a bound is a notable achievement, but, unfortunately, it is not useful for our
purposes. Rather, we will aim
p at a bound whose main term is bounded by a constant
around 1 times x(log δ0 q)/ δ0 φ(q); this is slightly worse asymptotically than (1.13),
but it is much better in the delicate range of δ0 q ∼ 300000, and in fact for a much
wider range as well.
***
We see that we have several tasks. One of them is the removal of logarithms: we
cannot afford a single factor of log x, and, in practice, we can afford at most one factor
of log q. Removing logarithms will be possible in part because of the use of previously
existing efficient techniques (the large sieve for sequences with prime support) but also
because we will be able to find cancellation at several places in sums coming from a
combinatorial identity (namely, Vaughan’s identity). The task of finding cancellation
is particularly delicate because we cannot afford large constants or, for that matter,
16 CHAPTER 1. INTRODUCTION
P
statements valid only for large x. (Bounding a sum such as n µ(n) efficiently, where
µ is the Möbius function
(
(−1)k if n = p1 p2 . . . pk , all pi distinct
µ(n) =
0 if p2 |n for some prime p,
P
is harder than estimating a sum such as n Λ(n) equally efficiently, even though we
are used to thinking of the two problems as equivalent.)
We have said that our bounds will improve as |δ| increases. This dependence on
δ will be secured in different ways at different places. Sometimes δ will appear as
an argument, as in ηb(−δ); for η piecewise continuous with η 0 ∈ L1 , we know that
|b
η (t)| → 0 as |t| → ∞. Sometimes we will obtain a dependence on δ by using several
different rational approximations to the same α ∈ R. Lastly, we will obtain a good
dependence on δ in bilinear sums by supplying a scattered input to a large sieve.
If there is a main moral to the argument, it lies in the close relation between the
circle method and the large sieve. The circle method rests on the estimation of an
integral involving a Fourier transform fb : R/Z → C; as we will later see, this leads
naturally to estimating the `2 -norm of fb on subsets (namely, unions of arcs) of the circle
R/Z. The large sieve can be seen as an approximate discrete version of Plancherel’s
identity, which states that |fb|2 = |f |2 .
Both in this section and in §1.5, we shall use the large sieve in part so as to use
the fact that some of the functions we work with have prime support, i.e., are non-zero
only on prime numbers. There are ways to use prime support to improve the output
of the large sieve. In §1.5, these techniques will be refined and then translated to the
context of the circle method, where f has (essentially) prime support and |fb|2 must be
integrated over unions of arcs. (This allows us to remove a logarithm.) The main point
is that the large sieve is not being used as a black box; rather, we can adapt ideas from
(say) the large-sieve context and apply them to the circle method.
Lastly, there are the benefits of a continuous η. Hardy and Littlewood already
used a continuous η; this was abandoned by Vinogradov, presumably for the sake of
simplicity. The idea that smooth weights η can be superior to sharp truncations is
now commonplace. As we shall see, using a continuous η is helpful in the minor-arcs
regime, but not as crucial there as for the major arcs. We will not use a smooth η; we
will prove our estimates for any continuous η that is piecewise C1 , and then, towards
the end, we will choose to use the same weight η = η2 as in [Tao14], in part because it
has compact support, and in part for the sake of comparison. The moral here is not quite
the common dictum “always smooth”, but rather that different kinds of smoothing can
be appropriate for different tasks; in the end, we will show how to coordinate different
smoothing functions η.
There are other ideas involved; for instance, some of Vinogradov’s lemmas are
improved. Let us now go into some of the details.
This identity is essentially log-free: while a trivial bound on the sum of the right side
for n from 1 to N does seem to have two extra factors of log, they are present only in
the term µ ∗ log3 , which is not the hardest one to estimate. Ramaré obtained a log-free
bound in [Ram10] using an identity introduced by Diamond and Steinig in the course
of their own work on elementary proofs of the prime number theorem [DS70]; that
identity gives a decomposition for Λ · logk that can also be derived from the expansion
of (ζ 0 (s)/ζ(s))(k) , by a clever grouping of terms.
In the end, I decided to use Vaughan’s identity, motivated in part by [Tao14], and
in part by the lack of free parameters in (1.16); as can be seen in (1.15), Vaughan’s
identity has two parameters U , V that we can set to whatever values we think best. The
form of the identity allowed me to reuse much of my work up to that point, but it also
posed a challenge: since Vaughan’s identity is by no means log-free, one has obtain
cancellation in Vaughan’s identity at every possible step, beyond the cancellation given
by the phase e(αn). (The presence of a phase, in fact, makes the task of getting can-
cellation from the identity more complicated.) The removal of logarithms will be one
of our main tasks in what follows. It is clear that the presence of the Möbius function
µ should give, in principle, some cancellation; we will show how to use it to obtain as
much cancellation as we need – with good constants, and not just asymptotically.
and X X X vun
Λ(v) µ(u) e(αvun)η . (1.18)
n
x
v≤V u≤U
In either case, α = a/q + √ δ/x, where q is larger than a constant r and |δ/x| ≤ 1/qQ0
for some Q0 > max(q, x). For the purposes of this exposition, we will set it as our
task to estimate the slightly simpler sum
X X mn
µ(m) e(αmn)η , (1.19)
n
x
m≤D
q is not small, we can afford to bound (Λ≤V ∗ µ≤U )(n) trivially (by log n) in the less
sensitive terms.
Let us first outline Vinogradov’s procedure for bounding type I sums. Just by sum-
ming a geometric series, we get
X c
e(αn) ≤ min N, , (1.20)
{α}
n≤N
where c is a constant and {α} is the distance from α to the nearest integer. Vinogradov
splits the outer sum in (1.19) into sums of length q. When m runs on an interval of
length q, the angle am/q runs through all fractions of the form b/q; due to the error
δ/x, αm could be close to 0 for two values of n, but otherwise {αm} takes values
bounded below by 1/q (twice), 2/q (twice), 3/q (twice), etc. Thus
X X X X 2N
µ(m) e(αmn) ≤ e(αmn) ≤ + 2cq log eq
m
y<m≤y+q n≤N y<m≤y+q n≤N
(1.21)
for any y ≥ 0.
There are several ways to improve this. One is simply to estimate the inner sum
more precisely; this was already done in [DR01]. One can also define a smoothing
function η, as in (1.19); it is easy to get
!
X n |η 0 |1 |η 0 |1 |ηc00 |∞
e(αn)η ≤ min x|η|1 + , , .
x 2 2| sin(πα)| 4x(sin πα)2
n≤N
Except for the third term, this is as in [Tao14]. We could also choose carefully which
bound to use for each m; surprisingly, this gives an improvement – in fact, an impor-
tant one, for m large. However, even with these improvements, we still have a term
proportional to N/m as in (1.21), and this contributes about (x log x)/q to the sum
(1.19), thus giving us an estimate that is not log-free.
What we have to do, naturally, is to take out the terms with q|m for m small. (If m
is large, then those may not be the terms for which mα is close to 0; we will later see
what to do.) For y + q ≤ Q/2, |α − a/q| ≤ 1/qQ, we get that
X B C
min A, , (1.22)
| sin παn| | sin παn|2
y<m≤y+q
q-m
is at most
4q √ Ce3 q
20 2 2Bq
min Cq , 2A + AC, max 2, log . (1.23)
3π 2 π π Bπ
This is satisfactory. We are left with all the terms m ≤ M = min(D, Q/2) with q|m
– and also with all the terms Q/2 < m ≤ D. For m ≤ M divisible by q, we can
20 CHAPTER 1. INTRODUCTION
estimate (as opposed to just bound from above) the inner sum in (1.19) by the Poisson
summation formula, and then sum over m, but without taking absolute values; writing
m = aq, we get a main term
xµ(q) X µ(a)
· ηb(−δ) · , (1.24)
q a
a≤M/q
(a,q)=1
X µ(a) 4 q 1
≤ (1.25)
a 5 φ(q) log x/q
a≤x
(a,q)=1
for q ≤ x. (Cf. [EM95], [EM96]) This is neither trivial nor elementary.5 We are, so to
speak, allowed to use non-elementary means (that is, methods based on L-functions)
because the only L-function we need to use here is the Riemann zeta function.
What shall we do for m > Q/2? We can always give a bound
4q √
X C
min A, 2
≤ 3A + AC (1.26)
| sin παn| π
y<m≤y+q
√
for y arbitrary; since AC will be of constant size, (4q/π) AC is pleasant enough, but
the contribution of 3A ∼ 3|η|1 x/y is nasty (it adds a multiple of (x log x)/q to the
total) and seems unavoidable: the values of m for which αm is close to 0 no longer
correspond to the congruence class m ≡ 0 mod q, and thus cannot be taken out.
The solution is to switch approximations. (The idea of using different approxima-
tions to the same α is neither new nor recent in the general context of the circle method:
see [Vau97, §2.8, Ex. 2]. What may be new is its use to clear a hurdle in type I sums.)
What does this mean? If α were exactly, or almost exactly, a/q, then there would be
no other very good approximations in a reasonable range. However, note that we can
define Q = bx/|δq|c for α = a/q + δ/x, and still have |α − a/q| ≤ 1/qQ. If δ is very
small, Q will be larger than 2D, and there will be no terms with Q/2 < m ≤ D to
worry about.
5 The current state of knowledge may seem surprising: after all, we expect nearly square-root cancella-
p
2/x holds for all real 0 < x ≤ 1012 ; see also the stronger
P
tion – for instance, | n≤x µ(n)/n| ≤
bound p [Dre93]). The classical zero-free region of the Riemann zeta function ought to give a factor of
exp(− (log x)/c), which looks much better than 1/ log x. What happens is that (a) such a factor is
not actually much better than 1/ log x for x ∼ 1030 , say; (b) estimating sums involving the Möbius func-
tion by means of an explicit formula is harder than estimating sums involving Λ(n): the residues of 1/ζ(s)
at the non-trivial zeros of s come into play. As a result, getting non-trivial explicit results on sums of µ(n)
is harder than one would naively expect from the quality of classical effective (but non-explicit) results. See
[Rama] for a survey of explicit bounds.
1.4. THE MINOR ARCS M 21
What happens if δ is not very small? We know that, for any Q0 , there is an approx-
imation a0 /q 0 to α with |α − a0 /q 0 | ≤ 1/q 0 Q0 and q 0 ≤ Q0 . However, for Q0 > Q, we
know that a0 /q 0 cannot equal a/q: by the definition of Q, the approximation a/q is not
good enough, i.e., |α − a/q| ≤ 1/qQ0 does not hold. Since a/q 6= a0 /q 0 , we see that
|a/q − a0 /q 0 | ≥ 1/qq 0 , and this implies that q 0 ≥ (/(1 + ))Q.
Thus, for m > Q/2, the solution is to apply (1.26) with a0 /q 0 instead of a/q. The
contribution of A fades into insignificance: for the first sum over a range y < m ≤
y + q 0 , y ≥ Q/2, it contributes at most x/(Q/2), and all the other contributions of A
sum up to at most a constant times (x log x)/q 0 .
Proceeding in this way, we obtain a total bound for (1.19) whose main terms are
proportional to
1 x 1 2 D
q
min 1, 2 , |ηc00 |∞ · D and q log max ,q , (1.27)
φ(q) log xq δ π q
with good, explicit constants. The first term – usually the largest one – is precisely what
we needed: it is proportional to (1/φ(q))x/ log x for q small, and decreases rapidly as
|δ| increases.
At this point it is convenient to assume that η is the Mellin convolution of two functions.
The multiplicative or Mellin convolution on R+ is defined by
Z ∞
t dr
(η0 ∗M η1 )(t) = η0 (r)η1 .
0 r r
X X
S1 (U, W ) = µ(d) ,
x x
2W <m≤ W d>U
d|m (1.29)
2
X X
S2 (V, W ) = Λ(n)e(αmn) .
x x
2W ≤m≤ W max( V, W )≤n≤W
2
We must bound S1 (U, W ) by a constant times x/W . We are able to do this – with
a good constant. (A careless bound would have given a multiple of (x/U ) log3 (x/U ),
which is much too large.) First, we reduce S1 (W ) to an expression involving an inte-
gral of
X X µ(r1 )µ(r2 )
. (1.30)
σ(r1 )σ(r2 )
r1 ≤x r2 ≤x
(r1 ,r2 )=1
P
We can bound (1.30) by the use of bounds on n≤t µ(n)/n, combined with the es-
timation of infinite products by means of approximations to ζ(s) for s → 1+ . After
some additional manipulations, we obtain a bound for S1 (U, W ) whose main term is
at most (3/π 2 )(x/W ) for each W , and closer to 0.22482x/W on average over W .
(This is as good a point as any to say that, throughout, we can use a trick in [Tao14]
that allows us to work with odd values of integer variables throughout, instead of letting
m or n range over all integers. Here, for instance, if m and n are restricted to be odd,
we obtain a bound of (2/π 2 )(x/W ) for individual W , and 0.15107x/W on average
over W . This is so even though we are losing some cancellation in µ by the restriction.)
Let us now bound S2 (V, W ). This is traditionally done by Linnik’s dispersion
method. However, it should be clear that the thing to do nowadays is to use a large
sieve, and, more specifically, a large sieve for primes; that kind of large sieve is nothing
other than a tool for estimating expressions such as S2 (V, W ). (Incidentally, even
though we are trying to save every factor of log we can, we choose not to use small
sieves at all, either here or elsewhere.) In order to take advantage of prime support, we
use Montgomery’s inequality ([Mon68], [Hux72]; see the expositions in [Mon71, pp.
27–29] and [IK04, §7.4]) combined with Montgomery and Vaughan’s large sieve with
weights [MV73, (1.6)], following the general procedure in [MV73, (1.6)]. We obtain a
bound of the form
log W x qW W
+ (1.31)
log W
2q
4φ(q) φ(q) 2
on S2 (V, W ), where, of course, we can also choose not to gain a factor of log W/2q if
q is close to or greater than W .
It remains to see how to gain a factor of |δ| in the major arcs, and more specifically
in S2 (V, W ). To explain this, let us step back and take a look at what the large sieve is.
1.4. THE MINOR ARCS M 23
The large sieve can be seen as an approximate, or statistical, version of this: for a
“sample” of points α1 , α2 , . . . , αk satisfying |αi − αj | ≥ β for i 6= j, it tells us that
X 2 X
fb(αi ) ≤ (X + β −1 ) |f (n)|2 , (1.32)
1≤j≤k n
x W x
+ log W
4Q 2 4
provided that L ≥ x/|δ|q and, as usual, |α−a/q| ≤ 1/qQ. This is very small compared
to the trivial bound . xW/8.
What happens if L < x/|δq|? Then there is never any overlap: we consider all
angles αi , and give them all together to the large sieve. The total bound is (W 2 /4 +
xW/2|δ|q) log W . If L = x/2W is smaller than, say, x/3|δq|, then we see clearly
that there are non-intersecting swarms of angles αi around the rationals a/q. We can
thus save a factor of log (or rather (φ(q)/q) log(W/|δq|)) by applying Montgomery’s
inequality, which operates by strewing displacements of given angles (or, here, swarms
around angles) around the circle to the extent possible while keeping everything well-
separated. In this way, we obtain a bound of the form
log W x q W W
W
+ .
log |δ|q |δ|φ(q) φ(q) 2 2
Compare this to (1.31); we have gained a factor of |δ|/4, and so we use this estimate
when |δ| > 4. (We will actually use the criterion |δ| > 8, but, since we will be working
24 CHAPTER 1. INTRODUCTION
with approximations of the form 2α = a/q + δ/x, the value of δ in our actual work
is twice of what it is in this introduction. This is a consequence of working with sums
over the odd integers, as in [Tao14].)
***
We have succeeded in eliminating all factors of log we came across. The only
R x/U
factor of log that remains is log x/U V , coming from the integral V dW/W . Thus,
we want U V to be close to x, but we cannot let it be too close, since we also have a
term proportional to D = U V in (1.27), p and we need to keep it substantially smaller
than x. We set U and V so that U V is x/ q max(4, |δ|) or thereabouts.
In the end, after some work, we obtain our main minor-arcs bound (Theorem 3.1.1).
It
P states the following. Let x ≥ x0 , x0 = 2.16 · 1020 . Tecall that Sη (α, x) =
n Λ(n)e(αn)η(n/x) and η2 = η1 ∗M η1 = 4·1[1/2,1] ∗1[1/2,1] . Let 2α = a/q +δ/x,
q ≤ Q, gcd(a, q) = 1, |δ/x| ≤ 1/qQ, where Q = (3/4)x2/3 . If q ≤ x1/3 /6, then
Rx,δ0 q log δ0 q + 0.5 2.5x 2x
|Sη (α, x)| ≤ p ·x+ √ + · Lx,δ0 q,q + 3.36x5/6 ,
δ0 φ(q) δ0 q δ 0q
(1.33)
where
!
log 4t
δ0 = max(2, |δ|/4), Rx,t = 0.27125 log 1 + 9x1/3
+ 0.41415,
2 log 2.004t
q 13
Lx,t,q = log t + 7.82 + 13.66 log t + 37.55.
φ(q) 4
(1.34)
The factor Rx,t is small in practice; for typical “difficult” values of x and δ0 x, it is
less than 1. The crucial things to notice in (1.33) are that there is no factor of log x, and
that, in the main term, there is only one factor of log δ0 q. The fact that δ0 helps us as
it grows is precisely what enables us to take major arcs that get narrower and narrower
as q grows.
Since the sum over n is of the order of x log x, this is not log-free, and so cannot be
good enough; we will later see how to do better. Still, this gets the main shape right:
our bound on (1.36) will be proportional to |η+ |22 |η∗ |1 . Moreover, we see that η∗ has
to be such that we know how to bound |Sη∗ (α, x)| for α ∈ m, while our choice of η+
is more or less free, at least as far as the minor arcs are concerned.
What about the major arcs? In order to do anything on them, we will have to be
able to estimate both η+ (α) and η∗ (α) for α ∈ M. If that is the case, then, as we
shall see, we will be able to obtain that the main term of (1.35) is an infinite product
(independent of the smoothing functions), times x2 , times
Z ∞
(cη+ (−α))2 ηb∗ (−α)e(−αn/x)dα
−∞
Z ∞Z ∞ n (1.38)
= η+ (t1 )η+ (t2 )η∗ − (t1 + t2 ) dt1 dt2 .
0 0 x
In other words, we want to maximize (or nearly maximize) the expression on the right
of (1.38) divided by |η+ |22 |η∗ |1 .
One way to do this is to let η∗ be concentrated on a small interval [0, ). Then the
right side of (1.38) is approximately
Z ∞ n
|η∗ |1 · η+ (t)η+ − t dt. (1.39)
0 x
To maximize (1.39), we should make sure that η+ (t) ∼ η+ (n/x − t). We set x ∼ n/2,
and see that we should define η+ so that it is supported on [0, 2] and symmetric around
t = 1, or nearly so; this will maximize the ratio of (1.39) to |η+ |22 |η∗ |1 .
We should do this while making sure that we will know how to estimate Sη+ (α, x)
for α ∈ M. We know how to estimate Sη (α, x) very precisely for functions of the
2 2
form η(t) = g(t)e−t /2 , η(t) = g(t)te−t /2 , etc., where g(t) is band-limited. We will
work with a function η+ of that form, chosen so as to be very close (in `2 norm) to a
function η◦ that is in fact supported on [0, 2] and symmetric around t = 1.
We choose ( 2
t3 (2 − t)3 e−(t−1) /2 if t ∈ [0, 2],
η◦ (t) =
0 if t 6∈ [0, 2].
This function is obviously symmetric (η◦ (t) = η◦ (2 − t)) and vanishes to high order
at t = 0, besides being supported on [0, 2].
2
We set η+ (t) = hR (t)te−t /2 , where hR (t) is an approximation to the function
( 1
t2 (2 − t)3 et− 2 if t ∈ [0, 2]
h(t) =
0 if t 6∈ [0, 2].
26 CHAPTER 1. INTRODUCTION
We just let hR (t) be the inverse Mellin transform of the truncation of M h to an interval
[−iR, iR]. (Explicitly,
Z ∞
dy
hR (t) = h(ty −1 )FR (y) ,
0 y
where FR (t) = sin(R log y)/(π log y), that is, FR is the Dirichlet kernel with a change
of variables.)
2
Since the Mellin transform of te−t /2 is regular at s = 0, the Mellin transform
M η+ will be holomorphic in a neighborhood of {s : 0 ≤ <(s) ≤ 1}, even though
the truncation of M h to [−iR, iR] is brutal. Set R = 200, say. By the fast decay of
M h(it) and the fact that the Mellin transform M is an isometry, |(hR (t) − h(t))/t|2 is
very small, and hence so is |η+ − η◦ |2 , as we desired.
But what about the requirement that we be able to estimate Sη∗ (α, x) for both
α ∈ m and α ∈ M?
Generally speaking, if we know how to estimate Sη1 (α, x) for some α ∈ R/Z and
we also know how to estimate Sη2 (α, x) for all other α ∈ R/Z, where η1 and η2 are
two smoothing functions, then we know how to estimate Sη3 (α, x) for all α ∈ R/Z,
where η3 = η1 ∗M η2 , or, more generally, η∗ (t) = (η1 ∗M η2 )(κt), κ > 0 a constant.
This is an easy exercise on exchanging the order of integration and summation:
X n
Sη∗ (α, x) = Λ(n)e(αn)(η1 ∗M η2 ) κ
n
x
Z ∞X n dr Z ∞ dr
= Λ(n)e(αn)η1 (κr)η2 = η1 (κr)Sη2 (rx) ,
0 n
rx r 0 r
(1.40)
and similarly with η1 and η2 switched. Of course, this trick is valid for all exponential
sums: any function f (n) would do in place of Λ(n). The only caveat is that η1 (and
η2 ) should be small very near 0, since, for r small, we may not be able to estimate
Sη2 (rx) (or Sη1 (rx)) with any precision. This is not a problem; one of our functions
2
will be t2 e−t /2 , which vanishes to second order at 0, and the other one will be η2 =
4 · 1[1/2,1] ∗M 1[1/2,1] , which has support bounded away from 0. We will set κ large
(say κ = 49) so that the support of η∗ is indeed concentrated on a small interval [0, ),
as we wanted.
***
Now that we have chosen our smoothing weights η+ and η∗ , we have to estimate the
major-arc integral (1.35) and the minor-arc integral (1.36). What follows can actually
be done for general η+ and η∗ ; we could have left our particular choice of η+ and η∗
for the end.
Estimating the major-arc integral (1.35) may sound like an easy task, since we have
rather precise estimates for Sη (α, x) (η = η+ , η∗ ) when α is on the major arcs; we
could just replace Sη (α, x) in (1.35) by the approximation given by (1.7) and (1.11). It
is, however, more efficient to express (1.35) as the sum of the contribution of the trivial
character (a sum of integrals of (bη (−δ)x)3 , where ηb(−δ)x comes from (1.11)), plus a
1.5. INTEGRALS OVER THE MAJOR AND MINOR ARCS 27
where E(q) = E is as in (1.12), plus two other terms of essentially the same form. As
usual, the major arcs M are the arcs around rationals a/q with q ≤ r. We will soon
2
discuss how to bound the integral of Sη+ (α, x) over arcs around rationals a/q with
q ≤ s, s arbitrary. Here, however, it is best to estimate the integral over M using the
estimate on Sη+ (α, x) from (1.7) and (1.11); we obtain a great deal of cancellation,
with the effect that, for χ non-trivial, the error term in (1.12) appears only when it gets
squared, and thus becomes negligible.
The contribution of the trivial character has an easy approximation, thanks to the
fast decay of ηb◦ . We obtain that the major-arc integral (1.35) equals a main term
C0 Cη◦ ,η∗ x2 , where
Y 1
Y
1
C0 = 1− · 1+ ,
(p − 1)2 (p − 1)3
p|n p-n
Z ∞Z ∞ n
Cη◦ ,η∗ = η◦ (t1 )η◦ (t2 )η∗ − (t1 + t2 ) dt1 dt2 ,
0 0 x
plus several small error terms. We have already chosen η◦ , η∗ and x so as to (nearly)
maximize Cη◦ ,η∗ .
It is time to bound the minor-arc integral (1.36). As we said in §1.5, we must do
better than the usual bound (1.37). Since our minor-arc bound (3.2) on |Sη (α, x)|,
α ∼ a/q, decreases as q increases, it makes sense to use partial summation together
with bounds on
Z Z Z
|Sη+ (α, x)|2 = |Sη+ (α, x)|2 dα − |Sη+ (α, x)|2 dα,
ms Ms M
where ms denotes the arcs around a/q, r < q ≤ s, and Ms denotes the arcs around all
a/q, q ≤ s. We already know how to estimate the integral on M. How do we bound
the integral on Ms ? R R
In order to do better than the trivial bound Ms ≤ R/Z , we will need to use the
fact that the series (1.6) defining Sη+ (α, x) is essentially supported on prime numbers.
Bounding the integral on Ms is closely related to the problem of bounding
2
X X X
an e(a/q) (1.41)
q≤s a mod q n≤x
(a,q)=1
√ √
efficiently for s considerably smaller than x and an supported on the primes x <
p ≤ x. This is a classical problem in the study of the large sieve. The usual bound on
(1.41) (by, for instance, Montgomery’s inequality) has a gain of a factor of
relative to the bound of (x + s2 ) n |an |2 that one would get from the large sieve
P
without using prime support. Heath-Brown proceeded similarly to bound
2eγ log s
Z Z
2
|Sη+ (α, x)| dα . |Sη+ (α, x)|2 dα. (1.42)
Ms log x/s2 R/Z
This already gives us the gain of C(log s)/ log x that we absolutely need, but
the constant C is suboptimal; the factor in the right side of (1.42) should really be
(log s)/ log x, i.e., C should be 1. We cannot reasonably hope to obtain a factor better
than 2(log s)/ log x in the minor arcs due to what is known as the parity problem in
sieve theory. As it turns out, Ramaré [Ram09] had given general bounds on the large
sieve that were clearly conducive to better bounds on (1.41), though they involved a
ratio that was not easy to bound in general.
I used several careful estimations (including [Ram95, Lem. 3.4]) to reduce the
problem of bounding this ratio to a finite number of cases, which I then checked by
a rigorous computation. This approach gave a bound on (1.41) with a factor of size
close to 2(log s)/ log x. (This solves the large-sieve problem for s ≤ x0.3 ; it would
still be worthwhile to give a computation-free proof for all s ≤ x1/2− , > 0.) It was
then easy to give an analogous bound for the integral over Ms , namely,
Z Z
2 log s
|Sη+ (α, x)|2 dα . |Sη+ (α, x)|2 dα,
Ms log x R/Z
where . can easily be made precise by replacing log s by log s + 1.36 and log x by
log x + c, where c is a small constant. Without this improvement, the main theorem
would still have been proved, but the required computation time would have been mul-
tiplied by a factor of considerably more than e3γ = 5.6499 . . . .
What remained then was just to compare the estimates on (1.35) and (1.36) and
check that (1.36) is smaller for n ≥ 1027 . This final step was just bookkeeping. As
we already discussed, a check for n < 1027 is easy. Thus ends the proof of the main
theorem.
5 < n ≤ 8.8 · 1030 , there is a prime p in the list such that 4 ≤ n − p ≤ 4 · 1018 + 2.
(Choose the largest p < n in the ladder, or, if n minus that prime is 2, choose the prime
immediately under that.) By [OeSHP14] (and the fact that 4 · 1018 + 2 equals p + q,
where p = 2000000000000001301 and q = 1999999999999998701 are both prime),
we can write n − p = p1 + p2 for some primes p1 , p2 , and so n = p + p1 + p2 .
Building a prime ladder involves only integer arithmetic, that is, computer manip-
ulation of integers, rather than of real numbers. Integers are something that computers
can handle rapidly and reliably. We look for primes for our ladder only among a spe-
cial set of integers whose primality can be tested deterministically quite quickly (Proth
numbers: k · 2m + 1, k < 2m ). Thus, we can build a prime ladder by a rigorous,
deterministic algorithm that can be (and was) parallelized trivially.
The second computation is more demanding. It consists in verifying that, for every
L-function L(s, χ) with χ of conductor q ≤ r = 300000 (for q even) or q ≤ r/2
(for q odd), all zeroes of L(s, χ) such that |=(s)| ≤ Hq = 108 /q (for q odd) and
|=(s)| ≤ Hq = max(108 /q, 200 + 7.5 · 107 /q (for q even) lie on the critical line.
As a matter of fact, Platt went up to conductor q ≤ 200000 (or twice that for q even)
[Plab]; he had already gone up to conductor 100000 in his PhD thesis [Pla11]. The
verification took, in total, about 400000 core-hours (i.e., the total number of processor
cores used times the number of hours they ran equals 400000; nowadays, a top-of-the-
line processor typically has eight cores). In the end, since I used only q ≤ 150000 (or
twice that for q even), the number of hours actually needed was closer to 160000; since
I could have made do with q ≤ 120000 (at the cost of increasing C to 1029 or 1030 ), it
is likely, in retrospect, that only about 80000 core-hours were needed.
Checking zeros of L-functions computationally goes back to Riemann (who did
it by hand for the special case of the Riemann zeta function). It is also one of the
things that were tried on digital computers in their early days (by Turing [Tur53], for
instance; see the exposition in [Boo06b]). One of the main issues to be careful about
arises whenever one manipulates real numbers via a computer: generally speaking, a
computer cannot store an irrational number; moreover, while a computer can handle
rationals, it is really most comfortable handling just those rationals whose denomina-
tors are powers of two. Thus, one cannot really say: “computer, give me the sine of
that number” and expect a precise result. What one should do, if one really wants to
prove something (as is the case here!), is to say: “computer, I am giving you an interval
I = [a/2k , b/2k ]; give me an interval I 0 = [c/2` , d/2` ], preferably very short, such
that sin(I) ⊂ I 0 ”. This is called interval arithmetic; it is arguably the easiest way to do
floating-point computations rigorously.
Processors do not do this natively, and if interval arithmetic is implemented purely
on software, computations can be slowed down by a factor of about 100. Fortunately,
there are ways of running interval-arithmetic computations partly on hardware, partly
on software.
Incidentally, there are some basic functions (such as sin) that should always be done
on software, not just if one wants to use interval arithmetic, but even if one just wants
reasonably precise results: the implementation of transcendental functions in some of
the most popular processors does not always round correctly, and errors can accumulate
quickly. Fortunately, this problem is already well-known, and there is software that
takes care of this. (Platt and I used the crlibm library [DLDDD+ 10].)
30 CHAPTER 1. INTRODUCTION
Lastly, there were several relatively minor computations strewn here and there in
the proof. There is some numerical integration, done rigorously; once or twice, this
was done using a standard package based on interval arithmetic [Ned06], but most of
the time I wrote my own routines in C (using Platt’s interval arithmetic package) for
the sake of speed. Another kind of computation (employed much more in [Hela] than
in the somewhat more polished version of the proof given here) was a rigorous version
of a “proof by graph” (“the maximum of a function f is clearly less than 4 because I
can see it on the screen”). There is a standard way to do this (see, e.g., [Tuc11, §5.2]);
essentially, the bisection method combines naturally with interval arithmetic, as we
shall describe in §2.6. Yet another computation (and not a very small one) was that
involved in verifying a large-sieve inequality in an intermediate range (as we discussed
in §1.5).
It may be interesting to note that one of the inequalities used to estimate (1.30) was
proven with the help of automatic quantifier elimination [HB11]. Proving this inequal-
ity was a very minor task, both computationally and mathematically; in all likelihood,
it is feasible to give a human-generated proof. Still, it is nice to know from first-
hand experience that computers can nowadays (pretend to) do something other than
just perform numerical computations – and that this is already applicable in current
mathematical practice.
Chapter 2
We let τ (n) be the number of divisors of an integer n, ω(n) the number of prime
divisors of n, and σ(n) the sum of the divisors of n.
We write (a, b) for the greatest common divisor of a and b. If there is any risk
of confusion with the pair (a, b), we write gcd(a, b). Denote by (a, b∞ ) the divisor
vp (a)
of a. (Thus, a/(a, b∞ ) is coprime to b, and is in fact the maximal divisor
Q
p|b p
of a with this property.)
As is customary, we write e(x) for e2πix . We denote the Lr norm of a function f
by |f |r . We write O∗ (R) to mean a quantity at most R in absolute value. Given a set
S, we write 1S for its characteristic function:
(
1 if x ∈ S,
1S (x) =
0 otherwise.
31
32 CHAPTER 2. NOTATION AND PRELIMINARIES
not induced by any character of smaller modulus. Given a character χ, we write χ∗ for
the (uniquely defined) primitive character inducing χ. If a character χ mod q is induced
by the trivial character χT , we say that χ is principal and write χ0 for χ (provided the
modulus q is clear from the context). In other words, χ0 (n) = 1 when (n, q) = 1 and
χ0 (n) = 0 when (n, q) = 0.
A Dirichlet P L-function L(s, χ) (χ a Dirichlet character) is defined as the analytic
continuation of n χ(n)n−s to the entire complex plane; there is a pole at s = 1 if χ
is principal.
A non-trivial zero of L(s, χ) is any s ∈ C such that L(s, χ) = 0 and 0 < <(s) < 1.
(In particular, a zero at s = 0 is called “trivial”, even though its contribution can be
a little tricky to work out. The same would go for the other zeros with <(s) = 0
occuring for χ non-primitive, though we will avoid this issue by working mainly with
χ primitive.) The zeros that occur at (some) negative integers are called trivial zeros.
The critical line is the line <(s) = 1/2 in the complex plane. Thus, the generalized
Riemann hypothesis for Dirichlet L-functions reads: for every Dirichlet character χ,
all non-trivial zeros of L(s, χ) lie on the critical line. Verifiable finite versions of
the generalized Riemann hypothesis generally read: for every Dirichlet character χ of
modulus q ≤ Q, all non-trivial zeros of L(s, χ) with |=(s)| ≤ f (q) lie on the critical
line (where f : Z → R+ is some given function).
(The first bound follows from n∈Z |f (n)| ≤ |f |1 + (1/2)|f 0 |1 , which, in turn is
P
a quick consequence of the fundamental theorem of calculus; the second bound is
proven by summation by parts.) The alternative bound (1/4)|f 00 |1 /| sin(πα)|2 given
in [Tao14, Lemma 3.1] (for f continuous and piecewise C 1 ) can usually be improved
by the following estimate.
Lemma 2.3.1. Let f : R → C be compactly supported, continuous and piecewise C 1 .
Then
1 c
X |f 00 |∞
f (n)e(αn) ≤ 4 (2.3)
(sin πα)2
n∈Z
for every α ∈ R.
As usual, the assumption of compact support could easily be relaxed to an assump-
tion of fast decay.
Proof. By the Poisson summation formula,
∞
X ∞
X
f (n)e(αn) = fb(n − α).
n=−∞ n=−∞
Hence
∞ ∞
X X 1 1 π2
fb(n − α) ≤ |fc00 |∞ 2
= |fc00 |∞ · · .
n=−∞ n=−∞
(2π(n − α)) (2π) (sin απ)2
2
34 CHAPTER 2. NOTATION AND PRELIMINARIES
The trivial bound |fc00 |∞ ≤ |f 00 |1 , applied to (2.3), recovers the bound in [Tao14,
Lemma 3.1]. In order to do better, we will give a tighter bound for |fc00 |∞ in Appendix
B when f is equal to one of our main smoothing functions (f = η2 ).
Integrals of multiples of f 00 (in particular, |f 00 |1 and fc00 ) can still be made sense
of when f 00 is undefined at a finite number of points, provided f is understood as a
distribution (and f 0 has finite total variation). This is the case, in particular, for f = η2 .
***
P
When we need to estimate n f (n) precisely, we will use the Poisson summation
formula: X X
f (n) = fb(n).
n n
We will not have to worry about convergence here, since we will apply the Poisson
summation formula only to compactly supported functions f whose Fourier transforms
decay at least quadratically.
Recall that, in the case of the Fourier transform, for |fb|2 = |f |2 to hold, it is enough
that f be in `1 ∩ `2 . This gives us that, for (2.6) to hold, it is enough that f (x)xσ−1 be
in `1 and f (x)xσ−1/2 be in `2 (again, with respect to dt, in both cases).
We write f ∗M g for the multiplicative, or Mellin, convolution of f and g:
Z ∞ x dw
(f ∗M g)(x) = f (w)g . (2.7)
0 w w
In general,
M (f ∗M g) = M f · M g (2.8)
2.5. BOUNDS ON SUMS OF µ AND Λ 35
and
Z σ+i∞
1
M (f · g)(s) = M f (z)M g(s − z)dz [GR94, §17.32] (2.9)
2πi σ−i∞
provided that z and s − z are within the strips on which M f and M g (respectively) are
well-defined.
We also have several useful transformation rules, just as for the Fourier transform.
For example,
M (f 0 (t))(s) = −(s − 1) · M f (s − 1),
M (tf 0 (t))(s) = −s · M f (s), (2.10)
M ((log t)f (t))(s) = (M f )0 (s)
(as in, e.g., [BBO10, Table 1.11]).
Let
η2 = (2 · 1[1/2,1] ) ∗M (2 · 1[1/2,1] ).
Since (see, e.g., [BBO10, Table 11.3] or [GR94, §16.43])
bs − as
(M I[a,b] )(s) = ,
s
we see that
2 4
1 − 2−s 1 − 2−s
M η2 (s) = , M η4 (s) = . (2.11)
s s
where the next-to-last step holds by contour integration, and the last step holds by the
definition of the Gamma function Γ(s).
First, let us see some bounds involving Λ. The following bound can be easily
derived from [RS62, (3.23)], supplemented by a quick calculation of the contribution
of powers of primes p < 32:
X Λ(n)
≤ log x. (2.12)
n
n≤x
We can derive a bound in the other direction from [RS62, (3.21)] (for x > 1000,
adding the contribution of all prime powers ≤ 1000) and a numerical verification for
x ≤ 1000:
X Λ(n) 3
≥ log x − log √ . (2.13)
n 2
n≤x
for y ≥ 2 · 106 .
2. X
Λ(n) < 1.03883y (2.15)
n≤y
where we use (2.15) and partial summation for y > 200000, and a computation for
663 < y ≤ 200000. Using instead the second table in [RR96, p. 423], together with
computations for small y < 107 and partial summation, we get that
X y2
Λ(n)n < 1.0008 (2.17)
2
n≤y
for all y ≥ 1.
It is also true that
X 1
(log p)2 ≤ y(log y) (2.19)
2
y/2<p≤y
2.5. BOUNDS ON SUMS OF µ AND Λ 37
X µ(n)
≤1 (2.20)
n
n≤x:gcd(n,q)=1
for all x, q ≥ 1,
2. (Ramaré [Ram13]; cf. El Marraki [EM95], [EM96])
X µ(n) 0.03
≤ (2.21)
n log x
n≤x
for x ≥ 11815.
3. (Ramaré [Ramb])
X µ(n) 1 4 q
= O∗ · (2.22)
n log x/q 5 φ(q)
n≤x:gcd(n,q)=1
The computation was conducted rigorously by means of interval arithmetic. For the
sake of verification, we record that
X µ(n)
5.42625 · 10−8 ≤ ≤ 5.42898 · 10−8 .
12
n
n≤10
X µ(n) 1
≤ √
n 2 x
n≤x
bound rk− and an upper bound rk+ on {f (x) : x ∈ Ik }; here rk− and rk+ are both of
the form a/2` , a, ` ∈ Z. Let m0 be the minimum of rk+ over all k. We can discard
all the intervals Ik for which rk− > m0 . Then we apply the main procedure: starting
with i = 1, split each surviving interval into two equal halves, recompute the lower and
upper bound on each half, define mi , as before, to be the minimum of all upper bounds,
and discard, again, the intervals on which the lower bound is larger than mi ; increase i
by 1. We repeat the main procedure as often as needed. In the end, we obtain that the
minimum is no smaller than the minimum of the lower bounds (call them (r(i) )− k ) on
(i)
all surviving intervals Ik . Of course, we also obtain that the minimum (or minima, if
there is more than one) must lie in one of the surviving intervals.
It is easy to see how the same method can be applied (with a trivial modification)
to find maxima, or (with very slight changes) to find the roots of a real-valued function
on a compact interval.
40 CHAPTER 2. NOTATION AND PRELIMINARIES
Part I
Minor arcs
41
Chapter 3
Introduction
The circle method expresses the number of solutions to a given problem in terms of
exponential sums. Let η : R+ → C be a smooth function, Λ the von Mangoldt function
(defined as in (1.5)) and e(t) = e2πit . The estimation of exponential sums of the type
X
Sη (α, x) = Λ(n)e(αn)η(n/x), (3.1)
n
where α ∈ R/Z, already lies at the basis of Hardy and Littlewood’s approach to the
ternary Goldbach problem by means of the circle method [HL22]. The division of the
circle R/Z into “major arcs” and “minor arcs” goes back to Hardy and Littlewood’s
development of the circle method for other problems. As they themselves noted, as-
suming GRH means that, for the ternary Goldbach problem, all of the circle can be,
in effect, subdivided into major arcs – that is, under GRH, (3.1) can be estimated with
major-arc techniques for α arbitrary. They needed to make such an assumption pre-
cisely because they did not yet know how to estimate Sη (α, x) on the minor arcs.
Minor-arc techniques for Goldbach’s problem were first developed by Vinogradov
[Vin37]. These techniques make it possible to work without GRH. The main obstacle
to a full proof of the ternary Goldbach conjecture since then has been that, in spite of
gradual improvements, minor-arc bounds have simply not been strong enough.
As in all work to date, our aim will be to give useful upper bounds on (3.1) for
α in the minor bounds, rather than the precise estimates that are typical of the major-
arc case. We will have to give upper bounds that are qualitatively stronger than those
known before. (In Part III, we will also show how to use them more efficiently.)
Our main challenge will be to give a good upper bound whenever q is larger than a
constant r. Here “sufficiently good” means “smaller than the trivial bound divided by
a large constant, and getting even smaller quickly as q grows”. Our bound must also be
good for α = a/q + δ/x, where q < r but δ is large. (Such an α may be said to lie on
the tail (δ large) of a major arc (q small).)
Of course, all expressions must be explicit and all constants in the leading terms of
the bound must be small. Still, the main requirement is a qualitative one. For instance,
we know in advance that a single factor of log x would be the end of us. That is, we
43
44 CHAPTER 3. INTRODUCTION
know that, if there is a single term of the form, say, (x log x)/q, and the trivial bound
is about x, we are lost: (x log x)/q is greater than x for x large and q constant.
The quality of the results here is due to several new ideas of general applicability.
In particular, §5.1 introduces a way to obtain cancellation from Vaughan’s identity.
Vaughan’s identity is a two-log gambit, in that it introduces two convolutions (each of
them at a cost of log) and offers a great deal of flexibility in compensation. One of the
ideas presented here is that at least one of two logs can be successfully recovered after
having been given away in the first stage of the proof. This reduces the cost of the use
of this basic identity in this and, presumably, many other problems.
There are several other improvements that make a qualitative difference; see the
discussions at the beginning of §4 and §5. Considering smoothed sums – now a com-
mon idea – also helps. (Smooth sums here go back to Hardy-Littlewood [HL22] – both
in the general context of the circle method and in the context of Goldbach’s ternary
problem. In recent work on the problem, they reappear in [Tao14].)
3.1 Results
p
The main bound we are about to see is essentially proportional to ((log q)/ φ(q)) · x.
The term δ0 serves to improve the bound when we are on the tail of an arc.
Theorem 3.1.1. Let x ≥ x0 , x0 = 2.16 · 1020 . Let Sη (α, x) be as in (3.1), with η
defined in (3.4). Let 2α = a/q + δ/x, q ≤ Q, gcd(a, q) = 1, |δ/x| ≤ 1/qQ, where
Q = (3/4)x2/3 . If q ≤ x1/3 /6, then
Rx,δ0 q log δ0 q + 0.5 2.5x 2x
|Sη (α, x)| ≤ p ·x+ √ + · Lx,δ0 q,q + 3.36x5/6 ,
δ0 φ(q) δ0 q δ0 q
(3.2)
where
!
log 4t
δ0 = max(2, |δ|/4), Rx,t = 0.27125 log 1 + 9x1/3
+ 0.41415,
2 log 2.004t
q 13
Lx,t,q = log t + 7.82 + 13.66 log t + 37.55.
φ(q) 4
(3.3)
If q > x1/3 /6, then
The factor Rx,t is small in practice; for instance, for x = 1025 and δ0 q = 5 · 105
(typical “difficult” values), Rx,δ0 q equals 0.59648 . . . .
The classical choice1 for η in (3.1) is η(t) = 1 for t ≤ 1, η(t) = 0 for t > 1, which,
of course, is not smooth, or even continuous. We use
as in Tao [Tao14], in part for purposes of comparison. (This is the multiplicative con-
volution of the characteristic function of an interval with itself.) Nearly all work should
be applicable to any other sufficiently smooth function η of fast decay. It is important
that ηb decay at least quadratically.
We are not forced to use the same smoothing function as in Part II, and we do not.
As was explained in the introduction, the simple technique (1.40) allows us to work
with one smoothing function on the major arcs and with another one on the minor arcs.
for 20 ≤ q ≤ x1/48 . We should underline that, while both the constant 13000 and the
condition q ≤ x1/48 keep (3.5) from being immediately useful in the present context,
(3.5) is asymptotically better than the results here as q → ∞. (Indeed, qualitatively
speaking, the form of (3.5) is the best one can expect from results derived by the family
of methods stemming from Vinogradov’s work.) There is also unpublished work by
Ramaré (ca. 1993) with different constants for q (log x/ log log x)4 .
Table 3.1: Worst-case upper bounds on x−1 |Sη (a/2q, x)| for q ≥ q0 , |δ| ≤ 8, x =
1027 . The trivial bound is 1.
Indeed, by (3.7),
X
Λ>V (n) = µ(d)Λ>V (m)
dm|n
X X
= µ≤U (d)Λ>V (m) + µ>U (d)Λ>V (m).
dm|n dm|n
Applying to this the trivial equality Λ>V = Λ − Λ≤V , as well as the simple fact that
1 ∗ Λ = log, we obtain that
X X X
Λ>V (n) = µ≤U (d) log(n/d) − µ≤U (d)Λ≤V (m) + µ>U (d)Λ>V (m).
d|n dm|n dm|n
where
X X
SI,1 = µ(m)f (m) (log n)e(αmn)f (n)η(mn/x),
m≤U n
X X X
SI,2 = Λ(d)f (d) µ(m)f (m) e(αdmn)f (n)η(dmn/x),
d≤V m≤U n
X (3.9)
X X
SII = f (m)
µ(d)
Λ(n)e(αmn)f (n)η(mn/x),
m>U d>U n>V
d|m
X
S0,∞ = Λ(n)e(αn)f (n)η(n/x).
n≤V
where v is a small, positive, square-free integer. (Our final choice will be v = 2.) Then
The sums SI,1 , SI,2 are called “of type I”, the sum SII is called “of type II” (or
bilinear). (The not-all-too colorful nomenclature goes back to Vinogradov.) The sum
S0,∞ is in general negligible; for our later choice of V and η, it will be in fact 0. The
sum S0,v will be negligible as well.
As we already discussed in the introduction, Vaughan’s identity is highly flexible
(in that we can choose U and V at will) but somewhat inefficient in practice (in that a
trivial estimate for the right side of (3.11) is actually larger than a trivial estimate for
the left side of (3.11)). Some of our work will consist in regaining part of what is given
up when we apply Vaughan’s identity.
Λ · log2 = FI,1,u (n) − 3FI,2,V,u (n) − 3FII,V,u (n) + F3,V (n) + F0,V,u (n), (3.17)
where
FI,1,u = µ ∗ log3>u ,
FI,2,V,u = (µ ∗ log2 ) ∗ Λ≤V ,
FII,V,u (n) = (Λ · log)>u ∗ Λ>V ,
F0,V,u (n) = µ ∗ log3≤u −3(Λ · log)≤u ∗ Λ>V
and F3,V is as in (3.16).
In the bulk of the present work – in particular, in all steps that are part of the proof
of Theorem 3.1.1 or the Main Theorem – we will use Vaughan’s identity, rather than
(3.17). This choice was made while the proof was still underway; it was due mainly
to back-of-the-envelope estimates that showed that the error terms could be too large
if (3.14) was used. Of course, this might have been the case with Vaughan’s identity
as well, but the fact that the parameters U , V there have a large effect on the outcome
meant that one could hope to improve on insufficient estimates in part by adjusting U
and V , without losing all previous work. (This is what was meant by the “flexibility”
of Vaughan’s identity.)
The question remains: can one prove ternary Goldbach using (3.17) rather than
Vaughan’s identity? This seems likely. If so, which proof would be more complicated?
This is not clear.
There are large parts of the work that are the essentially the same in both cases:
• estimates for sums involving Λ>u ∗ Λ>V and the like (“type II”).
Trilinear sums, i.e., sums involving Λ∗Λ∗Λ, can be estimated much like bilinear sums,
i.e., sums involving Λ ∗ Λ.
There are also challenges that appear only for Vaughan’s identity and others that
appear only for (3.17). An example of a challenge that is successfully faced in the main
proof, but does not appear if (3.17) is used, consists in bounding sums of type
2
X X
.
µ(d)
U <m≤x/W d>U
d|m
(In §5.1, we will be able to bound sums of this type by a constant times x/W .) Like-
wise, large tail terms that have to be estimated trivially seem unavoidable in (3.17).
(The choice of a parameter u > 1, as above, is meant to alleviate the problem.)
50 CHAPTER 3. INTRODUCTION
In the end, losing a factor of about log x/U V seems inevitable when one uses
Vaughan’s identity, but not when one uses (3.17). Another reason why a full treatment
based on (3.17) would also be worthwhile is that it is a somewhat less familiar, and
arguably under-used, identity and deserves more exploration. With these comments,
we close the discussion of (3.17); we will henceforth use Vaughan’s identity.
Chapter 4
Type I sums
and variations thereof. There are three main improvements in comparison to standard
treatments:
1. The terms with m divisible by q get taken out and treated separately by analytic
means. This all but eliminates what would otherwise be the main term.
2. The other terms get handled by improved estimates on trigonometric sums. For
large m, the improvements have a substantial total effect – more than a constant
factor is gained.
3. The “error” term δ/x = α − a/q is used to our advantage. This happens both
through the Poisson summation formula and through the use of two alternative
approximations to the same number α.
The fact that a continuous weight η is used (“smoothing”) is a difference with respect
to the classical literature ([Vin37] and what followed), but not with respect to more
recent work (including [Tao14]); using smooth or continuous weights is an idea that
has become commonplace in analytic number theory, even though it is not consistently
applied. The improvements due to smoothing in type I are both relatively minor and
essentially independent of the improvements due to (1) and (3). The use of a contin-
uous weight combines nicely with (2), but the ideas given here would give qualitative
improvements in the treatment of trigonometric sums even in the absence of smoothing.
51
52 CHAPTER 4. TYPE I SUMS
Lemma 8b in [Vin04, Ch. I]. See, in particular, the work of Daboussi and Rivat [DR01,
Lemma 1].) The main idea is to switch between different types of approximation within
the sum, rather than just choosing between bounding all terms either trivially (by A)
or non-trivially (by C/| sin(παn)|2 ). There will also1 be improvements in our appli-
cations stemming from the fact that Lemmas 4.1.1 and Lemma 4.1.2 take quadratic
(| sin(παn)|2 ) rather than linear (| sin(παn)|) inputs. (These improved inputs come
from the use of smoothing elsewhere.)
Lemma 4.1.1. Let α = a/q + β/qQ, (a, q) = 1, |β| ≤ 1, q ≤ Q. Then, for any
A, C ≥ 0,
6q 2 4q √
X C
min A, ≤ min 2A + C, 3A + AC . (4.1)
| sin(παn)|2 π2 π
y<n≤y+q
aj + c
αn = + δ1 (j) + δ2 mod 1,
q
where |δ1 (j)| and |δ2 | are both ≤ 1/2q; we can assume δ2 ≥ 0. The variable r =
aj + c mod q occupies each residue class mod p exactly once.
One option is to bound the terms corresponding to r = 0, −1 by A each and all
the other terms by C/| sin(παn)|2 . (This can be seen as the simple case; it will take
us about a page just because we should estimate all sums and all terms here with great
care – as in [DR01], only more so.)
The terms corresponding to r = −k and r = k − 1 (2 ≤ k ≤ q/2) contribute at
most
1 1 1 1
+ ≤ + ,
sin2 πq (k − 1
2 − qδ2 ) sin2 πq (k − 3
2 + qδ2 ) sin2 π
q k − 12 sin2 π
q k − 32
1
since x 7→ (sin x)2 is convex-up on (0, ∞). Hence the terms with r 6= 0, 1 contribute at
most
Z q/2
1 X 1 1 1
2 + 2 2 ≤ 2 + 2 2 ,
π π π
sin 2q 2≤r≤ q2 sin q (r − 1/2) sin 2q 1 sin πq x
where we use again the convexity of x 7→ 1/(sin x)2 . (We can assume q > 2, as
otherwise we have no terms other than r = 0, 1.) Now
π
Z q/2 Z
1 q 2 1 q π
2 dx = du = cot .
π (sin u)2 π q
π π
1 sin qx
q
1 This is a change with respect to the first version of the preprint [Helb]. The version of Lemma 4.1.1
there has, however, the advantage of being immediately comparable to results in the literature.
4.1. TRIGONOMETRIC SUMS 53
Hence
X C C 2q π
min A, ≤ 2A + 2 + C · cot .
(sin παn)2 π π q
y<n≤y+q sin 2q
t X t2
=1+ a2k+1 t2k+2 = 1 + + . . .
sin t 6
k≥0
(4.2)
X
2k+2 t2 t4
t cot t = 1 − b2k+1 t =1− − − ...,
3 45
k≥0
t2 t2 π 1 2t2 8t4
4
+ t cot 2t ≤ 1 + + c0 t + − −
sin2 t 3 4 2 3 45
2 2
3 t π 8 3 t 3
= − + c0 − t4 ≤ − ≤
2 3 4 45 2 3 2
4q √
r r
2q C 2q A
3A + A arcsin + C − 1 ≤ 3A + AC.
π A π C π
√
(If C/A > 1, then 3A + (4q/π) AC is greater than Aq, which is an obvious upper
bound for the left side of (4.1).)
Now we will see that, if we take out terms with n divisible by q and n is not too
large, then we can give a bound that does not involve a constant term A√ at all. (We are
referring to the bound (20/3π 2 )Cq 2 below; of course, 2A + (4q/π) AC does have
a constant term 2A – it is just smaller than the constant term 3A in the corresponding
bound in (4.1).)
Lemma 4.1.2. Let α = a/q + β/qQ, (a, q) = 1, |β| ≤ 1, q ≤ Q. Let y2 > y1 ≥ 0. If
y2 − y1 ≤ q and y2 ≤ Q/2, then, for any A, C ≥ 0,
4q √
X C 20 2
min A, ≤ min Cq , 2A + AC . (4.5)
| sin(παn)|2 3π 2 π
y1 <n≤y2
q-n
Proof. Clearly, αn equals an/q + (n/Q)β/q; since y2 ≤ Q/2, this means that |αn −
an/q| ≤ 1/2q for n ≤ y2 ; moreover, again for n ≤ y2 , the sign of αn − an/q remains
constant. Hence the left side of (4.5) is at most
q/2
! q/2 !
X C X C
min A, + min A, .
r=1
(sin πq (r − 1/2))2 r=1
(sin πq r)2
4.1. TRIGONOMETRIC SUMS 55
for q ≥ 2. (If q = 1, then the left-side of (4.5) is trivially zero.) Now, by (4.2),
t2 t2 4t2 16t4
π 1
t
+ cot 2t ≤ 1 + + c0 t4 + 1− −
(sin t)2 2 3 4 4 3 45
5 π 4 5
≤ + c0 − t4 ≤
4 4 45 4
t2 t2 3t2 81t4
π 2
3t
2
+ t cot ≤ 1 + + c0 t4 + 1− − 4
(sin t) 2 3 2 3 4 2 · 45
5 1 π 27 π 2 2 5
≤ + − + c0 − t ≤
3 6 2 360 2 3
Ce3 q
X B C q
min , ≤ 2B max 2, log . (4.6)
| sin(παn)| | sin(παn)|2 π Bπ
y1 <n≤y2
q-n
Proof. As in the proof of Lemma 4.1.2, we can bound the left side of (4.6) by
q/2
!
X B C
2 min , .
sin πq r − 12 sin2 πq r − 12
r=1
56 CHAPTER 4. TYPE I SUMS
preprint [Helb].
4.2. TYPE I ESTIMATES 57
terms small thanks to the particular way in which we switch between two different
approximations.
(These are not necessarily successive approximations in the sense of continued
fractions; we do not want to assume that the approximation a/q we are given arises
from a continued fraction, and at any rate we need more control on the denominator q 0
of the new approximation a0 /q 0 than continued fractions would furnish.)
The following lemma is a theme, so to speak, to which several variations will be
given. Later, in practice, we will always use one of the variations, rather than the
original lemma itself. This is so just because, even though (4.8) is the basic type of
sum we treat in type I, the sums that we will have to estimate in practice will always
present some minor additional complication. Proving the lemma we are about to give
in full will give us a chance to see all the main ideas at work, leaving complications for
later.
Lemma 4.2.1. Let α = a/q + δ/x, (a, q) = 1, |δ/x| ≤ 1/qQ0 , q ≤ Q0 , Q0 ≥ 16. Let
η be continuous, piecewise C 2 and compactly supported, with |η|1 = 1 and η 00 ∈ L1 .
Let c0 ≥ |ηc00 |∞ .
√ p
Let 1 ≤ D ≤ x. Then, if |δ| ≤ 1/2c2 , where c2 = (3π/5 c0 )(1 + 13/3), the
absolute value of mn
X X
µ(m) e(αmn)η (4.8)
n
x
m≤D
is at most
2
x c0 X µ(m) ∗ 1 1 D D
min 1, + O c0 − + (4.9)
q (2πδ)2 m 4 π2 2xq 2x
m≤ M
q
(m,q)=1
plus
√ √
2 c0 c1 x D c0 c1 D
D + 3c1 log+ + q log+
π q c2 x/q π q/2
0
3 2
√ (4.10)
|η |1 c0 e q 2 3c0 c1 3c1 55c0 c2
+ q · max 2, log + + + q,
π 4π|η 0 |1 x π c2 12π 2
where c1 = 1 + |η 0 |1 /(2x/D) and M ∈ [min(Q0 /2, D), D]. The same bound holds if
|δ| ≥ 1/2c2 but D ≤ Q0 /2.
In general, if |δ| ≥ 1/2c2 , the absolute value of (4.8) is at most (4.9) plus
√ !!
2 c0 c1 x 1 + 2D
D + (1 + ) min + 1, 2D $ + log x
π |δ|q 2 |δ|q
! (4.11)
(1 + ) 2D x 35c 0 c 2
+ 3c1 2 + log+ x + q,
|δ|q Q0 6π 2
√ p
for ∈ (0, 1] arbitrary, where $ = 3 + 2 + ((1 + 13/3)/4 − 1)/(2(1 + )).
58 CHAPTER 4. TYPE I SUMS
2
p In (4.9), min(1, c0 /(2πδ) ) always equals 1 when |δ| ≤ 1/2c2 (since (3/5)(1 +
13/3) > 1).
X 1 X 1 2
max1 2
= = π − 4.
1 2
|r|≤ 2 (n − r) n −
n6=0 n6=0 2
4.2. TYPE I ESTIMATES 59
xµ(q) X µ(m)
= · ηb(−δ) · .
q M
m
m≤ q
(m,q)=1
D2
∗ 2 1 1 D
+O µ(q) c0 − + .
4 π2 2xq 2x
We will bound |bη (−δ)| by (2.1).
As we have just seen, estimating the contribution of the terms with m divisible by
q and not too large (m ≤ M ) involves isolating a main term, estimating it carefully
(with cancellation) and then bounding the remaining error terms.
We will now bound the contribution of all other m – that is, m not divisible by q
and m larger than M . Cancellation will now be used only within the inner sum; that
is, we will bound each inner sum
X mn
Tm (α) = e(αmn)η ,
n
x
and then we will carefully consider how to bound sums of |Tm (α)| over m efficiently.
By (2.2) and Lemma 2.3.1,
1 0
2 |η |1
x 1 0 m c0 1
|Tm (α)| ≤ min + |η |1 , , . (4.13)
m 2 | sin(πmα)| x 4 (sin πmα)2
For any y2 > y1 > 0 with y2 − y1 ≤ q and y2 ≤ Q/2, (4.13) gives us that
X X C
|Tm (α)| ≤ min A, (4.14)
(sin πmα)2
y1 <m≤y2 y1 <m≤y2
q-m q-m
for A = (x/y1 )(1 + |η 0 |1 /(2(x/y1 ))) and C = (c0 /4)(y2 /x). We must now estimate
the sum X X
|Tm (α)| + |Tm (α)|. (4.15)
m≤M Q
2 <m≤D
q-m
To bound the terms with m ≤ M , we can use Lemma 4.1.2. The question is then
which one is smaller: the first or the second bound given by Lemma 4.1.2? A brief
calculation gives that the second
p bound is smaller
p (and hence preferable) exactly when
p √
C/A > (3π/10q)(1 + 13/3). Since C/A ∼ ( c0 /2)m/x, this means that
it is sensible to preferpthe second bound in Lemma 4.1.2 when m > c2 x/q, where
√
c2 = (3π/5 c0 )(1 + 13/3).
It thus makes sense to ask: does Q/2 ≤ c2 x/q (so that m ≤ M implies m ≤
c2 x/q)? This question divides our work into two basic cases.
60 CHAPTER 4. TYPE I SUMS
√ p
Case (a). δ large: |δ| ≥ 1/2c2 , where c2 = (3π/5 c0 )(1 + 13/3). Then
Q/2 ≤ c2 x/q; this will induce us to bound the first sum in (4.15) by the first bound in
Lemma 4.1.2.
Recall that M = min(Q/2, D), and so M ≤ c2 x/q. By (4.14) and Lemma 4.1.2,
∞ (j+1)q
!
X X X x |η 0 |1 c40 x
|Tm (α)| ≤ min + ,
j=0
jq + 1 2 (sin πmα)2
1≤m≤M jq<m≤min((j+1)q,M )
q-m q-m
20 c0 q 3 20 c0 q 3 1 M2
X 3 c2 x
≤ (j + 1) ≤ · + +1
3π 2 4x 3π 2 4x 2 q2 2 q2
0≤j≤ M
q
q2
5c0 c2 5c0 q 3 5c0 c2 35c0 c2
≤ M+ c2 + ≤ M+ q,
6π 2 3π 2 2 x 6π 2 6π 2
(4.16)
where, to bound the smaller terms, we are using the inequality Q/2 ≤ c2 x/q, and
where we are also using the observation that, since |δ/x| ≤ 1/qQ0 , the assumption
|δ| ≥ 1/2c2 implies that q ≤ 2c2 x/Q0 ; moreover, since q ≤ Q0 , this gives us that
q 2 ≤ 2c2 x. In the main term, we are bounding qM 2 /x from above by M · qQ/2x ≤
M/2δ ≤ c2 M .
If D ≤ (Q + 1)/2, then M ≥ bDc and so (4.16) is all we need: the second sum
in (4.15) is empty. Assume from now on that D > (Q + 1)/2. The first sum in (4.15)
is then bounded by (4.16) (with M = Q/2). To bound the second sum in (4.15), we
will use the approximation a0 /q 0 instead of a/q. The motivation is the following: if
we used the approximation a/q even for m > Q/2, the contribution of the terms with
q|m would be too large. When we use a0 /q 0 , the contribution of the terms with q 0 |m
(or m ≡ ±1 mod q 0 ) is very small: only a fraction 1/q 0 (tiny, since q 0 is large) of all
terms are like that, and their individual contribution is always small, precisely because
m > Q/2.
By (4.14) (without the restriction q - m on either side) and Lemma 4.1.1,
X ∞
X X
|Tm (α)| ≤ |Tm (α)|
Q/2<m≤D j=0 jq 0 + Q <m≤min((j+1)q 0 +Q/2,D)
2
j k
D−(Q+1)/2 s
q0
!
X x 4q 0 c1 c0 x (j + 1)q 0 + Q/2
≤ 3c1 Q+1
+ 0
j=0
jq 0 + 2 π 4 jq + (Q + 1)/2 x
j k
D−(Q+1)/2 s
q0 !
4q 0 q0
X x c1 c0
≤ 3c1 + 1+ 0 ,
j=0
jq 0 + Q+1
2
π 4 jq + (Q + 1)/2
If q 2 < 2c2 x, we estimate the terms with q/2 < m ≤ c2 x/q by Lemma 4.1.2,
which is applicable because min(c2 x/q, D) < Q/2:
∞ (j+1/2)q
!
X X X x |η10 | c40 x
|Tm (α)| ≤ min + ,
q 0 j=1
j − 12 q 2 (sin πmα)2
2 <m≤D (j− 2 )q<m≤(j+ 2 )q
1 1
q-m c x
m≤min( 2q ,D )
q-m
3
20 c0 q 3 c2 x D 0
20 c0 q X 1 3 c2 x 5
≤ j+ ≤ + +
3π 2 4x 0
2 3π 2 4x 2q 2 q 2 q2 8
1≤j≤ Dq + 21
5 q3
5c0 0 5c0 c2 0 11
≤ c2 D + 3c2 q + ≤ D + q ,
6π 2 4 x 6π 2 2
(4.24)
where we write D0 = min(c2 x/q, D). If c2 x/q ≥ D, we stop here. Assume that
c2 x/q < D. Let R = max(c2 x/q, q/2). The terms we have already estimated are
precisely those with m ≤ R. We bound the terms R < m ≤ D by the second bound
in Lemma 4.1.1:
∞
!
c0 (j+1)q+R
X X X c1 x
|Tm (α)| ≤ min , 4 x
2
j=0
jq + R (sin πmα)
R<m≤D m>jq+R
m≤min((j+1)q+R,D)
(4.25)
b q1 (D−R)
X c
s
3c1 x 4q c1 c0 q
≤ + 1+
j=0
jq + R π 4 jq + R
(Note there is no need to use two successive approximations a/q, a0 /q 0 as in case (a).
We are also including all terms with m divisible by q, as we may, since |Tm (α)| is
non-negative.) Now, much as before,
b q1 (D−R)
X c Z D
x x x 1 q 2x x D
≤ + dt ≤ min , + log+ , (4.26)
j=0
jq + R R q R t c2 q q c2 x/q
and
X cr
b q1 (D−R)
1 D
r Z r
q q q
1+ ≤ 1+ + 1 + dt
j=0
jq + R R q R t (4.27)
√ D−R 1 D
≤ 3+ + log+ .
q 2 q/2
We sum with (4.23) and (4.24), and we obtain that (4.15) is at most
√
2 c0 c1 √
q D D x
3q + D + log+ + 3c1 log+
π 2 q/2 c2 x/q q
(4.28)
|η 0 |1 c0 e 3 q 2
q 2x 55c0 c2
+ 3c1 min , + q + q · max 2, log ,
c2 q 12π 2 π 4π|η 0 |1 x
4.2. TYPE I ESTIMATES 63
√
where we are using the fact that 5c0 c2 /6π 2 < 2 c0 c1 /π to make sure that the term
√
(5c0 c2 /6π 2 )D0 from (4.24) is more than compensated by the term −2 c0 c1 R/π com-
ing from −R/q in (4.27) (by the definition of D0 and R, we have R ≥ D). We can
√
also use 5c0 c2 /6π 2 < 2 c0 c1 /π to bound the term (5c0 c2 /6π 2 )D0 from (4.24) by the
√
term 2 c0 c1 D/π in (4.28), in case c2 x/q ≥ D. (Again by definition, D0 ≤ D.) Thus,
(4.28) is valid both when c2 x/q < D and when c2 x/q ≥ D.
is at most
2 !
x c0 X µ(m) c0 q 1 1 D
min 1, + O∗ − 2 +1 (4.30)
2q (πδ)2 m x 8 2π q
m≤ M
q
(m,2q)=1
plus
√ √
2 c0 c1 3c1 x + D c0 c1 D
D+ log + q log+
π 2 q c2 x/q π q/2
√ (4.31)
2|η 0 |1 c0 e3 q 2
2 3c0 c1 3c1 55c0 c2
+ q · max 1, log + + + q,
π 4π|η 0 |1 x π 2c2 6π 2
where c1 = 1 + |η 0 |1 /(x/D) and M ∈ [min(Q0 /2, D), D]. The same bound holds if
|δ| ≥ 1/2c2 but D ≤ Q0 /2.
In general, if |δ| ≥ 1/2c2 , the absolute value of (4.8) is at most (4.30) plus
√ !!
√
2 c0 c1 x 1 + 2D
D + (1 + ) min + 1, 2D 3 + 2 + log x
π |δ|q 2 |δ|q
!
3 (1 + ) 2D x 35c0 c2
+ c1 2 + log+ x + q,
2 |δ|q Q 0 3π 2
(4.32)
for ∈ (0, 1] arbitrary.
64 CHAPTER 4. TYPE I SUMS
Hence, the absolute value of the sum of all terms with m ≤ M and q|m is given by
(4.30).
We define Tm,◦ (α) by
X mn
Tm,◦ (α) = e(αmn)η . (4.35)
x
n odd
is at most
c0 /δ 2
x X µ(m) x x [ X µ(m)
min 1, log + |log ·η(−δ)|
q (2π)2 m mq q m
m≤ M
q m≤ M
q (4.40)
(m,q)=1 (m,q)=1
D2 e1/2 x 1
∗ 1 2
+O c0 − log +
2 π2 4qx D e
plus
√
2 c0 c1 ex 3c1 x D q
D log + log+ log
π D 2 q c2 x/q c2
√
2 c0 c1 √
0
c0 e3 q 2
2|η |1 1 + D q
+ max 1, log log x + 3 + log log q
π 4π|η 0 |1 x π 2 q/2 c2
3/2 √
2x 20c0 c2 √
r
3c1 2x 2 ex
+ log + 2x log
2 c2 c2 3π 2 c2
(4.41)
for c1 = 1 + |η 0 |1 /(x/D). The same bound holds if |δ| ≥ 1/2c2 but D ≤ Q0 /2.
In general, if |δ| ≥ 1/2c2 , the absolute value of (4.39) is at most
√
2 c0 c1 ex
D log +
π D
√ !
√ + √
2 c0 c1 x 1 + 2D +
(1 + ) +1 3 + 2 · log 2 e|δ|q + log x log 2|δ|q
π |δ|q 2 |δ|q
√ 3/2 √
3c1 2 1+ 40
+ √ + log x + 2c0 c2 x log x
4 5 2 3
(4.42)
for ∈ (0, 1].
Proof. Define Q, Q0 , M , a0 and q 0 as in the proof of Lemma 4.2.1. The same method of
proof works as for Lemma 4.2.1; we go over the differences. When applying Poisson
summation or (2.2), use η(x/m) (t) = (log xt/m)η(t) instead of η(t). Then use the
bounds in (4.38) with ρ = x/m; in particular,
00 x
|η\
(x/m) |∞ ≤ c0 log .
m
and so
1X b x 1 δ 1X xn δ
f (n/2) ≤ (x/m) −
η\ + ηb −
2 n m 2 2 2 m2 2
n6=0
x δ m
1x [ δ x c0
= log ·η − + log ηb − + log (π 2 − 4).
2m 2 m 2 x m 2π 2
for q odd. (We can see that this, like the rest of the main term, vanishes for m even.)
In the term in front of π 2 − 4, we find the sum
q M/q
X m x M Z
x x/q
log ≤ log + t log dt
x m x M 2 0 t
m≤M
m odd
q|m
M x M2 e1/2 x
= log + log ,
x M 4qx M
where we use the fact that t 7→ t log(x/t) is increasing for t ≤ x/e. By the same fact
(and by M ≤ D), (M 2 /q) log(e1/2 x/M ) ≤ (D2 /q) log(e1/2 x/D). It is also easy to
see that (M/x) log(x/M ) ≤ 1/e (since M ≤ D ≤ x).
The basic estimate for the rest of the proof (replacing (4.13)) is
X mn X mn
Tm,◦ (α) = e(αmn)(log n)η = e(αmn)η(x/m)
x x
n odd n odd
0 1 0 1 \ 00
x |η (x/m) | 1 2 |η(x/m) | 1 m 2 |η (x/m) | ∞
= O∗ min |η(x/m) |1 + , ,
2m 2 | sin(2πmα)| x (sin 2πmα)2
1 0
|η 0 |1 2 |η |1
x x m c0 1
= O∗ log · min + , , .
m 2m 2 | sin(2πmα)| x 2 (sin 2πmα)2
We wish to bound
X X
|Tm,◦ (α)| + |Tm,◦ (α)|. (4.43)
m≤M Q
2 <m≤D
q-m
m odd
Just as in the proofs of Lemmas 4.2.1 and 4.2.2, we give two bounds, one valid for
|δ| large (|δ| ≥ 1/2c2 ) and the other for δ small (|δ| ≤ 1/2c2 ). Again as in the proof
of Lemma 4.2.2, we ignore the condition that m is odd in (4.15).
68 CHAPTER 4. TYPE I SUMS
Since
X x
(j + 1) log
jq + 1
0≤j≤ M
q
M x X x X x
≤ log x + log + log + j log
q M M
jq jq
1≤j≤ q 1≤j≤ M
q −1
Z Mq Z Mq
M x x x
≤ log x + log + log dt + t log dt
q M 0 tq 1 tq
2 1/2
2M M e x
≤ log x + + 2 log ,
q 2q M
this means that
40 c0 q 3 M2 e1/2 x
X 2M
|Tm (α)| ≤ log x + + log
3π 2 4x q 2q 2 M
1≤m≤M
q-m (4.45)
√
5c0 c2 ex 40 √ 3/2 √
≤ M log + 2c0 c2 x log x,
3π 2 M 3
where we are using the bounds M ≤ Q/2 ≤ c2 x/q and q 2 ≤ 2c2 x (just as in (4.16)).
Instead of (4.17), we have
j k
D−(Q+1)/2
q0
! Z D
X x x x 2x x x dt
log ≤ log + 0 log
j=0
jq + Q+1
0
2
0
jq + 2Q+1 Q/2 Q q Q+1
2
t t
2x 2x x 2x 2D
≤ log + 0 log log+ ;
Q Q q Q Q
recall that the coefficient in front of this sum will be halved by the condition that n is
odd. Instead of (4.18), we obtain
D−(Q+1)/2
b q0
c s !
0
X q0 x
q 1+ 0 log
j=0
jq + (Q + 1)/2 jq 0 + Q+1 2
Z D
√ 0
2x q x
≤ q 0 3 + 2 · log + 1+ log dt
Q+1 Q+1
2
2t t
√ 2x ex
≤ q 0 3 + 2 · log + D log
Q+1 D
0
Q+1 2ex q 2x 2D
− log + log log .
2 Q+1 2 Q+1 Q+1
4.2. TYPE I ESTIMATES 69
Rb
(The bound a log(x/t)dt/t ≤ log(x/a)
P log(b/a) will be more practical than the exact
expression for the integral.) Hence Q/2<m≤D |Tm (α)| is at most
√
2 c0 c1 ex
D log
π D
√
√
2 c0 c1 (1 + ) 2D 2x
+ (1 + ) 3 + 2 + log (Q + 1) log
π 2 Q+1 Q+1
√
√ √
2 c0 c1 Q + 1 2ex 3c1 2 1+ D
− · log + √ + log+ x log x.
π 2 Q+1 2 5 Q/2
Summing this to (4.45) (with M = Q/2), and using (4.21) and (4.22) as before, we
obtain that (4.43) is at most
√
2 c0 c1 ex
D log
π D √
√
√
2 c0 c1 2 ex 1 2D 2x
+ (1 + )(Q + 1) 3 + 2 log+ + log+ log+
π Q+1 2 Q+1 Q+1
√ √ √ √
3c1 2 1+ D 40 3/2
+ √ + log+ x log x + 2c0 c2 x log x.
2 5 Q/2 3
Now we go over the case of |δ| small (or D ≤ Q0 /2). Instead of (4.23), we have
2|η 0 |1 c0 e3 q 2
X
|Tm,◦ (α)| ≤ q max 1, log log x. (4.46)
π 4π|η 0 |1 x
m≤q/2
Suppose q 2 < 2c2 x. (Otherwise, the sum we are about to estimate is empty.) Instead
of (4.24), we have
40 c0 q 3
X X 1 x
|Tm,◦ (α)| ≤ 2 6x
j+ log 1
q 0
3π D0
2 j − 2 q
2 <m≤D
1
1≤j≤ q +2
q-m
Z 0 Z 0 !
10c0 q 3 2x 1 D x 1 D x D0 x
≤ log + log dt + t log dt + log 0
3π 2 x q q 0 t q 0 t q D
√
10c0 q 3 2D0 (D0 )2
2x ex
= log + + log
2
3π x q q 2q 2 D0
√ √
√ √
5c0 c2 2x ex 0 ex
≤ 4 2c2 x log + 4 2c 2 x log + D log
3π 2 q D0 D0
√ √
√
5c0 c2 0 ex 2 ex
≤ D log + 4 2c 2 x log
3π 2 D0 c2
(4.47)
where D0 = min(c2 x/q, D). (We are using the bounds q 3 /x ≤ (2c2 )3/2 , D0 q 2 /x ≤
3/2 √
c2 q < c 2 2x and D0 q/x ≤ c2 .) Instead of (4.25), we have
b D−R
q c 3c1
s !
2 x 4q c1 c0 q x
X X
|Tm,◦ (α)| ≤ + 1+ log ,
j=0
jq + R π 4 jq + R jq + R
R<m≤D
70 CHAPTER 4. TYPE I SUMS
where R = max(c2 x/q, q/2). We can simply reuse (4.26), multiplying it by log x/R;
the only p we take care to bound min(q/c2 , 2x/q) by the geometric
pdifference is that now
mean (q/c2 )(2x/q) = 2x/c2 . We replace (4.27) by
b q1 (D−R)
X cr 1 D
r Z r
q x q x q x
1+ log ≤ 1 + log + 1 + log dt
j=0
jq + R jq + R R R q R t t
√
q D ex R ex 1 q D
≤ 3 log + log − log + log log+ .
c2 q D q R 2 c2 R
(4.48)
We sum with (4.46) and (4.47), and obtain (4.41) as an upper √bound for (4.43). (Just as
in the proof of Lemma 4.2.1, the term (5c0 c2 /(3π 2 ))D0 log( ex/D0 ) is smaller than
√
the term (2 c1 c0 /π)R log ex/R in (4.48), and thusPgets absorbed by it when D > R.
If D ≤ R, then, again as in Lemma 4.2.1, the sum R<m≤D |Tm,◦ (α)| is empty, and
√ √
we bound (5c0 c2 /(3π 2 ))D0 log( ex/D0 ) by the term (2 c1 c0 /π)D log ex/D, which
would not appear otherwise.)
Now comes the time to focus on our second type I sum, namely,
X X X
Λ(v) µ(u) e(αvun)η(vun/x),
v≤V u≤U n
v odd u odd n odd
which corresponds to the term SI,2 in (3.9). The innermost two sums, on their own,
are a sum of type I we have already seen. Accordingly, for q small, we will be able to
bound them using Lemma 4.2.2. If q is large, then that approach does not quite work,
since then the approximation av/q to vα is not always good enough. (As we shall later
see, we need q ≤ Q/v for the approximation to be sufficiently close for our purposes.)
Fortunately, when q is large, we can also afford to lose a factor of log, since the
gains from q will be large. Here is the estimate we will use for q large.
Lemma 4.2.4. Let α ∈ R/Z √ with 2α = a/q + δ/x, (a, q) = 1, |δ/x| ≤ 1/qQ0 ,
q ≤ Q0 , Q0 ≥ max(2e, 2 x). Let η be continuous, piecewise C 2 and compactly
√
supported, with |η|1 = 1 and η 00 ∈ L1 . Let c0 ≥ |ηc00 |∞ . Let c2 = 6π/5 c0 . Assume
that x ≥ e2 c2 /2.
Let U, V ≥ 1 satisfy U V + (19/18)Q0 ≤ x/5.6. Then, if |δ| ≤ 1/2c2 , the absolute
value of
X X X
Λ(v) µ(u) e(αvun)η(vun/x) (4.49)
v≤V u≤U n
v odd u odd n odd
is at most
x c0
min 1, log V q
2q (πδ)2
2 (4.50)
3c4 U V 2 (U + 1)2 V
1 1 D log V
+ O∗ − 2 · c0 + + log q
4 π 2qx 2 x 2x
4.2. TYPE I ESTIMATES 71
plus
√
√
2 c0 c1 D c2 x log D + D
D log √ +q 3 log + log
π e q 2 q/2
0
c0 e 3 q 2
3c1 x + D 2|η |1 q
+ log D log + q max 1, log log (4.51)
2 q c2 x/q π 4π|η 0 |1 x 2
3c1 √ c2 x 25c0 √
+ √ x log + 2
(2c2 )3/2 x log x,
2 2c2 2 4π
Proof. We proceed essentially as in Lemma 4.2.1 and Lemma 4.2.2. Let Q, q 0 and Q0
be as in the proof of Lemma 4.2.2, that is, with 2α where Lemma 4.2.1 uses α.
Let M = min(U V, Q/2). We first consider the terms with uv ≤ M , u and v odd,
uv divisible by q. If q is even, there are no such terms. Assume q is odd. Then, by
(4.33) and (4.34), the absolute value of the contribution of these terms is at most
!!
X X xb
η (−δ/2) a |ηc00 |∞ 2
Λ(v)µ(a/v) +O · (π − 4) . (4.53)
2a x 2π 2
a≤M v|a
a odd a/U ≤v≤V
q|a
Now
X X Λ(v)µ(a/v)
a
a≤M v|a
a odd a/U ≤v≤V
q|a
X Λ(v) X µ(u) X Λ(pα ) X µ(u)
= + ,
v u α
pα u
v≤V u≤min(U,M/V ) p ≤V u≤min(U,M/V )
v odd u odd p odd u odd
(v,q)=1 q
q|u p|q (q,pα )
|u
72 CHAPTER 4. TYPE I SUMS
which equals
and so
X X Λ(v)µ(a/v) 1 X Λ(v)
= · O∗
log q +
a q v
a≤M v|a v≤V
a odd a/U ≤v≤V (v,2)=1
q|a
1
= · O∗ (log q + log V )
q
by (2.12). The absolute value of the sum of the terms with ηb(−δ/2) in (4.53) is thus at
most
x ηb(−δ/2) x c0
(log q + log V ) ≤ min 1, log V q,
q 2 2q (πδ)2
where we are bounding ηb(−δ/2) by (2.1) (with k = 2).
4.2. TYPE I ESTIMATES 73
|ηc00 |∞ 1 X X
(π 2 − 4) Λ(v)uv. (4.54)
2π 2 x
u≤U v≤V
uv odd
uv≤M, q|uv
u sq-free
For any R, u≤R,u odd,q|u ≤ R2 /4q + 3R/4. Using the estimates (2.12), (2.15)
P
and (2.16), we obtain that the double sum in (4.54) is at most
X X X X
Λ(v)v u+ (log p)pα u
v≤V u≤min(U,M/v) pα ≤V u≤U
(v,2q)=1 u odd p odd u odd
q
q|u p|q (q,pα )
|u
(M/v)2 (U + 1)2
X 3M X
≤ Λ(v)v · + + (log p)pα · (4.55)
4q 4v 4
v≤V pα ≤V
(v,2q)=1 p odd
p|q
where c4 = 1.03884.
From this point onwards, we use the easy bound
X
Λ(v)µ(a/v) ≤ log a.
v|a
a/U ≤v≤V
The inner sum is the same as the sum Tm,◦ (α) in (4.35); we will be using the bound
(4.36). Much as before, we will be able to ignore the condition that m is odd.
Let D = U V . What remains to do is similar to what we did in the proof of Lemma
4.2.1 (or Lemma 4.2.2).
Case (a). δ large: |δ| ≥ 1/2c2 . Instead of (4.16), we have
X 40 c0 q 3 X
(log m)|Tm,◦ (α)| ≤ (j + 1) log(j + 1)q,
3π 2 4x
1≤m≤M 0≤j≤ M
q
q-m
74 CHAPTER 4. TYPE I SUMS
√
and, since M ≤ min(c2 x/q, D), q ≤ 2c2 x (just as in the proof of Lemma 4.2.1) and
X
(j + 1) log(j + 1)q
0≤j≤ M
q
Z M
M M 1
≤ log M + + 1 log(M + 1) + 2 t log t dt
q q q 0
M2
M M
≤ 2 + 1 log x + 2 log √ ,
q 2q e
we conclude that
X 5c0 c2 M 20c0 √
|Tm,◦ (α)| ≤ M log √ + (2c2 )3/2 x log x.
3π 2 e 3π 2 (4.57)
1≤m≤M
q-m
Since
D 0 + 32 q 2
D0 + 3 q 1
Z
1 0 3 q
x log x dx = D + q log √ 2 − q 2 log √
q 2 2 e 2 e
0
0 3
1 02 3 0 D 3 q 9 D + q 1 q
= D + Dq log √ + + q 2 log √ 2 − q 2 log √
2 2 e 2 D0 8 e 2 e
0
1 D 3 9 2 3 19
= D02 log √ + D0 q log D0 + q 2 + + log D0 + q ,
2 e 2 8 9 2 18
where D0 = min(c2 x/q, D), and since the assumption (U V + (19/18)Q0 ) ≤ x/5.6
implies that (2/9 + 3/2 + log(D0 + (19/18)q)) ≤ x, we conclude that
X
|Tm,◦ (α)|(log m)
q 0
2 <m≤D
q-m
D0 √ √
5c0 c2 0 10c0 3 3/2 9 3/2 (4.60)
≤ D log √ + (2c 2 ) x log x + (2c 2 ) x log x
3π 2 e 3π 2 4 8
5c0 c2 0 D0 25c0 √
≤ D log √ + (2c2 )3/2 x log x.
3π 2 e 4π 2
76 CHAPTER 4. TYPE I SUMS
Let R = max(c2 x/q, q/2). We bound the terms R < m ≤ D as in (4.25), with a
factor of log(jq + R) inside the sum. The analogues of (4.26) and (4.27) are
b q1 (D−R)
X c x D log t
Z
x x
log(jq + R) ≤ log R + dt
jq + R R q R t
j=0 (4.61)
r r
2x c2 x x D
≤ log + log D log+ ,
c2 2 q R
X c
b q1 (D−R) r
q √
log(jq + R) 1 + ≤ 3 log R
jq + R
j=0 (4.62)
1 D R 1 D
+ D log − R log + log D log
q e e 2 R
(or 0 if D < R). We sum with (4.60) and the terms with m ≤ q/2, and obtain, for
D0 = c2 x/q = R,
√
√
2 c0 c1 D c2 x log D + D
D log √ +q 3 log + log
π e q 2 q/2
0
c0 e3 q 2
3c1 x + D 2|η |1 q
+ log D log + q max 1, log log
2 q c2 x/q π 4π|η 0 |1 x 2
3c1 √ c2 x 25c0 √
+ √ x log + 2
(2c2 )3/2 x log x,
2 2c2 2 4π
which, it is easy to check, is also valid even if D0 = D (in which case (4.61) and (4.62)
do not appear) or R = q/2 (in which case (4.60) does not appear).
Chapter 5
Type II sums
Here the main improvements over classical treatments of type II sums are as fol-
lows:
It should be clear that these techniques are of general applicability. (It is also clear that
(2) is not new, though, strangely enough, it seems not to have been applied to Gold-
bach’s problem. Perhaps this oversight is due to the fact that proofs of Vinogradov’s
result given in textbooks often follow Linnik’s dispersion method, rather than the large
sieve. Our treatment of the large sieve for primes will follow the lines set by Mont-
gomery and Montgomery-Vaughan [MV73, (1.6)]. The fact that the large sieve for
primes can be combined with the new technique (3) is, of course, a novelty.)
While (1) is particularly useful for the treatment of a term that generally arises in
applications of Vaughan’s identity, all of the points above address issues that can arise
in more general situations in number theory.
77
78 CHAPTER 5. TYPE II SUMS
For the smoothing function η(t) = η2 (t) = 4 max(log 2 − | log 2t|, 0), equation (5.2)
holds with η1 = 2 · 1[1/2,1] , where 1[1/2,1] is the characteristic function of the interval
[1/2, 1]. We will work with η = η2 , yet most of our work will be valid for any η of the
form η = η1 ∗ η1 .
By (5.2), the sum (5.1) equals
Z ∞
X X X mn/x dt
4 µ(d) Λ(n)e(αmn)η 1 (t)η 1
0
t t
m>U d>U n>V
(m,v)=1 d|m (n,v)=1
Z x/U X X X dW
=4 µ(d) Λ(n)e(αmn)
V
W
max( 2W ,U )<m≤ W d>U max(V, W
2 )<n≤W
x x
d|m
(m,v)=1 (n,v)=1
(5.3)
by the substitution t = (m/x)W . (We can assume V ≤ W ≤ x/U because otherwise
one of the sums in (5.4) is empty.) As we can see, the sums within the integral are now
unsmoothed. This will not be truly harmful, and to some extent it will be convenient,
in that ready-to-use large-sieve estimates in the literature have been optimized more
carefully for unsmoothed sums than for smooth sums. The fact that the sums start at
x/2W and W/2 rather than at 1 will also be slightly helpful.
(This is presumably why the weight η2 was introduced in [Tao14], which also uses
the large sieve. As we will later see, the weight η2 – or anything like it – will simply
not do on the major arcs, which are much more sensitive to the choice of weights. On
the minor arcs, however, η2 is convenient, and this is why we use it here. For type I
sums – as should be clear from our work so far, which was stated for general weights
– any function whose second derivative exists almost everywhere and lies in `1 would
do just as well. The option of having no smoothing whatsoever – as in Vinogradov’s
work, or as in most textbook accounts – would not be quite as good for type I sums,
and would lead to a routine but inconvenient splitting of sums into short intervals in
place of (5.3).)
We now do what is generally the first thing in type II treatments: we use Cauchy-
Schwarz. A minor note, however, that may help avoid confusion: the treatments fa-
miliar to some readers (e.g., the dispersion method, not followed here) start with the
special case of Cauchy-Schwarz that is most common in number theory
2
X X
an ≤N |an |2 ,
n≤N n≤N
79
s s
X X X
am bm ≤ |am |2 |bm |2 .
m m m
to the integrand in (5.3). At any rate, weP will haveP reduced the estimation of a sum
to the estimation of two simpler sums m |am |2 , m |bm |2 , but each of these two
simpler sums will be of a kind that we will lead to a loss of a factor of log x (or
(log x)3 ) if not estimated carefully. Since we cannot afford to lose a single factor of
log x, we will have to deploy and develop techniques to eliminate these factors of log x.
The procedure followed will be quite different for the two sums; a variety of techniques
will be needed.
We separate n prime and n non-prime in the integrand of (5.3), and, as we were
saying, we applypCauchy-Schwarz. We obtain p that the expression within the integral in
(5.3) is at most S1 (U, W ) · S2 (U, V, W ) + S1 (U, W ) · S3 (W ), where
2
X X
S1 (U, W ) =
,
µ(d)
max( 2W
x
,U )<m≤ W
x d>U
d|m
(m,v)=1
2 (5.4)
X X
S2 (U, V, W ) = (log p)e(αmp) .
max( 2W
x
,U )<m≤ W
x
2 )<p≤W
max(V, W
(m,v)=1 (p,v)=1
and
2
X X
S3 (W ) = Λ(n)
x x
2W <m≤ W n≤W
(m,v)=1 n non-prime (5.5)
X 2
= 1.42620W 1/2 ≤ 1.0171x + 2.0341W
x x
2W <m≤ W
(m,v)=1
(by [RS62, Thm. 13]). We will assume V ≤ w; thus the condition (p, v) = 1 will be
fulfilled automatically and can be removed.
The contribution of S3 (W ) will be negligible. We must bound S1 (U, W ) and
S2 (U, V, W ) from above.
80 CHAPTER 5. TYPE II SUMS
There will be a surprising amount of cancellation: the expression within the sum
will be bounded by a constant on average – a constant less than 1, and usually less than
1/2, in fact. In other words, the inner sum in (5.6) is exactly 0 most of the time.
Recall that we need explicit constants throughout, and that this essentially con-
strains us to elementary means. (We will at one point use Dirichlet series and ζ(s) for
s real and greater than 1.)
where we have set s = m/(r1 r2 l). We begin by simplifying the innermost triple sum.
This we do in the following Lemma; it is not a trivial task, and carrying it out efficiently
actually takes an idea.
5.1. THE SUM S1 : CANCELLATION 81
equals
6z v X X µ(r1 )µ(r2 ) 1 r1 r2
1 − max , ,
π 2 σ(v) r <y r <y σ(r1 )σ(r2 ) 2 y y
1 2
(r1 ,r2 )=1
(r1 r2 ,v)=1 (5.10)
2 2
√ Y
3 1 1
+ O∗ 5.08 ζ y z· 1+ √ 1 − 3/2 .
2 p p
p|v
We can change the order of summation of ri and di by defining si = ri /di , and we can
also use the obvious fact that the number of integers in an interval (a, b] divisible by d
is (b − a)/d + O∗ (1). Thus (5.12) equals
X X
µ(d1 )µ(d2 ) µ(d1 s1 )µ(d2 s2 )
d1 ,d2 <y s1 <y/d1
(d1 ,d2 )=1 s2 <y/d2
(d1 d2 ,v)=1 (d1 s1 ,d2 s2 )=1
(s1 s2 ,v)=1
X X µ(m) z 1 s1 d1 s2 d2
µ(d3 ) 2
1 − max , ,
d1 d2 d3 m s1 d1 s2 d2 2 y y
d3 |v z
q
m≤
d2 2
1 s1 d 2 s 2 d 3
(m,d1 s1 d2 s2 v)=1
(5.13)
82 CHAPTER 5. TYPE II SUMS
plus
X X X X
O∗
.
1 (5.14)
d1 ,d2 <y s1 <y/d1 d3 |v m≤ z
q
d2 2
1 s1 d2 s2 d3
(d1 d2 ,v)=1 s2 <y/d2
m sq-free
(s1 s2 ,v)=1
i.e., the main term in (5.10). It remains to estimate the terms used to complete the
sum; p exactly by (5.13) with the inequality m ≤
p their total is, by definition, given
z/(d21 sd22 s2 d3 ) changed to m > z/(d21 sd22 s2 d3 ). This is a total of size at most
1 X X X X 1 z
2
. (5.16)
2 d1 d2 d3 m s1 d1 s2 d2
d1 ,d2 <y s1 <y/d1 d3 |v m> z
q
d2 2
1 s1 d 2 s2 d 3
(d1 d2 ,v)=1 s2 <y/d2
m sq-free
(s1 s2 ,v)=1
where
X 1 X x2
f (x) := 1+ .
2 m>x m2
m≤x
m sq-free m sq-free
It is easy to see that f (x)/x has a local maximum exactly when x is a square-free
(positive) integer. We can hence check that
1 ζ(2)
f (x) ≤ 2+2 − 1.25 x = 1.26981 . . . x
2 ζ(4)
for all x ≥ 0 by checking all integers smaller than a constant, using {m : m sq-free} ⊂
{m : 4 - m} and 1.5 · (3/4) < 1.26981 to bound f from below for x larger than a
constant. Therefore, (5.17) is at most
X X Xr z
1.27
d21 s1 d22 s2 d3
d1 ,d2 <y s1 <y/d1 d3 |v
(d1 d2 ,v)=1 s2 <y/d2
(s1 s2 ,v)=1
2
√ Y
X
1 X 1
= 1.27 z 1+ √ · √ .
p d s
p|v d<y s<y/d
(d,v)=1 (s,v)=1
Alternatively, if v = 2, we bound
1 y/d 1
Z
X 1 X 1 p
√ = √ ≤1+ √ ds = y/d
s s 2 1 s
s<y/d s<y/d
(s,v)=1 s odd
and thus
p
X X 1 X y/d √ 1 3
√ ≤ ≤ y 1 − 3/2 ζ .
sd d 2 2
d<y s<y/d d<y
(d,v)=1 (s,v)=1 (d,2)=1
where v = 1 or v = 2. Then
1/x
if 33 ≤ x ≤ 106 ,
1
|g1 (x)| ≤ x (111.536 + 55.768 log x) if 106 ≤ x < 1010 ,
0.0044325 + 0.1079
√ if x ≥ 1010 ,
(log x)2 x
2.1/x
if 33 ≤ x ≤ 106 ,
|g2 (x)| ≤ x1 (1634.34 + 817.168 log x) if 106 ≤ x < 1010 ,
0.038128 + 0.2046
√ . if x ≥ 1010 .
(log x)2 x
5.1. THE SUM S1 : CANCELLATION 85
Tbe proof involves what may be called a version of Rankin’s trick, using Dirichlet
series and the behavior of ζ(s) near s = 1.
Proof. We prove the statements for x ≤ 106 by a direct computation, using interval
arithmetic. (In fact, in that range, one gets 2.0895071/x instead of 2.1/x.) Assume
from now on that x > 106 .
Clearly
X X X µ(r1 )µ(r2 )
g(x) = µ(d)
σ(r1 )σ(r2 )
r1 ≤x r2 ≤x d|(r1 ,r2 )
(r1 r2 ,v)=1
X X X µ(r1 )µ(r2 )
= µ(d)
σ(r1 )σ(r2 )
d≤x r1 ≤x r2 ≤x
(d,v)=1 d|(r1 ,r2 )
(r1 r2 ,v)=1
(5.20)
X µ(d) X X µ(u1 )µ(u2 )
=
(σ(d))2 σ(u1 )σ(u2 )
d≤x u1 ≤x/d u2 ≤x/d
(d,v)=1 (u1 ,dv)=1 (u2 ,dv)=1
2
X µ(d) X µ(r)
= .
(σ(d))2 σ(r)
d≤x r≤x/d
(d,v)=1 (r,dv)=1
Moreover,
X µ(r) X µ(r) X Y p
= −1
σ(r) r 0 0
p+1
r≤x/d r≤x/d d |r p|d
(r,dv)=1 (r,dv)=1
X Y −1 X µ(r)
=
p+1 r
d0 ≤x/d 0p|d r≤x/d
µ(d0 )2 =1 (r,dv)=1
(d0 ,dv)=1 d0 |r
X 1 X µ(r)
=
d0 σ(d0 ) r
d0 ≤x/d r≤x/dd0
µ(d0 )2 =1 (r,dd0 v)=1
(d0 ,dv)=1
and
X µ(r) X 1 X µ(r)
= .
r d00 r
r≤x/dd0 d00 ≤x/dd0 r≤x/dd0 d00
(r,dd0 v)=1 d00 |(dd0 v)∞
86 CHAPTER 5. TYPE II SUMS
Hence
2
X (µ(d))2 X 1 X 1
0 00
|g(x)| ≤
0 σ(d0 )
f (x/dd d ) , (5.21)
(σ(d))2
d0 ≤x/d d d00
d≤x
02 d00 ≤x/dd0
(d,v)=1 µ(d ) =1 d00 |(dd0 v)∞
(d0 ,dv)=1
P
where f (t) = r≤t µ(r)/r .
We intend to bound the function f (t) by a linear combination of terms of the form
t−δ , δ ∈ [0, 1/2). Thus it makes sense now to estimate Fv (s1 , s2 , x), defined to be the
quantity
X (µ(d))2 X µ(d01 )2 X 1
0 00 1−s1
· (dd d )
1 1
(σ(d))2 d01 σ(d01 ) d001
d01 d00 0
∞
d 1 |(dd1 v)
(d,v)=1 (d01 ,dv)=1
X µ(d02 )2 X 1
0 00 1−s2
· (dd d ) .
2 2
d02 σ(d02 ) d002
d02 d00 0
∞
2 |(dd2 v)
0
(d2 ,dv)=1
(d,v)=1
X µ(d0 )2 Y 1
· 0 s +1 0−1 0−s
(d ) 1 (1 + p ) (1 − p 1 )
d0 0 0
p |d
(d0 ,dv)=1
X µ(d0 )2 Y 1
· 0 s +1 0−1 0−s
,
(d ) 2 (1 + p ) (1 − p 2 )
d0 0 0
p |d
(d0 ,dv)=1
and so
xy (1 + x − y)(1 − xy)(1 − xy 2 ) (1 − x3 )
1+ = ≤ .
(1 + x)(1 − y) (1 + x)(1 − y)(1 − xy)(1 − xy 2 ) (1 − xy)(1 − xy 2 )
(5.23)
For any x ≤ y1 , y2 < 1 with y12 ≤ x, y22 ≤ x,
y1 y2 (1 − x3 )2 (1 − x4 )
1+ ≤ . (5.24)
(1 − y1 + x)(1 − y2 + x) (1 − y1 y2 )(1 − y1 y22 )(1 − y12 y2 )
This can be checked as follows: multiplying by the denominators and changing vari-
ables to x, s = y1 + y2 and r = y1 y2 , we obtain an inequality where the left side,
quadratic on s with positive leading coefficient, must be less than or equal to the right
side, which is linear on s. The left side minus the right side can be maximal for given
x, r only
√ when s is maximal or minimal. This happens when y1 = y2 or when either
yi = x or yi = x for at least one of i = 1, 2. In each of these cases, we have re-
duced (5.24) to an inequality in two variables that can be proven automatically1 by a
quantifier-elimination program; the author has used QEPCAD [HB11] to do this.
Hence Fv (s1 , s2 , x) is at most
Y (1 − p−3 )2 (1 − p−4 ) Y 1
·
(1 − p−s1 −s2 )(1 − p−2s1 −s2 )(1 − p−s1 −2s2 ) (1 − p−s1 )(1 − p−s2 )
p-v p|v
−3
Y 1−p Y 1 − p−3
·
(1 + p−s1 −1 )(1 + p−2s1 −1 ) (1 + p−s2 −1 )(1
+ p−2s2 −1 )
p-v p-v
(1 − 2−s1 −2s2 )(1 + 2−s1 −1 )(1 + 2−2s1 −1 )(1 + 2−s2 −1 )(1 + 2−2s2 −1 )
(1 − 2−s1 +s2 )−1 (1 − 2−2s1 −s2 )−1 (1 − 2−s1 )(1 − 2−s2 )(1 − 2−3 )4 (1 − 2−4 )
if v = 2.
For 1 ≤ t ≤ x, (2.21) and (2.24) imply
q
2 if x ≤ 1010
t
f (t) ≤ q log 1010 (5.26)
2 + 0.03 x loglogx−log 1010
if x > 1010 ,
t log x t
1 In √
practice, the case yi = x leads to a polynomial of high degree, and quantifier elimination increases
sharply in complexity as the degree increases; a stronger inequality of lower degree (with (1 − 3x3 ) instead
of (1 − x3 )2 (1 − x4 )) was given to QEPCAD to prove in this case.
88 CHAPTER 5. TYPE II SUMS
where we are using the fact that log x is convex-down. Note that, again by convexity,
log log x − log log 1010 1
10
< (log t)0 |t=log 1010 = = 0.0434294 . . .
log x − log 10 log 1010
p
Obviously, 2/t in (5.26) can be replaced by (2/t)1/2− for any ≥ 0.
By (5.21) and (5.26),
1−2
2
|gv (x)| ≤ Fv (1/2 + , 1/2 + , x)
x
for x ≤ 1010 . We set = 1/ log x and obtain from (5.25) that
ζ(1 + 2)ζ(3/2)4 ζ(2)2
Fv (1/2 + , 1/2 + , x) ≤ Cv, 21 +, 12 +
ζ(3)4 ζ(4)
(5.27)
log x
≤ 55.768 · Cv, 12 +, 21 + · 1 + ,
2
where we use the easy bound ζ(s) < 1 + 1/(s − 1) obtained by
X Z ∞
ns < 1 + ts dt.
1
for v = 2, we obtain
0.038128 25.607 1
|g2 (x)| ≤ 2
+√ + (1634.34 + 817.168 log x)
(log x) x log x x
0.038128 0.2046
≤ + √ .
(log x)2 x
Write f (t) = 1/S for S/2m < t ≤ S/(m+1), f (t) = 0 for t > S/m or t < S/2(m+
1), f (t) = 1/t − m/S for S/(m + 1) < t ≤ S/m P and f (t) = (m + 1)/S − 1/2t for
S/2(m + 1) < t ≤ S/2m; then (5.29) equals n:(n,v)=1 f (n). By Euler-Maclaurin
(second order),
Z ∞ Z ∞
X 1 00 ∗ 1 00
f (n) = f (x) − B2 ({x})f (x)dx = f (x) + O |f (x)| dx
n −∞ 2 −∞ 12
Z ∞
1 ∗ 0 3 0 s
= f (x)dx + · O f + f
−∞ 6 2m m+1
2 2 !
1 1 1 ∗ 2m m+1
= log 1 + + ·O + .
2 m 6 s s
(5.30)
90 CHAPTER 5. TYPE II SUMS
Similarly,
∞
d2 f (2x + 1)
Z
X 1
f (n) = f (2x + 1) − B2 ({x}) dx
−∞ 2 dx2
n odd
1 ∞
Z ∞
x−1
Z
1
= f (x)dx − 2 B2 f 00 (x)dx
2 −∞ −∞ 2 2
1 ∞ 1 ∞ ∗ 00
Z Z
= f (x)dx + O (|f (x)|) dx
2 −∞ 6 −∞
2 2 !
1 1 1 ∗ 2m m+1
= log 1 + + ·O + .
4 m 3 s s
( (
55.768 . . . if v = 1, 111.536 . . . if v = 1,
cv,2 = cv,3 =
817.168 . . . if v = 2, 1634.34 . . . if v = 2,
( (
0.0044325 . . . if v = 1, 0.1079 . . . if v = 1,
cv,4 = cv,5 =
0.038128 . . . if v = 2, 0.2046 . . . if v = 2.
5m2 + 2m + 1
X φ(v) 1 ∗
gv (m) · log 1 + + O cv,0
2v m S2
m≤C0
1 1 ∗ cv,1
X Z
+ O du
s 1/2 uS/s
S/106 ≤s<S/C0
which is
5m2 + 2m + 1
X φ(v) 1 X
∗
gv (m) · log 1 + + |g(m)| · O cv,0
2v m S2
m≤C0 m≤C0
√ !
∗ log 2 log 2 6
2− 2
+ O cv,1 + cv,3 + cv,2 (1 + log 10 ) + cv,5
C0 106 1010/2
X c v,4 /2
+ O∗
10
s(log S/2s)2
s<S/10
P 1
R 2/1010 1
for S ≥ (C0 + 1). Note that s<S/1010 s(log S/2s)2 = 0 t(log t)2 dt.
Now
10
(
cv,4 2/10
Z
1 cv,4 /2 0.00009923 . . . if v = 1
dt = =
2 0 t(log t)2 log(1010 /2) 0.000853636 . . . if v = 2.
and
√ (
log 2 6
2− 2 0.0006506 . . . if v = 1
cv,3 + cv,2 (1 + log 10 ) + cv,5 =
106 105 0.009525 . . . if v = 2.
For C0 = 10000,
(
φ(v) 1 X 1 0.362482 . . . if v = 1,
gv (m) · log 1 + =
v 2 m 0.360576 . . . if v = 2,
m≤C0
(
X 6204066.5 . . . if v = 1,
cv,0 |gv (m)|(5m2 + 2m + 1) ≤
m≤C
15911340.1 . . . if v = 2,
0
and (
0.00006931 . . . if v = 1,
cv,1 · (log 2)/C0 =
0.00017328 . . . if v = 2.
Thus, for S ≥ 100000,
(
X 1Z 1 0.36393 if v = 1,
gv (uS/s)du ≤ (5.31)
s 1/2 0.37273 if v = 2.
s≤S
(s,v)=1
For S < 100000, we proceed as above, but using the exact expression (5.29) instead
of (5.30). Note (5.29) is of the form fs,m,1 (S) + fs,m,2 (S)/S, where both fs,m,1 (S)
and fs,m,2 (S) depend only on bSc (and on s and m). Summing over m ≤ S, we obtain
a bound of the form
X 1Z 1
gv (uS/s)du ≤ Gv (S)
s 1/2
s≤S
(s,v)=1
92 CHAPTER 5. TYPE II SUMS
with
Gv (S) = Kv,1 (|S|) + Kv,2 (|S|)/S,
where Kv,1 (n) and Kv,2 (n) can be computed explicitly for each integer n. (For exam-
ple, Gv (S) = 1 − 1/S for 1 ≤ S < 2 and Gv (S) = 0 for S < 1.)
It is easy to check numerically that this implies that (5.31) holds not just for S ≥
100000 but also for 40 ≤ S < 100000 (if v = 1) or 16 ≤ S < 100000 (if v =
RT
2). Using the fact that Gv (S) is non-negative, we can compare 1 Gv (S)dS/S with
log(T +1/N ) for each T ∈ [2, 40]∩ N1 Z (N a large integer) to show, again numerically,
that (
Z T
dS 0.3698 log T if v = 1,
Gv (S) ≤ (5.32)
1 S 0.37273 log T if v = 2.
(We use N = 100000 for v = 1; already N = 10 gives us the answer above for
v = 2. Indeed, computations suggest the better bound 0.358 instead of 0.37273; we
are committed to using 0.37273 because of (5.31).)
Multiplying by 6v/π 2 σ(v), we conclude that
x3/2
x x
S1 (U, W ) = · H1 + O∗ 5.08ζ(3/2)3 3/2 (5.33)
W WU W U
if v = 1,
x3/2
x x
S1 (U, W ) = · H2 + O∗ 1.27ζ(3/2)3 3/2 (5.34)
W WU W U
if v = 2, where
( (
6 4
π 2 G1 (S) if 1 ≤ S < 40, π 2 G2 (S) if 1 ≤ S < 16,
H1 (S) = H2 (s) =
0.22125 if S ≥ 40, 0.15107 if S ≥ 16.
(5.35)
Hence (by (5.32))
(
Z T
dS 0.22482 log T if v = 1,
Hv (S) ≤ (5.36)
1 S 0.15107 log T if v = 2;
moreover,
3 2
H1 (S) ≤ , H2 (S) ≤ (5.37)
π2 π2
for all S.
***
Note. There is another way to obtain cancellation on µ, applicable when (x/W ) >
U q (as is unfortunately never the case in our main application). For this alternative
to be taken, one must either apply Cauchy-Schwarz on n rather than m (resulting in
exponential sums over m) or lump together all m near each other and in the same
5.2. THE SUM S2 : THE LARGE SIEVE, PRIMES AND TAILS 93
Proof. For any distinct i, j, the angles αi , αj are separated by at least ν (if ai = aj ) or
at least 1/q − |υi − υj | ≥ 1/q − υ ≥ ν (if ai 6= aj ). Hence we can apply the large sieve
(in the optimal N + δ −1 − 1 form due to Selberg [Sel91] and Montgomery-Vaughan
[MV74]) and obtain the bound in (5.39) with 1 instead of min(1, . . . ) immediately.
94 CHAPTER 5. TYPE II SUMS
We can also apply Montgomery’s inequality ([Mon68], [Hux72]; see the exposi-
tions in [Mon71, pp. 27–29] and [IK04, §7.4]). This gives us that the left side of (5.39)
is at most
−1
2
k
X (µ(r))2 X X X X
0
(log p)e((αi + a /r)p) (5.40)
φ(r)
r≤R r≤R a0 mod r i=1 W 0 <p≤W
(r,q)=1 (r,q)=1 (a0 ,r)=1
If we add all possible fractions of the form a0 /r, r ≤ R, (r, q) = 1, to the fractions
ai /q, we obtain fractions that are separated by at least 1/qR2 . If ν + υ ≥ 1/qR2 , then
the resulting angles αi + a0 /r are still
p separated by at least ν. Thus we can apply the
large sieve to (5.40); setting R = 1/ (ν + υ)q, we see that we gain a factor of
P (5.41)
since d≤R 1/d ≥ log(R) for all R ≥ 1 (integer or not).
Let us first give a bound on sums of the type of S2 (U, V, W ) using prime sup-
port but not the error terms (or Lemma 5.2.1). This is something that can be done
very well using tools available in the literature. (Not all of these tools seem to be
known as widely as they should be.) Bounds (5.42) and (5.44) are completely standard
large-sieve bounds. To obtain the gain of a factor of log in (5.43), we use a lemma
of Montgomery’s, for whose modern proof (containing an improvement by Huxley)
we refer to the standard source [IK04, Lemma 7.15]. The purpose of Montgomery’s
lemma is precisely to gain a factor of log in applications of the large sieve to sequences
supported on the primes. To use the lemma efficiently, we apply Montgomery and
Vaughan’s large sieve with weights [MV73, (1.6)], rather than more common forms of
the large sieve. (The idea – used in [MV73] to prove an improved version of the Brun-
Titchmarsh inequality – is that Farey fractions (rationals with bounded denominator)
are not equidistributed; this fact can be exploited if a large sieve with weights is used.)
2
X X
(log p)e(αmp)
A0 <m≤A1 W 0 <p≤W (5.42)
A1 − A0 X
≤ · (W − W 0 + 2q) (log p)2 .
min(q, dQ/2e)
W 0 <p≤W
5.2. THE SUM S2 : THE LARGE SIEVE, PRIMES AND TAILS 95
Proof. Let k = min(q, dQ/2e) ≥ dq/2e. We split (A0 , A1 ] into d(A1 −A0 )/ke blocks
of at most k consecutive integers m0 + 1, m0 + 2, . . . . For m, m0 in such a block, αm
and αm0 are separated by a distance of at least
2 (5.46)
a0
X X
≤ (log p)e α (m0 + a) + p .
r
a0 mod r W 0 <p≤W
(a0 ,r)=1
96 CHAPTER 5. TYPE II SUMS
is at most
−1 !−1
X W 3 1 1
+ −
2 2 qrR Q
r≤R
(r,q)=1
r sq-free (5.48)
2
q
a0
X X X
(log p)e α (m0 + a) + p
a=1 a0 mod r W 0 <p≤W
r
(a0 ,r)=1
We now apply the large sieve with weights [MV73, (1.6)], recalling that each angle
α(m0 + a) + a0 /rPis separated from the others by at least 1/qrR − 1/Q; we obtain that
(5.48) is at most W 0 <p≤W (log p)2 . It remains to estimate the sum in the first line of
(5.47). (We are following here a procedure analogous to that used in [MV73] to prove
the Brun-Titchmarsh theorem.)
Assume first that q ≤ W/13.5. Set
1/2
W
R= σ , (5.49)
q
where σ = 1/2e2·0.25068 = 0.30285 . . . . It is clear that qR2 < Q, q < W 0 and R ≥ 2.
Moreover, for r ≤ R,
1 1 σ 1 σ 1 σ/3.5
≤ ≤ = ≤ .
Q 3.5W 3.5 σW 3.5 qR2 qrR
Hence
−1
W 3 1 1 W 3 qrR W 3r W
+ − ≤ + = + σ
· 2σ
2 2 qrR Q 2 2 1 − σ/3.5 2 2 1 − 3.5 R 2
W 3σ rW W rW
= 1+ < 1+
2 1 − σ/3.5 R 2 R
5.2. THE SUM S2 : THE LARGE SIEVE, PRIMES AND TAILS 97
and so
−1 !−1
µ(r)2
X W 3 1 1
+ −
2 2 qrR Q φ(r)
r≤R
(r,q)=1
For R ≥ 2,
X µ(r)2
(1 + rR−1 )−1 > log R + 0.25068;
φ(r)
r≤R
this is true for R ≥ 100 by [MV73, Lemma 8] and easily verifiable numerically for
2 ≤ R < 100. (It suffices to verify this for R integer with r < R instead of r ≤ R, as
that is the worst case.)
Now
1 W 1 W
log R = log + log 2σ = log − 0.25068.
2 2q 2 2q
Hence
X µ(r)2 1 W
(1 + rR−1 )−1 > log
φ(r) 2 2q
r≤R
We need a version of Lemma 5.2.2 with m restricted to the odd numbers, since we
plan to set the parameter v equal to 2.
98 CHAPTER 5. TYPE II SUMS
If A1 − A0 ≤ 2%q and q ≤ ρQ, %, ρ ∈ [0, 1], the following bound also holds:
2
X X
(log p)e(αmp)
A0 <m≤A1 W 0 <p≤W (5.52)
X
0 2
≤ (W − W + q/(1 − %ρ)) (log p) .
W 0 <p≤W
Proof. We follow the proof of Lemma 5.2.2, noting the differences. Let
just as before. We split (A0 , A1 ] into d(A1 − A0 )/ke blocks of at most 2k consecutive
integers; any such block contains at most k odd numbers. For odd m, m0 in such a
block, αm and αm0 are separated by a distance of
m − m0
|{α(m − m0 )}| = 2α = |{(a/q)k}| − O∗ (k/qQ) ≥ 1/2q.
2
We obtain (5.50) and (5.52) just as we obtained (5.42) and (5.44) before. To obtain
(5.51), proceed again as before, noting that the angles we are working with can be
labelled as α(m0 + 2a), 0 ≤ a < q.
The idea now (for large δ) is that, if δ is not negligible, then, as m increases and
αm loops around the circle R/Z, αm roughly repeats itself every q steps – but with a
slight displacement. This displacement gives rise to a configuration to which Lemma
5.2.1 is applicable. The effect is that we can apply the large sieve once instead of many
times, thus leading to a gain of a large factor (essentially, the number of times the large
sieve would have been used). This is how we obtain the factor of |δ| in the denominator
of the main term x/|δ|q in (5.56) and (5.57).
5.2. THE SUM S2 : THE LARGE SIEVE, PRIMES AND TAILS 99
x/W − U 0
q W X
S2 (U 0 , W 0 , W ) ≤ +1 (log p)2 .
2q φ(q) log(W/2q) 0
W <p≤W
Now
x/W − U 0
+ 1 · (W − W 0 + 2q)
q · min(2, ρ−1 )
x W − W0 x
≤ − U0 −1
+ max(1, 2ρ) − U 0 + W/2 + 2q
W q min(2, ρ ) W
x/4 x
≤ + max(1, 2ρ) + W/2 + 2q.
q min(2, ρ−1 ) 2W
This implies (5.53).
If W > x/4q, apply (5.44) with % = x/4W q, ρ = 1. This yields (5.55).
Assume now that δ 6= 0 and x/4W + q ≤ x/|δq|. Let Q0 = x/|δq|. For any m1 ,
m2 with x/2W < m1 , m2 ≤ x/W , we have |m1 − m2 | ≤ x/2W ≤ 2(Q0 − q), and
so
m1 − m2 1
· δ/x + qδ/x ≤ Q0 |δ|/x = . (5.58)
2 q
The conditions of Lemma 5.2.1 are thus fulfilled with υ = (x/4W ) · |δ|/x and ν =
|δq|/x. We obtain that S2 (U 0 , W 0 , W ) is at most
2q 1 0 −1
X
min 1, W − W + ν (log p)2 .
φ(q) log ((q(ν + υ))−1 ) 0
W <p≤W
0 −1 0
Here W − W + ν = W − W + x/|qδ| ≤ W/2 + x/|qδ| and
−1
|δ| x −1
(q(ν + υ))−1 = q q+ .
x 4W
Lastly, assume δ 6= 0 and q ≤ ρQ. We let Q0 = x/|δq| ≥ Q again, and we split
the range U 0 < m ≤ x/W into intervals of length 2(Q0 − q), so that (5.58) still holds
within each interval. We apply Lemma 5.2.1 with υ = (Q0 − q) · |δ|/x and ν = |δq|/x.
We obtain that S2 (U 0 , W 0 , W ) is at most
x/W − U X
1+ 0
W − W 0 + ν −1 (log p)2 .
2(Q − q) 0
W <p≤W
Minor-arc totals
It is now time to make all of our estimates fully explicit, choose our parameters, put
our type I and type II estimates together and give our final minor-arc estimates.
Let x > 0 be given. Starting in section 6.3.1, we will assume that x ≥ x0 =
2.16 · 1020 . We will choose our main parameters U and V gradually, as the need arises;
we assume from the start that 2 · 106 ≤ V < x/4 and U V ≤ x.
We are also given an angle α ∈ R/Z. We choose an approximation 2α = a/q +
δ/x, (a, q) = 1, q ≤ Q, |δ/x| ≤ 1/qQ. √The parameter Q will be chosen later; we
assume from the start that Q ≥ max(16, 2 x) and Q ≥ max(2U, x/U ).
(Actually, U and V will be chosen in different ways depending on the size of q.
Actually, even Q will depend on the size of q; this may seem circular, but what actually
happens is the following: we will first set a value for Q depending only on x, and if
the corresponding value of q ≤ Q is larger than a certain parameter y depending on x,
then we reset U , V and Q, and obtain a new value of q.)
Let SI,1 , SI,2 , SII , S0 be as in (3.9), with the smoothing function η = η2 as in
(3.4). (We bounded the type I sums SI,1 , SI,2 for a general smoothing function η; it is
only here that we are specifying η.)
The term S0 is 0 because V < x/4 and η2 is supported on [−1/4, 1]. We set v = 2.
as per [Tao14, (5.9)–(5.13)]. Similarly, for η2,ρ (t) = log(ρt)η2 (t), where ρ ≥ 4,
101
102 CHAPTER 6. MINOR-ARC TOTALS
In the first inequality, we are using the fact that log(ρt) is always positive (and less than
log(ρ)) when t is in the support of η2 .
Write log+ x for max(log x, 0).
c0 /δ 2
x X µ(m) x x [ X µ(m)
min 1, log + |log ·η(−δ)|
q (2π)2 m mq q m
m≤ U
q m≤ U
q
(m,q)=1 (m,q)=1
√
ex √
2 c0 c1 q q q + 2U 3c1 x q U
+ U log + 3q log + log log + log log+ c2 x
π U c2 2 c2 q 2q c2 q
r 2 1/2
3c1 2x 2x c0 2c0 U e x 1
+ log + − 2 log +
2 c2 c2 2 π 4qx U e
0
3 2
3/2 √ √
2|η |1 c0 e q 20c0 c2 2 ex
+ q max 1, log log x + 2x log ,
π 4π|η 0 |1 x 3π 2 c2
(6.3)
where c0 = 31.521 (by Lemma B.2.3), c1 = 1.0000028 > 1 + (8 log 2)/V ≥ 1 +
√
(8 log 2)/(x/U ) and c2 = 6π/5 c0 = 0.67147 . . . . By (2.1) (with k = 2), (B.17) and
Lemma B.2.4,
24 log 2
|log ·η(−δ)| ≤ min 2 − log 4, 2 2
[ .
π δ
By (2.20), (2.22) and (2.23), the first line of (6.3) is at most
! !
c00
x 4 q/φ(q) x q
min 1, 2 min , 1 log + 1.00303
q δ 5 log+ qU2 U φ(q)
!
c00
x 4 q/φ(q)
+ min 2 − log 4, 02 min ,1 ,
q δ 5 log+ qU2
where c00 = 0.798437 > c0 /(2π)2 , c000 = 1.685532. Clearly c000 /c0 > 1 > 2 − log 4.
Taking derivatives, we see that t 7→ (t/2) log(t/c2 ) log+ 2U/t takes its maxi-
mum (for t ∈ [1, 2U ]) when log(t/c2 ) log+ 2U/t = log t/c2 − log+ 2U/t; since
t → log t/c2 − log+ 2U/t is increasing on [1, 2U ], we conclude that
q q 2U 2U
log log+ ≤ U log .
2 c2 q c2
6.2. CONTRIBUTIONS OF DIFFERENT TYPES 103
Similarly, t 7→ t log(x/t) log+ (U/t) takes its maximum at a point t ∈ [0, U for which
log(x/t) log+ (U/t) = log(x/t) + log+ (U/t), and so
x q U U
log log+ c2 x ≤ (log x + log U ).
q c2 q c2
We conclude that
! !
c00
x 4q/φ(q) x q
|SI,1 | ≤ min 1, 2 min ,1 log + c3,I + c4,I
q δ 5 log+ qU2 U φ(q)
2 2
e1/2 x
q c11,I q U
+ c7,I log + c8,I log x max 1, log q + c10,I log
c2 x 4qx U
√
√
2U 2 ex c10,I
+ c5,I log + c6,I log xU U + c9,I x log + ,
c2 c2 e
(6.4)
where c2 and c00 are as above, c3,I = 2.11104 > c000 /c00 , c4,I = 1.00303, √ 5,I =
c
√
3.57422 > 2 c0 c1 /π, c6,I = 2.23389 > 3c1 /2c2 , c7,I = 6.19072 > 2 3c0 c1 /π,
c8,I = 3.53017 > 2(8 log 2)/π,
√ √ 3/2
3 2c1 20 2c0 c2
c9,I = 19.1568 > √ + ,
2 c2 3π 2
c10,I = 9.37301 > c0 (1/2 − 2/π 2 ) and c11,I = 9.0857 > c0 e3 /(4π · 8 log 2).
We can thus estimate SI,2 by applying Lemma 4.2.2 to each inner double sum in (6.5).
√
We obtain that, if |δ| ≤ 1/2c2 , where c2 = 6π/5 c0 and c0 = 31.521, then |SI,2 | is
at most
x/v 2
X c0 X µ(m) c10,I q U
Λ(v) min 1, + +1 (6.6)
2qv (πδ)2 m 4x/v qv
v≤V m≤Mv /q
(m,2q)=1
104 CHAPTER 6. MINOR-ARC TOTALS
plus
√ √ !
X 2 c0 c+ 3c+ x + U c0 c+ + U
Λ(v) U+ log c2 x + qv log
π 2 vqv vqv π qv /2
v≤V
√
c11,I qv2
X 2 3c0 c+ 3c+ 55c0 c2
+ Λ(v) c8,I max log , 1 qv + + + qv ,
x/v π 2c2 6π 2
v≤V
(6.7)
where qv = q/(q, v), Mv ∈ [min(Q/2v, U ), U ] and c+ = 1 + (8 log 2)/(x/U V ); if
|δ| ≥ 1/2c2 , then |SI,2 | is at most (6.6) plus
√
X c0 c1 3c 1 (1 + ) + 2U x 35c 0 c2
Λ(v) U+ 2 + log x/v + qv
π/2 2 Q 3π 2
v≤V |δ|qv
√ log+ x/v 2U
√
X c0 c1 x/v b |δ|qv c +1
+ Λ(v) (1 + ) min + 1, 2U 3 + 2 +
π/2 |δ|qv 2
v≤V
(6.8)
P
Write SV = v≤V Λ(v)/(vqv ). By (2.12),
log V 1X X 1 X 1 (6.9)
≤ + (log p) vp (q) + −
q q pα pα
p|q α≥1 α≥1
pα+vp (q) ≤V pα ≤V
log V 1X log V q
≤ + (log p)vp (q) = .
q q q
p|q
This helps us to estimate (6.6). We could also use this to estimate the second term in
the first line of (6.7), but, for that purpose, it will actually be wiser to use the simpler
bound
X x U X U/c2 1.0004
Λ(v) log+ c2 x ≤ Λ(v) ≤ UV (6.10)
vqv vqv e ec2
v≤V v≤V
(by (2.14) and the fact that t log+ A/t takes its maximum at t = A/e).
We bound the sum over m in (6.6) by (2.20) and (2.22):
!
X µ(m) 4 q/φ(q)
≤ min .
m 5 log+ M
2q 2 , 1
v
m≤Mv /q
(m,2q)=1
6.2. CONTRIBUTIONS OF DIFFERENT TYPES 105
X X X X log V vp (q)
Λ(v)(v, q) ≤ (log p) pvp (q) ≤ (log p) p
log p
v≤V p|q 1≤α≤logp V p|q
(v,q)6=1
X
≤ (log V ) pvp (q) ≤ q log V
p|q
and X X X
Λ(v)(v, q)2 ≤ (log p) pvp (q)+α
v≤V p|q 1≤α≤logp V
(v,q)6=1
X
≤ (log p) · 2pvp (q) · plogp V ≤ 2qV log q.
p|q
The expressions in (6.8) get estimated similarly. The first line of (6.8) is at most
√
2 c0 c+ 3c+ 1+ 2U V |δ|q xV 35c0 c2
1.0004 UV + 2+ log+ + qV
π 2 x Q 3π 2
106 CHAPTER 6. MINOR-ARC TOTALS
by (2.14). Since q ≤ Q/V , we can obviously bound qV by Q. As for the second line
of (6.8) –
X x/v 1 2U
Λ(v) min + 1, 2U · log+ j k
|δ|qv 2 x/v
+1
v≤V |δ|qv
X U X U 1.0004
≤ Λ(v) max t log+ ≤ Λ(v) = U V,
t>0 t e e
v≤V v≤V
but
X x/v X
Λ(v) min + 1, 2U ≤ Λ(v) · 2U
|δ|qv
v≤V v≤ 2Ux|δ|q
X x/|δ| X X x/|δ| 1 1
+ Λ(v) + Λ(v) + Λ(v) −
x vq v qv q
2U |δ|q
<v≤V v≤V v≤V
(v,q)=1 (v,q)6=1
x x x 3
≤ 1.03883 + max log V − log + log √ , 0
|δ|q |δ|q 2U |δ|q 2
x 1X
+ 1.0004V + (log p)vp (q)
|δ| q
p|q
x + 6U V |δ|q
≤ 1.03883 + log q + log √ + 1.0004V
|δ|q 2x
!
x c0 4/5
min 1, min , 1 log V q
2φ(q) (πδ)2 log+ 4VQq2
2 (6.12)
U 2V
x UV q 2 c10,I U V
+ c8,I2 1+ + q log V + log V
q x U 2 x x
plus
c11,I q 2 V
U
(c4,I2 +c9,I2 )U V +(c10,I2 log +c5,I2 max log , 2 +c12,I2 )·Q, (6.13)
q x
6.2. CONTRIBUTIONS OF DIFFERENT TYPES 107
where
√
c4,I2 = 3.57565(1 + 0 ) > 1.0004 · 2 c0 c+ /π,
c5,I2 = 3.53312 > 1.0004 · c8,I ,
c10,I
c8,I2 = 1.17257 > · 0.5004,
4
c9,I2 = 0.82214(1 + 20 ) > 3c+ · 1.0004/2ec2 ,
√ √
c10,I2 = 1.78783 1 + 20 > 1.0004 c0 c+ /π,
c12,I2 = 29.3333 + 11.9020
√
3 2 3c0 √ 55c0 c2
> 1.0004 c+ + c+ + + 1.78783(1 + 0 ) log 2
2c2 π 6π 2
= c6,I2 + c10,I2 log 2
and c10,I = 9.37301 as before. Here 0 = (4 log 2)/(x/U V ), and c6,I2 is as in (6.11).
If |δ| ≥ 1/2c2 , then |SI,2 | is at most (6.12) plus
+ 6U V |δ|q x
(c4,I2 + (1 + )c13,I2 )U V + c c14,I2 log q + log √ + c15,I2
2x |δ|q
1+ 2U V |δ|q x
+ c16,I2 2 + log+ + c17,I2 Q + c · c4,I2 V,
x Q/V
(6.14)
where √
2 c0 c+ 1.0004
c13,I2 = 1.31541(1 + 0 ) > · ,
√π e
√ 2 c0 c+
c14,I2 = 3.57422 1 + 20 > ,
√π
√ 2 c0 c+
c15,I2 = 3.71301 1 + 20 > · 1.03883,
π
c16,I2 = 1.5006(1 + 20 ) > 1.0004 · 3c+ /2
35c0 c2
c17,I2 = 25.0295 > 1.0004 · ,
3π 2
√ √
and c = (1 + ) 3 + 2. We recall that c2 = 6π/5 c0 = 0.67147 . . . . We will
choose ∈ (0, 1) later; we also leave the task of bounding 0 for later.
The case q > Q/V . We use Lemma 4.2.4 in this case.
where S1 , S2 and S3 are as in (5.4) and (5.5). We bounded S1 in (5.33) and (5.34), S2
in Prop. 5.2.4 and S3 in (5.5).
108 CHAPTER 6. MINOR-ARC TOTALS
Let us try to give some structure to the bookkeeping we must now inevitably do.
The second integral in (6.15) will be negligible (because S3 is); let us focus on the first
integral.
Thanks to our work in §5.1, the term S1 (U, W ) is bounded by a (small) constant
times x/W . (This represents a gain of several factors of log with respect to the trivial
bound.) We bounded S2 (U, V, W ) using the large sieve; we expected, and got, a bound
√
that is better than trivial by a factor of size roughly q log x – the exact factor in
the bound depends on the value of W . In particular, it is only in the central part of the
√
range for W that we will really be able to save a factor of q log x, as opposed to
√
just q. We will have to be slightly clever in order to get a good total bound in the
end.
***
We first recall our estimate for S1 . In the whole range [V, x/U ] for W , we know
from (5.33), (5.34) and (5.37) that S1 (U, W ) is at most
r
2 x 3 x x/W U
2
+ κ0 ζ(3/2) , (6.16)
π W W U
where
κ0 = 1.27.
(We recall we are working with v = 2.)
We have better estimates for the constant in front in some parts of the range; in
what is usually the main part, (5.34) and (5.36) give us a constant of 0.15107 instead
of 2/π 2 . Note that 1.27ζ(3/2)3 = 22.6417 . . . . We should choose U , V so that the
first term in (6.16) dominates. For the while being, assume only
x
U ≥ 5 · 105 ; (6.17)
VU
then (6.16) gives
x
S1 (U, W ) ≤ κ1 , (6.18)
W
where
2 22.6418
κ1 = +p ≤ 0.2347.
π2 106 /2
This will suffice for our cruder estimates.
The second integral in (6.15) is now easy to bound. By (5.5),
where √
κ9 = 8 · 1.0172 · κ1 ≤ 3.9086.
Let us now examine S2 , which was bounded in Prop. 5.2.4. We set the parameters
W 0 , U 0 as follows, in accordance with (5.4):
by (2.19).
Bounding S2 for δ arbitrary. We set
W0 = min(max(2θq, V ), x/U ),
where ρ = q/Q.
If W0 > V , the contribution of the terms with V ≤ W < W0 to (6.15) is (by 6.18)
bounded by
Z W0 s
W 2 log W
x ρ0 W dW
4 κ1 + 1 x log W + + qW log W
V W 4 4q 4 W
Z W0 √ Z W0 √
κ2 √ log W κ2 √ log W
≤ ρ0 x 3/2
dW + x 1/2
dW
2 V W 2 V W
s Z W0 √
ρ0 x2 log W (6.20)
+ κ2 + qx dW
16q V W
√ x p p
≤ κ2 ρ0 √ + κ2 xW0 log W0
V
s
2κ2 ρ0 x2
+ + qx (log W0 )3/2 − (log V )3/2 ,
3 16q
We now examine the terms with W ≥ W0 . If 2θq > x/U , then W0 = x/U , the
contribution of the case is nil, and the computations below can be ignored. Thus, we
can assume that 2θq ≤ x/U .
We use (5.54):
x 1 q W 1
S2 (U 0 , W 0 , W ) ≤ + · W log W.
4φ(q) log(W/2q) φ(q) log(W/2q) 2
√ √ √
By a + b ≤ a + b, we can take out the q/φ(q) · W/ log(W/2q) term and estimate
its contribution on its own; it is at most
Z x/U s
x q 1 log W dW
4 κ1 · · W2
W0 W φ(q) 2 log W/2q W
r Z x/U s
κ2 q x log W
=√ dW (6.21)
2 φ(q) W0 W log W/2q
r Z x/U s !
κ2 qx 1 log 2q
≤√ √ 1+ dW
2 φ(q) W0 W log W/2q
Now
s
Z x/U Z x/2U q
1 log 2q p 1
√ dW = 2q log 2q √ dt.
W0 W log W/2q max(θ,V /2q) t log t
−1
if and only if T > T0 , where T0 = e(1−2/2.3) = 2135.94 . . . , it is enough to check
(numerically) that (6.22) holds for T = T0 .) Since θ ≥ e, this gives us that
s !
Z x/U
1 log 2q
√ 1+ dW
W0 W log W/2q
r s
x p x/2U q
≤2 + 2.3 2q log 2q · ,
U log x/2U q
6.2. CONTRIBUTIONS OF DIFFERENT TYPES 111
We are left with what will usually be the main term, viz.,
Z x/U s
x log W dW
4 S1 (U, W ) · W , (6.23)
W0 8φ(q) log W/2q W
p
which, by (5.34), is at most x/ φ(q) times the integral of
v r !
u
1 u
t 2H2
x κ
4 x/W U log W
+
W WU 2 U log W/2q
by (5.36), where
κ6 = 0.60428.
Thus the main term is simply
β κ6 x x
+ p log . (6.26)
2 4β φ(q) U W0
β −1
√ · 10−3 · κ4 ≤ β −1 κ7 /2,
2
where √
2κ4
κ7 = ≤ 0.1281.
1000
Thus the contribution of the second summand is at most
β −1 κ7 x
·p .
2 φ(q)
If 2θq > x/U , the integral is over an empty range and its contribution is hence 0.
If 2θq ≤ V , (6.27) is
x/U
β log 2q x/2U q 1 dt
Z Z
β log 2q dW
=
2 V log W/2q W 2 V /2q log t t
β log 2q x
= · (log log − log log V /2q) (6.28)
2 2U q
β log 2q log x/U V
= · log 1 + .
2 log V /2q
(Let us stop for a moment and ask ourselves when this will be smaller than what
we can see as the main term, namely, the term (β/2) log x/U W0 in (6.25). Clearly,
log(1 + (log x/U V )/(log V /2q)) ≤ (log x/U V )/(log V /2q), and that is smaller than
(log x/U V )/ log 2q when V /2q > 2q. Of course, it does not actually matter if (6.28)
is smaller than the term from (6.25) or not, since we are looking for upper bounds here,
not for asymptotics.)
The total bound for (6.23) is thus
x 1 x Φ −1 1 x κ7
p · β· log + +β κ6 log + , (6.29)
φ(q) 2 U W0 2 4 U W0 2
6.2. CONTRIBUTIONS OF DIFFERENT TYPES 113
where
log 2q log log x − log log θ if V /2θ < q < x/(2θU ).
2U q
Φ=
log x/U V
(6.30)
log 2q log 1 +
log V /2q if q ≤ V /2θ.
where Φ is as in (6.30).
Bounding S2 for |δ| ≥ 8. Let us see how much a non-zero δ can help us. It makes
sense to apply (5.56) only when |δ| ≥ 8; otherwise (5.54) is almost certainly better.
Now, by definition, |δ|/x ≤ 1/qQ, and so |δ| ≥ 8 can happen only when q ≤ x/8Q.
With this in mind, let us apply (5.56), assuming |δ| > 8. Note first that
This is at least 2 min(2Q, W )/|δq|. Thus we are allowed to apply (5.56) when |δq| ≤
2 min(2Q, W ). Since Q ≥ x/U , we know that min(2Q, W ) = W for all W ≤ x/U ,
and so it is enough to assume that |δq| ≤ 2W . We will soon be making a stronger
assumption.
Recalling also (6.19), we see that (5.56) gives us
0 0
2q/φ(q) x W 1
S2 (U , W , W ) ≤ min
1, + · W (log W ).
4W 1
|δq| 2 2
log |δ|q · x/U
1+ 2Q
(6.32)
2
Similarly to before, we define W0 = max(V, θ|δq|), where θ ≥ 3e /8 will be set
later. (Here θ ≥ 3e2 /8 is an assumption we do not yet need, but we will be using it
soon to simplify matters slightly.) For W ≥ W0 , we certainly have |δq| ≤ 2W . Hence
the part of the first term of (6.15) coming from the range W0 ≤ W < x/U is
Z x/U p dW
4 S1 (U, W ) · S2 (U, V, W )
W0 W
Z x/U v
Wx W2
r
q u log W dW
≤4 uS1 (U, W ) · + .
u
φ(q) W0 t |δq| 2 W
log 4W|δ|q ·
1
x/U
1+ 2Q
(6.33)
114 CHAPTER 6. MINOR-ARC TOTALS
Note that 1 + (x/U )/2Q ≤ 3/2. Proceeding as in (6.23)–(6.31), we obtain that this is
at most s
2x x x
p log +Φ κ6 log + 2κ7 ,
|δ|φ(q) U W0 U W0
where
(1+1 )|δq| log x/U V
log
4 log 1 + log 4V /|δ|(1+1)q if |δq| ≤ V /θ,
Φ= 3|δq|
(6.34)
8x 8θ
log
8 log log 3U |δq| − log log 3 if V /θ < |δq| ≤ x/θU ,
If W0 > V , we also have to consider the range V ≤ W < W0 . By Prop. 5.2.4 and
(6.19), the part of (6.15) coming from this is
Z θ|δq| s
W2
Wx Wx x dW
4 S1 (U, W ) · (log W ) + + + .
V 2|δq| 4 16(1 − ρ)Q 8(1 − ρ) W
where we use the facts that W0 = θ|δq| (by W0 > V ) and θ ≥ 3e2 /8, and where we
√
recall that κ2 = 4 κ1 .
The terms W x/2|δ|q and W x/(16(1 − ρ)Q) contribute at most
s
θ|δq|
√
Z
x x x dW
4 κ1 · (log W )W +
V W 2|δq| 16(1 − ρ)Q W
!Z
θ|δq| p
1 1 dW
= κ2 x p + p log W
2|δ|q 4 (1 − ρ)Q V W
!
2κ2 1 1
= x p + p (log θ|δ|q)3/2 − (log V )3/2 .
3 2|δ|q 4 (1 − ρ)Q
s √ √
√ θ|δq|
2κ1 x ∞ log W
Z Z
log W dW
2κ1 x ≤√ dW
V W (1 − ρ) W 1−ρ V W 3/2
κ2 x p p
≤p ( log V + 1/ log V ),
2(1 − ρ)V
***
It is time to collect all type II terms. Let us start with the case of general δ. We will
set θ ≥ e later. If q ≤ V /2θ, then |SII | is at most
s
x x log x/U V x
p · log + log 2q log 1 + κ6 log + 2κ7
2φ(q) UV log V /2q UV
s !
√ r
q log 2q x x
+ 2κ2 1 + 1.15 √ + κ9 √ .
φ(q) log x/2U q U V
(6.37)
116 CHAPTER 6. MINOR-ARC TOTALS
Now let us examine the alternative bounds for |δ| ≥ 8. Here we assume θ ≥ 3e2 /8.
If |δq| ≤ V /θ, then |SII | is at most
v !
u
2x u
tlog x |δq|(1 + 1 ) log x/U V
p + log log 1 + 4V
|δ|φ(q) UV 4 log |δ|(1+1 )q
r
x (6.40)
· κ6 log + 2κ7
UV
s s
2q log V x x
+ κ2 · · √ + κ9 √ ,
φ(q) log 2V /|δq| U V
where ρ = q/Q. Note that |δ| ≤ x/Qq implies ρ ≤ x/Q2 , and so ρ will be very small
and Q − q will be very close to Q.
The case |δq| > x/θU will not arise in practice, essentially because of |δ|q ≤ x/Q.
C0 U V, (6.42)
where C0 equals
(
c4,I2 + c9,I2 = 4.39779 + 5.219930 if |δ| ≤ 1/2c2 ∼ 0.74463,
c4,I2 + (1 + )c13,I2 = (4.89106 + 1.31541)(1 + 0 ) if |δ| > 1/2c2
(6.43)
(from (6.13) and (6.14), type I; we will specify and 0 = (4 log 2)/(x/U V ) later)
and
v !
log UxV
u
x u x
p tlog + (log δ0 (1 + 1 )q) log 1 + V
δ0 φ(q) UV log δ0 (1+ 1 )q (6.44)
r
x
· κ6 log + 2κ7
UV
(from (6.37) and (6.40), type II; here δ0 = max(2, |δ|/4), while 1 = x/2U Q for
|δ| > 8 and 1 = 0 for |δ| < 8.
√
We set U V = κx/ qδ0 ; we must choose κ > 0.
p |δ| ≤ 4,√so that
Let us first optimize (or, rather, almost optimize) κ in the case
δ0 = 2 and 1 = 0. For the purpose of choosing κ, we replace φ(q) by q/C1 ,
where C1 = 2.3536 ∼ 510510/φ(510510), and also replace V by q 2 /c, c a constant.
We use the approximation
x
! √ √
log UV log( 2q/κ) 3 log 2 c/κ
log 1 + V
= log 1 + = log +
log |2q|
log(q/2c) 2 log q/2c
√
3 2 log 2 c/κ
∼ log + .
2 3 log q/2c
118 CHAPTER 6. MINOR-ARC TOTALS
provided that |δ| ≤ 4. (What we obtain for |δ| > 4 is essentially the same, only with
δ0 q = δq/4 instead of 2q and 0.27797/((1 + 0.27)(1 + 0 )) in place of 0.30915.) For
q = 5 · 105 , c = 2.5 and |δ| ≤ 4 (typical values in the most delicate range), we get that
κ should be about 0.5582/(1 + 1.190 ). Values of q, c nearby give similar values for
κ, whether |δ| ≤ 4 or for |δ| > 4.
(Incidentally, at this point, we could already give a back-of-the-envelope estimate
for the last line of (6.45), i.e., our main term. It suggests that choosing w = 1 instead
of w = 2 would have given bounds worse by about 15 percent.)
We make the choices
x
κ = 1/2, and so UV = √
2 qδ0
for the sake of simplicity. (Unsurprisingly, (6.45) changes very
√ slowly around its min-
imum.) Note, by the way, that this means that 0 = (2 log 2)/ qδ0 .
Now we must decide how to choose U , V and Q, given our choice of U V . We will
actually make two sets of choices.
6.3. ADJUSTING PARAMETERS. CALCULATIONS. 119
First, we will use the SI,2 estimates for q ≤ Q/V to treat all α of the form α =
a/q + O∗ (1/qQ), q ≤ y. (Here y is a parameter satisfying y ≤ Q/V .)
Then, the remaining α will get treated with the (coarser) SI,2 estimate for q >
Q/V , with Q reset to a lower value (call it Q0 ). If α was not treated in the first go (so
that it must be dealt with the coarser estimate) then α = a0 /q 0 + δ 0 /x, where either
q 0 > y or δ 0 q 0 > x/Q. (Otherwise, α = a0 /q 0 + O∗ (1/q 0 Q) would be a valid estimate
with q 0 ≤ y.) The value of Q0 is set to be smaller than Q both because this is helpful
(it diminishes error terms that would be large for large q) and because this is harmless
(since we are no longer assuming that q ≤ Q/V ).
as an alternative to (6.48) for |δ| ≥ 8. (In several of these expressions, we are apply-
ing some minor simplifications that our later choices will justify. Of course, even if
these simplifications were not justified, we would not be getting incorrect results, only
potentially suboptimal ones; we are trying to decide how choose certain parameters.)
In addition, we have a relatively mild but important dependence on V in the main
term (6.44), even when we hold U V constant (as we do, in so far as we have already
chosen U V ). We must also respect the condition q ≤ Q/V , the lower bound on
U given by (6.17), and the assumptions made √ at the beginning of the chapter (e.g.
Q ≥ x/U , V ≥ 2 · 106 ). Recall that U V = x/2 qδ0 .
We set
x
Q= ,
8y
since we will then have not just q ≤ y but also q|δ| ≤ x/Q = 8y, and so qδ0 ≤ 2y.
We want q ≤ Q/V to be true whenever q ≤ y; this means that
√
Q QU QU U qδ0
q≤ = = √ =
V UV x/2 qδ0 4y
120 CHAPTER 6. MINOR-ARC TOTALS
√
must be true when q ≤ y, and so it is enough to set U = 4y 2 / qδ0 . The following
choices make sense: we will work with the parameters
x1/3 x 3 p p
y= , Q= = x2/3 , x/U V = 2 qδ0 ≤ 2 2y,
6 8y 4
(6.49)
4y 2 x2/3 x x 9x1/3
U=√ = √ , V = = 2 = ,
qδ0 9 qδ0 (x/U V ) · U 8y 2
where,
√ as before,√δ0 = max(2, |δ|/4). So, for instance, we obtain 1 ≤ x/2U Q =
6 qδ0 /x1/3 ≤ 2 3/x1/6 . Assuming
V x/8y 2 x x
= ≥ = = 4 > 1,
2θq 2θq 16θy 3 54y 3
(6.52)
V x/8y 2 x x
≥ ≥ = = 1.
θ|δq| 8θy 64θy 3 216y 3
The first type I bound is
4 q
c0
x 5 φ(q) 1p
c q
4,I
|SI,1 | ≤ min 1, 02
min , 1 log 9x qδ + c +
3
0 3,I
2
q δ x3 /9 φ(q)
log+
5 1
q2 δ02
c10,I x1/3
y 1/3
p
+ c7,I log + c8,I log x y + 1 (log 9x eqδ0 )
c2 34 22 q 3/2 δ02
√
2x2/3 x5/3
2/3
√
x 2 ex c10,I
+ c5,I log √ + c6,I log √ √ + c9,I x log + ,
9c2 qδ0 9 qδ0 9 qδ0 c2 e
(6.53)
where the constants are as in §6.2.1. For any c, R ≥ 1, the function
attains its maximum on [R0 , ∞], R0 > R, at x = R0 . Hence, for qδ0 fixed,
4/5 1p
min + 4x2/3 , 1 log 9x 3 qδ0 + c3,I (6.54)
log 5
9(δ0 q) 2
6.3. ADJUSTING PARAMETERS. CALCULATIONS. 121
Now, notice that, for smaller values of x, (6.54) increases as x increases, since the term
min(. . . , 1) equals the constant 1. Hence, (6.54) attains its maximum for x ∈ (0, ∞)
at (6.55), and so
4/5 1p
min + 4x2/3 , 1 log 9x 3 qδ0 + c3,I + c4,I
log 5
9(δ0 q) 2
27 2/5 7
≤ log e (δ0 q)7/4 + c3,I + c4,I ≤ log δ0 q + 6.11676.
2 4
Examining the other terms in (6.53) and using (6.50), we conclude that
c00
x q 7
|SI,1 | ≤ min 1, 2 · log δ0 q + 6.11676
q δ φ(q) 4
(6.56)
x2/3
+ √ (0.67845 log x − 1.20818) + 0.37864x2/3 ,
qδ0
where we are using (6.50) (and, of course, the trivial bound δ0 q ≥ 2) to simplify the
smaller error terms. We recall that c00 = 0.798437 > c0 /(2π)2 .
Let us now consider SI,2 . The terms that appear both for |δ| small and |δ| large are
given in (6.12). The second line in (6.12) equals
2U V 2 qV 2 x2/3 9x1/3
x c10,I q
c8,I2 2
+ + + √ + log
4q δ0 x x 2 2 qδ0 18qδ0 2
1/3
1/6 2/3
x 9x 27 c10,I y x 1 9
≤ c8,I2 + √ + + + log x + log
4q 2 δ0 2 2 8 2 23/2 18qδ0 3 2
2/3 √
x x
≤ 0.29315 2 + (0.08679 log x + 0.39161) + 0.00153 x,
q δ0 qδ0
can be bounded trivially by log(9x1/3 q/2) ≤ (2/3) log x+log 3/4. We can also bound
(6.57) as we bounded (6.54) before, namely, by fixing q and finding the maximum for
x variable. In this way, we obtain that (6.57) is maximal for y = 4e4/5 q 2 ; since, by
definition, x1/3 /6 = y, (6.57) then equals
9(6 · 4e4/5 q 2 )q 4
log = 3 log q + log 108 + ≤ 3 log q + 5.48214.
2 5
122 CHAPTER 6. MINOR-ARC TOTALS
4c0
3 x
min 1, 20 · log q + 2.74107
δ 2 φ(q)
(6.58)
x
+ 0.29315 2 + (0.0434 log x + 0.1959)x2/3 .
q δ0
x x2/3 3
(c4,I2 + c9,I2 ) √ + (c10,I2 log 3/2 √ + 2c5,I2 + c12,I2 ) · x2/3
2 qδ0 9q δ0 4
2.1989x 3.61818x
≤ √ + + (1.77019 log x + 29.2955)x2/3 ,
qδ0 qδ0
√
where we recall that 0 = (4 log 2)/(x/U
p V )√= (2 log 2)/ qδ0 , which can be bounded
√
crudely by 2 log √ 2. (Thus, c10,I2 ≤ 1 + 8 log 2·1.78783 < 3.54037 and c12,I2 ≤
29.3333 + 11.902 2 log 2 ≤ 41.0004.)
If |δ| > 1/2c2 , we must consider (6.14) instead. For = 0.07, that is at most
x 2 log 2
(c4,I2 + (1 + )c13,I2 ) √ 1+ √
2 qδ0 qδ0
2 log 2 x
+ (3.38845 1 + √ log δq 3 + 20.8823)
qδ0 |δ|q
4 log 2
+ 68.8133 1 + √ log |δ|q + 72.0828 x2/3 + 60.4141x1/3
qδ0
x 2 log 2 x
= 2.49157 √ 1+ √ + (3.38845 log δq 3 + 32.6771)
qδ0 qδ0 |δ|q
log |δ|q 2
+ 22.9378 log x + 190.791 √ + 130.691 x 3
qδ0
x x
≤ 2.49157 √ + (3.59676 log δ0 + 27.3032 log q + 91.2218)
qδ0 qδ0
2
+ (22.9378 log x + 411.228)x 3 ,
√
where, besides the crude bound 0 ≤ 2 log 2, we use the inequalities
√
log |δ|q log 4qδ0 log 8 log q 1 log q 1 log e2 2
√ ≤ √ ≤ √ , √ ≤√ √ ≤√ = ,
qδ0 qδ0 2 qδ0 2 q 2 e e
1 4c2 log |δ| 2 log δ0
≤ , ≤ · .
|δ| δ0 |δ| e log 2 δ0
(Obviously, 1/|δ| ≤ 4c2 /δ0 is based on the assumption |δ| > 1/2c2 and on the inequal-
ity 16c2 ≥ 1. The bound on (log |δ|)/|δ| is based on the fact that (log t)/t reaches its
maximum at t = e, and (log δ0 )/δ0 = (log 2)/2 for |δ| ≤ 8.)
6.3. ADJUSTING PARAMETERS. CALCULATIONS. 123
We sum (6.58) and whichever one of our bounds for (6.13) and (6.14) is greater
(namely, the latter). We obtain that, for any δ,
4c00
x 3 x
|SI,2 | ≤ 2.49157 √ + min 1, 2 · log q + 2.74107
qδ0 δ 2 φ(q)
x
+ (3.59676 log δ0 + 27.3032 log q + 91.515) + (22.9812 log x + 411.424)x2/3 ,
qδ0
(6.59)
where we bound one of the lower-order terms in (6.58) by x/q 2 δ0 ≤ x/qδ0 .
For type II, we have to consider two cases: (a) |δ| < 8, and (b) |δ| ≥ 8. Consider
first |δ| < 8. Then δ0 = 2. Recall that θ = 27/8. We have q ≤ V /2θ and |δq| ≤ V /θ
thanks to (6.52). We apply (6.37), and obtain that, for |δ| < 8,
v !
1
u
x 2 log 4qδ0
u1
|SII | ≤ p · t log 4qδ0 + log 2q log 1 + V
2φ(q) 2 log 2q
p
· 0.30214 log 4qδ0 + 0.2562
r v
q u log 2q
u
1/4 2/3
+ 8.22088 1 + 1.15t 1/3
√ (qδ0 ) x + 1.84251x5/6
φ(q) log 9x 2√q δ0
r
x log 8q p
≤p · Cx,2q log 2q + · 0.30214 log 2q + 0.67506
2φ(q) 2
r
q 3/4
+ 16.406 x + 1.84251x5/6
φ(q)
(6.60)
where we bound
x1/3 x1/3
log 2q log 3√ log 3√
√ ≤ < lim = 2,
9x1/3
√ δ0
1/6 2
9x√ x→∞ 1/6 2
9x√
log 2 q log log
2 1/6 2 1/6
for 0 < t < 9x1/3 /2. (We have 2.004 here instead of 2 because we want a constant
≥ 2(1 + 1 ) in later occurences of Cx,t , for reasons that will soon become clear.)
For purposes of later comparison, we remark that 16.404 ≤ 1.57863x4/5−3/4 for
x ≥ 2.16 · 1020 .
Consider now case (b), namely, |δ| ≥ 8. Then δ0 = |δ|/4. By (6.52), |δq| ≤ V /θ.
124 CHAPTER 6. MINOR-ARC TOTALS
v !
u
2x u1 |δq|(1 + 1 ) log |δ|q
|SII | ≤ p · t log |δq| + log log 1 + 18x1/3
|δ|φ(q) 2 4 2 log |δ|(1+ 1 )q
p
· 0.30214 log |δ|q + 0.2562
v
1/3
log 9x2
r u
q u
+ 8.22088 t · (qδ0 )1/4 x2/3 + 1.84251x5/6
φ(q) log 9x1/3
|δq|
r
x log 4δ0 q p
≤p Cx,δ0 q log δ0 (1 + 1 )q + 0.30214 log δ0 q + 0.67506
δ0 φ(q) 2
r
q 4/5
+ 1.79926 x + 1.84251x5/6 ,
φ(q)
(6.61)
since
v s
9x1/3 1/3
log 9x2
u
u log
8.22088t 2
· (qδ0 )1/4 ≤ 8.22088 · (x1/3 /3)1/4
log 9x1/3
|δq|
log 27
4
≤ 1.79926x4/5−2/3
1/3 3
p q/φ(q) ≤ z(y) = z(x /6) (since x ≥ 18 ). It is easy to
By Lemma C.2.2,
4/5−5/6
check that x → z(x /6)x1/3 is decreasing
p for x ≥ 2.16 · 1020 (in fact, for
3
18 ). Using (6.50),
p we conclude that 1.67718 q/φ(q)x4/5 ≤ 0.89657x5/6 and, by
the way, 16.406 q/φ(q)x3/4 ≤ 0.78663x5/6 . This allows us to simplify the last lines
of (6.60) and (6.61). We obtain that, for δ arbitrary,
r
x log 4δ0 q p
|SII | ≤ p Cx,δ0 q (log δ0 q + 1 ) + 0.30214 log δ0 q + 0.67506
δ0 φ(q) 2
+ 2.73908x5/6 .
(6.62)
It is time to sum up SI,1 , SI,2 and SII . The main terms come from the first line
of (6.62) and the first term of (6.59). Lesser-order terms can be dealt with roughly:
we bound min(1, c00 /δ 2 ) and min(1, 4c00 /δ 2 ) from above by 2/δ0 (using the fact that
c00 = 0.798437 < 16, which implies that 8/δ > 4c00 /δ 2 for δ > 8; of course, for δ ≤ 8,
we have min(1, 4c00 /δ 2 ) ≤ 1 = 2/2 = 2/δ0 ).
6.3. ADJUSTING PARAMETERS. CALCULATIONS. 125
where, for instance, we bound (3/2) log q + 2.74107 by (3/2) log δ0 q + 2.74107 −
(3/2) log 2.
As for the other terms – we use the assumption x ≥ 2.16 √· 1020 to bound√x
2/3
2/3 5/6 2/3 2/3
and x log x by a small constant times x . We bound x / qδ0 by x / 2 (in
(6.56)). We obtain
x2/3 2
√ (0.67845 log x − 1.20818) + 0.37864x 3
2
2 5
+ (22.9812 log x + 411.424)x 3 + 2.73908x 6 ≤ 3.35531x5/6 .
The sums S0,∞ and S0,w in (3.11) are 0 (by (6.50) and the fact that η2 (t) = 0 for
t ≤ 1/4). We conclude that, for q ≤ y = x1/3 /6, x ≥ 2.16 · 1020 and η = η2 as in
(3.4),
Since Cx,t is an increasing function as a function of t (for x fixed and t ≤ 9x1/3 /2.004)
and δ0 q ≤ 2y, we see that Cx,t ≤ Cx,2y . It is clear that x 7→ Cx,t (fixed t) is a
decreasing function of x. For x = 2.16 · 1020 , Cx,2y = 1.39942 . . . .
We want U/(x/U V ) ≥ 5 · 105 (this is (6.17)). We also want U V small. With this
in mind, we let
x1/3 √ x x2/3
V = , U = 500 6x1/3 , Q= = √ . (6.65)
3 U 500 6
6
Then (6.17) holds (as an equality). Since we are assuming√ (6.50), we have√ V ≥ 2 · 10 .
It is easy to check that (6.50) also implies that U ≤ x/2 and Q ≥ 2 x, and so the
inequalities in (6.51) all hold.
Write 2α = a/q + δ/x for the new approximation; we must have either q > y or
|δ| > 8y/q, since otherwise a/q would already be a valid approximation under the first
choice of parameters. Thus, either (a) q > y, or both (b1) |δ| > 8 and (b2) |δ|q > 8y.
Since now V = 2y, we have q > V /2θ in case (a) and |δq| > V /θ in case (b) for any
θ ≥ 1. We set θ = 4.
(Thanks to this choice of θ, we have |δq| ≤ x/Q ≤ x/θU , as we commented at the
end of §6.2.3; this will help us avoid some case-work later.)
By (6.4),
c0 √
x q
|SI,1 | ≤ min 1, 02 log x2/3 − log 500 6 + c3,I + c4,I
q δ φ(q)
2 2 1/2 2/3
Q Q U e x c
+ c7,I log + c8,I log x log c11,I Q + c10,I log √ + 10,I
c2 x 4x 500 6 e
√ 1/3 ! √
1000 6x √ √ √ 2 ex
+ c5,I log + c6,I log 500 6x4/3 · 500 6x1/3 + c9,I x log
c2 c2
0
x c 2 q 2.89 2/3
≤ min 1, 02 log x − 4.99944 + 1.00303 + x (log x)2 ,
q δ 3 φ(q) 1000
Q Q2
c7,I log + c8,I log x log c11,I
c2 x
2 2 1
=c8,I (log x) − c8,I (log 1500000 − log c11,I ) − c7,I log x + c7,I log √
3 500 6c2
≤c8,I (log x)2 − 38 log x.
We are also using the assumption (6.50) repeatedly in √ order to show that the sum of
all lower-order terms is less than (38c8,I log x)/(500 6). Note that c8,I (log x)2 Q ≤
0.00289x2/3 (log x)2 . √
We have q/φ(q) ≤ z(Q) (where z is as in (C.19)) and, since Q > 6 · 12 · 109
for x ≥ 2.16 · 1020 ,
2.50637
1.00303z(Q) ≤ 1.00303 eγ log log Q + √
log log 6 · 12 · 109
≤ 0.2359 log Q + 0.79 < 0.1573 log x.
6.3. ADJUSTING PARAMETERS. CALCULATIONS. 127
(It is possible to give a much better estimation, but it is not worthwhile, since this will
be a very minor term.) We have either q > y or q|δ| > 8y; if q|δ| > 8y but q ≤ y, then
|δ| ≥ 8, and so c00 /δ 2 q < 1/8|δ|q < 1/64y < 1/y. Hence
x 2
|SI,1 | ≤ + 0.1573 log x + 0.00289x2/3 (log x)2
y 3
≤ 2.4719x2/3 log x + 0.00289x2/3 (log x)2 .
We bound |SI,2 | using Lemma 4.2.4. First we bound (4.50): this is at most
4c00 x1/3 q
x
min 1, 2 log
2q δ 3
√ √
1
x3 1 2
(U V )2 log 1/3 2
1 1 3 3c4 500 6 (500 6x + 1) x log x
3 3
+ c0 − 2 + + ,
4 π 2x 2 9 6x
where c4 = 1.03884. We bound the second line of this using (6.50). As for the first
line, we have either q ≥ y (and so the first line is at most (x/2y)(log x1/3 y/3)) or
q < y and 4c00 /δ 2 q < 1/16y < 1/y (and so the same bound applies). Hence (4.50) is
at most
2
3x2/3 log x − log 18 + 0.02017x2/3 log x = 2.02017x2/3 log x − 3(log 18)x2/3 .
3
√
Now we bound (4.51), which comes up when |δ| ≤ 1/2c2 , where c2 = 6π/5 c0 ,
c0 = 31.521 (and so c2 = 0.6714769 . . . ). Since 1/2c2 < 8, it follows that q > y (the
alternative q ≤ y, q|δ| > 8y is impossible, since it implies |δ| > 8). Then (4.51) is at
most
√
√
2 c0 c1 UV c2 x log U V UV
U V log √ + Q 3 log + log
π e Q 2 Q/2
3c1 x UV 16 log 2 c0 e3 Q2 Q
+ log U V log + Q log log (6.66)
2 y c2 x/y π 4π · 8 log 2 · x 2
3c1 √ c2 x 25c0 √
+ √ x log + 2
(3c2 )1/2 x log x,
2 2c2 2 4π
If |δ| > 1/2c2 , then we know that |δq| > min(y/2c2 , 8y) = y/2c2 . Thus (4.52)
(with = 0.01) is at most
√
2 c0 c1 UV
U V log √
π e
√ √ x
+1
!
e2 U V
2.02 c0 c1 x y/2c2 1
+ +1 ( 3.02 − 1) log √ + log U V log x
π y/2c2 2 2 y/2c2
√
3c1 1 3.03 20c0
+ + log x + (2c2 )3/2 x log x.
2 2 0.16 3π 2
Again by (6.50), and in much the same way as before, this simplifies to
Now we must estimate SII . As we said before, either (a) q > y, or both (b1)
|δ| > 8 and (b2) |δ|q > 8y. Recall that θ = 4. In case (a), we have q > x1/3 /6 =
V /2 > V /2θ; thus, we can use (6.38), and obtain that, if q ≤ x/8U , |SII | is at most
p s
x z(q) x log x/(2U q) x
√ log + log 2q log κ6 log + 2κ7
2q U · 8q log 4 U · 8q
s !
√
r
x log x/4U x p x
+ 2κ2 z 1 + 1.15 √ + (κ2 log x/U + κ9 ) √
8U log 4 U V
κ2 x
+ (log 8y)3/2 − (log 2y)3/2 √
6 y
p 2 x
+ κ2 8 log x/U + ((log x/U )3/2 − (log V )3/2 ) √ ,
3 8U
(6.67)
where z is as in (C.19). (We are already simplifying the third line; the bound given
is justified by a derivative test.) It is easy to check that q → (log 2q)(log log q)/q is
decreasing for q ≥ y (indeed for q ≥ 9), and so the first line of (6.67) is maximal for
q = y.
6.3. ADJUSTING PARAMETERS. CALCULATIONS. 129
√
Also note that, since (x3/2 )0 = (3/2) x,
32 32 ! r
√
t 8 t 2 3 t 8
+ log − + log ≤ + log · log 4 ≤ 1.20083 t.
3 6 3 6 2 3 6
Of course,
t
− log 2c
t t 3 t t t t
− log 8c + − log 3 log < + log < log t.
3 3 log 4 3 3 3 3
On the remaining interval log(2.16 · 1020 ) ≤ t ≤ log 2000, we use interval arith-
metic (as in §2.6, with 30 iterations) to bound the ratio of (6.68) to t3/2 . We obtain that
it is at most
0.275964t3/2 .
x √ x2/3 x x
|δq| ≤ = U = 500 6x1/3 ≤ √ = = ,
Q 2000 6 4U θU
again under assumption (6.50). We apply (6.41), and obtain that |SII | is at most
p s
2x z(y) x log x/3U y x
√ log + log 3y log κ6 log + 2κ7
8y U · 4 · 8y log 32/3 U · 4 · 8y
2κ2 x 3 3 x/4 3 3
+ √ ((log 32y) 2 − (log 2y) 2 ) + √ ((log 4U ) 2 − (log 2y) 2 )
3 16y Q−y
!
κ2 p p x
+ p log V + 1/ log V + κ9 √
2(1 − y/Q) V
p p x
+ κ2 z(y) · log 4U · √ ,
U
(6.72)
where we are using the facts that (log 3t/8)/t is increasing for t ≥ 8y > 8e/3 and that
d (log t)3/2 − (log V )3/2 3(log t)1/2 − ((log t)3/2 − (log V )3/2 )
√ =
dt t 2t3/2
t
√
log e3 · log t − (log V )3/2
=− <0
2t3/2
for t ≥ θ · 8y = 16V , thanks to
2
e3
16V 3
log 3 log 16V > (log V ) + log 16 − 2 log (log V )2
e 16
2 !
e3
16
+ log 3 − 2 log log 16 log V > (log V )3
e 16
132 CHAPTER 6. MINOR-ARC TOTALS
(valid for log V ≥ 1). Much as before, we can rewrite (6.72) as x5/6 times
s
t
p
2 z(et/3 /6) t − log 3c
t
p − log 32c + − log 2 log 3
8/6 3 3 log 32/3
s r 3 32 !
t 2κ2 3 t 32 2 t
· κ6 − log 32c + 2κ7 + + log − − log 3
3 3 8 3 6 3
3/2 3/2 !
2κ2 1/4 t t
+ q + log 24c − − log 3
3 et/3 1 3 3
6c − 6
√ !
κ2 3 p 1 √
+q t/3 − log 3 + p + κ9 3
c t/3 − log 3
2 1 − et/3
r
t/3 + log 24c
q
+ κ2 z(et/3 /6) ,
6c
√ (6.73)
where t = log x and c = 500/ 6. For t ≥ 100, we use (6.70) to bound z(et/3 /6),
and we obtain that (6.73) is at most
√ r r 1/2
2 eγ 1 κ6
2κ2 3 1 t 32
p · · (log t)t + · + log · log 16
8/6 3 3 3 8 2 3 6
1/2
2κ2 1/4 1 t
+ q · + log 24c · log 72c
3 e100/3 1 2 3
6c − 6
√ ! r
κ2 3 p 1 √ p t/3 + log 24c
+q t/3 + p + κ9 3 + κ2 e log tγ ,
2 1− c
t/3 6c
e100/3
(6.74)
where we have bounded expressions of the form a3/2 −b3/2 (a > b) by (a1/2 /2)·(a−b).
The ratio of (6.74) to t3/2 is clearly a decreasing function of t. For t = 200, this ratio
is 0.23747 . . . ; hence, (6.74) (and thus (6.73)) is at most 0.23748t3/2 for t ≥ 200.
On the range log(2.16 · 1020 ) ≤ t ≤ 200, the bisection method (with 25 iterations)
gives that the ratio of (6.73) to t3/2 is at most 0.23511.
We conclude that, when |δ| > 8 and |δ|q > 8y,
|SII | ≤ 0.23511x5/6 (log x)3/2 .
Thus (6.71) gives the worst case.
We now take totals, and obtain
Sη (x, α) ≤ |SI,1 | + |SI,2 | + |SII |
≤ (2.4719 + 1230.9)x2/3 log x + (0.00289 + 0.0006406)x2/3 (log x)2
+ 0.275964x5/6 (log x)3/2
≤ 0.27598x5/6 (log x)3/2 + 1233.38x2/3 log x,
(6.75)
6.4. CONCLUSION 133
6.4 Conclusion
Proof of Theorem 3.1.1. We have shown that |Sη (α, x)| is at most (6.63) for q ≤
x1/3 /6 and at most (6.75) for q > x1/3 /6. It remains to simplify (6.63) slightly.
By the geometric mean/arithmetic mean inequality,
r
log 4δ0 q p
Cx,δ0 q (log δ0 q + 0.002) + 0.30214 log δ0 q + 0.67506 (6.76)
2
is at most
√
1 log 4δ0 q ρ
√ Cx,δ0 q (log δ0 q + 0.002) + + (0.30214 log δ0 q + 0.67506)
2 ρ 2 2
for any ρ > 0. We recall that
!
log 4t
Cx,t = log 1 + 9x1/3
.
2 log 2.004t
Let
Cx1 ,2q0 (log 2q0 + 0.002) + log28q0
ρ= = 3.397962 . . . ,
0.30214 log 2q0 + 0.67506
where x1 = 1025 , q0 = 2 · 105 . (In other words, we are optimizing matters for x = x1 ,
δ0 q = 2q0 ; the losses in nearby ranges will be very slight.) We obtain that (6.76) is at
most
√
ρ · 0.30214
Cx,δ0 q 1
√ (log δ0 q + 0.002) + √ + log δ0 q
2 ρ 4 ρ 2
√
1 log 2 ρ (6.77)
+ √ + · 0.67506
2 ρ 2
≤ 0.27125Cx,t (log δ0 q + 0.002) + 0.4141 log δ0 q + 0.49911.
where !
log 4t
Rx,t = 0.27125 log 1 + 9x1/3
+ 0.41415.
2 log 2.004t
Part II
Major arcs
135
Chapter 7
137
138 CHAPTER 7. MAJOR ARCS: OVERVIEW AND RESULTS
A key feature of the present work is that it allows one to mimic a wide variety
of smoothing functions by means of estimates on the Mellin transform of a single
2
smoothing function – here, the Gaussian e−t /2 .
7.1 Results
2
Write η♥ (t) = e−t /2 . Let us first give a bound for exponential sums on the primes
using η♥ as the smooth weight. Without loss of generality, we may assume that our
character χ mod q is primitive, i.e., that it is not really a character to a smaller modulus
q 0 |q.
Theorem 7.1.1. Let x be a real number ≥ 108 . Let χ be a primitive Dirichlet character
mod q, 1 ≤ q ≤ r, where r = 300000.
Then, for any δ ∈ R with |δ| ≤ 4r/q,
∞
X δ (n/x)2
Λ(n)χ(n)e n e− 2 = Iq=1 · ηc
♥ (−δ) · x + E · x,
n=1
x
The advantage of η(t) = t2 η♥ (t) over η♥ is that it vanishes at the origin (to second
order); as we shall see, this makes it is easier to estimate exponential sums with the
smoothing η ∗M g, where ∗M is a Mellin convolution and g is nearly arbitrary. Here is
a good example that is used, crucially, in Part III.
7.1. RESULTS 139
2
Corollary 7.1.3. Let η(t) = t2 e−t /2 ∗M η2 (t), where η2 = η1 ∗M η1 and η1 =
2 · I[1/2,1] . Let x be a real number ≥ 108 . Let χ be a primitive character mod q,
1 ≤ q ≤ r, where r = 300000.
Then, for any δ ∈ R with |δ| ≤ 4r/q,
∞
X δ
Λ(n)χ(n)e n η(n/x) = Iq=1 · ηb(−δ) · x + E · x,
n=1
x
Let us now look at a different kind of modification of the Gaussian smoothing. Say
we would like a weight of a specific shape; for example, what we will need to do in
Part III, we would like an approximation to the function
( 2
t3 (2 − t)3 e−(t−1) /2 for t ∈ [0, 2],
η◦ : t 7→ (7.3)
0 otherwise.
At the same time, what we have is an estimate for the Mellin transform of the Gaussian
2
e−t /2 , centered at t = 0.
The route taken here is to work with an approximation η+ to η◦ . We let
2
η+ (t) = hH (t) · te−t /2
, (7.4)
sin(H log y)
FH (t) = ,
π log y
Z ∞ (7.6)
−1 dy
hH (t) = (h ∗M FH )(y) = h(ty )FH (y)
0 y
1.617 · 10−10
−14 1 499900
|E| ≤ 1.3482 · 10 + +√ √ + 52 .
q x q
If q = 1, we have the sharper bound
251400
|E| ≤ 4.772 · 10−11 + √ .
x
This is a paradigmatic example, in that, following the proof given in §9.4, we can
2
bound exponential sums with weights of the form hH (t)e−t /2 , where hH is a band-
limited approximation to just about any continuous function of our choosing.
Lastly, we will need an explicit estimate of the `2 norm corresponding to the sum
in Thm. 7.1.4, for the trivial character.
2
Proposition 7.1.5. Let η(t) = η+ (t) = hH (t)te−t /2 , where hH is as in (7.6) and
H = 200. Let x be a real number ≥ 1012 .
Then
X∞ Z ∞
2 2
Λ(n)(log n)η (n/x) = x · η+ (t) log xt dt + E1 · x log x
n=1 0
where Iq=1 = 1 if q = 1 and Iq=1 = 0 otherwise. Here ρ runs over the complex
numbers ρ with L(ρ, χ) = 0 and 0 < <(ρ) < 1 (“non-trivial zeros”). The function Fδ
is the Mellin transform of e(δt)η(t) (see §2.4).
The questions are then: where are the non-trivial zeros ρ of L(s, χ)? How fast does
Fδ (ρ) decay as =(ρ) → ±∞?
Write σ = <(s), τ = =(s). The belief is, of course, that σ = 1/2 for every non-
trivial zero (Generalized Riemann Hypothesis), but this is far from proven. Most work
to date has used zero-free regions of the form σ ≤ 1 − 1/C log q|τ |, C a constant. This
is a classical zero-free region, going back, qualitatively, to de la Vallée-Poussin (1899).
The best values of C known are due to McCurley [McC84a] and Kadiri [Kad05].
These regions seem too narrow to yield a proof of the three-primes theorem. What
we will use instead is a finite verification of GRH “up to Tq ”, i.e., a computation show-
ing that, for every Dirichlet character of conductor q ≤ r0 (r0 a constant, as above),
every non-trivial zero ρ = σ + iτ with |τ | ≤ Tq satisfies <(σ) = 1/2. Such verifica-
tions go back to Riemann; modern computer-based methods are descended in part from
a paper by Turing [Tur53]. (See the historical article [Boo06b].) In his thesis [Pla11],
D. Platt gave a rigorous verification for r0 = 105 , Tq = 108 /q. In coordination with
the present work, he has extended this to
• all odd q ≤ 3 · 105 , with Tq = 108 /q,
• all even q ≤ 4 · 105 , with Tq = max(108 /q, 200 + 7.5 · 107 /q).
This was a major computational effort, involving, in particular, a fast implementation
of interval arithmetic (used for the sake of rigor).
What remains to discuss, then, is how to choose η in such a way Fδ (ρ) decreases
fast enough as |τ | increases, so that (7.7) gives a good estimate. We cannot hope for
Fδ (ρ) to start decreasing consistently before |τ | is at least as large as a constant times
|δ|. Since δ varies within (−cr0 /q, cr0 /q), this explains why Tq is taken inversely
proportional to q in the above. As we will work with r0 ≥ 150000, we also see that we
have little margin for maneuver: we want Fδ (ρ) to be extremely small already for, say,
|τ | ≥ 80|δ|. We also have a Scylla-and-Charybdis situation, courtesy of the uncertainty
principle: roughly speaking, Fδ (ρ) cannot decrease faster than exponentially on |τ |/|δ|
both for |δ| ≤ 1 and for δ large.
The most delicate case is that of δ large, since then |τ |/|δ| is small. It turns out
we can manage to get decay that is much faster than exponential for δ large, while no
slower than exponential for δ small. This we will achieve by working with smoothing
2
functions based on the (one-sided) Gaussian η♥ (t) = e−t /2 .
2
The Mellin transform of the twisted Gaussian e(δt)e−t /2 is a parabolic cylinder
function U (a, z) with z purely imaginary. Since fully explicit estimates for U (a, z),
z imaginary, have not been worked in the literature, we will have to derive them our-
selves.
Once we have fully explicit estimates for the Mellin transform of the twisted Gaus-
sian, we are able to use essentially any smoothing function based on the Gaussian
2
η♥ (t) = e−t /2 . As we already saw, we can and will consider smoothing functions
obtained by convolving the twisted Gaussian with another function and also functions
obtained by multiplying the twisted Gaussian with another function. All we need to
142 CHAPTER 7. MAJOR ARCS: OVERVIEW AND RESULTS
do is use an explicit formula of the right kind – that is, a formula that does not as-
sume too much about the smoothing function or the region of holomorphy of its Mellin
transform, but still gives very good error terms, with simple expressions.
All results here will be based on a single, general explicit formula (Lem. 9.1.1) valid
for all our purposes. The contribution of the zeros in the critical trip can be handled in
a unified way (Lemmas 9.1.3 and 9.1.4). All that has to be done for each smoothing
function is to bound a simple integral (in (9.24)). We then apply a finite verification of
GRH and are done.
Chapter 8
Our aim in this chapter is to give fully explicit, yet relatively simple bounds for the
2
Mellin transform Fδ (ρ) of e(δt)η♥ (t), where η♥ (t) = e−t /2 and δ is arbitrary. The
rapid decay that results will establish that the Gaussian η♥ is a very good choice for a
smoothing, particularly when the smoothing has to be twisted by an additive character
e(δt).
The Gaussian smoothing has been used before in number theory; see, notably,
Heath-Brown’s well-known paper on the fourth power moment of the Riemann zeta
function [HB79]. What is new here is that we will derive fully explicit bounds on the
Mellin transform of the twisted Gaussian. This means that the Gaussian smoothing will
be a real option in explicit work on exponential sums in number theory and elsewhere
from now on.1
2
Theorem 8.0.1. Let fδ (t) = e−t /2 e(δt), δ ∈ R. Let Fδ be the Mellin transform of fδ .
Let s = σ + iτ , σ ≥ 0, τ 6= 0. Let ` = −2πδ. Then, if sgn(δ) 6= sgn(τ ) and δ 6= 0,
(
π
−E(ρ)τ c1,σ,τ /τ σ/2 for ρ arbitrary,
|Fδ (s)| ≤ |Γ(s)|e 2τ e · (8.1)
c2,σ,τ /`σ for ρ ≤ 3/2.
1 There has also been work using the Gaussian after a logarithmic change of variables; see, in particular,
[Leh66]. In that case, the Mellin transform is simply a Gaussian (as in, e.g., [MV07, Ex. XII.2.9]). However,
2
for δ non-zero, the Mellin transform of a twist e(δt)e−(log t) /2 decays very slowly, and thus would not be
useful for our purposes, or, in general, for most applications in which GRH is not assumed.
143
144 CHAPTER 8. THE MELLIN TRANSFORM OF THE TWISTED GAUSSIAN
where ρ = 4τ /`2 ,
1 1 2(υ(ρ) − 1)
E(ρ) = arccos − ,
2 υ(ρ) ρ
!σ/2 √
2−1
− τ
1 1 2 e 2
c1,σ,τ = 1 + 2 4 + σ
1 + sin2 π8 tan π8
2 (8.2)
q
1 1
sec 2π5 e − τ6
c2,σ,τ = 1 + min 2σ+ 2 , σ + √ .
2 sin π5 (1/ 3)σ
and s p
1+ ρ2 + 1
υ(ρ) = .
2
If sgn(δ) = sgn(τ ) or δ = 0,
1 2 π π π 1
|Fδ (s)| ≤ |x0 |−σ · e− 2 ` |Γ(s)|e 2 |τ | · 1+ e− 4 |τ | + e−π|τ | , (8.3)
23/2 2
where ( √
0.51729 τ for ρ arbitrary,
|x0 | ≥ (8.4)
0.84473 |τ
|`|
|
for ρ ≤ 3/2.
2
As we shall see, the choice of smoothing function η(t) = e−t /2 can be easily
motivated by the method of stationary phase, but the problem is actually solved by the
saddle-point method. One of the challenges here is to keep all expressions explicit and
practical.
(In particular, the more critical estimate, (8.1), is optimal up to a constant depending
on σ; the constants we give will be good rather than optimal.)
The expressions in Thm. 8.0.1 can be easily simplified further, especially if one is
ready to introduce some mild constraints and make some sacrifices in the main term.
2
Corollary 8.0.2. Let fδ (t) = e−t /2 e(δt), δ ∈ R. Let Fδ be the Mellin transform of
fδ . Let s = σ + iτ , where σ ∈ [0, 1] and |τ | ≥ 20. Then, for 0 ≤ k ≤ 2,
k 2|τ | 2
κk,0 |τ |
e−0.1065( |`| ) if 4|τ |/`2 < 3/2.
|Fδ (s + k)| + |Fδ ((1 − s) + k)| ≤ |`|
κ |τ |k/2 e−0.1598|τ | if 4|τ |/`2 ≥ 3/2.
k,1
where
κ0,0 ≤ 3.001, κ1,0 ≤ 4.903, κ2,0 ≤ 7.96,
κ0,1 ≤ 3.286, κ1,1 ≤ 4.017, κ2,1 ≤ 5.13.
We are considering Fδ (s + k), and not just Fδ (s), because bounding Fδ (s + k)
2
enables us to work with smoothing functions equal to or based on tk e−t /2 . Clearly,
we can easily derive bounds with k arbitrary from Thm. 8.0.1. It is just that we will
8.1. HOW TO CHOOSE A SMOOTHING FUNCTION? 145
In other words, if sgn(τ ) 6= sgn(δ) and δ is not too small, asking that Fδ (σ + iτ )
decay rapidly as |τ | → ∞ amounts to asking that η(t) decay rapidly as t → 0. Thus,
if we ask for Fδ (σ + iτ ) to decay rapidly as |τ | → ∞ for all moderate δ, we are
requesting that
146 CHAPTER 8. THE MELLIN TRANSFORM OF THE TWISTED GAUSSIAN
4] does not make all constants explicit. The constants and trivial-zero terms were fully worked out for
q = 1 by [Wig20] (cited in [MV07, Exercise 12.1.1.8(c)]; the sign of hypκ,q (z) there seems to be off). As
was pointed out by Landau (see [Har66, p. 628]), [HL22] seems to neglect the effect of the zeros ρ with
<(ρ) = 0, =(ρ) 6= 0 for χ non-primitive. (The author thanks R. C. Vaughan for this information and the
references.)
8.2. THE TWISTED GAUSSIAN: OVERVIEW AND SETUP 147
for <(a) > −1/2; the function can be extended to all a, z ∈ C either by analytic
continuation or by other integral representations ([AS64, §19.5], [Tem10, §12.5(i)]).
Hence
2 1
Fδ (s) = e(πiδ) Γ(s)U s − , −2πiδ . (8.9)
2
The second argument of U is purely imaginary; it would be otherwise if a Gaussian of
non-zero mean were chosen.
Let us briefly discuss the state of knowledge up to date on Mellin transforms of
2
“twisted” Gaussian smoothings, that is, e−t /2 multiplied by an additive character
e(δt). As we have just seen, these Mellin transforms are precisely the parabolic cylin-
der functions U (a, z).
The function U (a, z) has been well-studied for a and z real; see, e.g., [Tem10].
Less attention has been paid to the more general case of a and z complex. The most
notable exception is by far the work of Olver [Olv58], [Olv59], [Olv61], [Olv65]; he
gave asymptotic series for U (a, z), a, z ∈ C. These were asymptotic series in the sense
of Poincaré, and thus not in general convergent; they would solve our problem if and
only if they came with error term bounds. Unfortunately, it would seem that all fully
explicit error terms in the literature are either for a and z real, or for a and z outside
our range of interest (see both Olver’s work and [TV03].) The bounds in [Olv61]
involve non-explicit constants. Thus, we will have to find expressions with explicit
error bounds ourselves. Our case is that of a in the critical strip, z purely imaginary.
There is one important difference between the approach we will follow here and
that in [Hela]. In [Hela], the integral (8.8) was estimated by a direct application of
the saddle-point method. Here, following a suggestion of N. Temme, we will use the
identity
1 2 Z c+i∞
e4z u2 1
U (a, z) = √ e−zu+ 2 u−a− 2 du (8.10)
2πi c−i∞
(see, e.g., [OLBC10, (12.5.6)]; c > 0 is arbitrary). Together, (8.9) and (8.10) give us
that
2 2
e−2π δ Γ(s) c+i∞ 2πiδu+ u2 −s
Z
Fδ (s) = √ e 2 u du. (8.11)
2πi c−i∞
148 CHAPTER 8. THE MELLIN TRANSFORM OF THE TWISTED GAUSSIAN
Estimating the integral in (8.11) turns out to be a somewhat cleaner task than estimating
(8.8). The overall procedure, however, is in essence the same in both cases.
We write
u2
φ(u) = − − (2πiδ)u + iτ log u (8.12)
2
for u real or complex, so that the integral in (8.11) equals
Z c+i∞
I(s) = e−φ(u) u−σ du. (8.13)
c−i∞
i`u0 + iτ
φ(u0 ) = − + i`u0 + iτ log u0
2
(8.16)
i` u0
= u0 + iτ log √ .
2 e
1 1
φ00 (u0 ) = − u20 + iτ = − 2 (i`u0 + 2iτ ).
2 (8.17)
u0 u0
Assign the names u0,+ , u0,− to the roots in (8.15) according to the sign in front
of the square-root (where the square-root is defined so as to have its argument in the
interval (−π/2, π/2]). We will actually have to pay attention just to u0,+ , since, unlike
u0,− , it lies on the right half of the plane, where our contour of integration also lies.
We remark that
q
i` + |`| −1 + 4iτ
r !
`2 ` 4τ
u0,+ = = i ± −1 + 2 i (8.18)
2 2 `
√
where
√ √ the sign ± is + if ` > 0 and − if ` < 0. If ` = 0, then u0,+ = (1/ 2 +
i/ 2) τ .
We can assume without loss of generality that τ ≥ 0. We will find it convenient to
assume τ > 0, since we can deal with τ = 0 simply by letting τ → 0+ .
8.3. THE SADDLE POINT 149
We should start by determining u0,+ explicitly, both in rectangular and polar coordi-
nates. For one thing, we will need to estimate the integrand in (8.13) for u = u0,+ . The
absolute value of the integrand is then e−φ(u0,+ ) u−σ
0,+ = |u0,+ |
−σ −<φ(u0,+ )
e , and, by
(8.16),
`
<φ(u0,+ ) = − =(u0,+ ) − arg(u0,+ )τ. (8.19)
2
p √
If ` = 0, we already know that <(u0,+ ) = =(u0,+ ) = τ /2, |u0,+ | = τ and
arg u0,+ = π/4. Assume from now on that ` 6= 0.
We will use the expression for u0,+ in (8.18). Solving a quadratic equation, we see
that
r r r
4τ j(ρ) − 1 j(ρ) + 1
−1 + 2 i = +i , (8.20)
` 2 2
r r !
` j(ρ) − 1 ` j(ρ) + 1
<(u0,+ ) = ± , =(u0,+ ) = 1± . (8.21)
2 2 2 2
Here and in what follows, the sign ± is + if ` > 0 and − if ` < 0. (Notice that <(u0,+ )
and =(u0,+ ) are always positive, except for τ = ` = 0, in which case <(u0,+ ) =
=(u0,+ ) = 0.) By (8.21),
r r !
|`| −1 + j(ρ) 1 + j(ρ)
|u0,+ | = · + 1± i
2 2 2
s r
|`| −1 + j(ρ) 1 + j(ρ) 1 + j(ρ) (8.22)
= + +1±2
2 2 2 2
s r
|`| 1 + j(ρ) |`| p
= 1 + j(ρ) ± 2 =√ υ(ρ)2 ± υ(ρ),
2 2 2
150 CHAPTER 8. THE MELLIN TRANSFORM OF THE TWISTED GAUSSIAN
p
where υ(ρ) = (1 + j(ρ))/2. We now compute the argument of u0,+ :
p
arg(u0,+ ) = arg ` i ± −1 + iρ
r r !!
−1 + j(ρ) 1 + j(ρ)
= arg + i ±1 +
2 2
q r q
1+j(ρ)
±1 + 1+j(ρ) ±1 + 2
2
= arcsin
r q
= arcsin
r q
1 + j(ρ) ± 2 1+j(ρ)
2 2 1+j(ρ)
2
v
u s ! s !
u1 2 π 1 2
= arcsin t 1± = − arccos ±
2 1 + j(ρ) 2 2 1 + j(ρ)
(8.23)
(by cos(π − 2θ) = − cos 2θ = 2 sin2 θ − 1). Thus
(
π 1 1 1 −1
2 − 2 arccos υ(ρ) = 2 arccos υ(ρ) if ` > 0,
arg(u0,+ ) = 1 1
(8.24)
2 arccos υ(ρ) if ` < 0.
In particular, arg(u0,+ ) lies in [0, π/2], and is close to π/2 only when ` > 0 and
ρ → 0+ . Here and elsewhere, we follow the convention that arcsin and arctan have
image in [−π/2, π/2], whereas arccos has image in [0, π].
By (8.21),
r ! r !
`2 j(ρ) − 1 4τ `2 j(ρ) − 1
<(`u0,+ + 2τ ) = ± + 2 = ρ± ,
2 2 ` 2 2
r !
`2 j(ρ) + 1
=(`u0,+ + 2τ ) = 1± .
2 2
Figure 8.1: arg(w) − π/2 as a function of Figure 8.2: arg(w) − π/2 as a function of
ρ for ` < 0 ρ for ` ≥ 0
There is nothing wrong in using plots here to get an idea of the behavior of arg(w),
since, at any rate, the direction of steepest descent will play only an advisory role in
our choices. See Figures 8.1 and 8.2.
(8.2)), this would be a good choice. However, the vertical line has the defect of going
too close to the origin when ρ → 0.
Instead, we will let L consist of three segments: (a) the straight vertical ray
{(x0 , y) : y ≥ y0 },
where x0 = <u0,+ ≥ 0, y0 = =u0,+ > 0; (b) the straight segment going downwards
and to the right from u0,+ to the x-axis, forming an angle of π/2 − β (where β > 0
will be determined later) with the x-axis at a point (x1 , 0); (c) the straight vertical ray
{(x1 , y) : y ≤ 0}. Let us call these three segments L1 , L2 , L3 . Shifting the contour in
(8.13), we obtain Z
I= e−φ(u) u−σ du,
L
and so |I| ≤ I1 + I2 + I3 , where
Z
Ij = e−φ(u) u−σ |du|. (8.29)
Lj
As we shall see, we have chosen the segments Lj so that each of the three integrals Ij
will be easy to bound.
Let us start with I1 . Since σ ≥ 0,
Z ∞
−σ
I1 ≤ |u0,+ | e−<φ(x0 +iy) dy,
y0
where, by (8.12),
y 2 − x2
<φ(x + iy) = − `y − τ arg(x + iy). (8.30)
2
Let us expand the expression on the right of (8.30) for x = x0 and y around y0 =
=u0,+ > 0. The constant term is
` `2 τ −1
<φ(u0,+ ) = − y0 − τ arg(u0,+ ) = − (1 + υ(ρ)) − arccos
2 4 2 υ(ρ)
(8.31)
1 + υ(ρ) 1 −1
=− + arccos τ,
ρ 2 υ(ρ)
where we are using (8.19), (8.21) and (8.24).
The linear term vanishes because u0,+ is a saddle-point (and thus a local extremum
on L). It remains to estimate the quadratic term. Now, in (8.30), the term arg(x + iy)
equals arctan(y/x), whose quadratic term we should now examine – but, instead, we
are about to see that we can bound it trivially. In general, for t0 , t ∈ R and f ∈ C 2 ,
Z tZ r
0
f (t) = f (t0 ) + f (t0 ) · (t − t0 ) + f 00 (s)dsdr. (8.32)
t0 t0
Now, arctan00 (s) = −2s/(s2 + 1)2 , and this is negative for s > 0 and obeys
arctan00 (−s) = − arctan00 (s)
154 CHAPTER 8. THE MELLIN TRANSFORM OF THE TWISTED GAUSSIAN
– namely, (y − y0 )2 /2 – and ignore the quadratic term coming from arg(x + iy). Thus,
(y − y0 )2
<φ(x0 + iy) ≥ + <φ(u0,+ ) (8.34)
2
for y ≥ −y0 , and, in particular, for y ≥ y0 . Hence,
Z ∞ Z ∞
1 2 p
e−<φ(x0 +iy) dy ≤ e−<φ(u0,+ ) e− 2 (y−y0 ) dy = π/2 · e−<φ(u0,+ ) . (8.35)
y0 y0
Notice that, once we choose to use the approximation (8.33), the vertical direction is
actually optimal. (In turn, the fact that the direction of steepest descent is close to
vertical shows us that we are not losing much by using the approximation (8.33).)
As for |u0,+ |−σ , we will estimate it by the easy bound
ρ √ √
r
` p 2 `
|u0,+ | = √ υ + υ ≥ √ max , 2 = max( τ , `), (8.36)
2 2 2
where we use (8.22).
Let us now bound I2 . As we already said, the linear term at u0,+ vanishes. Let
u◦ be the point at which L2 meets the line normal to it through the origin. We must
take care that the angle formed by the origin, u0,+ and u◦ be no larger than the angle
formed by the origin, (x1 , 0) and u0 ; this will ensure that we are in the range in which
the approximation (8.33) is valid (namely, t ≥ −t0 , where t0 = tan α0 ). The first
angle is π/2 + β − arg u0,+ , whereas the second angle is π/2 − β. Hence, it is enough
to set β ≤ (arg u0,+ )/2. Then we obtain from (8.12) and (8.33) that
(u − u0,+ )2
<φ(u) ≥ <φ(u0,+ ) − < . (8.37)
2
If we let s = |u − u0,+ |, we see that
(u − u0,+ )2 s2 π s2
< = cos 2 · −β = − cos 2β.
2 2 2 2
Hence,
Z
I2 ≤ |u◦ |−σ e−<φ(u) |du|
L2
Z ∞ r (8.38)
−σ s2 π
< |u◦ | e−<φ(u0,+ )− 2 cos 2β
ds = |u◦ |−σ e−<φ(u0,+ ) .
0 2 cos 2β
Since arg u0 = arg u0,+ − β, we see that, by (8.21),
|u◦ | = < ((x0 + iy0 ) (cos β − i sin β))
r r ! !
` j−1 j+1 (8.39)
= cos β + 1 + sin β .
2 2 2
8.4. THE INTEGRAL OVER THE CONTOUR 155
for y ≤ 0. Hence
Z Z 0
2
I3 ≤ |x1 |−σ e−<φ(u) |du| ≤ |x1 |−σ e−<φ(x1 ) e−y /2
dy
L3 −∞
r
−σ π − 1−tan2 β τ −<φ(u0,+ )
≤ |x1 | · e 4 e .
2
√
(since (1 − tan2 π/8)/4 = ( 2 − 1)/2) and, when ρ ≤ 3/2,
q p
sec 2π − τ6
1
|I| ≤ 1 + min 2σ+ 2 ,
5
+
e
√ · π/2 e−<φ(u0,+ ) .
π σ σ `σ
sin 5 (1/ 3)
1 1 υ(ρ) − 1
E(ρ) = arccos − , (8.40)
2 υ(ρ) ρ
so that
1 + υ(ρ) 1 −1 π 2
−<φ(u0,+ ) = + arccos = − E(ρ) + .
ρ 2 υ(ρ) 2 ρ
π
To finish, we just need to apply (8.11). It makes sense to group together Γ(s)e 2 τ ,
p it is bounded on the critical line (by the classical formula |Γ(1/2 + iτ )| =
since
π/ cosh πτ , as in [MV07, Exer. C.1(b)]), and, in general, of slow growth on bounded
strips. Using (8.11), and noting that 2π 2 δ 2 = `2 /2 = (2/ρ) · τ , we obtain
(
π
τ −E(ρ)τ c1,σ,τ /τ σ/2 for ρ arbitrary,
|Fδ (s)| ≤ |Γ(s)|e 2 e · (8.41)
c2,σ,τ /`σ for ρ ≤ 3/2.
where !σ/2 √
2−1
− τ
1 1 2 e 2
c1,σ,τ = 1 + 24 + σ
2 1 + sin2 π8 tan π8
q (8.42)
1 1
sec 2π5 e − τ6
c2,σ,τ = 1 + min 2σ+ 2 , σ + √ .
2 sin π5 (1/ 3)σ
8.4. THE INTEGRAL OVER THE CONTOUR 157
√ so, for ρ ≥ 3/2, (j(ρ) − 1)/ρ is minimal at ρ = 3/2, where it takes the value
and
( 13 − 2)/3. Hence
r √ p√ p√
|`| j(ρ) − 1 |`| ρ 13 − 2 13 − 2 √ √
x0 = ≥ √ = √ τ ≥ 0.51729 τ . (8.46)
2 2 2 6 6
We now sum I1 , I2 and I3 , and then use (8.11); we obtain that, when ` < 0 and
τ ≥ 0,
2 2
e−2π δ |Γ(s)|
Z
|Fδ (s)| ≤ √ e−φ(u) u−σ du
2π L
(8.47)
−σ π −<φ(u0,+ ) 1 − τ π − 12 `2
≤ |x0 | 1 + 3/2 e + e 2 e |Γ(s)|.
2 2
By (8.19), (8.21) and (8.24),
`2 τ 1 τ 1 π
−<(φ(u0,+ )) = (1 − υ(ρ)) + arccos < arccos ≤ τ.
4 2 υ(ρ) 2 υ(ρ) 4
We conclude that, when sgn(`) 6= sgn(τ ) (i.e., sgn(δ) = sgn(τ )),
−σ − 21 `2 π
|τ | π − π |τ | 1 −π|τ |
|Fδ (s)| ≤ |x0 | · e |Γ(s)|e 2 · 1 + 3/2 e 4 + e ,
2 2
where x0 can be bounded as in (8.45) and (8.46). Here, as before, we reducing the case
τ < 0 to the case τ > 0 by reflection. This concludes the proof of Theorem 8.0.1.
8.5 Conclusions
We have obtained bounds on |Fδ (s)| for sgn(δ) 6= sgn(τ ) (8.41) and for sgn(δ) =
sgn(τ ) (8.47). Our task is now to simplify them.
First, let us look at the exponent E(ρ), defined as in (8.2). Its plot can be seen in
Figure 8.5. We claim that
(
0.1598 if ρ ≥ 1.5,
E(ρ) ≥ (8.48)
0.1065ρ if ρ < 1.5.
This is so for ρ ≥ 1.5 because E(ρ) is increasing on ρ and E(1.5) = 0.15982 . . . . The
case ρ < 1.5 is a little more delicate. We can easilypsee that arccos(1 − t2 /2) ≥ t for
0 ≤ t ≥ 2 (since the derivative of the left side is 1/ 1 − t2 /4, which is always ≥ 1).
We also have
ρ2 ρ4 ρ2
1+ − ≤ j(ρ) ≤ 1 +
2 8 2
160 CHAPTER 8. THE MELLIN TRANSFORM OF THE TWISTED GAUSSIAN
√
for 0 ≤ ρ ≤ 8, and so
ρ2 5ρ4 ρ2
1+ − ≤ υ(ρ) ≤ 1 +
8 128 8
p
for 0 ≤ ρ ≤ p32/5; this, in turn, gives us that 1/υ(ρ) ≤ 1 − ρ2 /8 + 7ρ4 /128 (again
for 0 ≤ ρ ≤ 32/5), and so 1/υ(ρ) ≤ 1 − (1 − 7/64)ρ2 /8 for 0 ≤ ρ ≤ 1/2. We
conclude that r
1 1 57
arccos ≥ ρ;
υ(ρ) 2 64
therefore,
r
1 57 ρ
E(ρ) ≥ ρ − > 0.11093ρ > 0.1065ρ.
4 64 8
In the remaining range 1/2 ≤ ρ ≤ 3/2, we prove that E(ρ)/ρ > 0.106551 using
the bisection method (with 20 iterations) implemented by means of interval arithmetic.
This concludes the proof of (8.48).
Assume from this point onwards that |τ | ≥ 20. Let us show that the contribution
of (8.3) is negligible relative to that of (8.1). Indeed,
π −π 1 7.8 −0.1598τ
1+ e 4 |τ | + e−π|τ | ≤ e .
23/2 2 106
2
It is useful to note that e−` /2
= e−2τ /ρ , and so, for σ ≤ k + 1 and ρ ≤ 3/2,
σ
e−2τ /ρ e−40/ρ e−80/(3t)
1 4
≤ σ ≤ σ
(0.84473|τ |/`)σ 0.84473
4 ρ `σ ` 0.84473 · 1.5 tσ
(8.49)
1 e−80/(3t)
≤ σ · 3.15683k+1 k+1 ,
` t
8.5. CONCLUSIONS 161
where t = 2ρ/3 ≤ 1. Since e−c/t /tk+1 attains its maximum at t = c/(k + 1),
k+1
e−80/(3t)
3(k + 1)
≤ e−(k+1) ,
tk+1 80
2 √
whereas |x0 |−σ e−` /2 ≤ |x0 |−σ ≤ (0.51729 τ )−σ for ρ ≥ 3/2.
We conclude that, for |τ | ≥ 20 and σ ≤ 3,
(
4 1
π
τ −0.1598τ 7 σ if ρ ≤ 3/2,
|Fδ (s)| ≤ |Γ(s)|e · e
2 · 106 ` 1 (8.50)
105 τ σ/2 if ρ ≥ 3/2
where √
1/30 2
|R2 (s)| < arg s =
12|s|3 cos3 2
180|s|3
for <(s) ≥ 0. The real part of (s − 1/2) log s − s is
π σ σ
(σ − 1/2) log |s| − τ arg(s) − σ = (σ − 1/2) log |s| − τ + τ arctan −
2 |τ | |τ |
and √
1 2
+ 180|τ
e 12|τ | |3 ≤ 1.004177.
162 CHAPTER 8. THE MELLIN TRANSFORM OF THE TWISTED GAUSSIAN
Thus,
π
2.51868 if 0 ≤ σ ≤ 1,
|Γ(s)|e 2 τ ≤ |τ |σ−1/2
· 2.53596 if 1 ≤ σ ≤ 2, (8.52)
2.5881 if 2 ≤ σ ≤ 3.
Let us now estimate the constants c1,σ,τ and c2,σ,τ in (8.2). By |τ | ≥ 20,
√
2−1
− τ τ
e 2
≤ 0.015889, e− 6 ≤ 0.035674. (8.53)
where p
κ0,0 ≤ (4 · 10−7 + 1.94511) · 2.51868 · 3/8 ≤ 3.001,
−7
p
κ1,0 ≤ (4 · 10 + 3.15692) · 2.53596 · 3/8 ≤ 4.903,
−7
p
κ2,0 ≤ (4 · 10 + 5.02186) · 2.5881 · 3/8 ≤ 7.96,
and, similarly,
Explicit formulas
163
164 CHAPTER 9. EXPLICIT FORMULAS
Proof. Since (a) η(t)tσ−1 is in `1 for σ in an open interval containing 3/2 and (b)
η(t)e(δt) has bounded variation (since η, η 0 ∈ `1 , implying that the derivative of
η(t)e(δt) is also in `1 ), the Mellin inversion formula (as in, e.g., [IK04, 4.106]) holds:
3
2 +i∞
Z
1
η(n/x)e(δn/x) = Gδ (s)xs n−s ds.
2πi 3
2 −i∞
Since Gδ (s) is bounded for <(s) = 3/2 (by η(t)t3/2−1 ∈ `1 ) and n Λ(n)n−3/2 is
P
bounded as well, we can change the order of summation and integration as follows:
∞ ∞ Z 32 +i∞
X X 1
Λ(n)χ(n)e(δn/x)η(n/x) = Λ(n)χ(n) · Gδ (s)xs n−s ds
n=1 n=1
2πi 3
2 −i∞
∞
Z 32 +i∞ X
1
= Λ(n)χ(n)Gδ (s)xs n−s ds (9.4)
2πi 23 −i∞ n=1
Z 32 +i∞
1 L0 (s, χ)
= − Gδ (s)xs ds.
2πi 23 −i∞ L(s, χ)
(This is the way the procedure always starts: see, for instance, [HL22, Lemma 1] or,
to look at a recent standard reference, [MV07, p. 144]. We are being very scrupulous
about integration because we are working with general η.)
The first question we should ask ourselves is: up to where can we extend Gδ (s)?
Since η(t)tσ−1 is in `1 for σ in an open interval I containing [1/2, 3/2], the transform
Gδ (s) is defined for <(s) in the same interval I. However, we also know that the
transformation rule M (tf 0 (t))(s) = −s · M f (s) (see (2.10); by integration by parts)
is valid when s is in the holomorphy strip for both M (tf 0 (t)) and M f . In our case
(f (t) = η(t)e(δt)), this happens when <(s) ∈ (I − 1) ∩ I (so that both sides of the
equation in the rule are defined). Hence s · Gδ (s) (which equals s · M f (s)) can be
analytically continued to <(s) in (I − 1) ∪ I, which is an open interval containing
[−1/2, 3/2]. This implies immediately that Gδ (s) can be analytically continued to the
same region, with a possible pole at s = 0.
When does Gδ (s) have a pole at s = 0? This happens when sGδ (s) is non-zero at
s = 0, i.e., when M (tf 0 (t))(0) 6= 0 for f (t) = η(t)e(δt). Now
Z ∞
M (tf 0 (t))(0) = f 0 (t)dt = lim f (t) − f (0).
0 t→∞
Here we were able to exchange the limit and the integral because f 0 (t)tσ is in `1
for σ in a neighborhood of 0; in turn, this is true because f 0 (t) = η 0 (t) + 2πiδη(t)
and η 0 (t)tσ and η(t)tσ are both in `1 for σ in a neighborhood of 0. In fact, we will
use the easy bounds |η(t) log t| ≤ (2/3)(|η(t)t−1/2 |1 + |η(t)t1/2 |1 ), |η 0 (t) log t| ≤
(2/3)(|η 0 (t)t−1/2 |1 + |η 0 (t)t1/2 |1 ), resulting from the inequality
2 −1 1
t 2 + t 2 ≤ | log t|, (9.5)
3
valid for all t > 0.
We conclude that the Laurent expansion of Gδ (s) at s = 0 is
η(0)
Gδ (s) = + c0 + c1 s + . . . , (9.6)
s
where
where
L0 (s, χ)
R = Ress=0 Gδ (s).
L(s, χ)
Of course,
Z ∞
Gδ (1) = M (η(t)e(δt))(1) = η(t)e(δt)dt = ηb(−δ).
0
Let us work out the Laurent expansion of L0 (s, χ)/L(s, χ) at s = 0. By the func-
tional equation (as in, e.g., [IK04, Thm. 4.15]),
L0 (s, χ) L0 (1 − s, χ)
π 1 s+κ 1 1−s+κ
= log − ψ − ψ − , (9.8)
L(s, χ) q 2 2 2 2 L(1 − s, χ)
where c0 = O∗ (|η 0 (t) log t|1 + 2π|δ||η(t) log t|1 ). If η(0) 6= 0, then
(
L0 (1, χ)
2π c0 if χ(−1) = 1
R = η(0) log +γ− +
q L(1, χ) 0 otherwise.
for q > 1, and
R = η(0) log 2π
for q = 1.
It is time to estimate the integral on the right side of (9.7). For that, we will need to
estimate L0 (s, χ)/L(s, χ) for <(s) = −1/2 using (9.8) and (9.9).
If <(z) = 3/2, then |t2 + z 2 | ≥ 9/4 for all real t. Hence, by [OLBC10, (5.9.15)]
and [GR94, (3.411.1)],
Z ∞
1 tdt
ψ(z) = log z − −2 2 + z 2 )(e2πt − 1)
2z 0 (t
Z ∞
1 tdt
= log z − + 2 · O∗ 9 2πt
2z 0 (e − 1)
Z ∞ 4
1 8 ∗ tdt
= log z − + O (9.10)
2z 9 e2πt − 1
0
1 8 1
= log z − + · O∗ Γ(2)ζ(2)
2z 9 (2π)2
1 ∗ 1 ∗ 10
= log z − +O = log z + O .
2z 27 27
Thus, in particular, ψ(1 − s) = log(3/2 − iτ ) + O∗ (10/27), where we write s =
1/2 + iτ . Now
π π π π
π(s + κ) e∓ 4 i− 2 τ + e± 4 i+ 2 τ
cot = ∓ π i− π τ π π = 1.
2 e 4 2 − e± 4 i+ 2 τ
168 CHAPTER 9. EXPLICIT FORMULAS
By (9.12),
v v
uZ − 1 +i∞ 2
uZ − 21 +i∞ 2
u 2 L0 (s, χ) 1 u log q
t · |ds| ≤ t |ds|
− 12 −i∞ L(s, χ) s − 12 −i∞ s
v
uZ
∞ 1 9
2
u
2 log τ 2 + 4 + 4.1396 + log π
+t 1 2
dτ
−∞ 4 +τ
√ √
≤ 2π log q + 226.844,
where we compute the last integral numerically.1
Again, we use the fact that, by (2.10), sGδ (s) is the Mellin transform of
d(e(δt)η(t))
−t = −2πiδte(δt)η(t) − te(δt)η 0 (t) (9.14)
dt
Hence, by Plancherel (as in (2.6)),
v sZ
u 1 Z − 12 +i∞ ∞
u
2 2
t |Gδ (s)s| |ds| = |−2πiδte(δt)η(t) − te(δt)η 0 (t)| t−2 dt
2π − 21 −i∞ 0
sZ sZ
∞ ∞
= 2π|δ| |η(t)|2 dt + |η 0 (t)|2 dt.
0 0
(9.15)
1 By a rigorous integration from τ = −100000 to τ = 100000 using VNODE-LP [Ned06], which runs
on the PROFIL/BIAS interval arithmetic package [Knü99].
9.1. A GENERAL EXPLICIT FORMULA 169
Lemma 9.1.1 leaves us with three tasks: bounding the sum of Gδ (ρ)xρ over all
non-trivial zeroes ρ with small imaginary part, bounding the sum of Gδ (ρ)xρ over all
non-trivial zeroes ρ with large imaginary part, and bounding L0 (1, χ)/L(1, χ). Let
us start with the last task: while, in a narrow sense, it is optional – in that, in the
applications we actually need (Thm. 7.1.2, Cor. 7.1.3 and Thm. 7.1.4), we will have
η(0) = 0, thus making the term L0 (1, χ)/L(1, χ) disappear – it is also very easy and
can be dealt with quickly.
Since we will be using a finite GRH check in all later applications, we might as
well use it here.
Lemma 9.1.2. Let χ be a primitive character mod q, q > 1. Assume that all non-trivial
zeroes ρ = σ + it of L(s, χ) with |t| ≤ 5/8 satisfy <(ρ) = 1/2. Then
L0 (1, χ) 5
≤ log M (q) + c,
L(1, χ) 2
P
where M (q) = maxn m≤n χ(m) and
√
2 3
c = 5 log = 15.07016 . . . .
ζ(9/4)/ζ(9/8)
Proof. By a lemma of Landau’s (see, e.g., [MV07, Lemma 6.3], where the constants
are easily made explicit) based on the Borel-Carathéodory Lemma (as in [MV07,
Lemma 6.2]), any function f analytic and zero-free on a disc Cs0 ,R = {s : |s − s0 | ≤
R} of radius R > 0 around s0 satisfies
f 0 (s)
∗ 2R log M/|f (s0 )|
=O (9.16)
f (s) (R − r)2
for all s with |s − s0 | ≤ r, where 0 < r < R and M is the maximum of |f (z)| on
Cs0 ,R . Assuming L(s, χ) has no non-trivial zeros off the critical line with |=(s)| ≤ H,
where H > 1/2, we set s0 = 1/2 + H, r = H − 1/2, and let R → H − . We obtain
Since s0 = 1/2 + H, Cs0 ,H is contained in {s ∈ C : <(s) > 1/2} for any value of H.
We choose (somewhat arbitrarily) H = 5/8.
170 CHAPTER 9. EXPLICIT FORMULAS
Obviously, 15.289 is more than log 2π, the bound for χ trivial. Hence, the absolute
value of the quantity R in the statement of Lemma 9.1.1 is at most
X X
Gδ (ρ)xρ ≤ |Gδ (ρ)| · x<(ρ) .
ρ ρ
Recall that these are sums over the non-trivial zeros ρ of L(s, χ).
We first prove a general lemma on sums of values of functions on the non-trivial
zeros of L(s, χ). This is little more than partial summation, given a (classical) bound
for the number of zeroes N (T, χ) of L(s, χ) with |=(s)| ≤ T . The error term becomes
particularly simple if f is real-valued and decreasing; the statement is then practically
identical to that of [Leh66, Lemma 1] (for χ principal), except for the fact that the error
term is improved here.
Lemma 9.1.3. Let f : R+ → C be piecewise C 1 . Assume limt→∞ f (t)t log t = 0.
Let χ be a primitive character mod q, q ≥ 1; let ρ denote the non-trivial zeros ρ of
L(s, χ). Then, for any y ≥ 1,
Z ∞
X 1 qT
f (=(ρ)) = f (T ) log dT
2π y 2π
ρ non-trivial
=(ρ)>y (9.24)
Z ∞
1 ∗ 0
+ O |f (y)|gχ (y) + |f (T )| · gχ (T )dT ,
2 y
where
gχ (T ) = 0.5 log qT + 17.7 (9.25)
If f is real-valued and decreasing on [y, ∞), the second line of (9.24) equals
Z ∞
∗ 1 f (T )
O dT .
4 y T
Proof. Write N (T, χ) for the number of non-trivial zeros of L(s, χ) satisfying |=(s)| ≤
T . Write N + (T, χ) for the number of (necessarily non-trivial) zeros of L(s, χ) with
0 < =(s) ≤ T . Then, for any f : R+ → C with f piecewise differentiable and
limt→∞ f (t)N (T, χ) = 0,
X Z ∞
f (=(ρ)) = f (T ) dN + (T, χ)
ρ:=(ρ)>y y
Z ∞
=− f 0 (T )(N + (T, χ) − N + (y, χ))dT
y
Z ∞
1
=− f 0 (T )(N (T, χ) − N (y, χ))dT.
2 y
Now, by [Ros41, Thms. 17–19] and [McC84a, Thm. 2.1] (see also [Tru, Thm. 1]),
T qT
N (T, χ) = log + O∗ (gχ (T )) (9.26)
π 2πe
172 CHAPTER 9. EXPLICIT FORMULAS
1 ∞ 0
Z
X T qT y qy
f (=(ρ)) = − f (T ) log − log dT
2 y π 2πe π 2πe
ρ:=(ρ)>y
Z ∞ (9.27)
1 ∗ 0
+ O |f (y)|gχ (y) + |f (T )| · gχ (T )dT .
2 y
Here
Z ∞ Z ∞
1 T qT y qy 1 qT
− f 0 (T ) log − log dT = f (T ) log dT. (9.28)
2 y π 2πe π 2πe 2π y 2π
is at most
p √ p
(|η|2 + |η · log |2 ) T0 log qT0 + (17.21|η · log |2 − (log 2π e)|η|2 ) T0
√ (9.29)
+ η(t)/ t · (1.32 log q + 34.5)
1
9.1. A GENERAL EXPLICIT FORMULA 173
Let us now consider zeros ρ with |=(ρ)| > 1. Apply Lemma 9.1.3 with y = 1 and
(
|Gδ (1/2 + it)| if t ≤ T0 ,
f (t) =
0 if t > T0 .
Hence s
Z T0 2
1 qT qT0 p
f (T ) log dT ≤ log + 1 · |η|2 T0 .
π 1 2π 2πe
Again by Cauchy-Schwarz,
s s
Z ∞ Z ∞ Z T0
0 1 1
|f (T )| · gχ (T ) dT ≤ |f 0 (T )|2 dT · |gχ (T )|2 dT .
1 2π −∞ π 1
Since |f 0 (T )| = |G0δ (1/2 + iT )| and (M η)0 (s) is the Mellin transform of log(t) ·
e(δt)η(t) (by (2.10)),
Z ∞
1
|f 0 (T )|2 dT = |η(t) log(t)|2 .
2π −∞
Much as before,
Z T0 Z T0
2
|gχ (T )| dT ≤ (0.5 log qT + 17.7)2 dT
1 0
= (0.25(log qT0 )2 + 17.2(log qT0 ) + 296.09)T0 .
Summing, we obtain
Z ∞
1 T0
Z
qT
f (T ) log dT + |f 0 (T )| · gχ (T ) dT
π 1 2π 1
p
qT0 1 log qT0
≤ log + |η|2 + + 17.21 |η(t)(log t)|2 T0
2πe 2 2
Finally, by (9.30) and (9.25),
√
|f (1)|gχ (1) ≤ η(t)/ t · (0.5 log q + 17.7).
1
By (9.32) and the assumption that all non-trivial zeros with |=(ρ)| ≤ T0 lie on the line
<(s) = 1/2, we conclude that
X p
|Gδ (ρ)| ≤ (|η|2 + |η · log |2 ) T0 log qT0
ρ non-trivial
1<|=(ρ)|≤T0
√ p
+ (17.21|η · log |2 − (log 2π e)|η|2 ) T0
√
+ η(t)/ t · (0.5 log q + 17.7).
1
δ 2 −0.1065( π|δ|
X qT0 −0.1598T0
T0
)
2
|Fδ (ρ)| ≤ log · 3.53e + 22.5 e .
ρ
2π T0
|=(ρ)|>T0
Here we have preferred to give a bound with a simple form. It is probably feasible
to derive from Theorem 8.0.1 a bound essentially proportional to e−E(ρ)T0 , where ρ =
T0 /(πδ)2 and E(ρ) is as in (8.2). (As we discussed in §8.5, E(ρ) behaves as e−(π/4)T0
2
for ρ large and as e−0.125(T0 /(πδ)) for ρ small.)
Proof. First of all,
X X
|Fδ (ρ)| = (|Fδ (ρ)| + |Fδ (1 − ρ)|) ,
ρ ρ
|=(ρ)|>T0 =(ρ)>T0
by the functional equation (which implies that non-trivial zeros come in pairs ρ, 1 − ρ).
Hence, by a somewhat brutish application of Cor. 8.0.2,
X X
|Fδ (ρ)| ≤ f (=(ρ)), (9.33)
ρ ρ
|=(ρ)|>T0 =(ρ)>T0
where 2
f (τ ) = 3.001e−0.1065( πδ ) + 3.286e−0.1598|τ | .
τ
(9.34)
Obviously, f (τ ) is a decreasing function of τ for τ ≥ T0 .
We now apply Lemma 9.1.3. We obtain that
Z ∞
X 1 qT 1
f (=(ρ)) ≤ f (T ) log + dT. (9.35)
ρ T0 2π 2π 4T
=(ρ)>T0
R∞ R∞
where E1 (x) = x e−t dt/t. Clearly, E1 (x) ≤ x e−t dt/x = e−x /x. Hence
Z ∞ −cy
c1 −ct 1 1 e
log t + e dt ≤ log y + + c1 .
y t c y c
We conclude that
Z ∞
−0.1598t 1 qt 1
e log + dt
T0 2π 2π 4t
Z ∞ q Z ∞
1 π/2 −ct log 2π
≤ log t + e dt + e−ct dt (9.36)
2π T0 t 2πc T0
1 q 1 π 1
= log T0 + log + + e−cT0
2πc 2π c 2 T0
with c = 0.1598. Since T0 ≥ 50 and q ≥ 1, this is at most
qT0 −cT0
1.072 log e . (9.37)
2π
Now let us deal with the Gaussian term. (It appears only if T0 < (3/2)(πδ)2 , as
otherwise |τ | ≥ (3/2)(πδ)2 holds whenever |τ | ≥ T0 .) For any y ≥ e, c ≥ 0,
Z ∞ Z ∞ Z ∞ 2
2 1 2 1 −t2 e−cy
e−ct dt = √ √ e−t dt ≤ te dt ≤ , (9.38)
y c cy cy √cy 2cy
2
∞ Z ∞ −t 2
e−ct E1 (cy 2 ) e−cy
Z
e
dt = dt = ≤ , (9.39)
y t cy 2 2t 2 2cy 2
Z ∞ Z ∞
−ct2 log t − 1 −ct2 log y −cy2
(log t)e dt ≤ log t + 2
e dt = e . (9.40)
y y 2ct 2cy
Hence
Z ∞
2 1 qT 1
e−0.1065( πδ )
T
log + dT
T0 2π 2π 4T
∞
|δ|
Z
2 q|δ|t 1
= e−0.1065t log + dt
T0
π|δ|
2 2 4t (9.41)
|δ| T0 |δ| q|δ|
log log 1 T0
−c0 ( π|δ| 2
≤
2 π|δ|
+ 2 2
+ )
2 e
T0 T0
2c0 π|δ| 2c0 π|δ|
T0
8c0 π|δ|
2
We need to record a few norms related to the Gaussian η♥ (t) = e−t /2 before we
proceed. Recall we are working with the one-sided Gaussian, i.e., we set η♥ (t) = 0
for t < 0. Symbolic integration then gives
Z ∞ √
2 π
|η♥ |22 = e−t dt = ,
0 2
Z ∞ √
0 2 2 π
|η♥ |2 = (te−t /2 )2 dt = ,
0 4
Z ∞ (9.43)
2 −t2 2
|η♥ · log |2 = e (log t) dt
0
√
π 2
π + 2γ 2 + 8γ log 2 + 8(log 2)2 ≤ 1.94753,
=
16
√ Z ∞ −t2 /2
e Γ(1/4)
|η♥ (t)/ t|1 = √ dt = 3/4 ≤ 2.15581
t 2
√ √ Z0 ∞ √
0 t 2 Γ(3/4)
|η♥ (t)/ t| = |η♥ (t) t|1 = e− 2 tdt = 1/4 ≤ 1.03045 (9.44)
0 2
Z ∞
t2 3
0
η♥ (t)t1/2 = η♥ (t)t3/2 = e− 2 t 2 dt = 1.07791.
1 1 0
We can now state what is really our main result for the Gaussian smoothing. (The
version in §7.1 will, as we shall later see, follow from this, given numerical inputs.)
2
Proposition 9.2.2. Let η(t) = e−t /2 . Let x ≥ 1, δ ∈ R. Let χ be a primitive character
mod q, q ≥ 1. Assume that all non-trivial zeros ρ of L(s, χ) with |=(ρ)| ≤ T0 lie on
the critical line. Assume that T0 ≥ 50.
Then
∞ (
X δ n ηb(−δ)x + O∗ (errη,χ (δ, x)) · x if q = 1,
Λ(n)χ(n)e n η =
n=1
x x O∗ (errη,χ (δ, x)) · x if q > 1,
(9.45)
where
δ2
qT0 T0 2
errη,χ (δ, x) = log · 3.53e−0.1598T0 + 22.5 e−0.1065( π|δ| )
2π T0
p p 1
+ (2.337 T0 log qT0 + 21.817 T0 + 2.85 log q + 74.38)x− 2
+ (3 log q + 14|δ| + 17)x−1 + (log q + 6) · (1 + 5|δ|) · x−3/2 .
178 CHAPTER 9. EXPLICIT FORMULAS
Proof. Let Fδ (s) be the Mellin transform of η♥ (t)e(δt). By Lemmas 9.1.4 (with Gδ =
Fδ ) and Lemma 9.2.1,
X
Fδ (ρ)xρ
ρ non-trivial
√
is at most (9.29) (with η = η♥ ) times x, plus
Let us now apply Lemma 9.1.1. We saw that the value of R in Lemma 9.1.1 is
bounded by (9.23). We know that η♥ (0) = 1. Again by (9.43) and (9.44), the quantity
c0 defined in (9.3) is at most 1.4056 + 13.3466|δ|. Hence
Lastly,
0
|η♥ |2 + 2π|δ||η♥ |2 ≤ 0.942 + 4.183|δ| ≤ 1 + 5|δ|.
Clearly
(6.01 − 6) · (1 + 5|δ|) + 13.347|δ| + 16.695 < 14|δ| + 17,
and so we are done.
The fact that this vanishes at t = 0 actually makes it easier to work with at several
levels.
Its Mellin transform is just a shift of that of the Gaussian. Write
t2
Fδ (s) = (M (e− 2 e(δt)))(s),
(9.47)
Gδ (s) = (M (η(t)e(δt)))(s).
Gδ (s) = Fδ (s + 2).
We start by bounding the contribution of zeros with large imaginary part, just as
before.
9.3. THE CASE OF η∗ (T ) 179
2
Lemma 9.3.1. Let η(t) = t2 e−t /2 . Let x ∈ R+ , δ ∈ R. Let χ be a primitive character
mod q, q ≥ 1. Assume that all non-trivial zeros ρ of L(s, χ) with |=(ρ)| ≤ T0 satisfy
<(s) = 1/2. Assume that T0 ≥ max(10π|δ|, 50).
Write Gδ (s) for the Mellin transform of η(t)e(δt). Then
2
X qT0 −0.1598T0
T0
−0.1065· (πδ)
|Gδ (ρ)| ≤ T0 log · 6.11e + 1.578e 2
.
ρ
2π
|=(ρ)|>T0
where we are using Gδ (ρ) = Fδ (ρ + 2) and the fact that non-trivial zeros come in pairs
ρ, 1 − ρ.
By Cor. 8.0.2 with k = 2,
X X
|Gδ (ρ)| ≤ f (=(ρ)),
ρ ρ
|=(ρ)|>T0 =(ρ)>T0
where
2
|τ | 2
|τ |
e−0.1065( πδ )
κ2,0
κ2,1 |τ |e−0.1598|τ | + if |τ | < 32 (πδ)2 ,
f (τ ) = 4 πδ (9.48)
κ |τ |e−0.1598|τ | if |τ | ≥ 32 (πδ)2 ,
2,1
where κ2,0 = 7.96 and κ2,1 = 5.13. We are including the term |τ |e−0.1598|τ | in both
cases in part because we cannot be bothered to take it out (just as we could not be
bothered in the proof of Lem. 9.2.1) and in part to ensure that f (τ ) is a decreasing
function of τ for τ ≥ T0 .
We can now apply Lemma 9.1.3. We obtain, again,
Z ∞
X 1 qT 1
f (=(ρ)) ≤ f (T ) log + dT. (9.49)
ρ T0 2π 2π 4T
=(ρ)>T0
where
log y 1 c1
c + c + y
a= log y 1
.
c − c2 y
and
log T0 1 π/2
0.1598 + 0.1598 + T0
a= log T0 1
≤ 1.299.
0.1598 − 0.15982 T0
It is easy to see that ratio of the expression within parentheses on the right side of
(9.51) to T0 log(qT0 /2π) increases as q decreases and, if we hold q fixed, decreases as
T0 ≥ 2π increases; thus, it is maximal for q = 1 and T0 = 50. Multiplying (9.51) by
κ2,1 = 5.13 and simplifying by the assumption T0 ≥ 50, we obtain that
Z ∞
−0.1598T 1 qT0 1 qT0 −0.1598T0
5.13T e log + ·e dT ≤ 6.11T0 log
.
T0 2π 2π 4T 2π
(9.52)
Now let us examine the Gaussian term. First of all – when does it arise? If T0 ≥
(3/2)(πδ)2 , then |τ | ≥ (3/2)(πδ)2 holds whenever |τ | ≥ T0 , and so (9.48) does not
give us a Gaussian term. Recall that T0 ≥ 10π|δ|, which means that |δ| ≤ 20/(3π)
implies that T0 ≥ (3/2)(πδ)2 . We can thus assume from now on that |δ| > 20/(3π),
since otherwise there is no Gaussian term to treat.
For any y ≥ 1, c, c1 > 0,
Z ∞ Z ∞
2 −ct2 1 −ct2 y 1 2
t e dt < t2 + e dt = + 2 · e−cy ,
y y 4c2 t2 2c 4c y
Z ∞ Z ∞
2 at log et log et a 2
(t2 log t + c1 t) · e−ct dt ≤ t2 log t + − − 2 e−ct dt
y y 2c 2c 4c t
(2cy + a) log y + a −cy2
= ·e ,
4c2
where
c1
c1 y + log ey
2c 1 c1 y + 4c21y2 1 2c1 c
1
2cy log ey + 4c2 y 2
a= y log ey 1
= + y log ey 1
= + + y log ey
.
2c − 4c2 y
y 2c − 4c2 y y log ey 2c − 4c12 y
(Note that a decreases as y ≥ y0 increases, provided that log ey0 > 1/(2cy02 ).) Setting
9.3. THE CASE OF η∗ (T ) 181
where we are using several times the assumption that T0 ≥ 4π 2 |δ| (and, in one occa-
sion, the fact that |δ| > 20/(3π) > 2).
We sum (9.52) and the estimate for (9.53) we have just got to reach our conclusion.
3√ 7√
|η|22 = π, |η 0 |22 = π,
8
√ 16
π
|η · log |22 = 8(3γ − 8) log 2 + 3π 2 + 6γ 2 + 24(log 2)2 + 16 − 32γ
64
≤ 0.16364,
√ 21/4 Γ(1/4) √ 3
|η(t)/ t|1 = ≤ 1.07791, |η(t) t|1 = 23/4 Γ(3/4) ≤ 1.54568,
√
4 4
√ Z 2 Z ∞
t2 t2
|η 0 (t)/ t|1 = t3/2 e− 2 dt − √ t3/2 e− 2 dt ≤ 1.48469,
0 2
0
√
|η (t) t|1 ≤ 1.72169.
(9.55)
182 CHAPTER 9. EXPLICIT FORMULAS
2
Proposition 9.3.2. Let η(t) = t2 e−t /2 . Let x ≥ 1, δ ∈ R. Let χ be a primitive
character mod q, q ≥ 1. Assume that all non-trivial zeros ρ of L(s, χ) with |=(ρ)| ≤ T0
lie on the critical line. Assume that T0 ≥ max(10π|δ|, 50).
Then
∞
(
ηb(−δ)x + O∗ (errη,χ (δ, x)) · x if q = 1,
X δ
Λ(n)χ(n)e n η(n/x) =
n=1
x O∗ (errη,χ (δ, x)) · x if q > 1,
(9.56)
where
2
qT0 −0.1598T0
T0
−0.1065· (πδ)
errη,χ (δ, x) = T0 log · 6.11e + 1.578e 2
2π
p p
+ 1.22 T0 log qT0 + 5.056 T0 + 1.423 log q + 37.19 · x−1/2
+ (3 + 11|δ|)x−1 + (log q + 6) · (1 + 6|δ|) · x−3/2 .
(9.57)
Proof. We proceed as in the proof of Prop. 9.2.2. The contribution of Lemma 9.3.1 is
2
qT0 T0
−0.1065· (πδ)
T0 log · 6.11e−0.1598T0 + 1.578e 2
· x,
2π
whereas the contribution of Lemma 9.1.4 is at most
p p √
(1.22 T0 log qT0 + 5.056 T0 + 1.423 log q + 37.188) x.
Lastly,
|η 0 |2 + 2π|δ||η|2 ≤ 0.881 + 5.123|δ|.
Now that we have Prop. 9.3.2, we can derive from it similar bounds for a smoothing
defined as the multiplicative convolution of η with something else. In general, for
ϕ1 , ϕ2 : [0, ∞) → C, if we know how to bound sums of the form
X
Sf,ϕ1 (x) = f (n)ϕ1 (n/x), (9.58)
n
we can bound sums of the form Sf,ϕ1 ∗M ϕ2 , simply by changing the order of summation
and integration:
X n
Sf,ϕ1 ∗M ϕ2 = f (n) · (ϕ1 ∗M ϕ2 )
n
x
Z ∞X n Z ∞
dw dw
= f (n)ϕ1 ϕ2 (w) = Sf,ϕ1 (wx)ϕ2 (w) .
0 n
wx w 0 w
(9.59)
9.3. THE CASE OF η∗ (T ) 183
This is particularly nice if ϕ2 (t) vanishes in a neighbourhood of the origin, since then
the argument wx of Sf,ϕ1 (wx) is always large.
2
We will use ϕ1 (t) = t2 e−t /2 , ϕ2 (t) = η1 ∗M η1 , where η1 is 2 times the char-
acteristic function of the interval [1/2, 1]. The motivation for the choice of ϕ1 and ϕ2
is clear: we have just got bounds based on ϕ1 (t) in the major arcs, and we obtained
minor-arc bounds for the weight ϕ2 (t) in Part I.
2
Corollary 9.3.3. Let η(t) = t2 e−t /2 , η1 = 2 · I[1/2,1] , η2 = η1 ∗M η1 . Let η∗ =
η2 ∗M η. Let x ∈ R+ , δ ∈ R. Let χ be a primitive character mod q, q ≥ 1. Assume
that all non-trivial zeros ρ of L(s, χ) with |=(ρ)| ≤ T0 lie on the critical line. Assume
that T0 ≥ max(10π|δ|, 50).
Then
∞
(
ηb∗ (−δ)x + O∗ (errη∗ ,χ (δ, x)) · x if q = 1,
X δ
Λ(n)χ(n)e n η∗ (n/x) =
n=1
x O∗ (errη∗ ,χ (δ, x)) · x if q > 1,
(9.60)
where
2
qT0 T0
−0.1065· (πδ)
errη,χ∗ (δ, x) = T0 log · 6.11e−0.1598T0 + 0.0102 · e 2
2π
p p 1
+ 1.679 T0 log qT0 + 6.957 T0 + 1.958 log q + 51.17 · x− 2
+ (6 + 22|δ|)x−1 + (log q + 6) · (3 + 17|δ|) · x−3/2 .
(9.61)
Proof. The left side of (9.60) equals
Z ∞X ∞
δn n dw
Λ(n)χ(n)e η η2 (w)
0 n=1
x wx w
Z ∞
1X
δwn n dw
= Λ(n)χ(n)e η η2 (w) ,
1
4 n=1
wx wx w
since η2 is supported on [−1/4, 1]. By Prop. 9.3.2, the main term (if q = 1) contributes
Z 1 Z ∞
dw
ηb(−δw)xw · η2 (w) =x ηb(−δw)η2 (w)dw
1
4
w 0
Z ∞Z ∞ Z ∞Z ∞
r dr
=x η(t)e(δwt)dt · η2 (w)dw = x η e(δr) η2 (w)dw
0 −∞ 0 −∞ w w
Z ∞ Z ∞
r dw
=x η η2 (w) e(δr)dr = ηb∗ (−δ) · x.
−∞ 0 w w
The error term is
Z 1 Z 1
dw
errη,χ (δw, wx) · wx · η2 (w) =x· errη,χ (δw, wx)η2 (w)dw. (9.62)
1
4
w 1
4
184 CHAPTER 9. EXPLICIT FORMULAS
and, by rigorous numerical integration from 1/4 to 1/2 and from 1/2 to 1 (using, e.g.,
VNODE-LP [Ned06]),
Z ∞
2
e−0.1065·10 ( w2 −1) η2 (w)dw ≤ 0.006446.
1
where T00 = T0 − H.
Proof. As usual,
X X
|Gδ (ρ)| = (|Gδ (ρ)| + |Gδ (1 − ρ)|) .
ρ ρ
|=(ρ)|>T0 =(ρ)>T0
2
Let Fδ be as in (9.47). Then, since η+ (t)e(δt) = hH (t)te−t /2
e(δt), where hH is as
in (7.6), we see by (2.9) that
Z H
1
Gδ (s) = M h(ir)Fδ (s + 1 − ir)dr,
2π −H
where κ1,0 = 4.903 and κ1,1 = 4.017. (As in the proof of Lemmas 9.2.1 and 9.3.1, we
are putting in extra terms so as to simplify our integrals.)
From (9.64), we conclude that
|M h(ir)|1
f (τ ) = · g(τ − H)
2π
is decreasing for τ ≥ T0 (because g(τ ) is decreasing for τ ≥ T0 − H). By (A.17),
|M h(ir)|1 ≤ 16.193918.
186 CHAPTER 9. EXPLICIT FORMULAS
Now we just need to estimate some integrals. For any y ≥ e2 , c > 0 and κ, κ1 ≥ 0,
∞ √
√
y
Z
1
te−ct dt ≤ + 2√ e−cy ,
y c 2c y
∞ √
√
y
Z
κ1 −ct a
t log(t + κ) + √ e dt ≤ + 2√ log(y + κ)e−cy ,
y t c c y
where
1 1 + cκ1
a= + .
2 log(y + κ)
κ1,1 |M h(ir)|1 ∞ 1 1 √
Z
qT
log + T − H · e−0.1598(T −H) dT
2π T0 2π 2π 4T
q
1 √ −0.1598T
Z ∞
1 log 2π
≤ 10.3532 log(T + H) + + Te dT (9.67)
T0 −H 2π 2π 4T
√
10.3532 T0 − H a qT0 −0.1598(T0 −H)
≤ + 2
√ log ·e ,
2π 0.1598 0.1598 T0 − H 2π
∞ 2
e−cy
Z
2
te−ct dt = ,
y 2c
1
! 2
∞ κ1 + log(y + κ) · e−cy
Z
−ct2 2cy
(t log(t + κ) + κ1 )e dt ≤ 1+
y y log(y + κ) 2c
Proceeding just as before, we see that the contribution of the Gaussian term in (9.65)
9.4. THE CASE OF η+ (T ) 187
to (9.66) is at most
κ1,0 |M h(ir)|1 ∞ 1
Z
qT 1 T − H −0.1065( Tπδ −H 2
) dT
log + ·e
2π T0 2π 2π 4T 2π|δ|
|δ| ∞
Z
H q|δ| π/2 2
≤ 12.6368 · log T + + log + T e−0.1065T dT
4 Tπ|δ|0 −H π|δ| 2 T
π π|δ|
|δ| 2 + 2·0.1065·(T0 −H)
log qT0 · e−0.1065( 0πδ ) ,
T −H 2
≤ 12.6368 · 1 +
T −H T
8 · 0.1065 π|δ| log π|δ|
0 0 2π
(9.68)
Since (T0 − H)/(π|δ|) ≥ 10, this is at most
qT0 −0.1065( T0πδ
−H 2
) .
16.147|δ| log ·e
2π
using (A.30) and (A.33). By (A.25), (A.32) and the assumption H ≥ 25,
0
|η+ |2 ≤ 0.80365, |η+ |2 ≤ 10.845789.
where
(log T1 )2
log T1 π
err`2 ,η+ = 0.462 + 0.909 log T1 T1 + 1.71 1 + H e− 4 T1
log x log x
p
+ (2.445 T0 log T0 + 50.04) · x−1/2
(9.73)
and T1 = T0 − 2H.
The assumption T0 ≥ 200 is stronger than what we strictly need, but, as it happens,
we could make much stronger assumptions still. Proposition 9.5.1 relies on a verifica-
tion of zeros of the Riemann zeta function; such verifications have gone up to values
of T0 much higher than 200.
9.5. A SUM FOR η+ (T )2 189
Proof. We will need to consider two smoothing functions, namely, η+,0 (t) = η+ (t)2
and η+,1 = η+ (t)2 log t. Clearly,
∞
X ∞
X ∞
X
2
Λ(n)(log n)η+ (n/x) = (log x) Λ(n)η+,0 (n/x) + Λ(n)η+,1 (n/x).
n=1 n=1 n=1
2
Since η+ (t) = hH (t)te−t /2
,
2 2
η+,0 (r) = h2H (t)t2 e−t , η+,1 (r) = h2H (t)(log t)t2 e−t .
2
Let η+,2 = (log x)η+,0 + η+,1 = η+ (t) log xt.
We wish to apply Lemma 9.1.1. For this, we must first check that some norms are
finite. Clearly,
2 2
η+,2 (t) = η+ (t) log x + η+ (t) log t
0 0 0 2
(9.74)
η+,2 (t) = 2η+ (t)η+ (t) log x + 2η+ (t)η+ (t) log t + η+ (t)/t.
Thus, we see that η+,2 (t) is in `2 because η+ (t) is in `2 and η+ (t), η+ (t) log t are both
in `∞ (see (A.25), (A.38), (A.40)):
2 2
|η+,2 (t)|2 ≤ η+ (t) 2 log x + η+ (t) log t 2
(9.75)
≤ |η+ |∞ |η+ |2 log x + |η+ (t) log t|∞ |η+ |2 .
0 0
Similarly, η+,2 (t) is in `2 because η+ (t) is in `2 , η+ (t) is in `2 (A.32), and η+ (t),
η+ (t) log t and η+ (t)/t (see (A.41)) are all in `∞ :
0 0 0 2
η+,2 (t) 2
≤ 2η+ (t)η+ (t) 2 log x + 2η+ (t)η+ (t) log t 2
+ η+ (t)/t 2
0 0
≤ 2 |η+ |∞ η+ 2
log x + 2 |η+ (t) log t|∞ η+ 2
+ |η+ (t)/t|∞ |η+ |2 .
(9.76)
In the same way, we see that η+,2 (t)tσ−1 is in `1 for all σ in (−1, ∞) (because the same
0
is true of η+ (t)tσ−1 (A.30), and η+ (t), η+ (t) log t are both in `∞ ) and η+,2 (t)tσ−1 is
σ−1 0 σ−1
in `1 for all σ in (0, ∞) (because the same is true of η+ (t)t and η+ (t)t (A.33),
and η+ (t), η+ (t) log t, η+ (t)/t are all in `∞ ).
We now apply Lemma 9.1.1 with q = 1, δ = 0. Since η+,2 (0) = 0, the residue
term R equals c0 , which, by (9.74), is at most 2/3 times
0
√ 0
√
2 (|η+ |∞ log x + |η+ (t) log t|∞ ) η+ (t)/ t + η+ (t) t
1 1
√ √
+ |η+ (t)/t|∞ η+ (t)/ t + η+ (t) t .
1 1
Using the bounds (A.38), (A.40), (A.41) (with the assumption H ≥ 25), (A.30) and
(A.33), we get that this means that
Since q = 1 and δ = 0, we get from (9.76) (and (A.38), (A.40), (A.41), with the
assumption H ≥ 25, and also (A.25) and (A.32)) that
0
(log q + 6.01)· η+,2 2
+ 2π|δ| |η+,2 |2 x−1/2
0
= 6.01 η+,2 2
x−1/2 ≤ (162.56 log x + 59.325)x−1/2 .
We will now apply Lemma 9.1.4 – as we may, because of the finiteness of the norms
we have already checked, together with
2 2
|η+,2 (t) log t|2 ≤ η+ (t) log t 2 log x + η+ (t)(log t)2 2
≤ |η+ (t) log t|∞ (|η+ (t)|2 log x + |η+ (t) log t|2 )
≤ 0.4976 · (0.80365 log x + 0.82999) ≤ 0.3999 log x + 0.41301
(9.78)
(by (A.40), (A.25) and (A.28); use the assumption H ≥ 25). We also need the bounds
(from (9.75), by the norm bounds (A.38), (A.40) and (A.25), all with H ≥ 25) and
√ √
η+,2 (t)/ t ≤ (|η+ (t)|∞ log x + |η+ (t) log t|∞ ) η+ (t)/ t
1 1 (9.80)
≤ 1.4211 log x + 0.49763
M η+,2 (ρ)) over all non-trivial zeros ρ with |=(ρ)| ≤ T0 is at most x1/2 times
p p
(1.54189 log x + 0.8129) T0 log T0 + (4.21245 log x + 6.17301) T0
(9.81)
+ 49.1 log x + 17.2,
2 2
The Mellin transform of e−t is Γ(s/2)/2, and so the Mellin transform of t2 e−t
2
is Γ(s/2 + 1)/2. By (2.10), this implies that the Mellin transform of (log t)t2 e−t is
0
Γ (s/2 + 1)/4. Hence, by (2.9),
Z ∞
1
M η+,2 (s) = M (h2H )(ir) · Fx (s − ir) dr, (9.82)
4π −∞
where s 1 s
Fx (s) = (log x)Γ + 1 + Γ0 +1 . (9.83)
2 2 2
Moreover,
Z ∞
1
M (h2H )(ir) = M hH (iu)M hH (i(r − u)) du, (9.84)
2π −∞
and so M (h2H )(ir) is supported on [−2H, 2H]. We also see that |M h2H (ir)|1 ≤
|M hH (ir)|21 /2π. We know that |M hH (ir)|21 /2π ≤ 41.73727 by (A.17).
Hence
Z ∞
1
|M η+,2 (s)| ≤ |M (h2H )(ir)|dr · max |Fx (s − ir)|
4π −∞ |r|≤2H
(9.85)
41.73727
≤ · max |Fx (s − ir)| ≤ 3.32135 · max |Fx (s − ir)|.
4π |r|≤2H |r|≤2H
|M η+,2 (ρ)| ≤ f (τ )
where
1 |τ | π(|τ |−2H)
f (T ) = 8.45 log x + log T −H · e− 4 . (9.89)
2 2
The functions t 7→ te−πt/2 and t 7→ (log t)te−πt/2 are decreasing for t ≥ e (or in fact
for t ≥ 1.762); setting t = T /2 − H, we see that the right side of (9.89) is a decreasing
function of T for T ≥ T0 , since T0 /2 − H ≥ 25/2 > e.
We can now apply Lemma 9.1.3, and get that
Z ∞
X 1 T 1
|M η+,2 (ρ)| ≤ f (T ) log + dT. (9.90)
ρ T0 2π 2π 4T
|=(ρ)|>T0
Since T ≥ T0 ≥ 75 > 2, we know that ((1/2π) log(T /2π) + 1/4T ) ≤ (1/2π) log T .
Hence, the right side of (9.90) is at most
8.39 ∞ (log T )2
Z
π(T −2H)
(log x)(log T ) + (T − 2H)e− 4 dT
4π T0 2
Z ∞ (9.91)
(log t)2
2H log t − πt
≤ 0.668 (log x) log t + + + 2H te 4 dt,
T1 t 2 t
where T1 = T0 − 2H and t = T − 2H; we are using the facts that (log t)00 < 0 for
t > 0 and ((logRt)2 )00 < 0 for t > e. (Of course, T1 ≥ 25 > e.)
∞
Of course, T1 e−(π/4)t = (4/π)e−(π/4)T1 . We recall (9.36) and (9.50):
∞ π
4/π e− 4 T1
Z
−π
4t
log t · e dt ≤ log T1 +
T1 T1 π/4
Z ∞ − π T1
π 4a e 4 log T1
(log t)te− 4 t dt ≤ T1 +
T1 π π/4
π
((0.462 log T1 + 0.909 log x)(log T1 )T1 + 1.71(log T1 + log x)H) e− 4 T1 .
• all even q ≤ 4 · 105 , with Tq = max(108 /q, 200 + 7.5 · 107 /q).
The method used was rigorous; its implementation uses interval arithmetic.
Let us see what this verification gives us when used as an input to Prop. 9.2.2. We
are interested in bounds on | errη,χ∗ (δ, x)| for q ≤ r and |δ| ≤ 4r/q. We set r = 3·105 .
(We will not be using the verification for q even with 3 · 105 < q ≤ 4 · 105 , though we
certainly could.)
We let T0 = 108 /q. Thus,
108 1000
T0 ≥ = ,
3 · 105 3
(9.92)
T0 108 /q 1000
≥ =
π|δ| π · 4r/q 12π
22.5 e 2
≤ |δ| · 7.715 · 10−34 ≤ 9.258 · 10−28 .
T0
194 CHAPTER 9. EXPLICIT FORMULAS
2π T0
1.54 · 10−26
≤ 4.3054 · 10−22 + ≤ 4.306 · 10−22 .
q
Again by T0 = 108 /q,
p p
2.337 T0 log qT0 + 21.817 T0 + 2.85 log q + 74.38
is at most
648662
√ + 111,
q
and
1.7 · 107
3 log q + 14|δ| + 17 ≤ 55 + ,
q
1.2 · 108
(log q + 6) · (1 + 5|δ|) ≤ 19 + .
q
Hence, assuming x ≥ 108 to simplify, we see that Prop. 9.2.2 gives us that
648662
√ + 111 1.7·107 1.2·108
−22 q 55 + q 19 + q
errη,χ (δ, x) ≤ 4.306 · 10 +√ + +
x x x3/2
1 650400
≤ 4.306 · 10−22 + √ √ + 112
x q
2
for η(t) = e−t /2 . This proves Theorem 7.1.1.
Let us now see what Platt’s calculations give us when used as an input to Prop. 9.3.2
and Cor. 9.3.3. Again, we set r = 3 · 105 , δ0 = 8, |δ| ≤ 4r/q and T0 = 108 /q, so
(9.92) is still valid. We obtain
2
qT0 −0.1598T0
T0
−0.1065· (πδ)
T0 log · 6.11e + 1.578e 2
2π
108
1000 −0.1598· 1000 8 −0.1065( 1000 )
2
≤ log 6.11 · e 3 + 10 · 1.578e 12π
2π 3
≤ 2.485 · 10−19 ,
q ≤ r, |δ| ≤ 4r/q.
We can see that Platt’s verification [Plab], mentioned before, allows us to take
250r
T0 = H + , H = 200,
q
since Tq is always at least this (Tq = 108 /q ≥ 200 + 7 · 107 /q > 200 + 3.75 · 107 /q
for q ≤ 150000 odd, Tq ≥ 200 + 7.5 · 107 /q for q ≤ 300000 even).
Thus,
250r 250r
T0 − H = ≥ = 250,
q r
T0 − H 250r 250
≥ ≥ = 19.89436 . . .
πδ πδq 4π
and also
4r
≤ 7.9854 · 10−16 + · 7.9814 · 10−18
q
9.5777 · 10−12
≤ 7.9854 · 10−16 + .
q
196 CHAPTER 9. EXPLICIT FORMULAS
1.617 · 10−10
−14 499900 1
errη+ ,χ (δ, x) ≤ 1.3482 · 10 + + √ + 52 √ .
q q x
2
≤ 10−1300000 + 9689000e−0.1065t .
Looking at (9.70), we get
T0 −1300000 2
errη+ ,χT (δ, x) ≤ log · 10 + 9689000e−0.1065t
2π p
+ ((1.634 log T0 + 12.43) T0 + 34.51)x−1/2 + 6600009x−1 .
The value t = 20 seems good enough; we choose it because it is not far from optimal
for x ∼ 1027 . We get that T0 = 12000000π + 200; since T0 < 108 , we are within the
range of the computations in [Plab] (or for that matter [Wed03] or [Plaa]). We obtain
251400
errη+ ,χT (δ, x) ≤ 4.772 · 10−11 + √ .
x
Lastly, let us look at the sum estimated in (9.72). Here it will be enough to go up
to just T0 = 2H + max(50, H/4) = 450, where, as before, H = 200. Of course, the
9.6. A VERIFICATION OF ZEROS AND ITS CONSEQUENCES 197
verification of the zeros of the Riemann zeta function does go that far; as we already
said, it goes until 108 (or rather more: see [Wed03] and [Plaa]). We make, again, the
assumption x ≥ 1012 . We look at (9.73) and obtain that err`2 ,η+ is at most
(log 50)2
log 50 π
0.462 12
+ 0.909 log 50 · 50 + 1.71 1 + 12
· 200 e− 4 50
log 10 log 10
√ −1/2
+ (2.445 450 log 450 + 50.04) · x
366.91
≤ 5.123 · 10−15 + √ .
x
(9.93)
It remains only to estimate the integral in (9.72). First of all,
Z ∞ Z ∞
2
η+ (t) log xt dt = η◦2 (t) log xt dt
0 0
Z ∞ Z ∞
+2 (η+ (t) − η◦ (t))η◦ (t) log xt dt + (η+ (t) − η◦ (t))2 log xt dt.
0 0
where Rthe integrals were computed rigorously using VNODE-LP [Ned06]. (The in-
∞
tegral 0 η◦2 (t)dt can also be computed symbolically.) By Cauchy-Schwarz and the
triangle inequality,
Z ∞
(η+ (t) − η◦ (t))η◦ (t) log xt dt ≤ |η+ − η◦ |2 |η◦ (t) log xt|2
0
≤ |η+ − η◦ |2 (|η◦ |2 log x + |η◦ · log |2 )
274.86
≤ (0.80013 log x + 0.214)
H 7/2
≤ 1.944 · 10−6 · log x + 5.2 · 10−7 ,
where we are using (A.23) and evaluate |η◦ · log |2 rigorously as above. By (A.23) and
(A.24),
Z ∞ 2
274.86 27428
(η+ (t) − η◦ (t))2 log xt dt ≤ log x +
0 H 7/2 H7
≤ 5.903 · 10−12 · log x + 2.143 · 10−12 .
We conclude that
Z ∞
2
η+ (t) log xt dt
0 (9.94)
∗ −6 ∗ −7
= (0.640206 + O (1.95 · 10 )) log x − 0.021095 + O (5.3 · 10 )
198 CHAPTER 9. EXPLICIT FORMULAS
√
We add to this the error term 5.123 · 10−15 + 366.91/ x from (9.93), and simplify
using the assumption x ≥ 1012 . We obtain:
∞
X
2
Λ(n)(log n)η+ (n/x) = 0.640206x log x − 0.021095x
n=1 (9.95)
∗ −6
√
+O 2 · 10 x log x + 366.91 x log x ,
199
Chapter 10
Let X
Sη (α, x) = Λ(n)e(αn)η(n/x), (10.1)
n
where η1 , η2 , η3 : R → C. Once we know that this is neither zero nor very close to
zero, we will know that it is possible to write N as the sum of three primes n1 , n2 , n3
in at least one way; that is, we will have proven the ternary Goldbach conjecture.
As can be readily seen, (10.2) equals
Z
Sη1 (α, x)Sη2 (α, x)Sη3 (α, x)e(−N α) dα. (10.3)
R/Z
In the circle method, the set R/Z gets partitioned into the set of major arcs M and the
set of minor arcs m; the contribution of each of the two sets to the integral (10.3) is
evaluated separately.
Our objective here is to treat the major arcs: we wish to estimate
Z
Sη1 (α, x)Sη2 (α, x)Sη3 (α, x)e(−N α)dα (10.4)
M
201
202 CHAPTER 10. THE INTEGRAL OVER THE MAJOR ARCS
where q 0 = q/ gcd(q, a∞ −1
P
0 ). Now, χ mod q 0 χ(a a0 ) = 0 unless a = a0 (in which
−1 0
P
case χ mod q0 χ(a a0 ) = φ(q )). Thus, (10.7) equals
provided that (b, q) = 1. (We are evaluating a Ramanujan sum in the last step.) Hence,
for α = a/q + δ/x, q ≤ x, (a, q) = 1,
1 X X
τ (χ, a) χ∗ (n)Λ(n)e(δn/x)η(n/x)
φ(q) χ n
equals
X µ((q, n∞ ))
Λ(n)e(αn)η(n/x).
n
φ((q, n∞ ))
Since (a, q) = 1, τ (χ, a) = χ(a)τ (χ). The factor µ((q, n∞ ))/φ((q, n∞ )) equals 1
when (n, q) = 1; the absolute value of the factor is at most 1 for every n. Clearly
X n X X pα
Λ(n)η = log p η .
n
x x
p|q α≥1
(n,q)6=1
α
1 X δ X X p
Sη (α, x) = χ(a)τ (χ)Sη,χ∗ , x + O ∗ 2 log p η ,
φ(q) x x
χ mod q p|q α≥1
(10.8)
where X
Sη,χ (β, x) = Λ(n)χ(n)e(βn)η(n/x). (10.9)
n
· Sη1 ,χ∗1 (δ/x, x)Sη2 ,χ∗2 (δ/x, x)Sη3 ,χ∗3 (δ/x, x)e(−δN/x)
plus an error term of absolute value at most
3 Y
pα
X X X
2 |Sηj0 (α, x)| log p ηj . (10.11)
j=1 j 0 6=j
x
p|q α≥1
204 CHAPTER 10. THE INTEGRAL OVER THE MAJOR ARCS
We will later see that the integral of (10.11) over S 1 is negligible – for our choices of
ηj , it will, in fact, be of size O(x(log x)A ), A a constant. The error term O(x(log x)A )
should be compared to the main term, which will be of size about a constant times x2 .
In (10.10), we have reduced our problems to estimating Sη,χ (δ/x, x) for χ prim-
itive; a more obvious way of reaching the same goal would have made (10.11) worse
√
by a factor of about q.
10.2 The integral over the major arcs: the main term
We are to estimate the integral (10.4), where the major arcs Mδ0 ,r are defined as in
(10.5). We will use η1 = η2 = η+ , η3 (t) = η∗ (κt), where η+ and η∗ will be set later.
We can write
Z ∞
Sη,χ (δ/x, x) = Sη (δ/x, x) = η(t/x)e(δt/x)dt + O∗ (errη,χ (δ, x)) · x
0 (10.12)
= ηb(−δ) · x + O∗ (errη,χT (δ, x)) · x
for χ primitive and non-trivial. The estimation of the error terms err will come later;
let us focus on (a) obtaining the contribution of the main term, (b) using estimates on
the error terms efficiently.
The main term: three principal characters. The main contribution will be given by
the term in (10.10) with χ1 = χ2 = χ3 = χ0 , where χ0 is the principal character mod
q.
The sum τ (χ0 , n) is a Ramanujan sum; as is well-known (see, e.g., [IK04, (3.2)]),
X
τ (χ0 , n) = µ(q/d)d. (10.14)
d|(q,n)
This simplifies to µ(q/(q, n))φ((q, n)) for q square-free. The special case n = 1 gives
us that τ (χ0 ) = µ(q).
Thus, the term in (10.10) with χ1 = χ2 = χ3 = χ0 equals
e(−N a/q)
µ(q)3 Sη+ ,χ∗0 (δ/x, x)2 Sη∗ ,χ∗0 (δ/x, x)e(−δN/x), (10.15)
φ(q)3
where, of course, Sη,χ∗0 (α, x) = Sη (α, x) (since χ∗0 is the trivial character). Summing
(10.15) for α = a/q + δ/x and a going over all residues mod q coprime to q, we obtain
q
µ (q,N ) φ((q, N ))
µ(q)3 Sη+ ,χ∗0 (δ/x, x)2 Sη∗ ,χ∗0 (δ/x, x)e(−δN/x).
φ(q)3
10.2. THE INTEGRAL OVER THE MAJOR ARCS: THE MAIN TERM 205
The integral of (10.15) over all of M = Mδ0 ,r (see (10.5)) thus equals
0
Z 2qx δ r
X φ((q, N ))
2
µ(q) µ((q, N )) Sη2+ ,χ∗0 (α, x)Sη∗ ,χ∗0 (α, x)e(−αN )dα
φ(q)3 δ0 r
− 2qx
q≤r
q odd
0
Z qx δ r
X φ((q, N ))
2
+ µ(q) µ((q, N )) Sη2+ ,χ∗0 (α, x)Sη∗ ,χ∗0 (α, x)e(−αN )dα.
φ(q)3 δ0 r
− qx
q≤2r
q even
(10.16)
The main term in (10.16) is
0
Z 2qx δ r
X φ((q, N ))
x3 · µ(q)2
µ((q, N )) η+ (−αx))2 ηb∗ (−αx)e(−αN )dα
(c
φ(q)3 δ0 r
− 2qx
q≤r
q odd
0
Z qx δ r
X φ((q, N ))
3 2
+x · µ(q) µ((q, N )) η+ (−αx))2 ηb∗ (−αx)e(−αN )dα.
(c
φ(q)3 δ0 r
− qx
q≤2r
q even
(10.17)
We would like to complete both the sum and the integral. Before, we should say
that we will want to be able to use smoothing functions η+ whose Fourier transforms
are not easy to deal with directly. All we want to require is that there be a smoothing
function η◦ , easier to deal with, such that η◦ be close to η+ in `2 norm.
Assume, then, that
|η+ − η◦ |2 ≤ 0 |η◦ |,
(3)
where η◦ is thrice differentiable outside finitely many points and satisfies η◦ ∈ L1 .
Then (10.17) equals
0
Z 2qx δ r
X φ((q, N ))
x3 · µ(q)2
µ((q, N )) (ηb◦ (−αx))2 ηb∗ (−αx)e(−αN )dα
φ(q)3 δ0 r
− 2qx
q≤r
q odd
0
Z qx δ r
X φ((q, N ))
3 2
+x · µ(q) µ((q, N )) (ηb◦ (−αx))2 ηb∗ (−αx)e(−αN )dα.
φ(q)3 δ0 r
− qx
q≤2r
q even
(10.18)
plus
!
X µ(q)2 Z ∞
∗ 2 2 2
O x · |(c
η+ (−α)) − (ηb◦ (−α)) ||ηb∗ (−α)|dα . (10.19)
q
φ(q)2 −∞
206 CHAPTER 10. THE INTEGRAL OVER THE MAJOR ARCS
≤ |η∗ |1 · |c
η+ − ηb◦ |2 |c
η+ + ηb◦ |2 = |η∗ |1 · |η+ − η◦ |2 |η+ + η◦ |2
≤ |η∗ |1 · |η+ − η◦ |2 (2|η◦ |2 + |η+ − η◦ |2 ) = |η∗ |1 |η◦ |22 · (2 + 0 )0 .
X φ((q, N )) Y 1
Y
1
µ(q)2 µ((q, N )) = 1− · 1+ .
φ(q)3 (p − 1)2 (p − 1)3
q≥1 p|N p-N
1 This is obviously crude, in that we are bounding φ((q, N ))/φ(q) by 1. We are doing so in order to
(This is standard. One rigorous way to obtain (10.22) is to approximate the integral
over α ∈ (−∞, ∞) by an integral with a smooth weight, at different scales; as the scale
becomes broader, the Fourier transform of the weight approximates (as a distribution)
the δ function. Apply Plancherel.)
Hence, (10.17) equals
Z ∞ Z ∞
N
x2 · η◦ (t1 )η◦ (t2 )η∗ − (t1 + t2 ) dt1 dt2
0 0 x
Y 1
Y
1
(10.23)
· 1− · 1 + .
(p − 1)2 (p − 1)3
p|N p-N
(3)
|η◦ |21
4.31004|η◦ |22 + 0.00113 δ05
2.82643|η◦ |22 (2 + 0 ) · 0 + |η∗ |1 x2 (10.24)
r
Here (10.23) is just as in the classical case [IK04, (19.10)], except for the fact that
a factor of 1/2 has been replaced by a double integral. Later, in chapter 11, we will see
how to choose our smoothing functions (and x, in terms of N ) so as to make the double
integral as large as possible in comparison with the error terms. This is an important
optimization. (We already had a first discussion of this in the introduction; see (1.39)
and what follows.)
What remains to estimate is the contribution of all the terms of the form errη,χ (δ, x)
in (10.12) and (10.13). Let us first deal with another matter – bounding the `2 norm of
|Sη (α, x)|2 over the major arcs.
By (10.8),
2
X a δ
Sη + ,χ
q x
a mod q
gcd(a,q)=1
1 XX
0)
X
= τ (χ)τ (χ χ(a)χ0 (a) · Sη,χ∗ (δ/x, x)Sη,χ0∗ (δ/x, x)
φ(q)2 χ 0
χ a mod q
gcd(a,q)=1
√ √ 2
+ O∗ 2(1 + q)(log x)2 |η|∞ max |Sη (α, x)| + (1 + q)(log x)2 |η|∞
α
1 X
= |τ (χ)|2 |Sη,χ∗ (δ/x, x)|2 + Kq,1 (2|Sη (0, x)| + Kq,1 ),
φ(q) χ
where
√
Kq,1 = (1 + q)(log x)2 |η|∞ .
2
µ2 (q)
X a δ 2
Sη + ,x = η (−δ)x + O∗ (errη,χT (δ, x) · x)|
|b
q x φ(q)
a mod q
(a,q)=1
1 X 2 q
q ∗ · O∗ | errη,χ (δ, x)|2 x2 + Kq,1 (2|Sη (0, x)| + Kq,1 )
+ µ
φ(q) q∗
χ6=χT
2 2
µ (q)x
η (−δ)|2 + O∗ (|errη,χT (δ, x)(2|η|1 + errη,χT (δ, x))|)
= |b
φ(q)
+ O∗ max q ∗ | errη,χ∗ (δ, x)|2 x2 + Kq,2 x ,
χ6=χT
a δ0 r a δ0 r
X X Z q + 2qx X X Z q + qx
2 2
|Sη (α, x)| dα + |Sη (α, x)| dα
a δ0 r a δ0 r
q≤r a mod q q − 2qx q≤2r a mod q q − qx
q odd (a,q)=1 q even (a,q)=1
δ0 r δ0 r
X µ2 (q)x2 Z 2qx 2
X µ2 (q)x2 Z qx 2
= |b
η (−αx)| dα + |b
η (−αx)| dα
φ(q) δ0 r
− 2qx φ(q) δ0 r
− qx
q≤r q≤2r
q odd q even
!
X µ2 (q)x2 gcd(q, 2)δ0 r
∗
+O · ETη, δ0 r (2|η|1 + ETη, δ0 r )
q
φ(q) qx 2 2
X δ0 rx Kq,2
· O∗ ∗ 2
+ max q | errη,χ∗ (δ, x)| +
q χ mod q x
q≤r
χ6=χT
q odd
|δ|≤δ0 r/2q
X 2δ0 rx
Kq,2
· O∗ ∗ 2
+ max q | errη,χ∗ (δ, x)| + ,
q χ mod q x
q≤2r
χ6=χT
q even
|δ|≤δ0 r/q
(10.25)
where
and χT is the trivial character. If all we want is an upper bound, we can simply remark
that
0 δ r 0 δ r
X µ2 (q) Z 2qx 2
X µ2 (q) Z qx 2
x |b
η (−αx)| dα + x |b
η (−αx)| dα
φ(q) − 2qxδ0 r φ(q) − δqx0r
q≤r q≤2r
q odd q even
X µ (q) 2
X µ (q) 2
X µ2 (q)
2 2
≤ + |b
η | = 2|η| .
φ(q) φ(q) 2 2
φ(q)
q≤r q≤2r q≤r
q odd q even q odd
(3)
η◦ is thrice differentiable outside finitely many points, (c) η◦ ∈ L1 . Clearly,
δ0 r
X µ2 (q) Z 2qx 2
x |b
η (−αx)| dα
φ(q) − 2qxδ0 r
q≤r
q odd
δ0 r !
X µ2 (q) Z 2q
2 2
≤ |ηb◦ (−α)| dα + 2h|ηb◦ | , |b
η − ηb◦ |i + |b
η− ηb◦ |2
φ(q) −
δ0 r
q≤r 2q
q odd
0 δ r
X µ2 (q) Z 2q 2
= |ηb◦ (−α)| dα
φ(q) − δ2q0r
q≤r
q odd
1 2
+ O∗ log r + 0.85 2 |η◦ |2 |η − η◦ |2 + |η◦ − η|2 ,
2
where we are using (C.11) and isometry. Also,
δ0 r δ0 r
X µ2 (q) Z qx 2
X µ2 (q) Z 2qx 2
|b
η (−αx)| dα = |b
η (−αx)| dα.
φ(q) − δqx0r φ(q) − 2qx
δ0 r
q≤2r q≤r
q even q odd
and
X µ2 (q)
2
Lr,δ0 = 2|η◦ |22 + O∗ (log r + 1.7) · 2 |η◦ |2 |η − η◦ |2 + |η◦ − η|2
φ(q)
q≤r
q odd
!
(3)
2|η◦ |21
∗ log r 0.425
+O · 0.64787 + + .
5π 6 δ05 4r r
(10.29)
Here, as elsewhere, χ∗ denotes the primitive character inducing χ, whereas q ∗ denotes
the modulus of χ∗ .
The error term xrETη,δ0 r will be very small, since it will be estimated using the
Riemann zeta function; the error term involving Kr,2 will be completely negligible.
2
The term involving xr(r +1)Eη,r,δ 0
; we see that it constrains us to have | errη,χ (x, N )|
less than a constant times 1/r if we do not want the main term in the bound (10.26) to
be overwhelmed.
212 CHAPTER 10. THE INTEGRAL OVER THE MAJOR ARCS
There are at least two ways we can evaluate (10.4). One is to substitute (10.10) into
(10.4). The disadvantages here are that (a) this can give rise to pages-long formulae, (b)
this gives error terms proportional to xr| errη,χ (x, N )|, meaning that, to win, we would
have to show that | errη,χ (x, N )| is much smaller than 1/r. What we will do instead is
to use our `2 estimate (10.26) in order
√ to bound the contribution of non-principal terms.
This will give us a gain of almost r on the error terms; in other words,√ to win, it will
be enough to show later that | errη,χ (x, N )| is much smaller than 1/ r.
The contribution of the error terms in Sη3 (α, x) (that is, all terms involving the
quantities errη,χ in expressions (10.12) and (10.13)) to (10.4) is
X 1 X X
τ (χ3 ) χ3 (a)e(−N a/q)
φ(q)
q≤r χ3 mod q a mod q
q odd (a,q)=1
Z δ0 r
2qx
Sη+ (α + a/q, x)2 errη∗ ,χ∗3 (αx, x)e(−N α)dα
δ0 r
− 2qx
(10.30)
X 1 X X
+ τ (χ3 ) χ3 (a)e(−N a/q)
φ(q)
q≤2r χ3 mod q a mod q
q even (a,q)=1
Z δ0 r
qx
Sη+ (α + a/q, x)2 errη∗ ,χ∗3 (αx, x)e(−N α)dα.
δ0 r
− qx
We should also remember the terms in (10.11); we can integrate them over all of R/Z,
and obtain that they contribute at most
3 Y
pα
Z X X X
2 |Sηj0 (α, x)| · max log p ηj dα
R/Z j=1 j 0 6=j
q≤r x
p|q α≥1
3 Y
pα
X X X
≤2 |Sηj0 (α, x)|2 · max log p ηj
j=1 j 0 6=j
q≤r x
p|q α≥1
X X pα
=2 Λ2 (n)η+ 2
(n/x) · log r · max η∗
n
p≤r x
α≥1
X pα
s
X X
+4 Λ2 (n)η+ 2 (n/x) · Λ2 (n)η∗2 (n/x) · log r · max η∗
n n
p≤r x
α≥1
equals
(3)
|η◦ |21
4.31004|η◦ |22 + 0.0012 δ05
C0 Cη◦ ,η∗ x2 + 2.82643|η◦ |22 (2 + 0 ) · 0 + |η∗ |1 x2
r
q
+O∗ (Eη∗ ,r,δ0 Aη+ + Eη+ ,r,δ0 · 1.6812( Aη+ + 1.6812|η+ |2 )|η∗ |2 ) · x2
q
+O∗ 2Zη+2 ,2 (x)LSη∗ (x, r) · x + 4 Zη+2 ,2 (x)Zη∗2 ,2 (x)LSη+ (x, r) · x ,
(10.36)
where
Y 1
Y
1
C0 = 1− · 1+ ,
(p − 1)2 (p − 1)3
p|N p-N
Z ∞ (10.37)
Z ∞
N
Cη◦ ,η∗ = η◦ (t1 )η◦ (t2 )η∗ − (t1 + t2 ) dt1 dt2 ,
0 0 x
√
Eη,r,δ0 = max q ∗ · | errη,χ∗ (δ, x)|, ETη,s = max | errη,χT (δ, x)|,
χ mod q |δ|≤s/q
q≤gcd(q,2)·r
|δ|≤gcd(q,2)δ0 r/2q
1
Z
2 X µ2 (q)
Aη = Sη+ (α, x) dα, Lη,r,δ0 ≤ 2|η|22 ,
x M φ(q)
q≤r
q odd
√ √
Kr,2 = (1 + 2r)(log x)2 |η|∞ (2Zη,1 (x)/x + (1 + 2r)(log x)2 |η|∞ /x),
1X k X pα
Zη,k (x) = Λ (n)η(n/x), LSη (x, r) = log r · max η ,
x n p≤r x
α≥1
(10.38)
and errη,χ is as in (10.12) and (10.13).
Here is how to read these expressions. The error term in the first line of (10.36)
will be small provided that 0 is small and r is large. The third line of (10.36) will
be negligible, as will be the term 2δ0 r(log er)Kr,2 in the definition of Aη . (Clearly,
Zη,k (x) η (log x)k−1 and LSη (x, q) η τ (q) log x for any η of rapid decay.)
10.4. THE INTEGRAL OVER THE MAJOR ARCS: CONCLUSION 215
One of our goals is to maximize the quantity Cη◦ ,η∗ in (10.37) relative to |η◦ |22 |η∗ |1 .
One way to do this is to ensure that (a) η∗ is concentrated on a very short1 interval [0, ),
(b) η◦ is supported on the interval [0, 2], and is symmetric around t = 1, meaning that
η◦ (t) ∼ η◦ (2 − t). Then, for x ∼ N/2, the integral
Z ∞ Z ∞
N
η◦ (t1 )η◦ (t2 )η∗ − (t1 + t2 ) dt1 dt2
0 0 x
provided that η0 (t) ≥ 0 for all t. It is easy to check (using Cauchy-Schwarz in the
second step) that this is essentially optimal. (We will redo this rigorously in a little
while.)
At the same time, the fact is that major-arc estimates are best for smoothing func-
tions η of a particular form, and we have minor-arc estimates from Part I for a different
specific smoothing η2 . The issue, then, is how do we choose η◦ and η∗ as above so that
217
218 CHAPTER 11. OPTIMIZING AND ADAPTING SMOOTHING FUNCTIONS
We define η◦ : R → R by
( 2
t3 (2 − t)3 e−(t−1) /2 if t ∈ [0, 2],
η◦ (t) = h(t)η♥ (t) = (11.3)
0 otherwise.
2 hw−v,vi
s
|w|2 2hw − v, vi + |w − v|22 |v|22
+ δ2
(2δ + δ 2 )2
∗
= 1+ =1+ +O
|v|2 |v|22 2 23/2
1 (5/2)2 |w − v|22
hw − v, vi
= 1 + δ + O∗ + 3/2 δ2 = 1 + + O ∗
2.71 .
2 2 |v|22 |v|22
11.2. THE SMOOTHING FUNCTION η∗ : ADAPTING MINOR-ARC BOUNDS219
where I[1/2,1] (t) is 1 if t ∈ [1/2, 1] and 0 otherwise. For major-arc estimates, we will
use a function based on
2
η♥ = e−t /2 .
2
We will actually use here the function t2 e−t /2 , whose Mellin transform is M η♥ (s+2)
(by, e.g., [BBO10, Table 11.1]).)
We will follow the simple expedient of convolving the two smoothing functions,
one good for minor arcs, the other one for major arcs. In general, let ϕ1 , ϕ2 : [0, ∞) →
C. It is easy to use bounds on sums of the form
X
Sf,ϕ1 (x) = f (n)ϕ1 (n/x) (11.8)
n
where η1 = 2 · I[1/2,1] .
Let us restate the bounds from Theorem 3.1.1 – the main result of Part I. We will
use Lemma C.2.2 to bound terms of the form q/φ(q).
Let x ≥ x0 , x0 = 2.16 · 1020 . Let 2α = a/q + δ/x, q ≤ Q, gcd(a, q) = 1,
|δ/x| ≤ 1/qQ, where Q = (3/4)x2/3 . Then, if 3 ≤ q ≤ x1/3 /6, Theorem 3.1.1 gives
us that
|δ|
|Sη2 (α, x)| ≤ gx max 1, · q x, (11.11)
8
where
p
(Rx,2r log 2r + 0.5) z(r) + 2.5 L2r
gx (r) = √ + + 3.36x−1/6 , (11.12)
2r r
11.2. THE SMOOTHING FUNCTION η∗ : ADAPTING MINOR-ARC BOUNDS221
with !
log 4t
Rx,t = 0.27125 log 1 + 9x1/3
+ 0.41415
2 log 2.004t
(11.13)
13
Lt = z(t/2) log t + 7.82 + 13.66 log t + 37.55,
4
If q > x1/3 /6, then, again by Theorem 3.1.1,
where
h(x) = 0.276x−1/6 (log x)3/2 + 1234x−1/3 log x. (11.15)
We will work with x varying within a range, and so we must pay some attention
to the dependence of (11.11) and (11.14) on x. Let us prove two auxiliary lemmas on
this.
Lemma 11.2.1. Let gx (r) be as in (11.12) and h(x) as in (11.15). Then
(
h(x) if x < (6r)3
x 7→
gx (r) if x ≥ (6r)3
Hence
Rxr ,2r log 2r + 0.5 ≤ 0.27215 log log xr log x1/3
r − 0.27215 log 12.5 log 3 + 0.5
≤ 0.09072 log log xr log xr − 0.255.
for r ≥ 37, and we also get z(r) ≤ eγ log log xr for r ∈ [11, 37] by the bisection
method with 10 iterations. Hence
p
(Rxr ,2r log 2r + 0.5) z(r) + 2.5
p
≤ (0.09072 log log xr log xr − 0.255) eγ log log xr + 2.5
≤ 0.1211 log xr (log log xr )3/2 + 2,
222 CHAPTER 11. OPTIMIZING AND ADAPTING SMOOTHING FUNCTIONS
and so
p
(Rxr ,2r log 2r + 0.5) z(r) + 2.5
√ ≤ (0.21 log xr (log log xr )3/2 + 3.47)x−1/6
r .
2r
Now, by (11.16),
13
L2r ≤ eγ log log xr · log(x1/3
r /3) + 7.82 + 13.66 log(x1/3
r /3) + 37.55
4
13
≤ eγ log log xr · xr + 4.25 + 4.56 log xr + 22.55.
12
It is clear that
4.25eγ log log xr + 4.56 log xr + 22.55
1/3
< 1234x−1/3
r log xr .
xr /6
for xr ≥ e: we make the comparison for xr = e and take the derivative of the ratio of
the left side by the right side.
It remains to show that
13
0.21 log xr (log log xr )3/2 + 3.47 + 3.36 + eγ x−1/3
r log xr log log xr (11.17)
2
is less than 0.276(log xr )3/2 for xr large enough. Since t 7→ (log t)3/2 /t1/2 is de-
creasing for t > e3 , we see that
−1/3
0.21 log xr (log log xr )3/2 + 6.83 + 13 γ
2 e xr log xr log log xr
<1
0.276(log xr )3/2
3
for all xr ≥ e33 , simply because it is true for x = e33 , which is greater than ee .
1/3
We conclude that h(xr ) ≥ gxr (r) = gxr (xr /6) for xr ≥ e33 . We check that
1/3
h(xr ) ≥ gxr (xr /6) for log xr ∈ [log 663 , 33] as well by the bisection method
(applied with 30 iterations, with log xr as the variable, on the intervals [log 663 , 20],
[20, 25], [25, 30] and [30, 33]). Since r ≥ 11 implies xr ≥ 663 , we are done.
Lemma 11.2.2. Let Rx,r be as in (11.12). Then t → Ret ,r (r) is convex-up for t ≥
3 log 6r.
Proof. Since t → e−t/6 and t → t are clearly convex-up, all we have to do is to show
that t → Ret ,r is convex-up. In general, since
0 0
00 f f 00 f − (f 0 )2
(log f ) = = ,
f f2
a function of the form (log f ) is convex-up exactly when f 00 f − (f 0 )2 ≥ 0. If f (t) =
1 + a/(t − b), we have f 00 f − (f 0 )2 ≥ 0 whenever
(t + a − b) · (2a) ≥ a2 ,
i.e., a2 + 2at ≥ 2ab, and that certainly happens when t ≥ b. In our case, b =
3 log(2.004r/9), and so t ≥ 3 log 6r implies t ≥ b.
11.2. THE SMOOTHING FUNCTION η∗ : ADAPTING MINOR-ARC BOUNDS223
Now we come to the point where we prove bounds on exponential sums of the form
Sη∗ (α, x) (that is, sums based on the smoothing η∗ ) based on our bounds (11.11) and
(11.14) on the exponential sums Sη2 (α, x). This is straightforward, as promised.
Proposition 11.2.3. Let x ≥ Kx0 , x0 = 2.16 · 1020 , K ≥ 1. Let Sη (α, x) be as
in (10.1). Let η∗ = η2 ∗M ϕ, where η2 is as in (11.10) and ϕ : [0, ∞) → [0, ∞) is
continuous and in L1 .
Let 2α = a/q + δ/x, q ≤ Q, gcd(a, q) = 1, |δ/x| ≤ 1/qQ, where Q = (3/4)x2/3 .
If q ≤ (x/K)1/3 /6, then
|δ|
Sη∗ (α, x) ≤ gx,ϕ max 1, q · |ϕ|1 x, (11.18)
8
where
p
(Rx,K,ϕ,2r log 2r + 0.5) z(r) + 2.5 L2r
gx,ϕ (r) = √ + + 3.36K 1/6 x−1/6 ,
2r r
Cϕ,2,K /|ϕ|1
Rx,K,ϕ,t = Rx,t + (Rx/K,t − Rx,t )
log K
(11.19)
with Rx,t and Lt are as in (11.13), and
Z 1
Cϕ,2,K = − ϕ(w) log w dw. (11.20)
1/K
where
hϕ (x) = h(x) + Cϕ,0,K /|ϕ|1 ,
Z 1/K (11.21)
Cϕ,0,K = 1.04488 |ϕ(w)|dw
0
We bound the first integral by the trivial estimate |Sη2 (α, wx)| ≤ |Sη2 (0, wx)| and
Cor. C.1.3:
Z 1/K Z 1/K
dw dw
|Sη2 (0, wx)|ϕ(x) ≤ 1.04488 wxϕ(w)
0 w 0 w
Z 1/K
= 1.04488x · ϕ(w)dw.
0
224 CHAPTER 11. OPTIMIZING AND ADAPTING SMOOTHING FUNCTIONS
If w ≥ 1/K, then wx ≥ x0 , and we can use (11.11) or (11.14). If q > (x/K)1/3 /6,
then |Sη2 (α, wx)| ≤ h(x/K)wx by (11.14); moreover, |Sη2 (α, y)| ≤ h(y)y for
x/K ≤ y < (6q)3 (by (11.14)) and |Sη2 (α, y)| ≤ gy,1 (r) for y ≥ (6q)3 (by (11.11)).
Thus, Lemma 11.2.1 gives us that
Z ∞ Z ∞
dw dw
|Sη2 (α, wx)|ϕ(w) ≤ h(x/K)wx · ϕ(w)
1/K w 1/K w
Z ∞
= h(x/K)x ϕ(w)dw ≤ h(x/K)|ϕ|1 · x.
1/K
If q ≤ (x/K)1/3 /6, we always use (11.11). We can use the coarse bound
Z ∞
dw
3.36x−1/6 · wx · ϕ(w) ≤ 3.36K 1/6 |ϕ|1 x5/6
1/K w
Since Lr does not depend on x,
Z ∞
Lr dw Lr
· wx · ϕ(w) ≤ |ϕ|1 x.
1/K r w r
Therefore
Z ∞
dw
Rwx,t · wx · ϕ(w)
1/K w
Z 1 Z ∞
log w log w
≤ 1 Rx/K,t + 1 − 1 Rx,t xϕ(w)dw + Rx,t ϕ(w)xdw
1/K log K log K 1
Z ∞ Z 1
x
≤ Rx,t x · ϕ(w)dw + (Rx/K,t − Rx,t ) ϕ(w) log wdw
1/K log K 1/K
Cϕ,2
≤ Rx,t |ϕ|1 + (Rx/K,t − Rx,t ) · x,
log K
where Z 1
Cϕ,2,K = − ϕ(w) log w dw.
1/K
is decreasing on r for r ≥ 670. Taking logarithms, and then derivatives, we see that we
have to show that
1 log 8r
r `+ r
2`2 1 1 1
+ + < ,
1+ log 8r
log 1 + log 8r r log 2r 2r log r log log r 2r
2` 2`
9y 1/3
where ` = log We multiply by 2r, and see that this is equivalent to
4.008r .
1 1
` 2 − 1+ log 8r 2 1
2`
+ + < 1. (11.24)
log 8r
log 1 + 2` log 2r log r log log r
Lemma 11.2.5. Let x ≥ 1025 . Let φ : [0, ∞) → [0, ∞) be continuous and in L1 . Let
gx,φ (r) and h(x) be as in (11.19) and (11.15), respectively. Then
3 4/15
gx,φ x ≥ h(2x/ log x).
8
226 CHAPTER 11. OPTIMIZING AND ADAPTING SMOOTHING FUNCTIONS
Recall that
0.276(log x)3/2 1234 log x
h(x) = 1/6
+ .
x x1/3
We can see that
(log x + 3.3)/x2/15
x 7→ (11.27)
(log(2x/ log x))3/2 /(2x/ log x)1/6
is increasing for x ≥ 1025 (and indeed for x ≥ e27 ) by taking the logarithm of the
right side of (11.27) and then taking its derivative with respect to t = log x. We can
see in the same way that (1/x2/15 )/(log(2x/ log x)/(2x/ log x)1/3 ) is increasing for
x ≥ e22 . Since
Our aim here is to give a bound on the `2 norm of an exponential sum over the minor
arcs. While we care about an exponential
P sum in particular, we will prove a result valid
for all exponential sums S(α, x) = n an e(αn) with an of prime support.
We start by adapting ideas from Ramaré’s version of the large sieve for primes to
estimate `2 norms over parts of the circle (§12.1). We are left with the task of giving
an explicit bound on the factor in Ramaré’s work; this we do in §12.2. As a side effect,
this finally gives a fully explicit large sieve for primes that is asymptotically optimal,
meaning a sieve that does not have a spurious factor of eγ in front; this was an arguably
important gap in the literature.
|S|∞ |S|22R, we can use the fact that large (“major”) values of S(α) have to be multiplied
only by M |S(α)|2 dα, where M R is a union (small in measure) of R major arcs. Now,
can we give an upper bound for M |S(α)|2 dα better than |S|22 = R/Z |S(α)|2 dα?
The first version of [Helb] gave an estimate on that integral using a technique due to
Heath-Brown, which in turn rests on an inequality of Montgomery’s ([Mon71, (3.9)];
see also, e.g., [IK04, Lem. 7.15]). The technique was communicated by Heath-Brown
to the present author, who communicated it to Tao, who used it in his own notable work
on sums of five primes (see [Tao14, Lem. 4.6] and adjoining comments). We will be
able to do better than that estimate here.
The role played by Montgomery’s inequality in Heath-Brown’s method is played
here by a result of Ramaré’s ([Ram09, Thm. 2.1]; see also [Ram09, Thm. 5.2]). The
following proposition is based on Ramaré’s result, or rather on one possible proof of
it. Instead of using the result as stated in [Ram09], we will actually be using elements
of the proof of [Bom74, Thm. 7A], credited to Selberg. Simply integrating Ramaré’s
inequality would give a non-trivial if slightly worse bound.
227
228 CHAPTER 12. THE `2 NORM AND THE LARGE SIEVE
where
X µ2 (r)
Gq (R) = . (12.2)
φ(r)
r≤R
(r,q)=1
Proof. By (12.1),
δ0 Q0
Z X Z 2
2
qx X a
|S(α)| dα = S +α dα. (12.3)
M −
δ0 Q0 q
q≤Q0 qx a mod q
(a,q)=1
√ √
for every q ≤ x, where we use the assumption that n is prime and > x (and thus
coprime to q) when an 6= 0. Hence
Z Z δ0 Q0 2
2
X X
∗
qx 1 X
|S(α)| dα = q an e(αn)χ(n) dα
M −
δ0 Q0 φ(q) n
q≤Q0 q ∗ |q qx
(q ∗ ,q/q ∗ )=1
µ2 (q/q ∗ )=1
δ0 Q0 2
q∗ µ2 (r)
Z
X X q ∗ rx X∗ X
= an e(αn)χ(n) dα
φ(q ∗ ) φ(r) δ Q
− q0∗ rx0 χ mod q ∗
q ∗ ≤Q0 r≤Q0 /q ∗ n
(r,q ∗ )=1
δ0 Q0 2
q∗ µ2 (r)
Z
X q∗ x X X∗ X
= an e(αn)χ(n) dα
φ(q ∗ ) −
δ0 Q0 φ(r)
q ∗ ≤Q0 q∗ x Q
r≤ q∗0 min(1, |α|x
0
) δ χ mod q ∗ n
(r,q ∗ )=1
12.1. VARIATIONS ON THE LARGE SIEVE FOR PRIMES 229
where
δ0 Q0 2
q∗ µ2 (r)
Z
X q∗ x X X∗ X
Σ= an e(αn)χ(n) dα
φ(q ∗ ) −
δ0 Q0 φ(r)
q ∗ ≤Q0 q∗ x
r≤ qQ∗ min(
δ0
1, |α|x ) χ mod q ∗ n
(r,q ∗ )=1
0 δ Q 2
X q X µ2 (r) Z qrx X∗ X
≤ an e(αn)χ(n) dα.
φ(q) φ(r) − δqrx0Q
n
q≤Q r≤Q/q χ mod q
(r,q)=1
for χ primitive of modulus q. Here cr (n) stands for the Ramanujan sum
X
cr (n) = e2πnu/r .
u mod r
(u,r)=1
√
√coprime to r, cr (n) = µ(r). Since χ is primitive, |τ (χ)| =
For n q. Hence, for
r ≤ x coprime to q,
2
2 qr
X X b
q an e(αn)χ(n) = χ(b)S +α .
n
qr
b=1
(b,qr)=1
Thus,
2
0 δ Q qr
X X µ2 (r) Z qrx X∗ X
b
Σ= χ(b)S +α dα
φ(rq) − δqrx
0Q qr
q≤Q r≤Q/q χ mod q b=1
(r,q)=1 (b,qr)=1
2
Z δ0 Q q
X 1 qx X X b
≤ χ(b)S +α dα
φ(q) δ0 Q
− qx q
q≤Q χ mod q b=1
(b,q)=1
δ0 Q q
XZ 2
qx X b
= S +α dα.
−
δ0 Q q
q≤Q qx b=1
(b,q)=1
230 CHAPTER 12. THE `2 NORM AND THE LARGE SIEVE
where
X µ2 (r)
Gq (R) = . (12.5)
φ(r)
r≤R
(r,q)=1
Proof. By (10.5),
δ0 Q0
Z X Z 2
2
2qx X a
|S(α)| dα = S +α dα
M −
δ0 Q0 q
q≤Q0 2qx a mod q
q odd (a,q)=1
δ0 Q0
X Z 2
qx X a
+ S +α dα.
−
δ0 Q0 q
q≤Q0 qx a mod q
q even (a,q)=1
R 2
We proceed as in the proof of Prop. 12.1.1. We still have (12.3). Hence M
|S(α)| dα
equals
δ0 Q0 2
q∗ µ2 (r)
Z
X 2q ∗ x X X∗ X
an e(αn)χ(n) dα
φ(q ∗ ) δ Q
0 0
− 2q φ(r)
q ∗ ≤Q0 ∗x Q
r≤ q∗0 min(1, 2|α|x
0
) δ χ mod q ∗ n
q ∗ odd
(r,2q ∗ )=1
δ0 Q0 2
q∗ µ2 (r)
Z
X q∗ x X X∗ X
+ an e(αn)χ(n) dα.
φ(q ∗ ) −
δ0 Q0 φ(r)
q ∗ ≤2Q0 q∗ x
r≤
2Q0
min(1, 2|α|x
0 δ
) χ mod q ∗ n
q∗
q ∗ even
(r,q ∗ )=1
12.1. VARIATIONS ON THE LARGE SIEVE FOR PRIMES 231
(The sum with q odd and r even is equal to the first sum; hence the factor of 2 in front.)
Therefore,
Z ∗
2 G2q∗ (Q0 /sq )
|S(α)| dα ≤ max max · 2Σ1
∗ ≤Q s≤Q /q ∗ G ∗
M q 0 0 2q ∗ (Q/sq )
q ∗ odd
(12.6)
Gq∗ (2Q0 /sq ∗ )
+ ∗max max · Σ2 ,
q ≤2Q0 s≤2Q0 /q ∗ Gq∗ (2Q/sq ∗ )
q ∗ even
where
δ0 Q 2
µ2 (r)
Z
X q X 2qrx X∗ X
Σ1 = an e(αn)χ(n) dα
φ(q) φ(r) δ Q
0
− 2qrx n
q≤Q r≤Q/q χ mod q
q odd (r,2q)=1
0 δ Q 2
X q X µ2 (r) Z qrx X∗ X
= an e(αn)χ(n) dα.
φ(q) φ(r) − δqrx0Q
n
q≤Q r≤2Q/q χ mod q
q odd (r,q)=1
r even
0 δ Q 2
X q X µ2 (r) Z qrx X∗ X
Σ2 = an e(αn)χ(n) dα.
φ(q) φ(r) − δqrx0Q
n
q≤2Q r≤2Q/q χ mod q
q even (r,q)=1
Let us now check that the intervals of integration (b/q − δ0 Q/2qx, b/q + δ0 Q/2qx)
(for q odd), (b/q − δ0 Q/qx, b/q + δ0 Q/qx) (for q even) do not overlap. Recall that
δ0 Q/qx = 1/2qQ. The absolute value of the difference between two distinct fractions
b/q, b0 /q 0 is at least 1/qq 0 . For q, q 0 ≤ Q odd, this is larger than 1/4qQ + 1/4Qq 0 ,
and so the intervals do not overlap. For q ≤ Q odd and q 0 ≤ 2Q even (or vice versa),
1/qq 0 ≥ 1/4qQ + 1/2Qq 0 , and so, again the intervals do not overlap. If q ≤ Q
and q 0 ≤ Q are both even, then |b/q − b0 /q 0 | is actually ≥ 2/qq 0 . Clearly, 2/qq 0 ≥
1/2qQ + 1/2Qq 0 , and so again there is no overlap. We conclude that
Z X
2
2Σ1 + Σ2 ≤ |S(α)| = |an |2 .
R/Z n
232 CHAPTER 12. THE `2 NORM AND THE LARGE SIEVE
and
X log p
cE = γ + = 1.3325822 . . . (12.14)
p(p − 1)
p≥2
by [RS62, (2.11)].
If R ≥ 182, then
where the upper bound is valid for R ≥ 120. This is true by (12.12) for R ≥ 4 · 107 ;
we check (12.15) for 120 ≤ R ≤ 4 · 107 by a numerical computation.1 Similarly, for
R ≥ 200,
log R + 1.661 log R + 1.698
≤ G2 (R) ≤ (12.16)
2 2
by (12.12) for R ≥ 1.6 · 108 , and by a numerical computation for 200 ≤ R ≤ 1.6 · 108 .
Write ρ = (log Q0 )/(log Q) ≤ 1. We obtain immediately from (12.15) and (12.16)
that
G(Q0 ) log Q0 + 1.354
≤
G(Q) log Q + 1.312
(12.17)
G2 (Q0 ) log Q0 + 1.698
≤
G2 (Q) log Q + 1.661
for Q, Q0 ≥ 200. What is hard is to approximate Gq (Q0 )/Gq (Q) for q large and Q0
small.
Let us start by giving an easy bound, off from the truth by a factor of about eγ .
(Specialists will recognize this as a factor that appears often in first attempts at esti-
mates based on either large or small sieves.) First, we need a simple explicit lemma.
Lemma 12.2.1. Let m ≥ 1, q ≥ 1. Then
Y p
≤ eγ (log(m + log q) + 0.65771). (12.18)
p−1
p|q∨p≤m
Proof. Let P =
Q
p≤m∨p|q p. Then, by [RS75, (5.1)],
Y P
P≤q p = qe p≤m log p
≤ qe(1+0 )m ,
p≤m
[Lam08] ideas.
234 CHAPTER 12. THE `2 NORM AND THE LARGE SIEVE
P 2.50637
≤ eγ log((1 + 0 )m + log q) +
φ(P) log(m + log q)
2.50637/eγ
γ
≤ e log(m + log q) + 0 + .
log(m + log q)
Thus (12.18) holds when m + log q ≥ 8.53, since then 0 + (2.50637/eγ )/ log(m +
log q) ≤ 0.65771. We verify all choices of m, q ≥ 1 with m + log q ≤ 8.53 compu-
tationally; the worst case is that of m = 1, q = 6, which give the value 0.65771 in
(12.18).
Here is the promised easy bound.
Lemma 12.2.2. Let Q0 ≥ 1, Q ≥ 182Q0 . Let q ≤ Q0 , s ≤ Q0 /q, q an integer. Then
γ Q0
Gq (Q0 /sq) e log sq + log q + 1.172 eγ log Q0 + 1.172
≤ ≤ .
Gq (Q/sq) log QQ0 + 1.312 log QQ0 + 1.312
and so
Gq (Q0 /sq) 1
≤ . (12.19)
Gq (Q/sq) GP (Q/Q0 )
Now the lower bound in (12.11) gives us that, for d = P, R = Q/Q0 ,
φ(P)
GP (Q/Q0 ) ≥ G(Q/Q0 ).
P
By Lem. 12.2.1,
P
Q0
≤ eγ log + log q + 0.658 .
φ(P) sq
Hence, using (12.15), we get that
Gq (Q0 /sq) P/φ(P) eγ log Qsq + log q + 1.172
0
≤ ≤ , (12.20)
Gq (Q/sq) G(Q/Q0 ) log QQ0 + 1.312
Lemma 12.2.2 will play a crucial role in reducing to a finite computation the prob-
lem of bounding Gq (Q0 /sq)/Gq (Q/sq). As we will now see, we can use Lemma
12.2.2 to obtain a bound that is useful when sq is large compared to Q0 – precisely the
case in which asymptotic estimates such as (12.12) are relatively weak.
Lemma 12.2.3. Let Q0 ≥ 1, Q ≥ 200Q0 . Let q ≤ Q0 , s ≤ Q0 /q. Let ρ =
(log Q0 )/ log Q ≤ 2/3. Then, for any σ ≥ 1.312ρ,
(log Q0 + σ) log Q0
log Q0 + σ −
log Q + 1.312
1.312ρ(log Q0 + σ)
= (1 − ρ)(log Q0 + σ) +
log Q + 1.312
≥ (1 − ρ)(log Q0 + σ) + 1.312ρ2
Define
log Q0,min + c+ c+ − ρcE
ω(ρ) = 1 =ρ+ 1 .
ρ log Q0,min + cE ρ log Q0,min + cE
Then ρ ≤ (log Q0 + c+ )/(log Q + cE ) ≤ ω(ρ) (because c+ ≥ ρcE ). We conclude that
(12.25) (and hence (12.23)) holds provided that
X log p
(1 − ω(ρ)) log sq − + c∆
p
p|q (12.26)
q
≥ errq, Q0 +ω(ρ) max 0, − errq, Q ,
φ(q) sq sq
12.2. BOUNDING THE QUOTIENT IN THE LARGE SIEVE FOR PRIMES 237
We will thus be able to assume from now on that (12.27) does not hold, or, what is the
same, that
1
sq < (cρ,2 q) 1−ω(ρ) (12.28)
holds, where cρ,2 = exp((1.4709 − cE ) + ω(ρ)(cE − 1.312) − c∆ ).
What values of R = Q0 /sq must we consider for q given? First, by (12.28), we
can assume R > Q0,min /(cρ,2 q)1/(1−ω(ρ)) . We can also assume
−γ
R > c(c+ ) · max(Rq, Q0,min )(1−ρ)e − log q (12.29)
for c(c+ ) is as in Lemma 12.2.3, since all smaller R are covered by that Lemma.
Clearly, (12.29) implies that
log q
R1−τ > c(c+ ) · q τ − > c(c+ )q τ − log q,
Rτ
(1−ρ)e−γ
where τ = (1−ρ)e−γ , and also that R > c(c+ )Q0,min −log q. Iterating, we obtain
that we can assume that R > $(q), where
!
Q0,min
$(q) = max $0 (q), c(c+ )Qτ0,min − log q, 1 (12.30)
(cρ,2 q) 1−ω(ρ)
and
1
1−τ
log q
τ
c(c+ )q − if c(c+ )q τ > log q + 1,
τ
$0 (q) = (c(c+ )q τ −log q) 1−τ
0 otherwise.
238 CHAPTER 12. THE `2 NORM AND THE LARGE SIEVE
Looking at (12.26), we see that it will be enough to show that, for all R satisfying
R > $(q), we have
φ(q)
errq,R +ω(ρ) max (0, − errq,tR ) ≤ κ(q) (12.31)
q
for all t ≥ 20000, where
X log p
κ(q) = (1 − ω(ρ)) log q − + c∆ .
p
p|q
and
Q 3
Y p 7.284(1 + βρ ) p<p0 f1 (p)
λ(q) ≤ · . (12.35)
p<p0
p − 1 (1 − ω(ρ)) log q − P log p
+ c
p<p0 p ∆
and
3
Q
Y p 7.284(1 + βρ ) p<p0 ,p6=7 f1 (p)
λ(q) ≤ · (12.37)
p<p0
p − 1 (1 − ω(ρ)) log q − P log p
+ c
p<p0 ,p6=7 p ∆
p6=7
Q
for q < p≤p0 . (We are taking out 7 because it is the “least helpful” prime to omit
among all primes from 2 to 7, again by the fact that (p/(p − 1)) · f1 (p) and p →
(log p)/p are decreasing functions for p ≥ 3.)
We know how to give upper bounds for the expression on the right of (12.35).
The task is in essence simple: we can base our bounds on the classic explicit work in
[RS62], except that we also have to optimize matters so that they are close to tight for
p1 = 29, p1 = 31 and other low p1 .
By [RS62, (3.30)] and a numerical computation for 29 ≤ p1 ≤ 43,
Y p
< 1.90516 log p1
p−1
p≤p1
for p1 ≥ 29. Since ω(ρ) is increasing on ρ and we are assuming ρ ≤ 0.6, Q0,min =
100000,
ω(ρ) ≤ 0.627312, βρ ≤ 0.023111.
For x > a, where a > 1 is any constant, we obviously have
X X p−2/3
log 1 + p−2/3 ≤ (log p) .
log a
a<p≤x a<p≤x
P
by Abel summation (13.3) and the estimate [RS62, (3.32)] for θ(x) = p≤x log p,
Z x
X
−2/3 − 32 2 −5
(log p)p = (θ(x) − θ(a))x − (θ(u) − θ(a)) − u 3 du
a 3
a<p≤x
2 x
Z
2 5
≤ (1.01624x − θ(a))x− 3 + (1.01624u − θ(a)) u− 3 du
3 a
2
= (1.01624x − θ(a))x− 3 + 2 · 1.01624(x1/3 − a1/3 ) + θ(a)(x−2/3 − a−2/3 )
= 3 · 1.01624 · x1/3 − (2.03248a1/3 + θ(a)a−2/3 ).
that 104 <p≤x log(1 + p−2/3 ) ≤ 0.33102x1/3 − 7.06909 for x > 104 .
P
We conclude
P
Since p≤104 log p ≤ 10.09062, this means that
X 10.09062 − 7.06909
log(1 + p−2/3 ) ≤ 0.33102 + 4/3
x1/3 ≤ 0.47126x1/3
p≤x
10
for x > 104 ; a direct computation for all x prime between 29 and 104 then confirms
that X
log(1 + p−2/3 ) ≤ 0.74914x1/3
p≤x
240 CHAPTER 12. THE `2 NORM AND THE LARGE SIEVE
whereas, by (12.38),
λ(2.2 · 1010 ) ≤ 838.227 < 846.765;
this is enough to ensure that λ(q) < $0 (q) for 2.2 · 1010 ≤ q < p≤31 p.
Q
Q Let us now give some rough bounds that will be enough to cover the case q ≥
τ
p≤31 p. First, as we already discussed, $(q) = $ 0 (q) and, since c(c + )q > log q +
1,
1
$0 (q) ≥ (c(c+ )q τ − log q) 1−τ ≥ (0.911q 0.224 − log q)1.289 ≥ q 0.2797 (12.39)
Q Q Q
by q ≥ p≤31 p. We are in the range p≤p1 p ≤ q ≤ p≤p0 p, where p1 < p0
are two consecutive primes with p1 ≥ 31.QBy [RS62, (3.16)] and a computation for
31 ≤ q < 200, we know that log q ≥ p≤p1 log p ≥ 0.8009p1 . By (12.38) and
(12.39), it follows that we just have to show that
1/3
190.272(log t)3 e2.24742t
e0.224t >
(0.8009t − log t + 0.07354)3
12.2. BOUNDING THE QUOTIENT IN THE LARGE SIEVE FOR PRIMES 241
for t ≥ 31. Now, t ≥ 31 implies 0.8009t − log t + 0.07354 ≥ 0.6924t, and so, taking
logarithms we see that we just have to verify
for t ≥ 31, and, since the left side is increasing and the right side is decreasing for
t ≥ 31, this is trivial to check.
We conclude that $(q) > λ(q) whenever q ≥ 2.2 · 1010 .
It remains to see how we can relax this assumption if we assume that 2 · 3 · 5 · 7 - q.
We repeat the same analysis as before, using (12.36) and (12.37) instead of (12.34) and
(12.35). For p1 ≥ 29,
1/3
−log(1+7−2/3 ) 1/3
Y p Y e0.74914x e0.74914x
< 1.633 log p1 , f1 (p) ≤ ≤
p−1 5.8478 7.44586
p≤p1 p≤p1
p6=7 p6=7
P Q
and p≤p1 :p6=7 (log p)/p < log p1 − (log 7)/7. So, for q < p≤p0 :p6=7 p, and p1 ≥ 29
the prime immediately preceding p0 ,
1/3
3
e0.74914p1
7.45235 · 7.44586
λ(q) ≤ 1.633 log p 1 ·
log 7
0.37268 log q − log p1 + 7 + 0.02741
1/3
84.351(log p1 )3 e2.24742p1
≤ .
(log q − log p1 + 0.35152)3
Thus we obtain, just like before, that
values of R. We check all integers R in [$(q), λ(q)) for all q < 3.3 · 109 (and all
3.3 · 109 ≤ q < 2.2 · 1010 , 210|q) by an explicit computation.2
Gq (Q0 /sq)
≤ 1, (12.41)
Gq (Q/sq)
Corollary 12.2.5. Let {an }∞ n=1 , an ∈ C, be√supported on the5 primes. Assume that
{an } is in `1 ∩ `2 and that an =
p 0 for n ≤ x. Let Q0 ≥ 10 , δ0 ≥ 1 be such that
(20000Q0 )2 ≤ x/2δ0 ; set Q = x/2δ0 .
Let S(α) = n an e(αn) for α ∈ R/Z. Let M as in (12.1). Then, if Q0 ≤ Q0.6 ,
P
Z
2 log Q0 + c+ X
|S(α)| dα ≤ |an |2 ,
M log Q + cE n
P
where c+ = 1.36 and cE = γ + p≥2 (log p)/(p(p − 1)) = 1.3325822 . . . .
Let Mδ0 ,Q0 as in (10.5). Then, if (2Q0 ) ≤ (2Q)0.6 ,
Z
2 log 2Q0 + c+ X
|S(α)| dα ≤ |an |2 . (12.42)
Mδ0 ,Q0 log 2Q + cE n
2
Here, of course, R/Z |S(α)| dα = n |an |2 (Plancherel). If Q0 > Q0.6 , we will
R P
Proof. Immediate from Prop. 12.1.1, Prop. 12.1.2 and Prop. 12.2.4.
Obviously, one can also give a statement derived from Prop. 12.1.1; the resulting
bound is Z
log Q0 + c+ X
|S(α)|2 dα ≤ |an |2 ,
M log Q + cE n
where M is as in (12.1).
We also record the large-sieve form of the result.
2 This is by far the heaviest computation in the present work, though it is still rather minor (about two
weeks of computing on a single core of a fairly new (2010) desktop computer carrying out other tasks as well;
this is next to nothing compared to the computations in [Plab], or even those in [HP13]). For the applications
here, we could have assumed ρ ≤ 8/15, and that would have reduced computation time drastically; the
lighter assumption ρ ≤ 0.6 was made with views to general applicability in the future. As elsewhere in this
section, numerical computations were carried out by the author in C; all floating-point operations used D.
Platt’s interval arithmetic package.
12.2. BOUNDING THE QUOTIENT IN THE LARGE SIEVE FOR PRIMES 243
Now, instead of using the easy inequality Gq (Q0 )/Gq (Q) ≤ G1 (Q0 )/G1 (Q/Q0 ), use
Prop. 12.2.4.
***
It would seem desirable to prove a result such as Prop. 12.2.4 (or Cor. 12.2.5, or
Cor. 12.2.6) without computations and with conditions that are as weak as possible.
Since, as we said, we cannot make c+ equal to cE , and since c+ does have to increase
when the conditions are weakened (as is shown by computations; this is not an arti-
fact of our method of proof) the right goal might be to show that the maximum of
Gq (Q0 /sq)/Gq (Q/sq) is reached when s = q = 1.
However, this is also untrue without conditions. For instance, for Q0 = 2 and Q
large, the value of Gq (Q0 /q)/Gq (Q/q) at q = 2 is larger than at q = 1: by (12.12),
G2 Q20 1
∼
Q 1 Q log 2
G2 2 2 log 2 + c E + 2
2 2 G(Q0 )
= log 2
> ∼ .
log Q + cE − 2
log Q + cE G(Q)
Thus, at the very least, a lower bound on Q0 is needed as a condition. This also dims
the hopes somewhat for a combinatorial proof of Gq (Q0 /q)G(Q) ≤ Gq (Q/q)G(Q0 );
at any rate, while such a proof would be welcome, it could not be extremely straightfor-
ward, since there are terms in Gq (Q0 /q)G(Q) that do not appear in Gq (Q/q)G(Q0 ).
244 CHAPTER 12. THE `2 NORM AND THE LARGE SIEVE
Chapter 13
The time has come to bound the part of our triple-product integral (10.3) that comes
from the minor arcs m ⊂ R/Z. We have an `∞ estimate (from Prop. 11.2.3, based on
Theorem 3.1.1) and an `2 estimate (from §12.2). Now we must put them together.
There
R are two ways in which we must be careful. A trivial bound of the form
`33 = |S(α)|3 dα ≤ `22 · `∞ would introduce a fatal factor of log x coming from `2 .
We avoid this by using the fact that we have `2 estimates over Mδ0 ,Q0 for varying Q0 .
We must also remember to substract the major-arc contribution from our estimate
for Mδ0 ,Q0 ; this is why we were careful to give a lower bound in Lem. 10.3.1, as
opposed to just the upper bound (10.28).
b
X Z b
f (n) · g(n) ≤ (max g(n)) · F (a) + (max g(n)) · F 0 (u)du.
n≥a a n≥u
n=a
Pn
Proof. Let S(n) = m=a f (m). Then, by partial summation,
b
X b−1
X
f (n) · g(n) ≤ S(b)g(b) + S(n)(g(n) − g(n + 1)). (13.2)
n=a n=a
245
246 CHAPTER 13. THE INTEGRAL OVER THE MINOR ARCS
Let h(x) = maxx≤n≤b g(n). Then h is non-increasing. Hence (13.1) and (13.2) imply
that
Xb Xb
f (n)g(n) ≤ f (n)h(n)
n=a n=a
b−1
X
≤ S(b)h(b) + S(n)(h(n) − h(n + 1))
n=a
b−1
X
≤ F (b)h(b) + F (n)(h(n) − h(n + 1)).
n=a
P
In general, for αn ∈ C, A(x) = a≤n≤x αn and F continuous and piecewise differ-
entiable on [a, x],
X Z x
αn F (x) = A(x)F (x) − A(u)F 0 (u)du. (Abel summation) (13.3)
a≤n≤x a
P
Applying this with αn = h(n) − h(n + 1) and A(x) = a≤n≤x αn = h(a) − h(bxc +
1), we obtain
b−1
X
F (n)(h(n) − h(n + 1))
n=a
Z b−1
= (h(a) − h(b))F (b − 1) − (h(a) − h(buc + 1))F 0 (u)du
a
Z b−1
= h(a)F (a) − h(b)F (b − 1) + h(buc + 1)F 0 (u)du
a
Z b−1
= h(a)F (a) − h(b)F (b − 1) + h(u)F 0 (u)du
a
Z b
= h(a)F (a) − h(b)F (b) + h(u)F 0 (u)du,
a
We will now seeR our main application of Lemma 13.1.1. We have to bound an
integral of the form Mδ ,r |S1 (α)|2 |S2 (α)|dα, where Mδ0 ,r is a union of arcs defined
0
as in (10.5). Our inputs are (a) a bound on integrals of the form Mδ ,r |S1 (α)|2 dα, (b)
R
0
a bound on |S2 (α)| for α ∈ (R/Z) \ Mδ0 ,r . The input of type (a) is what we derived in
§12.1 and §12.2; the input of type (b) is a minor-arcs bound, and as such was the main
subject of Part I.
13.1. PUTTING TOGETHER `2 BOUNDS OVER ARCS AND `∞ BOUNDS 247
for some δ0 ≤ x/2r12 and all r ∈ [r0 , r1 ]. Assume, moreover, that H(r1 ) = 1. Let
g : [r0 , r1 ] → R+ be a non-increasing function such that
where Z
1
I0 = P 2
|S1 (α)|2 dα. (13.7)
n |an | Mδ0 ,r0
The condition δ0 ≤ x/2r12 is there just to ensure that the arcs in the definition of
Mδ0 ,r do not overlap for r ≤ r1 .
Let Z
1
f (r1 ) = P 2
|S1 (α)|2 dα.
n |an | (R/Z)\Mδ0 ,r1
Then, by (13.5),
Z r1
1 X
P 2
|S1 (α)|2 |S2 (α)|dα ≤ f (r)g(r).
n |an | (R/Z)\Mδ0 ,r0 r=r0
By (13.4),
Z
X 1
f (r) = P 2
|S1 (α)|2 dα
r0 ≤r≤x n |an | Mδ0 ,x+1 \Mδ0 ,r0
Z ! (13.8)
1 2
= P 2
|S1 (α)| dα − I0 ≤ H(x) − I0
n |an | Mδ0 ,x+1
248 CHAPTER 13. THE INTEGRAL OVER THE MINOR ARCS
Then
r !2
|ϕ|1 x
q
Z r0 ≤ (M + T ) + Sη∗ (0, x) · E ,
κ
13.2. THE MINOR-ARC TOTAL 249
where
X
S= (log p)2 η+
2
(n/x),
√
p> x
√ √
1 x
T = Cϕ,3 log · (S − ( J − E)2 ),
2 κ (13.11)
Z
J= |Sη+ (α, x)|2 dα,
M8,r0
Z ∞
1
Cη+ ,0 = 0.7131 √ (sup η+ (r))2 dt,
0 t r≥t
Z ∞
log t
Cη+ ,1 = 0.7131 √ (sup η+ (r))2 dt,
1 t r≥t (13.12)
Cη+ ,2 = 0.51942|η+ |2∞ ,
Z 1/K
1.04488
Cϕ,3 (K) = |ϕ(w)|dw
|ϕ|1 0
and
log(r0 + 1) + c+ √ √ 2
M = g(r0 ) · √ · S − ( J − E)
log x + c−
Z r1 8
−2.14938 + 15
2 g(r) 7 log κ
+ dr + + g(r1 ) · S
log x + 2c− r0 r 15 log x + 2c−
(13.13)
where c+ = 2.0532 and c− = 0.6394.
Proof. Let y = x/κ. Let Q = (3/4)y 2/3 , as in Thm. 3.1.1 (applied with y instead
of x). Let α ∈ (R/Z) \ M8,r , where r0 ≤ r ≤ y 1/3 /6 and y is used instead of
x to define M8,r (see (10.5)). There exists an approximation 2α = a/q + δ/y with
q ≤ Q, |δ|/y ≤ 1/qQ. Thus, α = a0 /q 0 + δ/2y, where either a0 /q 0 = a/2q or
a0 /q 0 = (a + q)/2q holds. (In particular, if q 0 is odd, then q 0 = q; if q 0 is even, then
q 0 = 2q.)
There are three cases:
where we use the fact that g(r) is a non-increasing function (Lemma 11.2.4).
250 CHAPTER 13. THE INTEGRAL OVER THE MINOR ARCS
Let
(
3 gy,ϕ (r) if r ≤ r1 ,
r1 = y 4/15 , g(r) =
8 gy,ϕ (r1 ) if r > r1 .
By Lemma 11.2.4, for r ≥ 670, g(r) is a non-increasing function and g(r) ≥ gy,φ (r).
Moreover, by Lemma 11.2.5, gy,φ (r1 ) ≥ h(2y/ log y), where h is as in (11.15), and so
g(r) ≥ h(2y/ log y) for all r ≥ r0 ≥ 670. Thus, we have shown that
log y
|Sη∗ (y, α)| ≤ g(r) + Cϕ,3 · |ϕ|1 y (13.17)
2
Clearly,
Z
|Sη∗ (α, x)||S2,η+ (α, x)|2 dα
(R/Z)\M8,r0
Z
≤ max |Sη∗ (α, x)| · |S2,η+ (α, x)|2 dα
α∈R/Z R/Z
∞
X X X
≤ Λ(n)η∗ (n/x) · Λ(n)2 η+ (n/x)2 + Λ(n)2 η+ (n/x)2 .
√
n=1 n non-prime n≤ x
where E is as in (13.11).
It remains to bound
Z
|Sη∗ (α, x)||S1,η+ (α, x)|2 dα. (13.18)
(R/Z)\M8,r0
We wish to apply Prop. 13.1.2. Corollary 12.2.5 gives us an input of type (13.4); we
have just derived a bound (13.17) that provides an input of type (13.5). More precisely,
252 CHAPTER 13. THE INTEGRAL OVER THE MINOR ARCS
x ≥ 1025 · κ,
log((3/8)(x/κ)4/15 + 1) + c+
lim+ H(r) − lim− H(r) = 1 − √
r→r1 r→r1 log x + c−
8 −
4/15 log 83 + c+ − 15
4
log κ − 15
c
≤1− + √ −
1/2 log x + c
8
7 −2.14938 + 15 log κ
≤ + −
.
15 log x + 2c
As we already showed,
Z X
|S2,η+ (α, x)|2 dα = Λ(n)2 η+ (n/x)2 ≤ E.
R/Z n non-prime
√
or n ≤ x
Thus, √ √
I0 · S ≥ ( J − E)2 ,
and so we are done.
R r1 g(r)
We now should estimate the integral r0 r dr in (13.13). It is easy to see that
Z ∞ Z ∞ Z ∞
1 2 log r log er0 1 1
3/2
dr = 1/2
, 2
dr = , 2
dr = ,
r0 r r0 r0 r r 0 r0 r r0
Z r1 Z ∞ 2 Z ∞
1 r1 log r 2 log e r0 log 2r 2 log 2e2 r0
dr = log , 3/2
dr = √ , dr = √ ,
r0 r r0 r0 r r0 r0 r3/2 r0
Z ∞ Z ∞
(log 2r)2 2P2 (log 2r0 ) (log 2r)3 2P3 (log 2r0 )
3/2
dr = √ , 3/2
dr = 1/2
,
r0 r r 0 r0 r r0
(13.22)
where
We also have Z ∞
dr
= E1 (log r0 ) (13.24)
r0 r2 log r
where E1 is the exponential integral
∞
e−t
Z
E1 (z) = dt.
z t
Clearly, z(r) − eγ log log r = 2.50637/ log log r is decreasing on r. Hence, for
r ≥ 105 ,
z(r) ≤ eγ log log r + cγ ,
where cγ = 1.025742. Let F (t) = eγ log t + cγ . Then F 00 (t) = −eγ /t2 < 0. Hence
p
d2 F (t) F 00 (t) (F 0 (t))2
= − <0
dt2 2 F (t) 4(F (t))3/2
p
254 CHAPTER 13. THE INTEGRAL OVER THE MINOR ARCS
p p
for all t > 0. In other words, F (t) is convex-down, and so we can bound F (t)
p √ 0
from above by F (t0 ) + F (t0 ) · (t − t0 ), for any t ≥ t0 > 0. Hence, for r ≥ r0 ≥
105 , p
p p p d F (t) r
z(r) ≤ F (log r) ≤ F (log r0 ) + |t=log r0 · log
dt r0
p eγ log rr0
= F (log r0 ) + p · .
F (log r0 ) 2 log r0
Thus, by (13.22),
Z ∞p
eγ
z(r) p 1
dr ≤ F (log r0 ) 2 − √
r0 r3/2 F (log r0 ) r0
eγ log e2 r0
+p √ (13.26)
F (log r0 ) log r0 r0
p
eγ
2 F (log r0 )
= √ 1+ .
r0 F (log r0 ) log r0
The other integrals in (13.25) are easier. Just as in (13.26), we extend the range of
integration to [r0 , ∞]. Using (13.22) and (13.24), we obtain
Z ∞ Z ∞
z(r) F (log r) γ log log r0 cγ
2
dr ≤ 2
dr = e + E 1 (log r0 ) + ,
r r r0 r r0 r0
Z ∞ 0
z(r) log r (1 + log r0 ) log log r0 + 1 cγ log er0
2
dr ≤ eγ + E1 (log r0 ) + ,
r0 r r 0 r0
By [OLBC10, (6.8.2)],
1 1
≤ E1 (log r) ≤ .
r(log r + 1) r log r
(The second inequality is obvious.) Hence
Z ∞
z(r) eγ (log log r0 + 1/ log r0 ) + cγ
2
dr ≤ ,
r0 r r0
γ 1
Z ∞
z(r) log r e log log r0 + log r0 + cγ
2
dr ≤ · log er0 .
r0 r r0
Finally,
Z ∞
z(r) γ 2 log log r0 log r0 2cγ
3/2
≤e √ + 2E1 +√
r0 r r0 2 r0
γ
(13.27)
2 2e
≤√ F (log r0 ) + .
r0 log r0
It is time to estimate
r1
Z p
Rz,2r log 2r z(r)
dr, (13.28)
r0 r3/2
13.2. THE MINOR-ARC TOTAL 255
sZ sZ
r1 r1
(Rz,2r log 2r)2 z(r)
dr · dr. (13.29)
r0 r3/2 r0 r3/2
We have already bounded the second integral. Let us look at the first one. We can write
◦
Rz,t = 0.27125Rz,t + 0.41415, where
!
◦ log 4t
Rz,t = log 1 + 9z 1/3
. (13.30)
2 log 2.004t
Clearly,
!
◦ t/2
Rz,et /4 = log 1 + 36z 1/3
.
log 2.004 −t
In our case, a = 1/2, c = 1 and b = log 36z 1/3 − log(2.004) > 0. Hence, for t < b,
b 3 b 3
−ab((a − 2c)(b − 2t) − 2ct) = 2t + (b − 2t) = b − t > 0,
2 2 2 2
1/3
9z 1/3
3 4/15 9 2y
2r1 = y < ≤ .
16 2.004 log y 2.004
◦
We conclude that r → Rz,2r is convex-up on log 8r for r ≤ r1 , and hence so is
2
r → Rz,r , and so, in turn, is r → Rz,r . Thus, for r ∈ [r0 , r1 ],
Therefore, by (13.22),
r1
(Rz,2r log 2r)2
Z
dr
r0 r3/2
Z r1
2 log r1 /r 2 log r/r0 dr
≤ Rz,2r + R z,2r1 (log 2r)2 3/2
r0
0
log r1 /r0 log r 1 /r0 r
2
2Rz,2r
0
P2 (log 2r0 ) P2 (log 2r1 ) P3 (log 2r0 ) P3 (log 2r1 )
= √ − √ log 2r1 − √ + √
log rr10 r0 r1 r0 r1
2
2Rz,2r
1
P3 (log 2r0 ) P3 (log 2r1 ) P2 (log 2r0 ) P2 (log 2r1 )
+ √ − √ − √ − √ log 2r0
log rr10 r0 r1 r0 r1
!
2 log 2r0 2 2 P2 (log 2r0 ) P2 (log 2r1 )
= 2 Rz,2r0 − (Rz,2r1 − Rz,2r0 ) · √ − √
log rr10 r0 r1
2 2
Rz,2r − Rz,2r
1 0
P3 (log 2r0 ) P3 (log 2r1 )
+2 r1 √ − √
log r0 r0 r1
2 P2 (log 2r0 ) P2 (log 2r1 )
= 2Rz,2r · √ − √
0
r0 r1
2 2 −
Rz,2r1 − Rz,2r0 P2 (log 2r0 ) P3 (log 2r1 ) − (log 2r0 )P2 (log 2r1 )
+2 √ − √ ,
log rr01 r0 r1
(13.32)
where P2 (t) and P3 (t) are as in (13.23), and P2− (t) = P3 (t)−tP2 (t) = 2t2 +16t+48.
Z r1
g(r)
dr ≤ f0 (r0 , y) + f1 (r0 ) + f2 (r0 , y), (13.33)
r0 r
where
s
p q 2
f0 (r0 , y) = (1 − cϕ ) I0,r0 ,r1 ,y + cϕ I0,r0 ,r1 , 2y √ I1,r0
log y r0
p
eγ
F (log r0 ) 5
f1 (r0 ) = √ 1+ +√
2r0 F (log r0 ) log r0 2r0 (13.34)
1 13
+ log er0 + 11.07 Jr0 + 13.66 log er0 + 37.55
r0 4
((log y)/2)1/6 r1
f2 (r0 , y) = 3.36 log ,
y 1/6 r0
13.2. THE MINOR-ARC TOTAL 257
Conclusion
η∗ = (η2 ∗M ϕ)(κt),
2
where ϕ(t) = t2 e−t /2
, η2 = η1 ∗M η1 and η1 = 2 · I[−1/2,1/2] , and
2
η+ = h200 (t)te−t /2
,
where
Z ∞
dy
hH (t) = h(ty −1 )FH (y) ,
0 y
(
2 3 t−1/2
t (2 − t) e if t ∈ [0, 2], sin(H log y)
h(t) = FH (t) = .
0 otherwise, π log y
We studied η∗ and η+ in Part II. We saw η∗ in Thm. 13.2.1 (which actually works for
general ϕ : [0, ∞) → [0, ∞), as its statement says). We will set κ soon.
We fix a value for r, namely, r = 150000. Our results will have to be valid for any
x ≥ x+ , where x+ is fixed. We set x+ = 4.9 · 1026 , since we want a result valid for
N ≥ 1027 , and, as was discussed in (11.1), we will work with x+ slightly smaller than
N/2.
259
260 CHAPTER 14. CONCLUSION
√
Eη+ ,r,δ0 = max q ∗ | errη+ ,χ∗ (δ, x)|
χ mod q
q≤r·gcd(q,2)
|δ|≤gcd(q,2)δ0 r/2q
√ 1.617 · 10−10 1 √
≤ 1.3482 · 10−14 300000 + √ +√ 499900 + 52 300000
2 x+
≤ 2.3992 · 10−8 ,
(14.2)
where, in the latter case, we are using the fact that a stronger bound for q = 1 (namely,
(14.1)) allows us to assume q ≥ 2.
We also need to bound a few norms: by the estimates in §A.3 and §A.5 (applied
with H = 200),
274.8569
|η+ |1 ≤ 1.062319, |η+ |2 ≤ 0.800129 + ≤ 0.800132
2007/2
4 (14.3)
1+ πlog H
|η+ |∞ ≤ 1 + 2.06440727 · ≤ 1.079955.
H
By (10.12) and (14.1),
∗
|Sη+ (0, x)| = ηc
+ (0) · x + O errη+ ,χT (0, x) · x
≤ (|η+ |1 + ETη+ ,δ0 r/2 )x ≤ 1.063x.
This is far from optimal, but it will do, since all we wish to do with this is to bound the
tiny error term Kr,2 in (10.27):
√
Kr,2 = (1 + 300000)(log x)2 · 1.079955
√
· (2 · 1.06232 + (1 + 300000)(log x)2 1.079955/x)
≤ 1259.06(log x)2 ≤ 9.71 · 10−21 x
and
δ0 r(log 2e2 r) Eη2+ ,r,δ0 + Kr,2 /x ≤ 1.00393 · 10−8 .
and
274.856893
|η+ − η◦ |2 ≤ ≤ 2.42942 · 10−6 . (14.5)
H 7/2
(3) (2)
We bound |η◦ |1 using the fact that (as we can tell by taking derivatives) η◦ (t)
(2)
increases from 0 at t = 0 to a maximum within [0, 1/2], and then decreases to η◦ (1) =
14.2. THE TOTAL MAJOR-ARC CONTRIBUTION 261
−7, only to increase to a maximum within [3/2, 2] (equal to the maximum attained
within [0, 1/2]) and then decrease to 0 at t = 2:
(3) (2) (2) (2)
|η◦ |1 = 2 max η◦ (t) − 2η◦ (1) + 2 max η◦ (t)
t∈[0,1/2] t∈[3/2,2]
(2)
(14.6)
= 4 max η◦ (t) + 14 ≤ 4 · 4.6255653 + 14 ≤ 32.5023,
t∈[0,1/2]
where we compute the maximum by the bisection method with 30 iterations (using
interval arithmetic, as always).
We evaluate explicitly
X µ2 (q)
= 6.798779 . . . ,
φ(q)
q≤r
q odd
The only prime that we know does not divide N is 2. Thus, we use the bound
Y 1
C0 ≥ 2 1− ≥ 1.3203236. (14.9)
p>2
(p − 1)2
The other main constant is Cη◦ ,η∗ , which we defined in (10.37) and already started
to estimate in (11.6):
Z Nx Z Nx !
2 0 2 ∗ 2
Cη◦ ,η∗ = |η◦ |2 η∗ (ρ)dρ + 2.71|η◦ |2 · O ((2 − N/x) + ρ) η∗ (ρ)dρ
0 0
(14.10)
262 CHAPTER 14. CONCLUSION
2
provided that N ≥ 2x. Recall that η∗ = (η2 ∗M ϕ)(κt), where ϕ(t) = t2 e−t /2
.
Therefore,
Z N/x Z N/x Z 1 Z N/x
κρ dw
η∗ (ρ)dρ = (η2 ∗ ϕ)(κρ)dρ = η2 (w) ϕ dρ
0 0 1/4 0 w w
Z 1 Z ∞
|η2 |1 |ϕ|1 1
= − η2 (w) ϕ(ρ)dρdw.
κ κ 1/4 κN/xw
(14.12)
for x ∈ [−1, 1], and η◦0 (x + 1) = 0 for x 6∈ [−1, 1]. Now, for any even integer k > 0,
Z 1 Z 1
2 2 k+1
xk e−x dx = 2 xk e−x dx = γ ,1 ,
−1 0 2
14.2. THE TOTAL MAJOR-ARC CONTRIBUTION 263
Rr
where γ(a, r) = 0 e−t ta−1 dt is the incomplete gamma function. (We substitute
t = x2 in the integral.) By [AS64,
√ (6.5.16), (6.5.22)], γ(a + 1, 1) = aγ(a, 1) − 1/e for
all a > 0, and γ(1/2, 1) = π erf(1), where
Z 1
2 2
erf(z) = √ e−t dt.
π 0
We conclude that
1 1 2 2.0002
Cη◦ ,η∗ 2 2
≥ |ϕ|1 |η◦ |2 − |η◦ |2 2 + e−2κ − .
κ 2κ 2 κ3
Setting
κ = 49
and using (14.4), we obtain
1
(|ϕ|1 |η◦ |22 − 0.000834).
Cη◦ ,η∗ ≥ (14.14)
κ
Here it is useful to note that |ϕ|1 = π2 , and so, by (14.4), |ϕ|1 |η◦ |22 = 0.80237 . . . .
p
N N
x= c1 = = 0.495461 . . . · N. (14.15)
2+ κ 2 + √9/4 1
2π 49
Thus, we see that, since we are assuming N ≥ 1027 , we in fact have x ≥ 4.95461 . . . ·
1026 , and so, in particular,
x
x ≥ 4.9 · 1026 , ≥ 1025 . (14.16)
κ
264 CHAPTER 14. CONCLUSION
Let us continue with our determination of the major-arcs total. We should com-
pute the quantities in (10.38). We already have bounds for Eη+ ,r,δ0 , Aη+ (see (14.7)),
Lη,r,δ0 and Kr,2 . By Corollary 7.1.3, we have
√ ∗
Eη∗ ,r,8 ≤ max q | errη∗ ,χ∗ (δ, x)|
χ mod q
q≤r·gcd(q,2)
|δ|≤gcd(q,2)δ0 r/2q
√
1 −19 1 (14.17)
≤ 2.485 · 10 + √ 381500 + 76 300000
κ 1025
1.33805 · 10−8
≤ ,
κ
where the factor of κ comes from the scaling in η∗ (t) = (η2 ∗M ϕ)(κt) (which in
effect divides x by κ). It remains only to bound the more harmless terms of type Zη,2
and LSη .
2
P
Clearly, Zη+2 ,2 ≤ (1/x) n Λ(n)(log n)η+ (n/x). Now, by Prop. 7.1.5,
∞
X
Λ(n)(log n)η 2 (n/x)
n=1
(14.18)
∗ −6 366.91
= 0.640206 + O 2 · 10 + √ x log x − 0.021095x
x
≤ (0.640206 + O∗ (3 · 10−6 ))x log x − 0.021095x.
Thus,
Zη+2 ,2 ≤ 0.640209 log x. (14.19)
We will proceed a little more crudely for Zη∗2 ,2 :
1X 2 1X
Zη∗2 ,2 = Λ (n)η∗2 (n/x) ≤ Λ(n)η∗ (n/x) · (η∗ (n/x) log n)
x n x n
≤ (|η∗ |1 + | errη∗ ,χT (0, x)|) · (|η∗ (t) · log+ (κt)|∞ + |η∗ |∞ log(x/κ)),
(14.20)
where log+ (t) := max(0, log t). It is easy to see that
η2 (t) 2
|η∗ |∞ = |η2 ∗M ϕ|∞ ≤ |ϕ|∞ ≤ 4(log 2)2 · ≤ 1.414, (14.21)
t 1 e
By Cor. 7.1.3,
1
| errη∗ ,χT (0, x)| ≤ 2.485 · 10−19 + √ (381500 + 76) ≤ 1.20665 · 10−7 .
1025
We conclude that
p
Zη∗2 ,2 ≤ ( π/2/49 + 1.20665 · 10−7 )(0.732513 + 1.414 log(x/49)) ≤ 0.0362 log x.
(14.23)
We have bounds for |η∗ |∞ and |η+ |∞ . We can also bound
from (A.42).
We can now bound LSη (x, r) for η = η∗ , η+ :
X pα
LSη (x, r) = log r · max η
p≤r x
α≥1
log x X |η · t|∞
≤ (log r) · max |η|∞ +
p≤r log p pα /x
α≥1
pα ≥x
log x |η · t|∞
≤ (log r) · max |η|∞ +
p≤r log p 1 − 1/p
(log r)(log x)
≤ |η|∞ + 2(log r)|η · t|∞ ,
log 2
and so
(3/e)3/2
1.414
LSη∗ ≤ log x + 2 · log r ≤ 24.32 log x + 0.57,
log 2 49
(14.25)
1.07996
LSη+ ≤ log x + 2 · 1.19073 log r ≤ 18.57 log x + 28.39,
log 2
where we are using the bound on |η+ |∞ in (14.3)
We can now start to put together all terms in (10.36). Let 0 = |η+ − η◦ |2 /|η◦ |2 .
Then, by (14.5),
0 |η◦ |2 = |η+ − η◦ |2 ≤ 2.42942 · 10−6 .
Thus,
(3)
|η◦ |21
4.31004|η◦ |22 + 0.0012 δ05
2.82643|η◦ |22 (2 + 0 ) · 0 +
r
266 CHAPTER 14. CONCLUSION
is at most
|η2 ∗M ϕ|22
Z Z ∞
1 ∞ w dt 2
|η∗ |22 = = η2 (t)ϕ dw
κ κ 0 t t
Z ∞ Z ∞ 0
1 1 w dt
≤ 1− η22 (t)ϕ2 dw
κ 0 4 0 t t2
Z ∞ 2 Z ∞
3 η2 (t) w dw
= ϕ2 dt
4κ 0 t 0 t t
3 √ 3 32 3√ 1.77082
= |η2 (t)/ t|22 · |ϕ|22 = · (log 2)3 · π≤ ,
4κ 4κ 3 8 κ
1.33805 · 10−8
· 8.7806 + 2.3922 · 10−8 · 1.6812
κ
√
r
1.77082 1.7316 · 10−6
· ( 8.7806 + 1.6812 · 0.80014) ≤ ,
κ κ
where we are using the bound Aη+ ≤ 8.7806 we obtained in (14.7). (We are also using
the bounds on norms in (14.3) and the value κ = 49.)
By the bounds (14.19), (14.23) and (14.25), we see that the third line of (10.36) is
at most
where we use the assumption x ≥ x+ = 4.9 · 1026 (though a much weaker assumption
would suffice).
Using the assumption x ≥ x+ again, together with (14.22) and the bounds we have
just proven, we conclude that, for r = 150000, the integral over the major arcs
Z
Sη+ (α, x)2 Sη∗ (α, x)e(−N α)dα
M8,r
14.3. THE MINOR-ARC TOTAL: EXPLICIT VERSION 267
is
p !
2 ∗ π/2 2 1.7316 · 10−6 2
−5
C0 · Cη0 ,η∗ x + O 2.9387 · 10 · x + x + 43(log x)2 x
κ κ
3.85628 · 10−5 · x2
2 ∗
= C0 · Cη0 ,η∗ x + O
κ
= C0 · Cη0 ,η∗ x2 + O∗ (7.86996 · 10−7 x2 ),
(14.26)
where C0 and Cη0 ,η∗ are as in (10.37). Notice that C0 Cη0 ,η∗ x2 is the expected asymp-
totic for the integral over all of R/Z. p
Moreover, by (14.9), (14.14) and (14.4), as well as |ϕ|1 = π/2,
Similarly,
Z ∞ 2
log t
Cη+ ,1 = 0.7131 √ sup η+ (r) dt
1 t r≥t
∞
1.190732 log t
Z
≤ 0.7131 dt ≤ 0.44937.
1 t5/2
268 CHAPTER 14. CONCLUSION
We get
2
by (14.18). Let us now estimate T . Recall that ϕ(t) = t2 e−t /2
. Since
u u u
u3
Z Z Z
2
ϕ(t)dt = t2 e−t /2
dt ≤ t2 dt = ,
0 0 0 3
we can bound
Z 2
1 x 1.04488 log x/κ 2 −t2 /2 0.2779
Cϕ,3 log = p t e dt ≤ .
2 κ π/2 0 ((log x/κ)/2)3
and so
√ √
1 x
T = Cϕ,3 log · (S − ( J − E)2 )
2 κ
8 · 0.2779
≤ · (0.640209x log x − 0.021095x − 8.6297x)
(log x/κ)3
8x log x 8x
≤ 0.17792 − 2.40405
(log x/κ)3 (log x/κ)3
x x
≤ 1.42336 − 13.69293 .
(log x/κ)2 (log x/κ)3
14.3. THE MINOR-ARC TOTAL: EXPLICIT VERSION 269
It remains to estimate M . Let us first look at g(r0 ); here g = gx/κ,ϕ , where gy,ϕ is
2
defined as in (11.19) and φ(t) = t2 e−t /2 , as usual. Write y = x/κ. We must estimate
the constant Cϕ,2,K defined in (11.21):
Z 1 Z 1
Cϕ,2,K = − ϕ(w) log w dw ≤ − ϕ(w) log w dw
1/K 0
Z 1
2
≤− w2 e−w /2
log w dw ≤ 0.093426,
0
where again we use VNODE-LP for rigorous numerical integration. Since |ϕ|1 =
p
π/2 and K = (log y)/2, this implies that
and so !
0.07455 0.07455
Ry,K,ϕ,t = Ry/K,t + 1− Ry,t . (14.34)
log log2 y log log2 y
Let t = 2r0 = 300000; we recall that K = (log y)/2. Recall from (14.16) that
y = x/κ ≥ 1025 ; thus, y/K ≥ 3.47435 · 1023 and log((log y)/2) ≥ 3.35976. Going
back to the definition of Rx,t in (11.13), we see that
!
log(8 · 150000)
Ry,,2r0 ≤ 0.27125 log 1 + 9·(1025 )1/3
+ 0.41415 ≤ 0.58341,
2 log 2.004·2·150000
! (14.35)
log(8 · 150000)
Ry/K,2r0 ≤ 0.27125 log 1 + 23 )1/3
+ 0.41415 ≤ 0.60295,
2 log 9·(3.47435·10
2.004·2·150000
(14.36)
and so
0.07455 0.07455
Ry,K,ϕ,2r0 ≤ 0.60295 + 1 − 0.58341 ≤ 0.58385.
3.35976 3.35976
Using
2.50637
z(r) = eγ log log r + ≤ 5.42506,
log log r
we see from (11.13) that
13
L2r0 = 5.42506 · log 300000 + 7.82 + 13.66 log 300000 + 37.55 ≤ 474.608.
4
270 CHAPTER 14. CONCLUSION
√
(0.58385 · log 300000 + 0.5) 5.42506 + 2.5
g(r0 ) = √
2 · 150000
1/6
474.608 log y
+ + 3.36
150000 2y
≤ 0.041568.
log(150000 + 1) + c+ √ √
√ −
· S − ( J − E)2
log x + c
13.9716
≤ 1 · (0.640209x log x − 0.021095x) − 8.6297x
2 log x + 0.6394
11.7332x
≤ 17.8895x − 1 − 8.6297x
2 log x + 0.6394
≤ (17.8895 − 8.6297)x ≤ 9.2598x,
log(150000 + 1) + c+ √ √ 2
g(r0 ) · √ · S − ( J − E) ≤ 0.041568 · 9.2598x
log x + c−
≤ 0.38492x.
(14.37)
This is one of the main terms.
log 8 · 83 y 4/15
Ry,2r1 = 0.27125 log 1 + 9y 1/3
+ 0.41415
2 log 2.004· 34 y 4/15
!
4
15log y + log 3
= 0.27125 log 1 + 1 4 9 + 0.41415 (14.38)
2 3 − 15 log y + 2 log 2.004· 34
!
4
15
≤ 0.27125 log 1 + 1 4
+ 0.41415 ≤ 0.71215.
2 3 − 15
14.3. THE MINOR-ARC TOTAL: EXPLICIT VERSION 271
for 180 ≤ t ≤ 30000; since f (t) < 0 for 0 < t < 180 (by (4/3) log t − c < 0) and
since, by c > 20/3, we have f (t) < (5/2)(log t)/t as soon as t > (log t)2 (and so, in
particular, for t > 30000), we see that (14.40) is valid for all t > 0. Therefore,
Since r1 = (3/8)y 4/15 and z(r) is increasing for r ≥ 27, we know that
2.50637
z(r1 ) ≤ z(y 4/15 ) = eγ log log y 4/15 +
log log y 4/15
2.50637 15
= eγ log log y + − eγ log ≤ eγ log log y − 1.43644
log log y − log 15
4
4
(14.42)
for y ≥ 1025 . Hence, (11.13) gives us that
13 3 4 3 4
L2r1 ≤ (eγ log log y − 1.43644) log y 15 + 7.82 + 13.66 log y 15 + 37.55
4 4 4
13 γ
≤ e log y log log y + 2.39776 log y + 12.2628 log log y + 23.7304
15
≤ (2.13522 log y + 18.118) log log y.
272 CHAPTER 14. CONCLUSION
√
where we are using the fact that y 7→ (log y)2 log log y/y 2/15 is decreasing for y ≥
1025 (because y 7→ (log y)5/2 /y 2/15 is decreasing for y ≥ e75/4 and 1025 > e75/4 ).
It remains only to bound
Z r1
2S g(r)
dr
log x + 2c− r0 r
in the expression (13.13) for M . We will use the bound on the integral given in (13.33).
The easiest term to bound there is f1 (r0 ), defined in (13.34), since it depends only on
r0 : for r0 = 150000,
f1 (r0 ) = 0.0169073 . . . .
It is also not hard to bound f2 (r0 , x), also defined in (13.34):
3 15 4
((log y)/2)1/6 8y
f2 (r0 , y) = 3.36 1/6
log
x r0
1/6
(log y) 4
≤ 3.36 log y + 0.05699 − log r0 ,
(2y)1/6 15
where we recall again that x = κy = 49y. Thus, since r0 = 150000 and y ≥ 1025 ,
f2 (r0 , y) ≤ 0.001399.
Let us now look at the terms I1,r , cϕ in (13.35). We already saw in (14.33) that
2
2
P2 (log 2r0 ) Rz,2r 1
− 0.414152 P2− (log 2r0 )
I0,r0 ,r1 ,z ≤ Rz,2r · √ + √ ,
0
r0 log rr01 r0
where P2 (t) = t2 + 4t + 8 and P2− (t) = 2t2 + 16t + 48. By (14.38) and (14.41),
we obtain that
p q
(1 − cϕ ) I0,r0 ,r1 ,y + cϕ I0,r0 ,r1 , 2y
log y
s
0.49214
≤ 0.97781 · 0.10426 + 4
15 log y − log 400000 (14.45)
s
0.49584
+ 0.02219 0.10439 + 4 ≤ 0.33239
15 log y − log 400000
for y ≥ 10150 .
For y between 1025 and 10150 , we evaluate the left side of (14.45) directly, using
the definition (13.35) of I0,r0 ,r1 ,z instead, as well as the bound
0.07455
cϕ ≤
log log2 y
from (14.33). (It is clear from the second and third lines of (13.32) that I0,r0 ,r1 ,z is
decreasing on z for r0 , r1 fixed, and so the upper bound for cϕ does give the worst case.)
The bisection method (applied to the interval [25, 150] with 30 iterations, including 30
initial iterations) gives us that
p q
(1 − cϕ ) I0,r0 ,r1 ,y + cϕ I0,r0 ,r1 , 2y ≤ 0.4153461 (14.46)
log y
for 1025 ≤ y ≤ 10140 . By (14.45), (14.46) is also true for y > 10150 . Hence
s
2
f0 (r0 , y) ≤ 0.4153461 · √ 5.73827 ≤ 0.071498.
r0
By (14.30),
Putting (14.37), (14.43) and (14.47) together, we conclude that the quantity M
defined in (13.13) is bounded by
M ≤ 0.38492x + 0.30517x + 0.114988x ≤ 0.80508x. (14.48)
Gathering the terms from (14.29), (14.32) and (14.48), we see that Theorem 13.2.1
states that the minor-arc total
Z
Z r0 = |Sη∗ (α, x)||Sη+ (α, x)|2 dα
(R/Z)\M8,r0
is bounded by
r !2
|ϕ|1 x
q
Z r0 ≤ (M + T ) + Sη∗ (0, x) · E
κ
√ 2
p x x (14.49)
≤ |ϕ|1 (0.80508 + 3.5776 · 10−4 ) √ + 1.0532 · 10−11 √
κ κ
x2
≤ 1.00948
κ
p
for r0 = 150000, x ≥ 4.9 · 1026 , where we use yet again the fact that |ϕ|1 = π/2.
This is our total minor-arc bound.
x2 x2 x2
∗
≥ 1.058259 + O 1.00948 ≥ 0.04877
κ κ κ
√
for r0 = 150000, where x = N/(2 + 9/(196 2π)), as in (14.15). (We are using
(14.27) and (14.49).) Recall that κ = 49 and η∗ (t) = (η2 ∗M ϕ)(κt), where ϕ(t) =
2
t2 e−t /2 .
276 CHAPTER 14. CONCLUSION
By (14.3) and (14.21), |η+ |∞ ≤ 1.079955 and |η∗ |∞ ≤ 1.414. By [RS62, Thms. 12
and 13],
X √ √
Λ(n1 ) < 1.4262 N + log 2 < 1.4263 N ,
n1 ≤ N non-prime
or n1 = 2
X X √
Λ(n1 ) Λ(n2 ) = 1.4263 N · 1.03883N ≤ 1.48169N 3/2 .
n1 ≤ N non-prime n2 ≤N
or n1 = 2
x2
≥ 0.04877 − 7.3306N 3/2 log N
κ
≥ 0.00024433N 2 − 1.4412 · 10−11 · N 2 ≥ 0.0002443N 2
by κ = 49 and (14.15). Since 0.0002443N 2 > 0, this shows that every odd number
N ≥ 1027 can be written as the sum of three odd primes.
Since the ternary Goldbach conjecture has already been checked for all N ≤ 8.875·
1030 [HP13], we conclude that every odd number N > 7 can be written as the sum
of three odd primes, and every odd number N > 5 can be written as the sum of three
primes. The main result is hereby proven: the ternary Goldbach conjecture is true.
Part IV
Appendices
277
Appendix A
Our aim here is to give bounds on the norms of some smoothing functions – and, in
particular, on several norms of a smoothing function η+ : [0, ∞) → R based on the
2
Gaussian η♥ (t) = e−t /2 .
As before, we write
(
t2 (2 − t)3 et−1/2 if t ∈ [0, 2],
h : t 7→ (A.1)
0 otherwise
where
sin(H log y)
FH (t) = ,
π log y
Z ∞ (A.4)
dy
hH (t) = (h ∗M FH )(y) = h(ty −1 )FH (y)
0 y
and H is a positive constant to be set later. By (2.8), M hH = M h · M FH . Now FH is
just a Dirichlet kernel under a change of variables; using this, we get that, for τ real,
1
if |τ | < H,
M FH (iτ ) = 1/2 if |τ | = H, (A.5)
0 if |τ | > H.
279
280 APPENDIX A. NORMS OF SMOOTHING FUNCTIONS
Thus,
M h(iτ )
if |τ | < H,
1
M hH (iτ ) = 2 M h(iτ ) if |τ | = H, (A.6)
0 if |τ | > H.
As it turns out, h, η◦ and M h (and hence M hH ) are relatively easy to work with,
whereas we can already see that hH and η+ have more complicated definitions. Part
of our work will consist in expressing norms of hH and η+ in terms of norms of h, η◦
and M h.
We will have to compute Ck , 1 ≤ k ≤ 4, with some care, due to the absolute value
involved in the definition.
The function (x2 (2 − x)3 ex−1/2 )0 = ((x2 (2 − x)3 )0 + x2 (2 − x)3 )ex−1/2 has the
same zeros as H1 (x) = (x2 (2 − x)3 )0 + x2 (2 − x)3 , namely, −4, 0, 1 and 2. The sign
of H1 (x) (and hence of h0 (x)) is + within (0, 1) and − within (1, 2). Hence
Z ∞ √
C1 = |h0 (x)|dx = |h(1) − h(0)| + |h(2) − h(1)| = 2h(1) = 2 e. (A.10)
0
The situation with (x2 (2 − x)3 ex−1/2 )00 is similar: it has zeros at the roots of
H2 (x) = 0, where H2 (x) = H1 (x) + H10 (x) (and, in general, Hk+1 (x) = Hk (x) +
Hk0 (x)). This time, we will prefer to find the roots numerically. It is enough to find
(candidates for) the roots using any available tool1 and then check rigorously that the
sign does change around the purported roots. In this way, we check that H2 (x) = 0 has
two roots α2,1 , α2,2 in the interval (0, 2), another root at 2, and two more roots outside
[0, 2]; moreover,
α2,1 = 0.48756597185712 . . . ,
(A.11)
α2,2 = 1.48777169309489 . . . ,
where we verify the root using interval arithmetic. The sign of H2 (x) (and hence of
h00 (x)) is first +, then −, then +. Write α2,0 = 0, α2,3 = 2. By integration by parts,
Z ∞ Z α2,1 Z α2,2 Z 2
C2 = |h00 (x)|x dx = h00 (x)x dx − h00 (x)x dx + h00 (x)x dx
0 0 α2,1 α2,2
3
!
X Z α2,j
= (−1)j+1 h0 (x)x|α
α2,j−1 −
2,j
h0 (x) dx
j=1 α2,j−1
2
X
=2 (−1)j+1 (h0 (α2,j )α2,j − h(α2,j )) = 10.79195821037 . . . .
j=1
(A.12)
To compute C3 , we proceed in the same way, finding two roots of H3 (x) = 0
(numerically) within the interval (0, 2), viz.,
α3,1 = 1.04294565694978 . . .
α3,2 = 1.80999654602916 . . .
The sign of H3 (x) on the interval [0, 2] is first −, then +, then −. Write α3,0 = 0,
α3,3 = 2. Proceeding as before – with the only difference that the integration by parts
3
!
X Z α3,j
j 00 00
= (−1) h (x)x2 |α 3,j
α3,j−1 − h (x) · 2x dx
j=1 α3,j−1
(A.13)
3
X
j 00 0
h (x)x − h (x) · 2x + 2h(x) |α
2
3,j
= (−1) α3,j−1
j=1
2
X
=2 (−1)j (h00 (α3,j )α3,j
2
− 2h0 (α3,j )α3,j + 2h(α3,j ))
j=1
C3 = 75.1295251672 . . . (A.14)
The treatment of the integral in C4 is very similar, at least as first. There are two
roots of H4 (x) = 0 in the interval (0, 2), namely,
α4,1 = 0.45839599852663 . . .
α4,2 = 1.54626346975533 . . .
The sign of H4 (x) on the interval [0, 2] is first −, +, then −. Using integration by parts
as before, we obtain
Z 2−
h(4) (x) x3 dx
0+
Z α4,1 Z α4,2 Z 2−
=− h(4) (x)x3 dx + h(4) (x)x3 dx − h(4) (x)x3 dx
0+ α4,1 α4,1
2
X
=2 (−1)j h(3) (α4,j )α4,j
3
− 3h(2) (α4,j )α4,j
2
+ 6h0 (α4,j )α4,j − 6h(α4,j )
j=1
since limt→0+ h(k) (t)tk = 0 for 0 ≤ k ≤ 3, limt→2− h(k) (t) = 0 for 0 ≤ k ≤ 2 and
limt→2− h(3) (t) = −24e3/2 . Now
Z ∞
|h(4) (x)x3 |dx = lim+ |h(3) (2 + ) − h(3) (2 − )| · 23 = 23 · 24e3/2 ,
2− →0
Hence
Z 2−
C4 = h(4) (x) x3 dx + 24e3/2 · 23 = 2013.18185012 . . . (A.15)
0+
A.2. THE DIFFERENCE η+ − η◦ IN `2 NORM. 283
We will, however, find it easier to deal with M h by means of the bound (A.8), in part
because (A.16) amounts to an invitation to numerical instability.
For instance, it is easy to use (A.8) to give a bound for the `1 -norm of M h(it).
Since C4 /C3 > C3 /C2 > C2 /C1 > C1 /C0 ,
Z ∞
|M h(it)|1 = 2 M h(it)dt
0
Z C2 /C1 Z C3 /C2 Z C4 /C3 Z ∞ !
C1 dt dt dt dt
≤2 C0 + C1 + C2 2
+ C3 3
+ C4 4
C0 C1 /C0 t C2 /C1 t C3 /C2 t C4 /C3 t
and so
|M h(it)|1 ≤ 16.1939176. (A.17)
C4 C32
C3 C1 C2 C3
+ C2 log 2 + C3 − + ,
C2 C3 C4 2 C42
and so
|(t + i)M h(it)|1 ≤ 27.8622803. (A.18)
This is at most
Z ∞
2 dt
max e−t t3 | log t| · |hH (t) − h(t)|2 .
t≥0 0 t
Now
2 2 2
max e−t t3 | log t| = max max e−t t3 (− log t), max e−t t3 log t
t≥0 t∈[0,1] t∈[1,5]
= 0.14882234545 . . .
where we find the maximum by the bisection method with 40 iterations (see 2.6).
Hence, by (A.22),
Z ∞
C2
(η+ (t) − η◦ (t))2 | log t|dt ≤ 0.148822346 4
0 7π
2 (A.24)
27427.502 165.61251
≤ ≤ .
H7 H 7/2
A.3. NORMS INVOLVING η+ 285
Since sZ r
∞
σ+3/2 −t2 /2 Γ(σ + 2)
t e = e−t2 t2σ+3 dt = ,
2 0 2
r
√ 31989 585e3
|h(t)/ t|2 = − ≤ 1.5023459,
8e 8
we conclude that p
|η+ (t)tσ |1 ≤ 1.062319 · Γ(σ + 2) (A.30)
for σ > −2.
0
A.4 Norms involving η+
0
By one of the standard transformation rules (see (2.10)), the Mellin transform of η+
equals −(s − 1) · M η+ (s − 1). Since the Mellin transform is an isometry in the sense
of (2.6),
1
Z 2 +i∞
Z − 12 +i∞
0 2 1 0 2 1 2
|η+ |2 = M (η+ )(s) ds = |s · M η+ (s)| ds.
2πi 1
2 −i∞
2πi − 12 −i∞
2
Recall that η+ (t) = hH (t)η♦ (t), where η♦ (t) = te−t /2 . Thus, by (2.9), the func-
tion M η+ (−1/2 + it) equals 1/2π times the (additive) convolution of M hH (it) and
M η♦ (−1/2 + it). Therefore, for s = −1/2 + it,
H
|s|
Z
|s| |M η+ (s)| = M h(ir)M η♦ (s − ir)dr
2π −H
Z H
3 (A.31)
≤ |ir − 1||M h(ir)| · |s − ir||M η♥ (s − ir)|dr
2π −H
3
= (f ∗ g)(t),
2π
where f (t) = |it − 1||M h(it)| and g(t) = | − 1/2 + it||M η♦ (−1/2 + it)|. (Since
|(−1/2 + i(t − r)) + (1 + ir)| = |1/2 + it| = |s|, either | − 1/2 + i(t − r)| ≥ |s|/3 or
|1+ir| ≥ 2|s|/3; hence |s−ir||ir−1| = |−1/2+i(t−r)||1+ir| ≥ |s|/3.) By Young’s
inequality (in a special case that follows from Cauchy-Schwarz), |f ∗ g|2 ≤ |f |1 |g|2 .
By (A.18),
|f |1 = |(r + i)M h(ir)|1 ≤ 27.8622803.
Yet again by Plancherel,
Z − 21 +i∞
|g|22 = |s|2 |M η♦ (s)|2 ds
− 12 −i∞
1 3
2 +i∞
Z
0 0 2 3π 2
= |(M (η♦ ))(s)|2 ds = 2π|η♦ |2 = .
1
2 −i∞
4
0
A.4. NORMS INVOLVING η+ 287
Hence
0 1 3
|η+ |2 ≤ √ · |f ∗ g|2
2π 2π
s
3
(A.32)
1 3 3π 2
≤√ · 27.8622803 ≤ 10.845789.
2π 2π 4
0
Let us now bound |η+ (t)tσ |1 for σ ∈ (−1, ∞). First of all,
2
0
0
|η+ (t)tσ |1 = hH (t)te−t /2
tσ
1
2 2
≤ h0H (t)te−t /2 + hH (t)(1 − t2 )e−t /2
· tσ
1
2
≤ h0H (t)tσ+1 e−t /2
+ |η+ (t)tσ−1 |1 + |η+ (t)tσ+1 |1 .
1
We can bound the last two terms by (A.30). Much as in (A.29), we note that
2 2 √
h0H (t)tσ+1 e−t /2
≤ tσ+1/2 e−t /2
|h0H (t) t|2 ,
1 2
we conclude that
r
0
p p Γ(σ + 1)
|η+ (t)tσ |1 ≤ 1.062319 · ( Γ(σ + 1) + Γ(σ + 3)) + · 2.6312226
2
p p
≤ 2.922875 Γ(σ + 1) + 1.062319 Γ(σ + 3)
(A.33)
for σ > −1.
288 APPENDIX A. NORMS OF SMOOTHING FUNCTIONS
Now
te−w/H 0 t|h0 |∞
d t t
h = h ≤ .
dw ew/H H ew/H Hew/H
Integration by parts easily yields the bounds | Si(x) − π/2| < 2/x for x > 0 and
| Si(x) + π/2| < 2/|x| for x < 0; we also know that 0 ≤ Si(x) ≤ x < π/2 for
x ∈ [0, 1] and −π/2 < x ≤ Si(x) ≤ 0 for x ∈ [−1, 0]. Hence
Z 1 Z ∞ −w/H
2t|h0 |∞ π −w/H 2e
|hH (t) − h(t)| ≤ e dw + dw
πH 0 2 1 w
4 E1 (1/H)
= t|h0 |∞ · (1 − e−1/H ) + ,
π H
where E1 is the exponential integral
∞
e−t
Z
E1 (z) = dt.
z t
A.5. THE `∞ -NORM OF η+ 289
By [AS64, (5.1.20)],
log(H + 1)
0 < E1 (1/H) < ,
e1/H
and, since log(H +1) = log H +log(1+1/H) < log H +1/H < (log H)(1+1/H) <
(log H)e1/H for H ≥ e, we see that this gives us that E1 (1/H) < log H (again for
H ≥ e, as is the case). Hence
1 + π4 log H
|hH (t) − h(t)| 1 4 log H
< |h0 |∞ · 1 − e− H + < |h0 |∞ · , (A.36)
t π H H
and so, by (A.34),
2 1 + π4 log H 1 + π4 log H
|η+ |∞ < 1 + · 2.80582038 · < 1 + 2.06440727 · . (A.38)
e H H
We will need three other bounds of this kind, namely, for η+ (t) log t, η+ (t)/t and
η+ (t)t. We start as in (A.34):
|η+ log t|∞ ≤ |η◦ log t|∞ + |(h(t) − hH (t))η♦ (t) log t|∞
≤ |η◦ log t|∞ + |(h − hH (t))/t|∞ |η♦ (t)t log t|∞ ,
(A.39)
|η+ (t)/t|∞ ≤ |η◦ (t)/t|∞ + |(h − hH (t))/t|∞ |η♦ (t)|∞
|η+ (t)t|∞ ≤ |η◦ (t)t|∞ + |(h − hH (t))/t|∞ |η♦ (t)t2 |∞ .
|η◦ (t) log t|∞ ≤ 0.279491, |η♦ (t)t log t|∞ ≤ 0.3811561.
Taking derivatives, we see that |η♦ (t)t2 |∞ = 33/2 e−3/2 . Hence, yet again by (A.36)
and (A.37),
1 + π4 log H
|η+ (t)t|∞ ≤ 1.06473476 + 3.25312 · . (A.42)
H
Appendix B
(This formula, in all likelihood well-known, is easy to derive. First, we can assume
without loss of generality that a = 0, b = 1 and g(a) = g(b) = 0. Dividing by g
by g(t), we see that we can also assume that g(t) is real (and in fact 1). We can also
assume that g is real-valued, in that it will be enough to prove (B.3) for the real-valued
function <g, as this will give us the bound g(t) = <g(t) ≤ (1/8) maxv |(<g)00 (v)| ≤
maxv |g 00 (v)| that we wish for. Lastly, we can assume (by symmetry) that 0 ≤ t ≤ 1/2,
and that g has a local maximum or minimum at t. Writing M = maxu∈[0,1] |g 00 (u)|,
we then have:
Z t Z tZ v Z t Z v
g(t) = g 0 (v)dv = g 00 (u)dudv = O∗ M du dv
0 0 t 0 t
Z t
1 2 1
= O∗ (v − t)M dv = O∗ t M = O∗ M ,
0 2 8
291
292 APPENDIX B. NORMS OF FOURIER TRANSFORMS
as desired.)
We obtain immediately from (B.3) that
1
max |g(t)| ≤ max(|g(a)|, |g(b)|) + (b − a)2 · max |g 00 (v)|. (B.4)
t∈[a,b] 8 v∈[a,b]
For any v ∈ R,
π 2
|g 00 (v)| ≤ · 4 + π 2 · 4 + (2π)2 = 9π 2 . (B.5)
2
Clearly g(t) depends only on t mod 4π. Hence, by (B.4) and (B.5), to estimate
max |g(t)|
t∈R
|ηc00
2 |∞ ≤ 31.521. (B.6)
where δx0 is the point measure at x0 of mass 1 (Dirac delta function) and
0
if x < 1/4 or x ≥ 1,
f (x) = −4x−2 if 1/4 ≤ x < 1/2,
−2
4x if 1/2 ≤ x < 1.
00 0
2 (t) = 4g(t) + f (t), where g is as in (B.2). It is easy to see that |f |1 =
Thus ηc b
2 maxx f (x) − 2 minx f (x) = 160. Therefore,
|f 0 |1 80
fb(t) = fb0 (t)/(2πit) ≤ = . (B.8)
2π|t| π|t|
Since 31.521 − 4 · 7.87052 = 0.03892, we conclude that (B.6) follows from Lemma
B.1.1 and (B.8) for |t| ≥ 655 > 80/(π · 0.03892).
It remains to check the range t ∈ (−655, 655); since 4g(−t)+fb(−t) is the complex
conjugate of 4g(t) + fb(t), it suffices to consider t non-negative. We use (B.4) (with
4g + fb instead of g) and obtain that, to estimate maxt∈R |4g + fb(t)|
qwith an error of at
most , it is enough to subdivide [0, 655) into intervals of length ≤ 2/|(4g + fb)00 |∞
each and check |4g + fb(t)| at the endpoints. Now, for every t ∈ R,
00
fb (t) = (−2πi)2 xd 2 f (t) = (2π)2 · O ∗ |x2 f | 2
1 = 12π .
B.2. BOUNDS INVOLVING A LOGARITHMIC FACTOR 293
By this and (B.5), |(4g + fb)00 |∞ ≤ 48π 2 . Thus, intervals of length δ1 give an error
term of size at most 24π 2 δ12 . We choose δ1 = 0.001 and obtain an error term less than
0.000237 for this stage.
To evaluate fb(t) (and hence 4g(t) + fb(t)) at a point, we integrate usingpSimpson’s
rule on subdivisions of the intervals [1/4, 1/2], [1/2, 1] into 200 · max(1, b |t|c) sub-
intervals each.1 The largest value of fb(t) we find is 31.52065 . . . , with an error term
of at most 4.5 · 10−5 .
Lemma B.2.2. Let y ≥ 4. Let g(t) = 4e(−t/4) − 4e(−t/2) + e(−t) and k(t) =
2e(−t/4) − e(−t/2). Then, for every t ∈ R,
Proof. By Lemma B.1.1, |g(t)| ≤ 7.87052. Since y ≥ 4, k(t) · (4 log 2)/ log y ≤ 6.
For any complex numbers z1 , z2 with |z1 |, |z2 | ≤ `, we can have |z1 − z2 | > ` only if
| arg(z1 /z2 )| > π/3. It is easy to check that, for all t ∈ [−2, 2],
g(t) · log y g(t) π
arg = arg < 0.7 < .
4 log 2 · k(t) k(t) 3
(It is possible to bound maxima rigorously as in (B.4).) Hence (B.10) holds.
1 As usual, the code uses interval arithmetic (§2.6).
294 APPENDIX B. NORMS OF FOURIER TRANSFORMS
Lemma B.2.3. Let η2 : R+ → R be as in (3.4). Let η(y) (t) = (log yt)η2 (t), where
y ≥ 4. Then
00 |
|ηd ∞ < 31.521 · log y.
(y) (B.11)
Proof. Clearly
00 2 1
η(y) (x) = η200 (x)(log y) + (log x)η200 (x) + η20 (x) − 2 η2 (x)
x x
= η200 (x)(log y) + 4(log x)(4δ1/4 (x) − 4δ1/2 (x) + δ1 (x)) + h(x),
where
0
if x < 1/4 or x > 1,
4
x2 (2 − 2 log 2x) if 1/4 ≤ x < 1/2,
h(x) =
4
x2 (−2 + 2 log x) if 1/2 ≤ x < 1.
(Here we are using the expression (B.7) for η200 (x).) Hence
00
(y) (t) = (4g(t) + f (t))(log y) + (−16 log 2 · k(t) + h(t)),
ηd (B.12)
b b
|f 0 |1 80 160(1 + log 2)
|fb(t)| ≤ ≤ , |b
h(t)| ≤ . (B.13)
2π|t| π|t| π|t|
Again as before, this implies that (B.11) holds for
1 160(1 + log 2)
|t| ≥ 80 + = 2252.51.
π · 0.03892 (log 4)
Note also that it is enough to check (B.11) for t ≥ 0, by symmetry. Our remaining task
is to prove (B.11) for 0 ≤ t ≤ 2252.21.
Let I = [0.3, 2252.21] \ [3.25, 3.65]. For t ∈ I, we will have
!
4g(t) + fb(t) π π
arg ⊂ − , . (B.14)
−16 log 2 · k(t) + bh(t) 3 3
(This is actually true for 0 ≤ t ≤ 0.3 as well, but we will use a different strategy in
that range in order to better control error terms.) Consequently, by Lemma B.1.2 and
log y ≥ log 4,
00
|ηd
(y) (t)| < max(|4g(t) + f (t)| · (log y), |16 log 2 · k(t) − h(t)|)
b b
(−1, 1) for t ∈ I). We decide to evaluate the argument in (B.14) at all t ∈ 0.005Z ∩ I,
computing fb(t) and bh(t) by numerical integration (Simpson’s rule) with a subdivision
of [−1/4, 1] into 5000 intervals. Proceeding as in the proof of Lemma B.1.1, we see
that the sampling induces an error of at most
1 0.0001
0.0052 max((4|g 00 (v)| + |(fb)00 (t)|) ≤ 48π 2 < 0.00593 (B.15)
2 v∈I 8
00
and so maxt∈[0,0.3]∪[3.25,3.65] |ηd
(y) |∞ < 29.1 log y < 31.521 log y.
The following function will appear only in a lower-order term; thus, an `1 estimate will
do.
Lemma B.2.4. Let η2 : R+ → R be as in (3.4). Then
297
298 APPENDIX C. SUMS INVOLVING Λ AND φ
The assumption that all non-trivial zeros up to T0 = 3.061 · 1010 lie on the critical
line was proven rigorously in [Plaa]; higher values of T0 have been reached elsewhere
([Wed03], [GD04]).
Proof. By Lemma C.1.1,
n Z ∞ Z ∞
X t η2 (t/x) X
Λ(n)η2 = η2 dt − dt − (M ϕ)(ρ),
n
x 1 x 1 t(t2 − 1) ρ
where ϕ(u)R ∞= η2 (u/x) and ρ runs over all non-trivial zeros of ζ(s). Since η2 is non-
negative, 1 η2 (t/x)dt = x|η2 |1 = x, while
Z ∞ Z 1 !
η2 (t/x) ∗ η2 (t) ∗ 9.61114
dt = O dt = O .
1 t(t2 − 1) 2 2
1/4 tx (t − 1/100) x2
By (2.11),
X X X 1 − 2−ρ 2
(M ϕ)(ρ) = M η2 (ρ) · xρ = xρ
ρ ρ ρ
ρ
= S1 (x) − 2S1 (x/2) + S1 (x/4),
where
X xρ
Sm (x) = . (C.3)
ρ
ρm+1
Setting aside the contribution of all ρ with |=(ρ)| ≤ T0 and all ρ with |=(ρ)| > T0 and
<(s) ≤ 1/2, and using the symmetry provided by the functional equation, we obtain
X 1 X 1
|Sm (x)| ≤ x1/2 · +x·
ρ
|ρ|m+1 ρ
|ρ|m+1
|=(ρ)|>T0
|<(ρ)|>1/2
X 1 x X 1
≤ x1/2 · + · .
ρ
|ρ|m+1 2 ρ
|ρ|m+1
|=(ρ)|>T0
We bound the first sum by [Ros41, Lemma 17] and the second sum by [RS03, Lemma
2]. We obtain
1 2.68 eT0
|Sm (x)| ≤ + x log + κm x1/2 , (C.4)
2mπT0m T0m+1 2π
3 √
X 1 2.68 9x eT0
(M η)(ρ) · xρ ≤ + 2 log + + 2 κ1 x1/2 .
ρ
2πT0 T0 4 2π 2
C.2. SUMS INVOLVING φ 299
Corollary C.1.3. Let η2 be as in (11.7). Assume that all non-trivial zeros of ζ(s) with
|=(s)| ≤ T0 , T0 = 3.061 · 1010 , lie on the critical line. Then, for all x ≥ 1,
X n
Λ(n)η2 ≤ min (1 + )x + 0.2x1/2 , 1.04488x , (C.5)
n
x
Proof. Immediate from Lemma C.1.2 for P x ≥ 2000. For x < 2000, we use computa-
tion as follows. Since |η20 |∞ = 16 and x/4≤n≤x Λ(n) ≤ x for all x ≥ 0, computing
P
n≤x Λ(n)η2 (n/x) only for x ∈ (1/1000)Z ∩ [0, 2000] results in an inaccuracy of at
most (16 · 0.0005/0.9995)x ≤ 0.00801x. This resolves the matter at all points outside
(205, 207) (for the first estimate) or outside (9.5, 10.5) and (13.5, 14.5) (for the second
estimate). In those intervals, the prime powers n involved do not change (since whether
x/4 < n ≤ x depends only on n and [x]), and thus we can find the maximum of the
sum in (C.5) just by taking derivatives.
and so
X µ2 (q)
1.295730 ≤ < 1.295732, (C.8)
φ(q)q
q odd
since the expression bounded in (C.8) is exactly half of that bounded in (C.7).
1 Using D. Platt’s integer arithmetic package.
300 APPENDIX C. SUMS INVOLVING Λ AND φ
In what follows, we will use values for convergent sums obtained in much the same
way – an easy tail bound followed by a computation.
By [Ram95, Lemma 3.4],
X µ2 (q)
= log r + cE + O∗ (7.284r−1/3 ),
φ(q)
q≤r
X µ2 (q) 1
log 2
(C.10)
= log r + cE + + O∗ (4.899r−1/3 ),
φ(q) 2 2
q≤r
q odd
where
X log p
cE = γ + = 1.332582275 + O∗ (10−9 /3)
p
p(p − 1)
by [RS62, (2.11)]. As we already said in (12.15), this, supplemented by a computation
for r ≤ 4 · 107 , gives
X µ2 (q)
log r + 1.312 ≤ ≤ log r + 1.354
φ(q)
q≤r
for r ≥ 195. (The numerical verification here goes up to 1.38 · 108 ; for r > 3.18 · 108 ,
use C.11.)
Clearly
X µ2 (q) X µ2 (q)
= . (C.12)
φ(q) φ(q)
q≤2r q≤r
q even q odd
We wish to obtain bounds for the sums
X µ2 (q) X µ2 (q) X µ2 (q)
, , ,
φ(q)2 φ(q)2 φ(q)2
q≥r q≥r q≥r
q odd q even
pj − (p − 1)j
fj (p) = , fj (pk ) = 0 for k ≥ 2.
(p − 1)j p
Proof. The right side of (C.14) decreases as A increases, while the left side depends
only on dAe. Hence, it is enough to prove (C.14) when A is an integer.
For A = 1, (C.14) is an equality. Let
−1
ζ(j) Y 1
C= · 1+ j .
ζ(2j) p
p|m
Let A ≥ 2. Since
X µ2 (a) X µ2 (a)
=C−
aj aj
a≥A a<A
(a,m)=1 (a,m)=1
and
∞
µ2 (a) µ2 (a)
Z
X X 1 1
C= < + j + dt
a
aj aj A A tj
a<A
(a,m)=1 (a,m)=1
X µ2 (a) 1 1
= + j + ,
aj A (j − 1)Aj−1
a<A
(a,m)=1
we obtain
X µ2 (a) 1 Aj−1 − 1 X µ2 (a)
j
= j−1 · C + ·C −
a A Aj−1 aj
a≥A a<A
(a,m)=1 (a,m)=1
Aj−1 − 1 µ2 (a)
C 1 1 1 X
< j−1 + · + −
A Aj−1 A j (j − 1)Aj−1 Aj−1 aj
a<A
(a,m)=1
C 1 1 1 1
≤ + j−1 1− + −1 .
Aj−1 A Aj−1 A j−1
302 APPENDIX C. SUMS INVOLVING Λ AND φ
Lastly,
X µ2 (q)q X X 1 X 1 X X 1 r
= µ2 (q) = µ2 (q) ≤ +1
φ(q) φ(d) φ(d) 2φ(d) d
q≤r q≤r d|q d≤r q≤r d≤r
q odd q odd d odd d|q d odd
q odd
r X 1 1X 1 log r
≤ + ≤ 0.64787r + + 0.425,
2 φ(d)d 2 φ(d) 4
d odd d≤r
d odd
(C.18)
where we are using (C.8) and (C.11).
***
Since we are on the subject of φ(q), let us also prove a simple lemma that we use
at various points in the text to bound q/φ(q).
Lemma C.2.2. For any q ≥ 1 and any r ≥ max(3, q),
q
< z(r),
φ(q)
C.2. SUMS INVOLVING φ 303
where
2.50637
z(r) = eγ log log r + . (C.19)
log log r
Proof. Since z(r) is increasing for r ≥ 27, the statement follows immediately for
q ≥ 27 by [RS62, Thm. 15]:
q
< z(q) ≤ z(r).
φ(q)
In order to show that every odd number n ≤ N is the sum of three primes, it is enough
to show for some M ≤ N that
1. every even integer 4 ≤ m ≤ M can be written as the sum of two primes,
2. the difference between any two consecutive primes ≤ N is at most M − 4.
(If we want to show that every odd number n ≤ N is the sum of three odd primes,
we just replace M − 4 by M − 6 in (2).) The best known result of type (1) is that
of Oliveira e Silva, Herzog and Pardi ([OeSHP14], M = 4 · 1018 ). As for (2), it was
proven in [HP13] for M = 4 · 1018 and N = 8.875694 · 1030 by a direct computation
(valid even if we replace M − 4 by M − 6 in the statement of (2)).
Alternatively, one can establish results of type (2) by means of numerical verifica-
tions of the Riemann hypothesis up to a certain height. This is a classical approach,
followed in [RS75] and [Sch76], and later in [RS03]; we will use the version of (1)
kindly provided by Ramaré in [Ramd]. We carry out this approach in full here, not
because it is preferrable to [HP13] – it is still based on computations, and it is slightly
more indirect than [HP13] – but simply to show that one can establish what we need
by a different route.
A numerical verification of the Riemann hypothesis up to a certain height consists
simply in checking that all (non-trivial) zeroes z of the Riemann zeta function up to a
height H (meaning: =(z) ≤ H) lie on the critical line <(z) = 1/2.
The height up to which the Riemann hypothesis has actually been fully verified is
not a matter on which there is unanimity. The strongest claim in the literature is in
[GD04], which states that the first 1013 zeroes of the Riemann zeta function lie on the
critical line <(z) = 1/2. This corresponds to checking the Riemann hypothesis up to
height H = 2.44599 · 1012 . It is unclear whether this computation was or could be
easily made rigorous; as pointed out in [SD10, p. 2398], it has not been replicated yet.
Before [GD04], the strongest results were those of the ZetaGrid distributed com-
puting project led by S. Wedeniwski [Wed03]; the method followed in it was more
305
306 APPENDIX D. CHECKING SMALL N BY CHECKING ZEROS OF ζ(S)
traditional, and should allow rigorous verification involving interval arithmetic. Unfor-
tunately, the results were never formally published. The statement that the ZetaGrid
project verified the first 9 · 1011 zeroes (corresponding to H = 2.419 · 1011 ) is often
quoted (e.g., [Bom10, p. 29]); this is the point to which the project had got by the
time of Gourdon and Demichel’s announcement. Wedeniwski asserts in private com-
munication that the project verified the first 1012 zeroes, and that the computation was
double-checked (by the same method).
The strongest claim prior to ZetaGrid was that of van de Lune (H = 3.293 · 109 ,
first 1010 zeroes; unpublished). Recently, Platt [Plaa] checked the first 1.1 · 1011 ze-
roes (H = 3.061 · 1010 ) rigorously, following a method essentially based on that
in [Boo06a]. Note that [Plaa] uses interval arithmetic, which is highly desirable for
floating-point computations.
Proposition D.0.3. Every odd integer 5 ≤ n ≤ n0 is the sum of three primes, where
29
5.90698 · 10
if [GD04] is used (H = 2.44 · 1012 ),
28
n0 = 6.15697 · 10 if ZetaGrid results are used (H = 2.419 · 1011 ),
1.23163 · 1027 if [Plaa] is used ( H = 3.061 · 1010 ).
Proof. For n ≤ 4 · 1018 + 3, this is immediate from [OeSHP14]. Let 4 · 1018 + 3 <
n ≤ n0 . We need to show that there is a prime p in [n − 4 − (n − 4)/∆, n − 4],
where ∆ is large enough for (n − 4)/∆ ≤ 4 · 1018 − 4 to hold. We will then have that
4 ≤ n − p ≤ 4 + (n − 4)/∆ ≤ 4 · 1018 . Since n − p is even, [OeSHP14] will then
imply that n − p is the sum of two primes p0 , p00 , and so
n = p + p0 + p00 .
This gives us (n − 4)/∆ ≤ 4 · 1018 − 4 for n − 4 < er0 , where r0 = 67 in case (a),
r0 = 66 in case (b) and r0 = 62 in case (c).
If n − 4 ≥ er0 , we can choose (again by [Ramd])
146869130682 in case (a),
∆ = 15392435100 in case (b),
307908668 in case (c).
This is enough for n − 4 < e68 in case (a), and without further conditions for (b) or (c).
307
Finally, if n − 4 ≥ e68 and we are in case (a), [Ramd] assures us that the choice
∆ = 147674531294
In other words, the rigorous results in [Plaa] are enough to show the result for all
odd n ≤ 1027 . Of course, [HP13] is also more than enough, and gives stronger results
than Prop. D.0.3.
308 APPENDIX D. CHECKING SMALL N BY CHECKING ZEROS OF ζ(S)
Bibliography
309
310 BIBLIOGRAPHY
[But11] Y. Buttkewitz. Exponential sums over primes and the prime twin prob-
lem. Acta Math. Hungar., 131(1-2):46–58, 2011.
[Che73] J. R. Chen. On the representation of a larger even integer as the sum of
a prime and the product of at most two primes. Sci. Sinica, 16:157–176,
1973.
[Che85] J. R. Chen. On the estimation of some trigonometrical sums and their
application. Sci. Sinica Ser. A, 28(5):449–458, 1985.
[Chu37] N.G. Chudakov. On the Goldbach problem. C. R. (Dokl.) Acad. Sci.
URSS, n. Ser., 17:335–338, 1937.
[Chu38] N.G. Chudakov. On the density of the set of even numbers which are
not representable as the sum of two odd primes. Izv. Akad. Nauk SSSR
Ser. Mat. 2, pages 25–40, 1938.
[Chu47] N. G. Chudakov. Introduction to the theory of Dirichlet L-functions.
OGIZ, Moscow-Leningrad, 1947. In Russian.
[CW89] J. R. Chen and T. Z. Wang. On the Goldbach problem. Acta Math.
Sinica, 32(5):702–718, 1989.
[CW96] J. R. Chen and T. Z. Wang. The Goldbach problem for odd numbers.
Acta Math. Sinica (Chin. Ser.), 39(2):169–174, 1996.
[Dab96] H. Daboussi. Effective estimates of exponential sums over primes.
In Analytic number theory, Vol. 1 (Allerton Park, IL, 1995), volume
138 of Progr. Math., pages 231–244. Birkhäuser Boston, Boston, MA,
1996.
[Dav67] H. Davenport. Multiplicative number theory. Markham Publishing
Co., Chicago, Ill., 1967. Lectures given at the University of Michigan,
Winter Term.
[dB81] N. G. de Bruijn. Asymptotic methods in analysis. Dover Publications
Inc., New York, third edition, 1981.
[Des08] R. Descartes. Œuvres de Descartes publiées par Charles Adam et
Paul Tannery sous les auspices du Ministère de l’Instruction publique.
Physico-mathematica. Compendium musicae. Regulae ad directionem
ingenii. Recherche de la vérité. Supplément à la correspondance. X.
Paris: Léopold Cerf. IV u. 691 S. 4◦ , 1908.
[Des77] J.-M. Deshouillers. Sur la constante de Šnirel0 man. In Séminaire
Delange-Pisot-Poitou, 17e année: (1975/76), Théorie des nombres:
Fac. 2, Exp. No. G16, page 6. Secrétariat Math., Paris, 1977.
[DEtRZ97] J.-M. Deshouillers, G. Effinger, H. te Riele, and D. Zinoviev. A com-
plete Vinogradov 3-primes theorem under the Riemann hypothesis.
Electron. Res. Announc. Amer. Math. Soc., 3:99–104, 1997.
BIBLIOGRAPHY 311
[Kad05] H. Kadiri. Une région explicite sans zéros pour la fonction ζ de Rie-
mann. Acta Arith., 117(4):303–339, 2005.
[LW02] M.-Ch. Liu and T. Wang. On the Vinogradov bound in the three primes
Goldbach conjecture. Acta Arith., 105(2):133–175, 2002.
[McC84a] K. S. McCurley. Explicit estimates for the error term in the prime num-
ber theorem for arithmetic progressions. Math. Comp., 42(165):265–
285, 1984.
[Ramc] O. Ramaré. A sharp bilinear form decomposition for primes and Moe-
bius function. Preprint. To appear in Acta. Math. Sinica.
[Ram95] O. Ramaré. On Šnirel0 man’s constant. Ann. Scuola Norm. Sup. Pisa
Cl. Sci. (4), 22(4):645–706, 1995.
[Ram13] O. Ramaré. From explicit estimates for primes to explicit estimates for
the Möbius function. Acta Arith., 157(4):365–379, 2013.
[Ros41] B. Rosser. Explicit bounds for some functions of prime numbers. Amer.
J. Math., 63:211–232, 1941.