0% found this document useful (0 votes)
13 views327 pages

1501.05438v2

The document discusses the ternary Goldbach problem, detailing its history, recent developments, and various mathematical methods used to approach it, particularly the circle method and Fourier analysis. It includes extensive sections on major and minor arcs, type I and type II sums, and integrals, providing a comprehensive treatment of the problem. The conclusion summarizes the findings and proofs related to the main theorem of the study.

Uploaded by

陳俊傑
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views327 pages

1501.05438v2

The document discusses the ternary Goldbach problem, detailing its history, recent developments, and various mathematical methods used to approach it, particularly the circle method and Fourier analysis. It includes extensive sections on major and minor arcs, type I and type II sums, and integrals, providing a comprehensive treatment of the problem. The conclusion summarizes the findings and proofs related to the main theorem of the study.

Uploaded by

陳俊傑
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 327

arXiv:1501.05438v2 [math.

NT] 27 Jan 2015

The ternary Goldbach problem

Harald Andrés Helfgott


ii
Contents

Preface vii

Acknowledgements ix

1 Introduction 1
1.1 History and new developments . . . . . . . . . . . . . . . . . . . . . 2
1.2 The circle method: Fourier analysis on Z . . . . . . . . . . . . . . . . 6
1.3 The major arcs M . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.3.1 What do we really know about L-functions and their zeros? . 9
1.3.2 Estimates of fb(α) for α in the major arcs . . . . . . . . . . . 10
1.4 The minor arcs m . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.4.1 Qualitative goals and main ideas . . . . . . . . . . . . . . . . 14
1.4.2 Combinatorial identities . . . . . . . . . . . . . . . . . . . . 16
1.4.3 Type I sums . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.4.4 Type II, or bilinear, sums . . . . . . . . . . . . . . . . . . . . 21
1.5 Integrals over the major and minor arcs . . . . . . . . . . . . . . . . 24
1.6 Some remarks on computations . . . . . . . . . . . . . . . . . . . . . 28

2 Notation and preliminaries 31


2.1 General notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.2 Dirichlet characters and L functions . . . . . . . . . . . . . . . . . . 32
2.3 Fourier transforms and exponential sums . . . . . . . . . . . . . . . . 32
2.4 Mellin transforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.5 Bounds on sums of µ and Λ . . . . . . . . . . . . . . . . . . . . . . . 35
2.6 Interval arithmetic and the bisection method . . . . . . . . . . . . . . 38

I Minor arcs 41
3 Introduction 43
3.1 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.2 Comparison to earlier work . . . . . . . . . . . . . . . . . . . . . . . 45
3.3 Basic setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.3.1 Vaughan’s identity . . . . . . . . . . . . . . . . . . . . . . . 45

iii
iv CONTENTS

3.3.2 An alternative route . . . . . . . . . . . . . . . . . . . . . . . 47

4 Type I sums 51
4.1 Trigonometric sums . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.2 Type I estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.2.1 Type I: variations . . . . . . . . . . . . . . . . . . . . . . . . 63

5 Type II sums 77
5.1 The sum S1 : cancellation . . . . . . . . . . . . . . . . . . . . . . . . 80
5.1.1 Reduction to a sum with µ . . . . . . . . . . . . . . . . . . . 80
5.1.2 Explicit bounds for a sum with µ . . . . . . . . . . . . . . . . 84
5.1.3 Estimating the triple sum . . . . . . . . . . . . . . . . . . . . 89
5.2 The sum S2 : the large sieve, primes and tails . . . . . . . . . . . . . . 93

6 Minor-arc totals 101


6.1 The smoothing function . . . . . . . . . . . . . . . . . . . . . . . . . 101
6.2 Contributions of different types . . . . . . . . . . . . . . . . . . . . . 102
6.2.1 Type I terms: SI,1 . . . . . . . . . . . . . . . . . . . . . . . . 102
6.2.2 Type I terms: SI,2 . . . . . . . . . . . . . . . . . . . . . . . . 103
6.2.3 Type II terms. . . . . . . . . . . . . . . . . . . . . . . . . . . 107
6.3 Adjusting parameters. Calculations. . . . . . . . . . . . . . . . . . . 117
6.3.1 First choice of parameters: q ≤ y . . . . . . . . . . . . . . . 119
6.3.2 Second choice of parameters . . . . . . . . . . . . . . . . . . 125
6.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133

II Major arcs 135


7 Major arcs: overview and results 137
7.1 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
7.2 Main ideas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140

8 The Mellin transform of the twisted Gaussian 143


8.1 How to choose a smoothing function? . . . . . . . . . . . . . . . . . 145
8.2 The twisted Gaussian: overview and setup . . . . . . . . . . . . . . . 146
8.2.1 Relation to the existing literature . . . . . . . . . . . . . . . . 146
8.2.2 General approach . . . . . . . . . . . . . . . . . . . . . . . . 147
8.3 The saddle point . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
8.3.1 The coordinates of the saddle point . . . . . . . . . . . . . . 149
8.3.2 The direction of steepest descent . . . . . . . . . . . . . . . . 150
8.4 The integral over the contour . . . . . . . . . . . . . . . . . . . . . . 152
8.4.1 A simple contour . . . . . . . . . . . . . . . . . . . . . . . . 152
8.4.2 Another simple contour . . . . . . . . . . . . . . . . . . . . . 157
8.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
CONTENTS v

9 Explicit formulas 163


9.1 A general explicit formula . . . . . . . . . . . . . . . . . . . . . . . 164
9.2 Sums and decay for the Gaussian . . . . . . . . . . . . . . . . . . . . 175
9.3 The case of η∗ (t) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
9.4 The case of η+ (t) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
9.5 A sum for η+ (t)2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
9.6 A verification of zeros and its consequences . . . . . . . . . . . . . . 193

III The integral over the circle 199


10 The integral over the major arcs 201
10.1 Decomposition of Sη by characters . . . . . . . . . . . . . . . . . . . 202
10.2 The integral over the major arcs: the main term . . . . . . . . . . . . 204
10.3 The `2 norm over the major arcs . . . . . . . . . . . . . . . . . . . . 207
10.4 The integral over the major arcs: conclusion . . . . . . . . . . . . . . 212

11 Optimizing and adapting smoothing functions 217


11.1 The symmetric smoothing function η◦ . . . . . . . . . . . . . . . . . 218
11.1.1 The product η◦ (t)η◦ (ρ − t). . . . . . . . . . . . . . . . . . . 218
11.2 The smoothing function η∗ : adapting minor-arc bounds . . . . . . . . 219

12 The `2 norm and the large sieve 227


12.1 Variations on the large sieve for primes . . . . . . . . . . . . . . . . . 227
12.2 Bounding the quotient in the large sieve for primes . . . . . . . . . . 232

13 The integral over the minor arcs 245


13.1 Putting together `2 bounds over arcs and `∞ bounds . . . . . . . . . . 245
13.2 The minor-arc total . . . . . . . . . . . . . . . . . . . . . . . . . . . 248

14 Conclusion 259
14.1 The `2 norm over the major arcs: explicit version . . . . . . . . . . . 259
14.2 The total major-arc contribution . . . . . . . . . . . . . . . . . . . . 261
14.3 The minor-arc total: explicit version . . . . . . . . . . . . . . . . . . 267
14.4 Conclusion: proof of main theorem . . . . . . . . . . . . . . . . . . . 275

IV Appendices 277
A Norms of smoothing functions 279
A.1 The decay of a Mellin transform . . . . . . . . . . . . . . . . . . . . 280
A.2 The difference η+ − η◦ in `2 norm. . . . . . . . . . . . . . . . . . . . 283
A.3 Norms involving η+ . . . . . . . . . . . . . . . . . . . . . . . . . . . 285
0
A.4 Norms involving η+ . . . . . . . . . . . . . . . . . . . . . . . . . . . 286
A.5 The `∞ -norm of η+ . . . . . . . . . . . . . . . . . . . . . . . . . . . 288
vi CONTENTS

B Norms of Fourier transforms 291


B.1 The Fourier transform of η200 . . . . . . . . . . . . . . . . . . . . . . 291
B.2 Bounds involving a logarithmic factor . . . . . . . . . . . . . . . . . 293

C Sums involving Λ and φ 297


C.1 Sums over primes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297
C.2 Sums involving φ . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299

D Checking small n by checking zeros of ζ(s) 305


Preface

ἐγγὺς δ’ ἦν τέλεος· ὃ δὲ τὀ τρίτον ἧκε χ[αμᾶζε·


σὺν τῶι δ’ ἐξέφυγεν θάνατον καὶ κῆ[ρα μέλαιναν

Hesiod (?), Ehoiai, fr. 76.21–2 Merkelbach and West

The ternary Goldbach conjecture (or three-prime conjecture) states that every odd
number n greater than 5 can be written as the sum of three primes. The purpose of this
book is to give the first full proof of this conjecture.
The proof builds on the great advances made in the early 20th century by Hardy and
Littlewood (1922) and Vinogradov (1937). Progress since then has been more gradual.
In some ways, it was necessary to clear the board and start work using only the main
existing ideas towards the problem, together with techniques developed elsewhere.
Part of the aim has been to keep the exposition as accessible as possible, with
an emphasis on qualitative improvements and new technical ideas that should be of
use elsewhere. The main strategy was to give an analytic approach that is efficient,
relatively clean, and, as it must be for this problem, explicit; the focus does not lie in
optimizing explicit constants, or in performing calculations, necessary as these tasks
are.
Organization. In the introduction, after a summary of the history of the problem,
we will go over a detailed outline of the proof. The rest of the book is divided in three
parts, structured so that they can be read independently: the first two parts do not refer
to each other, and the third part uses only the main results (clearly marked) of the first
two parts.
As is the case in most proofs involving the circle method, the problem is reduced to
showing that a certain integral over the “circle” R/Z is non-zero. The circle is divided
into major arcs and minor arcs. In Part I – in some ways the technical heart of the proof
– we will see how to give upper bounds on the integrand when α is in the minor arcs.
Part II will provide rather precise estimates for the integrand when the variable α is in
the major arcs. Lastly, Part III shows how to use these inputs as well as possible to
estimate the integral.
Each part and each chapter starts with a general discussion of the strategy and
the main ideas involved. Some of the more technical bounds and computations are
relegated to the appendices.

vii
viii PREFACE

Dependencies between the chapters

Introduction Notation and


preliminaries

1 2

Major arcs: The integral over


Minor arcs: overview the major arcs
introduction
3 7 10

Mellin transform of
Smoothing func-
twisted Gaussian
Type I sums tions and their use

4 8 11

Explicit formulas The `2 norm and


Type II sums the large sieve

5 9 12

The integral over


Minor-arc the minor arcs
totals
6 13

14

Conclusion
Acknowledgements

The author is very thankful to D. Platt, who, working in close coordination with him,
provided GRH verifications in the necessary ranges, and also helped him with the usage
of interval arithmetic. He is also deeply grateful to O. Ramaré, who, in reply to his
requests, prepared and sent for publication several auxiliary results, and who otherwise
provided much-needed feedback.
The author is also much indebted to A. Booker, B. Green, R. Heath-Brown, H.
Kadiri, D. Platt, T. Tao and M. Watkins for many discussions on Goldbach’s prob-
lem and related issues. Several historical questions became clearer due to the help
of J. Brandes, K. Gong, R. Heath-Brown, Z. Silagadze, R. Vaughan and T. Wooley.
Additional references were graciously provided by R. Bryant, S. Huntsman and I.
Rezvyakova. Thanks are also due to B. Bukh, A. Granville and P. Sarnak for their
valuable advice.
The introduction is largely based on the author’s article for the Proceedings of the
2014 ICM [Hel14b]. That article, in turn, is based in part on the informal note [Hel13b],
which was published in Spanish translation ([Hel13a], translated by M. A. Morales and
the author, and revised with the help of J. Cilleruelo and M. Helfgott) and in a French
version ([Hel14a], translated by M. Bilu and revised by the author). The proof first
appeared as a series of preprints: [Helb], [Hela], [Helc].
Travel and other expenses were funded in part by the Adams Prize and the Philip
Leverhulme Prize. The author’s work on the problem started at the Université de
Montréal (CRM) in 2006; he is grateful to both the Université de Montréal and the
École Normale Supérieure for providing pleasant working environments. During the
last stages of the work, travel was partly covered by ANR Project Caesar No. ANR-
12-BS01-0011.
The present work would most likely not have been possible without free and pub-
licly available software: SAGE, PARI, Maxima, gnuplot, VNODE-LP, PROFIL / BIAS,
and, of course, LATEX, Emacs, the gcc compiler and GNU/Linux in general. Some ex-
ploratory work was done in SAGE and Mathematica. Rigorous calculations used either
D. Platt’s interval-arithmetic package (based in part on Crlibm) or the PROFIL/BIAS
interval arithmetic package underlying VNODE-LP.
The calculations contained in this paper used a nearly trivial amount of resources;
they were all carried out on the author’s desktop computers at home and work. How-
ever, D. Platt’s computations [Plab] used a significant amount of resources, kindly do-
nated to D. Platt and the author by several institutions. This crucial help was provided
by MesoPSL (affiliated with the Observatoire de Paris and Paris Sciences et Lettres),

ix
x ACKNOWLEDGEMENTS

Université de Paris VI/VII (UPMC - DSI - Pôle Calcul), University of Warwick (thanks
to Bill Hart), University of Bristol, France Grilles (French National Grid Infrastructure,
DIRAC national instance), Université de Lyon 1 and Université de Bordeaux 1. Both
D. Platt and the author would like to thank the donating organizations, their technical
staff, and all those who helped to make these resources available to them.
Chapter 1

Introduction

The question we will discuss, or one similar to it, seems to have been first posed by
Descartes, in a manuscript published only centuries after his death [Des08, p. 298].
Descartes states: “Sed & omnis numerus par fit ex uno vel duobus vel tribus primis”
(“But also every even number is made out of one, two or three prime numbers.”1 .) This
statement comes in the middle of a discussion of sums of polygonal numbers, such as
the squares.
Statements on sums of primes and sums of values of polynomials (polygonal num-
bers, powers nk , etc.) have since shown themselves to be much more than mere cu-
riosities – and not just because they are often very difficult to prove. Whereas the study
of sums of powers can rely on their algebraic structure, the study of sums of primes
leads to the realization that, from several perspectives, the set of primes behaves much
like the set of integers, or like a random set of integers. (It also leads to the realization
that this is very hard to prove.)
If, instead of the primes, we had a random set of odd integers S whose density –
an intuitive concept that can be made precise – equaled that of the primes, then we
would expect to be able to write every odd number as a sum of three elements of S,
and every even number as the sum of two elements of S. We would have to check by
hand whether this is true for small odd and even numbers, but it is relatively easy to
show that, after a long enough check, it would be very unlikely that there would be any
exceptions left among the infinitely many cases left to check.
The question, then, is in what sense we need the primes to be like a random set of
integers; in other words, we need to know what we can prove about the regularities of
the distribution of the primes. This is one of the main questions of analytic number
theory; progress on it has been very slow and difficult.
Fourier analysis expresses information on the distribution of a sequence in terms
of frequencies. In the case of the primes, what may be called the main frequencies –
those in the major arcs – correspond to the same kind of large-scale distribution that
is encoded by L-functions, the family of functions to which the Riemann zeta function
1 Thanks are due to J. Brandes and R. Vaughan for a discussion on a possible ambiguity in the Latin

wording. Descartes’ statement is mentioned (with a translation much like the one given here) in Dickson’s
History [Dic66, Ch. XVIII].

1
2 CHAPTER 1. INTRODUCTION

belongs. On some of the crucial questions on L-functions, the limits of our knowledge
have barely budged in the last century. There is something relatively new now, namely,
rigorous numerical data of non-negligible scope; still, such data is, by definition, finite,
and, as a consequence, its range of applicability is very narrow. Thus, the real question
in the major-arc regime is how to use well the limited information we do have on the
large-scale distribution of the primes. As we will see, this requires delicate work on
explicit asymptotic analysis and smoothing functions.
Outside the main frequencies – that is, in what are called the minor arcs – estimates
based on L-functions no longer apply, and what is remarkable is that one can say
anything meaningful on the distribution of the primes. Vinogradov was the first to give
unconditional, non-trivial bounds, showing that there are no great irregularities in the
minor arcs; this is what makes them “minor”. Here the task is to give sharper bounds
than Vinogradov. It is in this regime that we can genuinely say that we learn a little
more about the distribution of the primes, based on what is essentially an elementary
and highly optimized analytic-combinatorial analysis of exponential sums, i.e., Fourier
coefficients given by series (supported on the primes, in our case).
The circle method reduces an additive problem – that is, a problems on sums, such
as sums of primes, powers, etc. – to the estimation of an integral on the space of
frequencies (the “circle” R/Z). In the case of the primes, as we have just discussed, we
have precise estimates on the integrand on part of the circle (the major arcs), and upper
bounds on the rest of the circle (the minor arcs). Putting them together efficiently to
give an estimate on the integral is a delicate matter; we leave it for the last part, as it
is really what is particular to our problem, as opposed to being of immediate general
relevance to the study of the primes. As we shall see, estimating the integral well does
involve using – and improving – general estimates on the variance of irregularities in
the distribution of the primes, as given by the large sieve.
In fact, one of the main general lessons of the proof is that there is a very close
relationship between the circle method and the large sieve; we will use the large sieve
not just as a tool – which we shall, incidentally, sharpen in certain contexts – but as a
source for ideas on how to apply the circle method more effectively.
This has been an attempt at a first look from above. Let us now undertake a more
leisurely and detailed overview of the problem and its solution.

1.1 History and new developments


The history of the conjecture starts properly with Euler and his close friend, Christian
Goldbach, both of whom lived and worked in Russia at the time of their correspon-
dence – about a century after Descartes’ isolated statement. Goldbach, a man of many
interests, is usually classed as a serious amateur; he seems to have awakened Euler’s
passion for number theory, which would lead to the beginning of the modern era of
the subject [Wei84, Ch. 3, §IV]. In a letter dated June 7, 1742, Goldbach made a
conjectural statement on prime numbers, and Euler rapidly reduced it to the following
conjecture, which, he said, Goldbach had already posed to him: every positive integer
can be written as the sum of at most three prime numbers.
1.1. HISTORY AND NEW DEVELOPMENTS 3

We would now say “every integer greater than 1”, since we no long consider 1 to
be a prime number. Moreover, the conjecture is nowadays split into two:
• the weak, or ternary, Goldbach conjecture states that every odd integer greater
than 5 can be written as the sum of three primes;
• the strong, or binary, Goldbach conjecture states that every even integer greater
than 2 can be written as the sum of two primes.
As their names indicate, the strong conjecture implies the weak one (easily: subtract 3
from your odd number n, then express n − 3 as the sum of two primes).
The strong conjecture remains out of reach. A short while ago – the first complete
version appeared on May 13, 2013 – the author proved the weak Goldbach conjecture.
Theorem 1.1.1. Every odd integer greater than 5 can be written as the sum of three
primes.
In 1937, I. M. Vinogradov proved [Vin37] that the conjecture is true for all odd
numbers n larger than some constant C. (Hardy and Littlewood had proved the same
statement under the assumption of the Generalized Riemann Hypothesis, which we
shall have the chance to discuss later.)
It is clear that a computation can verify the conjecture only for n ≤ c, c a constant:
computations have to be finite. What can make a result coming from analytic number
theory be valid only for n ≥ C?
An analytic proof, generally speaking, gives us more than just existence. In this
kind of problem, it gives us more than the possibility of doing something (here, writing
an integer n as the sum of three primes). It gives us a rigorous estimate for the number
of ways in which this something is possible; that is, it shows us that this number of
ways equals
main term + error term, (1.1)
where the main term is a precise quantity f (n), and the error term is something whose
absolute value is at most another precise quantity g(n). If f (n) > g(n), then (1.1) is
non-zero, i.e., we will have shown the existence of a way to write our number as the
sum of three primes.
(Since what we truly care about is existence, we are free to weigh different ways
of writing n as the sum of three primes however we wish – that is, we can decide that
some primes “count” twice or thrice as much as others, and that some do not count at
all.)
Typically, after much work, we succeed in obtaining (1.1) with f (n) and g(n) such
that f (n) > g(n) asymptotically, that is, for n large enough. To give a highly simplified
example: if, say, f (n) = n2 and g(n) = 100n3/2 , then f (n) > g(n) for n > C, where
C = 104 , and so the number of ways (1.1) is positive for n > C.
We want a moderate value of C, that is, a C small enough that all cases n ≤ C can
be checked computationally. To ensure this, we must make the error term bound g(n)
as small as possible. This is our main task. A secondary (and sometimes neglected)
possibility is to rig the weights so as to make the main term f (n) larger in comparison
to g(n); this can generally be done only up to a certain point, but is nonetheless very
helpful.
4 CHAPTER 1. INTRODUCTION

As we said, the first unconditional proof that odd numbers n ≥ C can be written
as the sum of three primes is due to Vinogradov. Analytic bounds fall into several
categories, or stages; quite often, successive versions of the same theorem will go
through successive stages.
1. An ineffective result shows that a statement is true for some constant C, but gives
no way to determine what the constant C might be. Vinogradov’s first proof of
his theorem (in [Vin37]) is like this: it shows that there exists a constant C such
that every odd number n > C is the sum of three primes, yet give us no hope of
finding out what the constant C might be.2 Many proofs of Vinogradov’s result
in textbooks are also of this type.
2. An effective, but not explicit, result shows that a statement is true for some
unspecified constant C in a way that makes it clear that a constant C could
in principle be determined following and reworking the proof with great care.
Vinogradov’s later proof ([Vin47], translated in [Vin54]) is of this nature. As
Chudakov [Chu47, §IV.2] pointed out, the improvement on [Vin37] given by
Mardzhanishvili [Mar41] already had the effect of making the result effective.3
3. An explicit result gives a value of C. According to [Chu47, p. 201], the first
explicit version of Vinogradov’s result was given by Borozdkin in his unpub-
lished doctoral dissertation, written under the direction of Vinogradov (1939):
C = exp(exp(exp(41.96))). Such a result is, by definition, also effective.
16.038
Borodzkin later [Bor56] gave the value C = ee , though he does not seem to
have published the proof. The best – that is, smallest – value of C known before
the present work was that of Liu and Wang [LW02]: C = 2 · 101346 .
4. What we may call an efficient proof gives a reasonable value for C – in our case,
a value small enough that checking all cases up to C is feasible.
How far were we from an efficient proof? That is, what sort of computation could
ever be feasible? The situation was paradoxical: the conjecture was known above an
explicit C, but C = 2 · 101346 is so large that it could not be said that the problem could
be attacked by any foreseeable computational means within our physical universe. (A
truly brute-force
√ verification up to C takes at least C steps; a cleverer verification takes
well over C steps. The number of picoseconds since the beginning of the universe is
less than 1030 , whereas the number of protons in the observable universe is currently
estimated at ∼ 1080 [Shu92]; this limits the number of steps that can be taken in
any currently imaginable computer, even if it were to do parallel processing on an
astronomical scale.) Thus, the only way forward was a series of drastic improvements
in the mathematical, rather than computational, side.
I gave a proof with C = 1029 in May 2013. Since D. Platt and I had verified
the conjecture for all odd numbers up to n ≤ 8.8 · 1030 by computer [HP13], this
established the conjecture for all odd numbers n.
2 Here, as is often the case in ineffective results in analytic number theory, the underlying issue is that of

Siegel zeros, which are believed not to exist, but have not been shown not to; the strongest bounds on (i.e.,
against) such zeros are ineffective, and so are all of the many results using such estimates.
3 The proof in [Mar41] combined the bounds in [Vin37] with a more careful accounting of the effect of

the single possible Siegel zero within range.


1.1. HISTORY AND NEW DEVELOPMENTS 5

(In December 2013, I reduced C to 1027 . The verification of the ternary Gold-
bach conjecture up to n ≤ 1027 can be done on a home computer over a weekend,
as of the time of writing (2014). It must be said that this uses the verification of the
binary Goldbach conjecture for n ≤ 4 · 1018 [OeSHP14], which itself required com-
putational resources far outside the home-computing range. Checking the conjecture
up to n ≤ 1027 was not even the main computational task that needed to be accom-
plished to establish the Main Theorem – that task was the finite verification of zeros of
L-functions in [Plab], a general-purpose computation that should be useful elsewhere.)
What was the strategy of the proof? The basic framework is the one pioneered by
Hardy and Littlewood for a variety of problems – namely, the circle method, which, as
we shall see, is an application of Fourier analysis over Z. (There are other, later routes
to Vinogradov’s result; see [HB85], [FI98] and especially the recent work [Sha14],
which avoids using anything about zeros of L-functions inside the critical strip.) Vino-
gradov’s proof, like much of the later work on the subject, was based on a detailed
analysis of exponential sums, i.e., Fourier transforms over Z. So is the proof that we
will sketch.
At the same time, the distance between 2 · 101346 and 1027 is such that we cannot
hope to get to 1027 (or any other reasonable constant) by fine-tuning previous work.
Rather, we must work from scratch, using the basic outline in Vinogradov’s original
proof and other, initially unrelated, developments in analysis and number theory (no-
tably, the large sieve). Merely improving constants will not do; rather, we must do
qualitatively better than previous work (by non-constant factors) if we are to have any
chance to succeed. It is on these qualitative improvements that we will focus.

***

It is only fair to review some of the progress made between Vinogradov’s time and
ours. Here we will focus on results; later, we will discuss some of the progress made
in the techniques of proof. See [Dic66, Ch. XVIII] for the early history of the problem
(before Hardy and Littlewood); see R. Vaughan’s ICM lecture notes on the ternary
Goldbach problem [Vau80] for some further details on the history up to 1978.
In 1933, Schnirelmann proved [Sch33] that every integer n > 1 can be written as
the sum of at most K primes for some unspecified constant K. (This pioneering work
is now considered to be part of the early history of additive combinatorics.) In 1969,
Klimov gave an explicit value for K (namely, K = 6 · 109 ); he later improved the
constant to K = 115 (with G. Z. Piltay and T. A. Sheptickaja) and K = 55. Later,
there were results by Vaughan [Vau77a] (K = 27), Deshouillers [Des77] (K = 26)
and Riesel-Vaughan [RV83] (K = 19).
Ramaré showed in 1995 that every even number n > 1 can be written as the sum of
at most 6 primes [Ram95]. In 2012, Tao proved [Tao14] that every odd number n > 1
is the sum of at most 5 primes.
There have been other avenues of attack towards the strong conjecture. Using ideas
close to those of Vinogradov’s, Chudakov [Chu37], [Chu38], Estermann [Est37] and
van der Corput [van37] proved (independently from each other) that almost every even
number (meaning: all elements of a subset of density 1 in the even numbers) can be
written as the sum of two primes. In 1973, J.-R. Chen showed [Che73] that every even
6 CHAPTER 1. INTRODUCTION

number n larger than a constant C can be written as the sum of a prime number and
the product of at most two primes (n = p1 + p2 or n = p1 + p2 p3 ). Incidentally,
J.-R. Chen himself, together with T.-Z. Wang, was responsible for the best bounds on
C (for ternary Goldbach) before Lui and Wang: C = exp(exp(11.503)) < 4 · 1043000
[CW89] and C = exp(exp(9.715)) < 6 · 107193 [CW96].
Matters are different if one assumes the Generalized Riemann Hypothesis (GRH).
A careful analysis [Eff99] of Hardy and Littlewood’s work [HL22] gives that every
odd number n ≥ 1.24 · 1050 is the sum of three primes if GRH is true4 . According
to [Eff99], the same statement with n ≥ 1032 was proven in the unpublished doctoral
dissertation of B. Lucke, a student of E. Landau’s, in 1926. Zinoviev [Zin97] improved
this to n ≥ 1020 . A computer check ([DEtRZ97]; see also [Sao98]) showed that the
conjecture is true for n < 1020 , thus completing the proof of the ternary Goldbach
conjecture under the assumption of GRH. What was open until now was, of course, the
problem of giving an unconditional proof.

1.2 The circle method: Fourier analysis on Z


It is common for a first course on Fourier analysis to focus on functions over the re-
als satisfying f (x) = f (x + 1), or, what is the same, functions f : R/Z → C.
Such a function (unless it is fairly pathological) has a Fourier series converging to it;
R same as saying that f has a Fourier transform
this is just the fb : Z → C defined
P
by f (n) = R/Z f (α)e(−αn)dα and satisfying f (α) =
b
n∈Z f (n)e(αn) (Fourier
b
inversion theorem), where e(t) = e2πit .
In number theory, we are especially interested in functions f : Z → C. Then things
are exactly the other way around: provided that f decays reasonably fast as n → ±∞
(or becomes 0 for n large enough), f has a Fourier transform fb : R/Z → C defined
P R
by fb(α) = n f (n)e(−αn) and satisfying f (n) = R/Z fb(α)e(αn)dα. (Highbrow
talk: we already knew that Z is the Fourier dual of R/Z, and so, of course, R/Z is
the Fourier dual of Z.) “ExponentialP sums” (or “trigonometrical sums”, as in the title
of [Vin54]) are sums of the form n f (α)e(−αn); of course, the “circle” in “circle
method” is just a name for R/Z. (To see an actual circle in the complex plane, look at
the image of R/Z under the map α 7→ e(α).)
The study of the Fourier transform fb is relevant to additive problems in number
theory, i.e., questions on the number of ways of writing n as a sum of k integers of
a particular form. Why? One answer could be that fb gives us information about the
“randomness” of f ; if f were the characteristic function of a random set, then fb(α)
would be very small outside a sharp peak at α = 0.
We can also give a more concrete and immediate answer. Recall that, in general,
the Fourier transform of a convolution equals the product of the transforms; over Z,

4 In fact, Hardy, Littlewood and Effinger use an assumption somewhat weaker than GRH: they assume

that Dirichlet L-functions have no zeroes satisfying <(s) ≥ θ, where θ < 3/4 is arbitrary. (We will review
Dirichlet L-functions in a minute.)
1.2. THE CIRCLE METHOD: FOURIER ANALYSIS ON Z 7

this means that for the additive convolution


X
(f ∗ g)(n) = f (m1 )g(m2 ),
m1 ,m2 ∈Z
m1 +m2 =n

the Fourier transform satisfies the simple rule

∗ g(α) = fb(α) · gb(α).


f[

We can see right away from this that (f ∗ g)(n) can be non-zero only if n can be
written as n = m1 + m2 for some m1 , m2 such that f (m1 ) and g(m2 ) are non-zero.
Similarly, (f ∗g ∗h)(n) can be non-zero only if n can be written as n = m1 +m2 +m3
for some m1 , m2 , m3 such that f (m1 ), f2 (m2 ) and f3 (m3 ) are all non-zero. This
suggests that, to study the ternary Goldbach problem, we define f1 , f2 , f3 : Z → C so
that they take non-zero values only at the primes.
Hardy and Littlewood defined f1 (n) = f2 (n) = f3 (n) = 0 for n non-prime (and
also for n ≤ 0), and f1 (n) = f2 (n) = f3 (n) = (log n)e−n/x for n prime (where x is
a parameter to be fixed later). Here the factor e−n/x is there to provide “fast decay”,
so that everything converges; as we will see later, Hardy and Littlewood’s choice of
e−n/x (rather than some other function of fast decay) comes across in hindsight as
being very clever, though not quite best-possible. (Their “choice” was, to some extent,
not a choice, but an artifact of their version of the circle method, which was framed
in terms of power series, not in terms of exponential sums with arbitrary smoothing
functions.) The term log n is there for technical reasons – in essence, it makes sense
to put it there because a random integer around n has a chance of about 1/(log n) of
being prime.
We can see that (f1 ∗ f2 ∗ f3 )(n) 6= 0 if and only if n can be written as the sum
of three primes. Our task is then to show that (f1 ∗ f2 ∗ f3 )(n) (i.e., (f ∗ f ∗ f )(n))
is non-zero for every n larger than a constant C ∼ 1027 . Since the transform of a
convolution equals a product of transforms,
Z Z
(f1 ∗f2 ∗f3 )(n) = f1 ∗\ f2 ∗ f3 (α)e(αn)dα = (fb1 fb2 fb3 )(α)e(αn)dα. (1.2)
R/Z R/Z

R
Our task is thus to show that the integral R/Z (fb1 fb2 fb3 )(α)e(αn)dα is non-zero.
As it happens, fb(α) is particularly large when α is close to a rational with small
denominator. Moreover, for such α, it turns out we can actually give rather precise
estimates for fb(α). Define M (called the set of major arcs) to be a union of narrow
arcs around the rationals with small denominator:
[ [ a 1 a 1

M= − , + ,
q qQ q qQ
q≤r a mod q
(a,q)=1

where Q is a constant times x/r, and r will be set later. (This is a slight simplification:
the major-arc set we will actually use in the course of the proof will be a little different,
8 CHAPTER 1. INTRODUCTION

due to a distinction between odd and even q.) We can write


Z Z Z
(f1 f2 f3 )(α)e(αn)dα =
b b b (f1 f2 f3 )(α)e(αn)dα + (fb1 fb2 fb3 )(α)e(αn)dα,
b b b
R/Z M m
(1.3)
where m is the complement (R/Z) \ M (called minor arcs).
Now, we simply do not know how to give precise estimates for fb(α) when α is in
m. However, as Vinogradov realized, one can give reasonable upper bounds on |fb(α)|
for α ∈ m. This suggests the following strategy: show that
Z Z
|fb1 (α)||fb2 (α)||fb3 (α)|dα < fb1 (α)fb2 (α)fb3 (α)e(αn)dα. (1.4)
m M

By (1.2) and (1.3), this will imply immediately that (f1 ∗ f2 ∗ f3 )(n) > 0, and so we
will be done.
The name of circle method is given to the study of additive problems by means of
Fourier analysis over Z, and, in particular, to the use of a subdivision of the circle R/Z
into major and minor arcs to estimate the integral of a Fourier transform. There was
a “circle” already in Hardy and Ramanujan’s work [HR00], but the subdivision into
major and minor arcs is due to Hardy and Littlewood, who also applied their method
to a wide variety of additive problems. (Hence “the Hardy-Littlewood method” as an
alternative name for the circle method.) For instance, before working on the ternary
Goldbach conjecture, they studied the question of whether every n > C can be written
as the sum of kth powers (Waring’s problem). In fact, they used a subdivision into
major and minor arcs to study Waring’s problem, and not for the ternary Goldbach
problem: they had no minor-arc bounds for ternary Goldbach, and their use of GRH
had the effect of making every α ∈ R/Z yield to a major-arc treatment.
Vinogradov worked with finite exponential sums, i.e., fi compactly supported.
From today’s perspective, it is clear that there are applications (such as ours) in which
it can be more important for fi to be smooth than compactly supported; still, Vino-
gradov’s simplifications were an incentive to further developments. In the case of the
ternary Goldbach’s problem, his key contribution consisted in the fact that he could
give bounds on fb(α) for α in the minor arcs without using GRH.
An important note: in the case of the binary Goldbach conjecture, the method fails
at (1.4), and not before; if our understanding of the actual value of fbi (α) is at all correct,
it is simply not true in general that
Z Z
|fb1 (α)||fb2 (α)|dα < fb1 (α)fb2 (α)e(αn)dα.
m M

Let us see why this is not surprising. Set f1 = f2 = f3 = f for simplicity, so that
we have the integral of the square (fb(α))2 for the binary problem, and the integral of
the cube (fb(α))3 for the ternary problem. Squaring, like cubing, amplifies the peaks
of fb(α), which are at the rationals of small denominator and their immediate neighbor-
hoods (the major arcs); however, cubing amplifies the peaks muchRmore than squaring.
This is why, even though the arcs making up M are very narrow, M (fb(α))3 e(αn)dα
1.3. THE MAJOR ARCS M 9

is larger than m |fb(α)|3 dα; that explains the name major arcs – they are not large, but
R

they give the major part of the contribution. R In contrast, squaring amplifies the peaks
less, and this is why the absolute value of M fb(α)2 e(αn)dα is in general smaller than
|fb(α)|2 dα. As nobody knows how to prove a precise estimate (and, in particular,
R
m
lower bounds) on fb(α) for α ∈ m, the binary Goldbach conjecture is still very much
out of reach.
To prove the ternary Goldbach conjecture, it is enough to estimate both sides of
(1.4) for carefully chosen f1 , f2 , f3 , and compare them. This is our task from now on.

1.3 The major arcs M


1.3.1 What do we really know about L-functions and their zeros?
Before we start, let us give a very brief review of basic analytic number theory (in the
sense of, say, [Dav67]). A Dirichlet character χ : Z → C of modulus q is a character
of (Z/qZ)∗ lifted to Z. (In other words, χ(n) = χ(n + q) for all n, χ(ab) = χ(a)χ(b)
for all a, b and χ(n) = 0 for (n, q) 6= 1.) A Dirichlet L-series is defined by

X
L(s, χ) = χ(n)n−s
n=1

for <(s) > 1, and by analytic continuation for <(s) ≤ 1. (The Riemann zeta function
ζ(s) is the L-function for the trivial character, i.e., the character χ such that χ(n) = 1
for all n.) Taking logarithms and then derivatives, we see that

L0 (s, χ) X
− = χ(n)Λ(n)n−s , (1.5)
L(s, χ) n=1

for <(s) > 1, where Λ is the von Mangoldt function (Λ(n) = log p if n is some prime
power pα , α ≥ 1, and Λ(n) = 0 otherwise).
Dirichlet introduced his characters and L-series so as to study primes in arithmetic
progressions. In general, and after some work, (1.5) allows us to restate many sums
over the primes (such as our Fourier transforms fb(α)) as sums over the zeros of L(s, χ).
A non-trivial zero of L(s, χ) is a zero of L(s, χ) such that 0 < <(s) < 1. (The other
zeros are called trivial because we know where they are, namely, at negative integers
and, in some cases, also on the line <(s) = 0. In order to eliminate all zeros on
<(s) = 0 outside s = 0, it suffices to assume that χ is primitive; a primitive character
modulo q is one that is not induced by (i.e., not the restriction of) any character modulo
d|q, d < q.)
The Generalized Riemann Hypothesis for Dirichlet L-functions is the statement
that, for every Dirichlet character χ, every non-trivial zero of L(s, χ) satisfies <(s) =
1/2. Of course, the Generalized Riemann Hypothesis (GRH) – and the Riemann Hy-
pothesis, which is the special case of χ trivial – remains unproven. Thus, if we want to
prove unconditional statements, we need to make do with partial results towards GRH.
Two kinds of such results have been proven:
10 CHAPTER 1. INTRODUCTION

• Zero-free regions. Ever since the late nineteenth century (Hadamard, de la


Vallée-Poussin) we have known that there are hourglass-shaped regions (more
precisely, of the shape logc t ≤ σ ≤ 1 − logc t , where c is a constant and where we
write s = σ + it) outside which non-trivial zeros cannot lie. Explicit values for
c are known [McC84b], [Kad05], [Kad]. There is also the Vinogradov-Korobov
region [Kor58], [Vin58], which is broader asymptotically but narrower in most
of the practical range (see [For02], however).

• Finite verifications of GRH. It is possible to (ask a computer to) prove small,


finite fragments of GRH, in the sense of verifying that all non-trivial zeros of
a given finite set of L-functions with imaginary part less than some constant H
lie on the critical line <(s) = 1/2. Such verifications go back to Riemann,
who checked the first few zeros of ζ(s). Large-scale, rigorous computer-based
verifications are now a possibility.

Most work in the literature follows the first alternative, though [Tao14] did use a
finite verification of RH (i.e., GRH for the trivial character). Unfortunately, zero-free
regions seem too narrow to be useful for the ternary Goldbach problem. Thus, we are
left with the second alternative.
In coordination with the present work, Platt [Plab] verified that all zeros s of L-
functions for characters χ with modulus q ≤ 300000 satisfying =(s) ≤ Hq lie on the
line <(s) = 1/2, where

• Hq = 108 /q for q odd, and

• Hq = max(108 /q, 200 + 7.5 · 107 /q) for q even.

This was a medium-large computation, taking a few hundreds of thousands of core-


hours on a parallel computer. It used interval arithmetic for the sake of rigor; we will
later discuss what this means.
The choice to use a finite verification of GRH, rather than zero-free regions, had
consequences on the manner in which the major and minor arcs had to be chosen. As
we shall see, such a verification can be used to give very precise bounds on the major
arcs, but also forces us to define them so that they are narrow and their number is
constant. To be precise: the major arcs were defined around rationals a/q with q ≤ r,
r = 300000; moreover, as will become clear, the fact that Hq is finite will force their
width to be bounded by c0 r/qx, where c0 is a constant (say c0 = 8).

1.3.2 Estimates of fb(α) for α in the major arcs


P
Recall that we want to estimate sums of the type fb(α) = f (n)e(−αn), where
f (n) is something like (log n)η(n/x) for n equal to a prime, and 0 otherwise; here
η : R → C is some function of fast decay, such as Hardy and Littlewood’s choice,
(
e−t for t ≥ 0,
η(t) =
0 for t < 0.
1.3. THE MAJOR ARCS M 11

Let us modify this just a little – we will actually estimate


X
Sη (α, x) = Λ(n)e(αn)η(n/x), (1.6)

where Λ is the von Mangoldt function (as in (1.5)) . The use of α rather than −α is just
a bow to tradition, as is the use of the letter S (for “sum”); however, the use of Λ(n)
rather than just plain log p does actually simplify matters.
The function η here is sometimes called a smoothing function or simply a smooth-
ing. It will indeed be helpful for it to be smooth on (0, ∞), but, in principle, it need
not even be continuous. (Vinogradov’s work implicitly uses, in effect, the “brutal trun-
cation” 1[0,1] (t), defined to be 1 when t ∈ [0, 1] and 0 otherwise; that would be fine for
the minor arcs, but, as it will become clear, it is a bad idea as far as the major arcs are
concerned.)
Assume α is on a major arc, meaning that we can write α = a/q+δ/x for some a/q
(q small) and some δ (with |δ| small). We can write Sη (α, x) as a linear combination
 
X δ
Sη (α, x) = cχ Sη,χ , x + tiny error term, (1.7)
χ
x

where   X
δ
Sη,χ ,x = Λ(n)χ(n)e(δn/x)η(n/x). (1.8)
x
In (1.7),√χ runs over primitive Dirichlet characters of moduli d|q, and cχ is small
(|cχ | ≤ d/φ(q)).
Why are we expressing the sums Sη (α, x) in terms of the sums Sη,χ (δ/x, x), which
look more complicated? The argument has become δ/x, whereas before it was α.
Here δ is relatively small – smaller than the constant c0 r, in our setup. In other words,
e(δn/x) will go around the circle a bounded number of times as n goes from 1 up to a
constant times x (by which time η(n/x) has become small, because η is of fast decay).
This makes the sums much easier to estimate.
To estimate the sums Sη,χ , we will use L-functions, together with one of the most
common tools of analytic number theory, the Mellin transform. This transform is es-
sentially a Laplace transform with a change of variables, and a Laplace transform, in
turn, is a Fourier transform taken on a vertical line in the complex plane. For f of fast
enough decay, the Mellin transform F = M f of f is given by
Z ∞
dt
F (s) = f (t)ts ;
0 t

we can express f in terms of F by the Mellin inversion formula


Z σ+i∞
1
f (t) = F (s)t−s ds
2πi σ−i∞

for any σ within an interval. We can thus express e(δt)η(t) in terms of its Mellin
transform Fδ and then use (1.5) to express Sη,χ in terms of Fδ and L0 (s, χ)/L(s, χ);
12 CHAPTER 1. INTRODUCTION

shifting the integral in the Mellin inversion formula to the left, we obtain what is known
in analytic number theory as an explicit formula:
X
Sη,χ (δ/x, x) = [bη (−δ)x] − Fδ (ρ)xρ + tiny error term.
ρ

Here the term between brackets appears only for χ trivial. In the sum, ρ goes over all
non-trivial zeros of L(s, χ), and Fδ is the Mellin transform of e(δt)η(t). (The tiny error
term comes from a sum over the trivial zeros of L(s, χ).) We will obtain the estimate
we desire if we manage to show that the sum over ρ is small.
The point is this: if we verify GRH for L(s, χ) up to imaginary part H, i.e., if
we check√that all zeroes ρ of L(s, χ) with |=(ρ)| ≤ H satisfy <(ρ) = 1/2, we have
|xρ | = x. In other words, xρ is very small (compared to x). However, for any
ρ whose imaginary part has absolute value greater than H, we know next to nothing
about its real part, other than 0 ≤ <(ρ) ≤ 1. (Zero-free regions are notoriously weak
for =(ρ) large; we will not use them.) Hence, our only chance is to make sure that
Fδ (ρ) is very small when |=(ρ)| ≥ H.
This has to be true for both δ very small (including the case δ = 0) and for δ not so
small (|δ| up to c0 r/q, which can be large because r is a large constant). How can we
choose η so that Fδ (ρ) is very small in both cases for τ = =(ρ) large?
The method of stationary phase is useful as an exploratory tool here. In brief, it
suggests (and can sometimes prove) that the main contribution to the integral
Z ∞
dt
Fδ (t) = e(δt)η(t)ts (1.9)
0 t
can be found where the phase of the integrand has derivative 0. This happens when
t = −τ /2πδ (for sgn(τ ) 6= sgn(δ)); the contribution is then a moderate factor times
η(−τ /2πδ). In other words, if sgn(τ ) 6= sgn(δ) and δ is not too small (|δ| ≥ 8, say),
Fδ (σ + iτ ) behaves like η(−τ /2πδ); if δ is small (|δ| < 8), then Fδ behaves like F0 ,
which is the Mellin transform M η of η. Here is our goal, then: the decay of η(t) as
|t| → ∞ should be as fast as possible, and the decay of the transform M η(σ + iτ )
should also be as fast as possible.
This is a classical dilemma, often called the uncertainty principle because it is the
mathematical fact underlying the physical principle of the same name: you cannot have
a function η that decreases extremely rapidly and whose Fourier transform (or, in this
case, its Mellin transform) also decays extremely rapidly.
What does “extremely rapidly” mean here? It means (as Hardy himself proved)
“faster than any exponential e−Ct ”. Thus, Hardy and Littlewood’s choice η(t) = e−t
seems essentially optimal at first sight.
However, it is not optimal. We can choose η so that M η decreases exponentially
(with a constant C somewhat worse than for η(t) = e−t ), but η decreases faster than
exponentially. This is a particularly appealing possibility because it is t/|δ|, and not so
much t, that risks being fairly small. (To be explicit: say we check GRH for characters
of modulus q up to Hq ∼ 50 · c0 r/q ≥ 50|δ|. Then we only know that |τ /2πδ| &
8. So, for η(t) = e−t , η(−τ /2πδ) may be as large as e−8 , which is not negligible.
Indeed, since this term will be multiplied later by other terms, e−8 is simply not small
1.3. THE MAJOR ARCS M 13

enough. On the other hand, we can assume that Hq ≥ 200 (say), and so M η(s) ∼
e−(π/2)|τ | is completely negligible, and will remain negligible even if we replace π/2
by a somewhat smaller constant.)
2
We shall take η(t) = e−t /2 (that is, the Gaussian). This is not the only possible
choice, but it is in some sense natural. It is easy to show that the Mellin transform Fδ
2
for η(t) = e−t /2 is a multiple of what is called a parabolic cylinder function U (a, z)
with imaginary values for z. There are plenty of estimates on parabolic cylinder func-
tions in the literature – but mostly for a and z real, in part because that is one of the
cases occuring most often in applications. There are some asymptotic expansions and
estimates for U (a, z), a, z, general, due to Olver [Olv58], [Olv59], [Olv61], [Olv65],
but unfortunately they come without fully explicit error terms for a and z within our
range of interest. (The same holds for [TV03].)
In the end, I derived bounds for Fδ using the saddle-point method. (The method
of stationary phase, which we used to choose η, seems to lead to error terms that are
too large.) The saddle-point method consists, in brief, in changing the contour of an
integral to be bounded (in this case, (1.9)) so as to minimize the maximum of the
integrand. (To use a metaphor in [dB81]: find the lowest mountain pass.)
Here we strive to get clean bounds, rather than the best possible constants. Consider
the case k = 0 of Corollary 8.0.2 with k = 0; it states the following. For s = σ + iτ
with σ ∈ [0, 1] and |τ | ≥ max(100, 4π 2 |δ|), we obtain that the Mellin transform Fδ of
2
η(t)e(δt) with η(t) = e−t /2 satisfies
2|τ | 2
(
3.001e−0.1065( |`| ) if 4|τ |/`2 < 3/2.
|Fδ (s + k)| + |Fδ ((1 − s) + k)| ≤ (1.10)
3.286e−0.1598|τ | if 4|τ |/`2 ≥ 3/2.

Similar bounds hold for σ in other ranges, thus giving us estimates on the Mellin
2
transform Fδ for η(t) = tk e−t /2 and σ in the critical range [0, 1]. (We could do a little
better if we knew the value of σ, but, in our applications, we do not, once we leave
the range in which GRH has been checked. We will give a bound (Theorem 8.0.1) that
does take σ into account, and also reflects and takes advantage of the fact that there
is a transitional region around |τ | ∼ (3/2)(π/δ)2 ; in practice, however, we will use
Cor. 8.0.2.)
A moment’s thought shows that we can also use (1.10) to deal with the Mellin
2
transform of η(t)e(δt) for any function of the form η(t) = e−t /2 g(t) (or, more gener-
2
ally, η(t) = tk e−t /2 g(t)), where g(t) is any band-limited function. By a band-limited
function, we could mean a function whose Fourier transform is compactly supported;
while that is a plausible choice, it turns out to be better to work with functions that are
band-limited with respect to the Mellin transform – in the sense of being of the form
Z R
g(t) = h(r)t−ir dr,
−R

where h : R → C is supported on a compact interval [−R, R], with R not too large (say
2
R = 200). What happens is that the Mellin transform of the product e−t /2 g(t)e(δt)
2
is a convolution of the Mellin transform Fδ (s) of e−t /2 e(δt) (estimated in (1.10)) and
14 CHAPTER 1. INTRODUCTION

that of g(t) (supported in [−R, R]); the effect of the convolution is just to delay decay
of Fδ (s) by, at most, a shift by y 7→ y − R.
We wish to estimate Sη,χ (δ/x) for several functions η. This motivates us to derive
an explicit formula (§) general enough to work with all the weights η(t) we will work
with, while being also completely explicit, and free of any integrals that may be tedious
to evaluate.
Once that is done, and once we consider the input provided by Platt’s finite verifi-
cation of GRH up to Hq , we obtain simple bounds for different weights.
2
For η(t) = e−t /2 , x ≥ 108 , χ a primitive character of modulus q ≤ r = 300000,
and any δ ∈ R with |δ| ≤ 4r/q, we obtain
 
δ
Sη,χ , x = Iq=1 · ηb(−δ)x + E · x, (1.11)
x

where Iq=1 = 1 if q = 1, Iq=1 = 0 if q 6= 1, and


 
1 650400
|E| ≤ 4.306 · 10−22 + √ √ + 112 . (1.12)
x q

Here ηb stands for the Fourier transform from R to R normalized as follows: ηb(t) =
R∞ √ 2 2

−∞
e(−xt)η(x)dx. Thus, ηb(−δ) is just 2πe−2π δ (self-duality of the Gaussian).
This is one of the main results of Part II; see §7.1. Similar bounds are also proven
2 2
there for η(t) = t2 e−t /2 , as well as for a weight of type η(t) = te−t /2 g(t), where
g(t) is a band-limited function, and also for a weight η defined by a multiplicative
convolution. The conditions on q (namely, q ≤ r = 300000) and δ are what we
expected from the outset.
Thus concludes our treatment of the major arcs. This is arguably the easiest part of
the proof; it was actually what I left for the end, as I was fairly confident it would work
out. Minor-arc estimates are more delicate; let us now examine them.

1.4 The minor arcs m


1.4.1 Qualitative goals and main ideas
What kind of bounds do we need? What is there in the literature?
We wish to obtain upper bounds on |Sη (α, x)| for some weight η and any α ∈ R/Z
not very close to a rational with small denominator. Every α is close to some rational
a/q; what we are looking for is a bound on |Sη (α, x)| that decreases rapidly when q
increases.
Moreover, we want our bound to decrease rapidly when δ increases, where α =
a/q + δ/x. In fact, the main terms in our bound will be decreasing functions of
max(1, |δ|/8) · q. (Let us write δ0 = max(2, |δ|/4) from now on.) This will allow
our bound to be good enough outside narrow major arcs, which will get narrower and
narrower as q increases – that is, precisely the kind of major arcs we were presupposing
in our major-arc bounds.
1.4. THE MINOR ARCS M 15

It would be possible to work with narrow major arcs that become narrower as q
increases simply by allowing q to be very large (close to x), and assigning each angle
to the fraction closest to it. This is, in fact, the common procedure. However, this
makes matters more difficult, in that we would have to minimize at the same time the
√ √
factors in front of terms x/q, x/ q, etc., and those in front of terms q, qx, and so
on. (These terms are being compared to the trivial bound x.) Instead, we choose to
strive for a direct dependence on δ throughout; this will allow us to cap q at a much

lower level, thus making terms such as q and qx negligible. (This choice has been
taken elsewhere in applications of the circle method, but, strangely, seems absent from
previous work on the ternary Goldbach conjecture.)
How good must our bounds be? Since the major-arc bounds are valid only for
q ≤ r = 300000 and |δ| ≤ 4r/q, we cannot afford even a single factor ofplog x (or
any other function tending to ∞ as x → ∞) in front of terms such as x/ q|δ0 |: a
factor like that would make the term larger than the trivial bound x if q|δ0 | is equal to
a constant (r, say) and x is very large. Apparently, there was no such “log-free bound”
with explicit constants in the literature, even though such bounds were considered to
be in principle feasible, and even though previous work ([Che85], [Dab96], [DR01],
[Tao14]) had gradually decreased the number of factors of log x. (In limited ranges for
q, there were log-free bounds without explicit constants; see [Dab96], [Ram10]. The
estimate in [Vin54, Thm. 2a, 2b] was almost log-free, but not quite. There were also
bounds [Kar93], [But11] that used L-functions, and thus were not really useful in a
truly minor-arc regime.)

It also seemed clear that a main bound proportional to (log q)2 x/ q (as in [Tao14])
was too large. At the same time, it was not really necessary to reach a bound of the
best possible form that could be found through Vinogradov’s basic approach, namely

x q
|Sη (α, x)| ≤ C . (1.13)
φ(q)
Such a bound had been proven by Ramaré [Ram10] for q in a limited range and C
non-explicit; later, in [Ramc] – which postdates the first version of [Helb] – Ramaré
broadened the range to q ≤ x1/48 and gave an explicit value for C, namely, C = 13000.
Such a bound is a notable achievement, but, unfortunately, it is not useful for our
purposes. Rather, we will aim
p at a bound whose main term is bounded by a constant
around 1 times x(log δ0 q)/ δ0 φ(q); this is slightly worse asymptotically than (1.13),
but it is much better in the delicate range of δ0 q ∼ 300000, and in fact for a much
wider range as well.

***

We see that we have several tasks. One of them is the removal of logarithms: we
cannot afford a single factor of log x, and, in practice, we can afford at most one factor
of log q. Removing logarithms will be possible in part because of the use of previously
existing efficient techniques (the large sieve for sequences with prime support) but also
because we will be able to find cancellation at several places in sums coming from a
combinatorial identity (namely, Vaughan’s identity). The task of finding cancellation
is particularly delicate because we cannot afford large constants or, for that matter,
16 CHAPTER 1. INTRODUCTION

P
statements valid only for large x. (Bounding a sum such as n µ(n) efficiently, where
µ is the Möbius function
(
(−1)k if n = p1 p2 . . . pk , all pi distinct
µ(n) =
0 if p2 |n for some prime p,
P
is harder than estimating a sum such as n Λ(n) equally efficiently, even though we
are used to thinking of the two problems as equivalent.)
We have said that our bounds will improve as |δ| increases. This dependence on
δ will be secured in different ways at different places. Sometimes δ will appear as
an argument, as in ηb(−δ); for η piecewise continuous with η 0 ∈ L1 , we know that
|b
η (t)| → 0 as |t| → ∞. Sometimes we will obtain a dependence on δ by using several
different rational approximations to the same α ∈ R. Lastly, we will obtain a good
dependence on δ in bilinear sums by supplying a scattered input to a large sieve.
If there is a main moral to the argument, it lies in the close relation between the
circle method and the large sieve. The circle method rests on the estimation of an
integral involving a Fourier transform fb : R/Z → C; as we will later see, this leads
naturally to estimating the `2 -norm of fb on subsets (namely, unions of arcs) of the circle
R/Z. The large sieve can be seen as an approximate discrete version of Plancherel’s
identity, which states that |fb|2 = |f |2 .
Both in this section and in §1.5, we shall use the large sieve in part so as to use
the fact that some of the functions we work with have prime support, i.e., are non-zero
only on prime numbers. There are ways to use prime support to improve the output
of the large sieve. In §1.5, these techniques will be refined and then translated to the
context of the circle method, where f has (essentially) prime support and |fb|2 must be
integrated over unions of arcs. (This allows us to remove a logarithm.) The main point
is that the large sieve is not being used as a black box; rather, we can adapt ideas from
(say) the large-sieve context and apply them to the circle method.
Lastly, there are the benefits of a continuous η. Hardy and Littlewood already
used a continuous η; this was abandoned by Vinogradov, presumably for the sake of
simplicity. The idea that smooth weights η can be superior to sharp truncations is
now commonplace. As we shall see, using a continuous η is helpful in the minor-arcs
regime, but not as crucial there as for the major arcs. We will not use a smooth η; we
will prove our estimates for any continuous η that is piecewise C1 , and then, towards
the end, we will choose to use the same weight η = η2 as in [Tao14], in part because it
has compact support, and in part for the sake of comparison. The moral here is not quite
the common dictum “always smooth”, but rather that different kinds of smoothing can
be appropriate for different tasks; in the end, we will show how to coordinate different
smoothing functions η.
There are other ideas involved; for instance, some of Vinogradov’s lemmas are
improved. Let us now go into some of the details.

1.4.2 Combinatorial identities


Generally, since Vinogradov, a treatment of the minor arcs starts with a combinatorial
identity expressing Λ(n) (or the characteristic function of the primes) as a sum of two
1.4. THE MINOR ARCS M 17

P section, by a convolution f ∗g, we will mean the Dirichlet


or more convolutions. (In this
convolution (f ∗ g)(n) = d|n f (d)g(n/d), i.e., the multiplicative convolution on the
semigroup of positive integers.)
In some sense, the archetypical identity is
Λ = µ ∗ log,
but it will not usually do: the contribution of µ(d) log(n/d) with d close to n is too
difficult to estimate precisely. There are alternatives: for example, there is the identity
Λ(n) log n = µ ∗ log2 −Λ ∗ Λ, (1.14)
which underlies an estimate of Selberg’s that, in turn, is the basis for the Erdös-Selberg
proof of the prime number theorem; see, e.g., [MV07, §8.2]. More generally, one
can decompose Λ(n)(log n)k as µ ∗ logk+1 minus a linear combination of convolu-
tions; this kind of decomposition – really just a direct consequence of the develop-
ment of (ζ 0 (s)/ζ(s))(k) – will be familiar to some from the exposition of Bombieri’s
work [Bom76] in [FI10, §3] (for instance). Another useful identity was that used by
Daboussi [Dab96]; witness its application in [DR01], which gives explicit estimates on
exponential sums over primes.
The proof of Vinogradov’s three-prime result was simplified substantially [Vau77b]
by the introduction of Vaughan’s identity:
Λ(n) = µ≤U ∗ log −Λ≤V ∗ µ≤U ∗ 1 + 1 ∗ µ>U ∗ Λ>V + Λ≤V , (1.15)
where we are using the notation
( (
f (n) if n ≤ W , 0 if n ≤ W ,
f≤W = f>W =
0 if n > W , f (n) if n > W .
P
Of the resulting sums ( n (µ≤U ∗ log)(n)e(αn)η(n/x), P etc.), the first three are said
to be of type I, type I (again) and type II; the last sum, n≤V Λ(n), is negligible.
One of the advantages of Vaughan’s identity is its flexibility: we can set U and V
to whatever values we wish. Its main disadvantage is that it is not “log-free”, in that it
seems to imposePthe loss of two factors of log x: if we sum each side of (1.15) from 1
to x, we obtain n≤x Λ(n) ∼ x on the left side, whereas, if we bound the sum on the
right side without the use of cancellation, we obtain a bound of x(log x)2 . Of course,
we will obtain some cancellation from the phase e(αn); still, even if this gives us a
√ √
factor of, say, 1/ q, we will get a bound of x(log x)2 / q, which is worse than the
trivial bound x for q bounded and x large. Since we want a bound that is useful for all
q larger than the constant r and all x larger than a constant, this will not do.
As was pointed out in [Tao14], it is possible to get a factor of (log q)2 instead of a
factor of (log x)2 in the type II sums by setting U and V appropriately. Unfortunately,
a factor of (log q)2 is still too large in practice, and there is also the issue of factors of
log x in type I sums.
Vinogradov had already managed to get an essentially log-free result (by a rather
difficult procedure) in [Vin54, Ch. IX]. The result in [Dab96] is log-free. Unfortu-
nately, the explicit result in [DR01] – the study of which encouraged me at the begin-
ning of the project – is not. For a while, I worked with the case k = 2 of the expansion
18 CHAPTER 1. INTRODUCTION

of (ζ 0 (s)/ζ(s))(k) , which gives

Λ · log2 = µ ∗ log3 −3 · (Λ · log) ∗ Λ − Λ ∗ Λ ∗ Λ. (1.16)

This identity is essentially log-free: while a trivial bound on the sum of the right side
for n from 1 to N does seem to have two extra factors of log, they are present only in
the term µ ∗ log3 , which is not the hardest one to estimate. Ramaré obtained a log-free
bound in [Ram10] using an identity introduced by Diamond and Steinig in the course
of their own work on elementary proofs of the prime number theorem [DS70]; that
identity gives a decomposition for Λ · logk that can also be derived from the expansion
of (ζ 0 (s)/ζ(s))(k) , by a clever grouping of terms.
In the end, I decided to use Vaughan’s identity, motivated in part by [Tao14], and
in part by the lack of free parameters in (1.16); as can be seen in (1.15), Vaughan’s
identity has two parameters U , V that we can set to whatever values we think best. The
form of the identity allowed me to reuse much of my work up to that point, but it also
posed a challenge: since Vaughan’s identity is by no means log-free, one has obtain
cancellation in Vaughan’s identity at every possible step, beyond the cancellation given
by the phase e(αn). (The presence of a phase, in fact, makes the task of getting can-
cellation from the identity more complicated.) The removal of logarithms will be one
of our main tasks in what follows. It is clear that the presence of the Möbius function
µ should give, in principle, some cancellation; we will show how to use it to obtain as
much cancellation as we need – with good constants, and not just asymptotically.

1.4.3 Type I sums


There are two type I sums, namely,
X X  mn 
µ(m) (log n)e(αmn)η (1.17)
n
x
m≤U

and X X X  vun 
Λ(v) µ(u) e(αvun)η . (1.18)
n
x
v≤V u≤U

In either case, α = a/q + √ δ/x, where q is larger than a constant r and |δ/x| ≤ 1/qQ0
for some Q0 > max(q, x). For the purposes of this exposition, we will set it as our
task to estimate the slightly simpler sum
X X  mn 
µ(m) e(αmn)η , (1.19)
n
x
m≤D

where D can be U or U V or something else less than x.


Why can we consider this simpler sum without omitting anything essential? It is
clear that (1.17) is of the same kind as (1.19). The inner double sum in (1.18) is just
(1.19) with αv instead of α; this enables us to estimate (1.18) by means of (1.19) for q
small, i.e., the more delicate case. If q is not small, then the approximation αv ∼ av/q
may notPbe accurate enough. In that case, we collapse the two outer sums in (1.18) into
a sum n (Λ≤V ∗ µ≤U )(n), and treat all of (1.18) much as we will treat (1.19); since
1.4. THE MINOR ARCS M 19

q is not small, we can afford to bound (Λ≤V ∗ µ≤U )(n) trivially (by log n) in the less
sensitive terms.
Let us first outline Vinogradov’s procedure for bounding type I sums. Just by sum-
ming a geometric series, we get
 
X c
e(αn) ≤ min N, , (1.20)
{α}
n≤N

where c is a constant and {α} is the distance from α to the nearest integer. Vinogradov
splits the outer sum in (1.19) into sums of length q. When m runs on an interval of
length q, the angle am/q runs through all fractions of the form b/q; due to the error
δ/x, αm could be close to 0 for two values of n, but otherwise {αm} takes values
bounded below by 1/q (twice), 2/q (twice), 3/q (twice), etc. Thus

X X X X 2N
µ(m) e(αmn) ≤ e(αmn) ≤ + 2cq log eq
m
y<m≤y+q n≤N y<m≤y+q n≤N
(1.21)
for any y ≥ 0.
There are several ways to improve this. One is simply to estimate the inner sum
more precisely; this was already done in [DR01]. One can also define a smoothing
function η, as in (1.19); it is easy to get
!
X n |η 0 |1 |η 0 |1 |ηc00 |∞
e(αn)η ≤ min x|η|1 + , , .
x 2 2| sin(πα)| 4x(sin πα)2
n≤N

Except for the third term, this is as in [Tao14]. We could also choose carefully which
bound to use for each m; surprisingly, this gives an improvement – in fact, an impor-
tant one, for m large. However, even with these improvements, we still have a term
proportional to N/m as in (1.21), and this contributes about (x log x)/q to the sum
(1.19), thus giving us an estimate that is not log-free.
What we have to do, naturally, is to take out the terms with q|m for m small. (If m
is large, then those may not be the terms for which mα is close to 0; we will later see
what to do.) For y + q ≤ Q/2, |α − a/q| ≤ 1/qQ, we get that
 
X B C
min A, , (1.22)
| sin παn| | sin παn|2
y<m≤y+q
q-m

is at most
4q √ Ce3 q
  
20 2 2Bq
min Cq , 2A + AC, max 2, log . (1.23)
3π 2 π π Bπ

This is satisfactory. We are left with all the terms m ≤ M = min(D, Q/2) with q|m
– and also with all the terms Q/2 < m ≤ D. For m ≤ M divisible by q, we can
20 CHAPTER 1. INTRODUCTION

estimate (as opposed to just bound from above) the inner sum in (1.19) by the Poisson
summation formula, and then sum over m, but without taking absolute values; writing
m = aq, we get a main term

xµ(q) X µ(a)
· ηb(−δ) · , (1.24)
q a
a≤M/q
(a,q)=1

where (a, q) stands for the greatest common divisor of a and q.


It is clear that we have to get cancellation over µ here. There is an elegant elemen-
tary argument [GR96] showing that the absolute value of the sum in (1.24) is at most
1. We need to gain one more log, however. Ramaré [Ramb] helpfully furnished the
following bound:

X µ(a) 4 q 1
≤ (1.25)
a 5 φ(q) log x/q
a≤x
(a,q)=1

for q ≤ x. (Cf. [EM95], [EM96]) This is neither trivial nor elementary.5 We are, so to
speak, allowed to use non-elementary means (that is, methods based on L-functions)
because the only L-function we need to use here is the Riemann zeta function.
What shall we do for m > Q/2? We can always give a bound

4q √
 
X C
min A, 2
≤ 3A + AC (1.26)
| sin παn| π
y<m≤y+q


for y arbitrary; since AC will be of constant size, (4q/π) AC is pleasant enough, but
the contribution of 3A ∼ 3|η|1 x/y is nasty (it adds a multiple of (x log x)/q to the
total) and seems unavoidable: the values of m for which αm is close to 0 no longer
correspond to the congruence class m ≡ 0 mod q, and thus cannot be taken out.
The solution is to switch approximations. (The idea of using different approxima-
tions to the same α is neither new nor recent in the general context of the circle method:
see [Vau97, §2.8, Ex. 2]. What may be new is its use to clear a hurdle in type I sums.)
What does this mean? If α were exactly, or almost exactly, a/q, then there would be
no other very good approximations in a reasonable range. However, note that we can
define Q = bx/|δq|c for α = a/q + δ/x, and still have |α − a/q| ≤ 1/qQ. If δ is very
small, Q will be larger than 2D, and there will be no terms with Q/2 < m ≤ D to
worry about.
5 The current state of knowledge may seem surprising: after all, we expect nearly square-root cancella-
p
2/x holds for all real 0 < x ≤ 1012 ; see also the stronger
P
tion – for instance, | n≤x µ(n)/n| ≤
bound p [Dre93]). The classical zero-free region of the Riemann zeta function ought to give a factor of
exp(− (log x)/c), which looks much better than 1/ log x. What happens is that (a) such a factor is
not actually much better than 1/ log x for x ∼ 1030 , say; (b) estimating sums involving the Möbius func-
tion by means of an explicit formula is harder than estimating sums involving Λ(n): the residues of 1/ζ(s)
at the non-trivial zeros of s come into play. As a result, getting non-trivial explicit results on sums of µ(n)
is harder than one would naively expect from the quality of classical effective (but non-explicit) results. See
[Rama] for a survey of explicit bounds.
1.4. THE MINOR ARCS M 21

What happens if δ is not very small? We know that, for any Q0 , there is an approx-
imation a0 /q 0 to α with |α − a0 /q 0 | ≤ 1/q 0 Q0 and q 0 ≤ Q0 . However, for Q0 > Q, we
know that a0 /q 0 cannot equal a/q: by the definition of Q, the approximation a/q is not
good enough, i.e., |α − a/q| ≤ 1/qQ0 does not hold. Since a/q 6= a0 /q 0 , we see that
|a/q − a0 /q 0 | ≥ 1/qq 0 , and this implies that q 0 ≥ (/(1 + ))Q.
Thus, for m > Q/2, the solution is to apply (1.26) with a0 /q 0 instead of a/q. The
contribution of A fades into insignificance: for the first sum over a range y < m ≤
y + q 0 , y ≥ Q/2, it contributes at most x/(Q/2), and all the other contributions of A
sum up to at most a constant times (x log x)/q 0 .
Proceeding in this way, we obtain a total bound for (1.19) whose main terms are
proportional to
   
1 x 1 2 D
q
min 1, 2 , |ηc00 |∞ · D and q log max ,q , (1.27)
φ(q) log xq δ π q

with good, explicit constants. The first term – usually the largest one – is precisely what
we needed: it is proportional to (1/φ(q))x/ log x for q small, and decreases rapidly as
|δ| increases.

1.4.4 Type II, or bilinear, sums


We must now bound
X X
S= (1 ∗ µ>U )(m) Λ(n)e(αmn)η(mn/x).
m n>V

At this point it is convenient to assume that η is the Mellin convolution of two functions.
The multiplicative or Mellin convolution on R+ is defined by
Z ∞  
t dr
(η0 ∗M η1 )(t) = η0 (r)η1 .
0 r r

Tao [Tao14] takes η = η2 = η1 ∗M η1 , where η1 is a brutal truncation, viz., the


function taking the value 2 on [1/2, 1] and 0 elsewhere. We take the same η2 , in part
for comparison purposes, and in part because this will allow us to use off-the-shelf
estimates on the large sieve. (Brutal truncations are rarely optimal in principle, but, as
they are very common, results for them have been carefully optimized in the literature.)
Clearly
 
Z X/U   X
X X  m  n  dW
S=  µ(d) η 1 · Λ(n)e(αmn)η 1 . (1.28)
V m
  x/W W W
d>U n≥V
d|m
22 CHAPTER 1. INTRODUCTION
p
By Cauchy-Schwarz, the integrand is at most S1 (U, W )S2 (V, W ), where

X X
S1 (U, W ) = µ(d) ,
x x
2W <m≤ W d>U
d|m (1.29)
2
X X
S2 (V, W ) = Λ(n)e(αmn) .
x x
2W ≤m≤ W max( V, W )≤n≤W
2

We must bound S1 (U, W ) by a constant times x/W . We are able to do this – with
a good constant. (A careless bound would have given a multiple of (x/U ) log3 (x/U ),
which is much too large.) First, we reduce S1 (W ) to an expression involving an inte-
gral of
X X µ(r1 )µ(r2 )
. (1.30)
σ(r1 )σ(r2 )
r1 ≤x r2 ≤x
(r1 ,r2 )=1
P
We can bound (1.30) by the use of bounds on n≤t µ(n)/n, combined with the es-
timation of infinite products by means of approximations to ζ(s) for s → 1+ . After
some additional manipulations, we obtain a bound for S1 (U, W ) whose main term is
at most (3/π 2 )(x/W ) for each W , and closer to 0.22482x/W on average over W .
(This is as good a point as any to say that, throughout, we can use a trick in [Tao14]
that allows us to work with odd values of integer variables throughout, instead of letting
m or n range over all integers. Here, for instance, if m and n are restricted to be odd,
we obtain a bound of (2/π 2 )(x/W ) for individual W , and 0.15107x/W on average
over W . This is so even though we are losing some cancellation in µ by the restriction.)
Let us now bound S2 (V, W ). This is traditionally done by Linnik’s dispersion
method. However, it should be clear that the thing to do nowadays is to use a large
sieve, and, more specifically, a large sieve for primes; that kind of large sieve is nothing
other than a tool for estimating expressions such as S2 (V, W ). (Incidentally, even
though we are trying to save every factor of log we can, we choose not to use small
sieves at all, either here or elsewhere.) In order to take advantage of prime support, we
use Montgomery’s inequality ([Mon68], [Hux72]; see the expositions in [Mon71, pp.
27–29] and [IK04, §7.4]) combined with Montgomery and Vaughan’s large sieve with
weights [MV73, (1.6)], following the general procedure in [MV73, (1.6)]. We obtain a
bound of the form  
log W x qW W
+ (1.31)
log W
2q
4φ(q) φ(q) 2

on S2 (V, W ), where, of course, we can also choose not to gain a factor of log W/2q if
q is close to or greater than W .
It remains to see how to gain a factor of |δ| in the major arcs, and more specifically
in S2 (V, W ). To explain this, let us step back and take a look at what the large sieve is.
1.4. THE MINOR ARCS M 23

Given a civilized function f : Z → C, Plancherel’s identity tells us that


Z 2 X
fb(α) dα = |f (n)|2 .
R/Z n

The large sieve can be seen as an approximate, or statistical, version of this: for a
“sample” of points α1 , α2 , . . . , αk satisfying |αi − αj | ≥ β for i 6= j, it tells us that
X 2 X
fb(αi ) ≤ (X + β −1 ) |f (n)|2 , (1.32)
1≤j≤k n

assuming that f is supported on an interval of length X.


Now consider α1 = α, α2 = 2α, α3 = 3α . . . . If α = a/q, then the angles
α1 , . . . , αq are well-separated, i.e., they satisfy |αi − αj | ≥ 1/q, and so we can apply
(1.32) with β = 1/q. However, αq+1 = α1 . Thus, if we have an outer sum of
length L > q – in (1.29), we have an outer sum of length L = x/2W – we need
to split it into P dL/qe blocks of length q, and so the total bound given by (1.32) is
dL/qe(X + q) n |f (n)|2 . Indeed, this is what gives us (1.31), which is fine, but we
want to do better for |δ| larger than a constant.
Suppose, then, that α = a/q + δ/x, where |δ| > 8, say. Then the angles α1
and αq+1 are not identical: |α1 − αq+1 | ≤ q|δ|/x. We also see that αq+1 is at a
distance at least q|δ|/x from α2 , α3 , . . . αq , provided that q|δ|/x < 1/q. We can go
on with αq+2 , αq+3 , . . . , and stop only once there is overlap, i.e., only once we reach
αm such that m|δ|/x ≥ 1/q. We then give all the angles α1 , . . . , αm – which are
separated by at least q|δ|/x from each other – to the large sieve at the same time. We
do this dL/me ≤ dL/(x/|δ|q)e times, and obtain a total bound of dL/(x/|δ|q)e(X +
x/|δ|q) n |f (n)|2 , which, for L = x/2W , X = W/2, gives us about
P

 
x W x
+ log W
4Q 2 4

provided that L ≥ x/|δ|q and, as usual, |α−a/q| ≤ 1/qQ. This is very small compared
to the trivial bound . xW/8.
What happens if L < x/|δq|? Then there is never any overlap: we consider all
angles αi , and give them all together to the large sieve. The total bound is (W 2 /4 +
xW/2|δ|q) log W . If L = x/2W is smaller than, say, x/3|δq|, then we see clearly
that there are non-intersecting swarms of angles αi around the rationals a/q. We can
thus save a factor of log (or rather (φ(q)/q) log(W/|δq|)) by applying Montgomery’s
inequality, which operates by strewing displacements of given angles (or, here, swarms
around angles) around the circle to the extent possible while keeping everything well-
separated. In this way, we obtain a bound of the form
 
log W x q W W
W
+ .
log |δ|q |δ|φ(q) φ(q) 2 2

Compare this to (1.31); we have gained a factor of |δ|/4, and so we use this estimate
when |δ| > 4. (We will actually use the criterion |δ| > 8, but, since we will be working
24 CHAPTER 1. INTRODUCTION

with approximations of the form 2α = a/q + δ/x, the value of δ in our actual work
is twice of what it is in this introduction. This is a consequence of working with sums
over the odd integers, as in [Tao14].)
***
We have succeeded in eliminating all factors of log we came across. The only
R x/U
factor of log that remains is log x/U V , coming from the integral V dW/W . Thus,
we want U V to be close to x, but we cannot let it be too close, since we also have a
term proportional to D = U V in (1.27), p and we need to keep it substantially smaller
than x. We set U and V so that U V is x/ q max(4, |δ|) or thereabouts.
In the end, after some work, we obtain our main minor-arcs bound (Theorem 3.1.1).
It
P states the following. Let x ≥ x0 , x0 = 2.16 · 1020 . Tecall that Sη (α, x) =
n Λ(n)e(αn)η(n/x) and η2 = η1 ∗M η1 = 4·1[1/2,1] ∗1[1/2,1] . Let 2α = a/q +δ/x,
q ≤ Q, gcd(a, q) = 1, |δ/x| ≤ 1/qQ, where Q = (3/4)x2/3 . If q ≤ x1/3 /6, then
Rx,δ0 q log δ0 q + 0.5 2.5x 2x
|Sη (α, x)| ≤ p ·x+ √ + · Lx,δ0 q,q + 3.36x5/6 ,
δ0 φ(q) δ0 q δ 0q
(1.33)
where
!
log 4t
δ0 = max(2, |δ|/4), Rx,t = 0.27125 log 1 + 9x1/3
+ 0.41415,
2 log 2.004t
 
q 13
Lx,t,q = log t + 7.82 + 13.66 log t + 37.55.
φ(q) 4
(1.34)
The factor Rx,t is small in practice; for typical “difficult” values of x and δ0 x, it is
less than 1. The crucial things to notice in (1.33) are that there is no factor of log x, and
that, in the main term, there is only one factor of log δ0 q. The fact that δ0 helps us as
it grows is precisely what enables us to take major arcs that get narrower and narrower
as q grows.

1.5 Integrals over the major and minor arcs


So far, we have sketched (§1.3) how to estimate Sη (α, x) for α in the major arcs and
2
η based on the Gaussian e−t /2 , and also (§1.4) how to bound |Sη (α, x)| for α in the
minor arcs and η = η2 , where η2 = 4 · 1[1/2,1] ∗M 1[1/2,1] . We now must show how to
use such information to estimate integrals such as the ones in (1.4).
We will use two smoothing functions η+ , η∗ ; in the notation of (1.3), we set f1 =
f2 = Λ(n)η+ (n/x), f3 = Λ(n)η∗ (n/x), and so we must give a lower bound for
Z
(Sη+ (α, x))2 Sη∗ (α, x)e(−αn)dα (1.35)
M

and an upper bound for


Z
2
Sη+ (α, x) Sη∗ (α, x)e(−αn)dα (1.36)
m
1.5. INTEGRALS OVER THE MAJOR AND MINOR ARCS 25

so that we can verify (1.4).


The traditional approach to (1.36) is to bound
Z Z
2 2
(Sη+ (α, x)) Sη∗ (α, x)e(−αn)dα ≤ Sη+ (α, x) dα · max ηb∗ (α)
m m α∈m
X n (1.37)
≤ Λ(n)2 η+
2
· max Sη∗ (α, x).
n
x α∈m

Since the sum over n is of the order of x log x, this is not log-free, and so cannot be
good enough; we will later see how to do better. Still, this gets the main shape right:
our bound on (1.36) will be proportional to |η+ |22 |η∗ |1 . Moreover, we see that η∗ has
to be such that we know how to bound |Sη∗ (α, x)| for α ∈ m, while our choice of η+
is more or less free, at least as far as the minor arcs are concerned.
What about the major arcs? In order to do anything on them, we will have to be
able to estimate both η+ (α) and η∗ (α) for α ∈ M. If that is the case, then, as we
shall see, we will be able to obtain that the main term of (1.35) is an infinite product
(independent of the smoothing functions), times x2 , times
Z ∞
(cη+ (−α))2 ηb∗ (−α)e(−αn/x)dα
−∞
Z ∞Z ∞ n  (1.38)
= η+ (t1 )η+ (t2 )η∗ − (t1 + t2 ) dt1 dt2 .
0 0 x
In other words, we want to maximize (or nearly maximize) the expression on the right
of (1.38) divided by |η+ |22 |η∗ |1 .
One way to do this is to let η∗ be concentrated on a small interval [0, ). Then the
right side of (1.38) is approximately
Z ∞ n 
|η∗ |1 · η+ (t)η+ − t dt. (1.39)
0 x
To maximize (1.39), we should make sure that η+ (t) ∼ η+ (n/x − t). We set x ∼ n/2,
and see that we should define η+ so that it is supported on [0, 2] and symmetric around
t = 1, or nearly so; this will maximize the ratio of (1.39) to |η+ |22 |η∗ |1 .
We should do this while making sure that we will know how to estimate Sη+ (α, x)
for α ∈ M. We know how to estimate Sη (α, x) very precisely for functions of the
2 2
form η(t) = g(t)e−t /2 , η(t) = g(t)te−t /2 , etc., where g(t) is band-limited. We will
work with a function η+ of that form, chosen so as to be very close (in `2 norm) to a
function η◦ that is in fact supported on [0, 2] and symmetric around t = 1.
We choose ( 2
t3 (2 − t)3 e−(t−1) /2 if t ∈ [0, 2],
η◦ (t) =
0 if t 6∈ [0, 2].
This function is obviously symmetric (η◦ (t) = η◦ (2 − t)) and vanishes to high order
at t = 0, besides being supported on [0, 2].
2
We set η+ (t) = hR (t)te−t /2 , where hR (t) is an approximation to the function
( 1
t2 (2 − t)3 et− 2 if t ∈ [0, 2]
h(t) =
0 if t 6∈ [0, 2].
26 CHAPTER 1. INTRODUCTION

We just let hR (t) be the inverse Mellin transform of the truncation of M h to an interval
[−iR, iR]. (Explicitly,
Z ∞
dy
hR (t) = h(ty −1 )FR (y) ,
0 y

where FR (t) = sin(R log y)/(π log y), that is, FR is the Dirichlet kernel with a change
of variables.)
2
Since the Mellin transform of te−t /2 is regular at s = 0, the Mellin transform
M η+ will be holomorphic in a neighborhood of {s : 0 ≤ <(s) ≤ 1}, even though
the truncation of M h to [−iR, iR] is brutal. Set R = 200, say. By the fast decay of
M h(it) and the fact that the Mellin transform M is an isometry, |(hR (t) − h(t))/t|2 is
very small, and hence so is |η+ − η◦ |2 , as we desired.
But what about the requirement that we be able to estimate Sη∗ (α, x) for both
α ∈ m and α ∈ M?
Generally speaking, if we know how to estimate Sη1 (α, x) for some α ∈ R/Z and
we also know how to estimate Sη2 (α, x) for all other α ∈ R/Z, where η1 and η2 are
two smoothing functions, then we know how to estimate Sη3 (α, x) for all α ∈ R/Z,
where η3 = η1 ∗M η2 , or, more generally, η∗ (t) = (η1 ∗M η2 )(κt), κ > 0 a constant.
This is an easy exercise on exchanging the order of integration and summation:
X  n
Sη∗ (α, x) = Λ(n)e(αn)(η1 ∗M η2 ) κ
n
x
Z ∞X  n  dr Z ∞ dr
= Λ(n)e(αn)η1 (κr)η2 = η1 (κr)Sη2 (rx) ,
0 n
rx r 0 r
(1.40)
and similarly with η1 and η2 switched. Of course, this trick is valid for all exponential
sums: any function f (n) would do in place of Λ(n). The only caveat is that η1 (and
η2 ) should be small very near 0, since, for r small, we may not be able to estimate
Sη2 (rx) (or Sη1 (rx)) with any precision. This is not a problem; one of our functions
2
will be t2 e−t /2 , which vanishes to second order at 0, and the other one will be η2 =
4 · 1[1/2,1] ∗M 1[1/2,1] , which has support bounded away from 0. We will set κ large
(say κ = 49) so that the support of η∗ is indeed concentrated on a small interval [0, ),
as we wanted.

***

Now that we have chosen our smoothing weights η+ and η∗ , we have to estimate the
major-arc integral (1.35) and the minor-arc integral (1.36). What follows can actually
be done for general η+ and η∗ ; we could have left our particular choice of η+ and η∗
for the end.
Estimating the major-arc integral (1.35) may sound like an easy task, since we have
rather precise estimates for Sη (α, x) (η = η+ , η∗ ) when α is on the major arcs; we
could just replace Sη (α, x) in (1.35) by the approximation given by (1.7) and (1.11). It
is, however, more efficient to express (1.35) as the sum of the contribution of the trivial
character (a sum of integrals of (bη (−δ)x)3 , where ηb(−δ)x comes from (1.11)), plus a
1.5. INTEGRALS OVER THE MAJOR AND MINOR ARCS 27

term of the form



Z
2
(maximum of q · E(q) for q ≤ r) · Sη+ (α, x) dα,
M

where E(q) = E is as in (1.12), plus two other terms of essentially the same form. As
usual, the major arcs M are the arcs around rationals a/q with q ≤ r. We will soon
2
discuss how to bound the integral of Sη+ (α, x) over arcs around rationals a/q with
q ≤ s, s arbitrary. Here, however, it is best to estimate the integral over M using the
estimate on Sη+ (α, x) from (1.7) and (1.11); we obtain a great deal of cancellation,
with the effect that, for χ non-trivial, the error term in (1.12) appears only when it gets
squared, and thus becomes negligible.
The contribution of the trivial character has an easy approximation, thanks to the
fast decay of ηb◦ . We obtain that the major-arc integral (1.35) equals a main term
C0 Cη◦ ,η∗ x2 , where
Y 1
 Y
1

C0 = 1− · 1+ ,
(p − 1)2 (p − 1)3
p|n p-n
Z ∞Z ∞ n 
Cη◦ ,η∗ = η◦ (t1 )η◦ (t2 )η∗ − (t1 + t2 ) dt1 dt2 ,
0 0 x
plus several small error terms. We have already chosen η◦ , η∗ and x so as to (nearly)
maximize Cη◦ ,η∗ .
It is time to bound the minor-arc integral (1.36). As we said in §1.5, we must do
better than the usual bound (1.37). Since our minor-arc bound (3.2) on |Sη (α, x)|,
α ∼ a/q, decreases as q increases, it makes sense to use partial summation together
with bounds on
Z Z Z
|Sη+ (α, x)|2 = |Sη+ (α, x)|2 dα − |Sη+ (α, x)|2 dα,
ms Ms M

where ms denotes the arcs around a/q, r < q ≤ s, and Ms denotes the arcs around all
a/q, q ≤ s. We already know how to estimate the integral on M. How do we bound
the integral on Ms ? R R
In order to do better than the trivial bound Ms ≤ R/Z , we will need to use the
fact that the series (1.6) defining Sη+ (α, x) is essentially supported on prime numbers.
Bounding the integral on Ms is closely related to the problem of bounding
2
X X X
an e(a/q) (1.41)
q≤s a mod q n≤x
(a,q)=1
√ √
efficiently for s considerably smaller than x and an supported on the primes x <
p ≤ x. This is a classical problem in the study of the large sieve. The usual bound on
(1.41) (by, for instance, Montgomery’s inequality) has a gain of a factor of

2eγ (log s)/(log x/s2 )


28 CHAPTER 1. INTRODUCTION

relative to the bound of (x + s2 ) n |an |2 that one would get from the large sieve
P
without using prime support. Heath-Brown proceeded similarly to bound

2eγ log s
Z Z
2
|Sη+ (α, x)| dα . |Sη+ (α, x)|2 dα. (1.42)
Ms log x/s2 R/Z

This already gives us the gain of C(log s)/ log x that we absolutely need, but
the constant C is suboptimal; the factor in the right side of (1.42) should really be
(log s)/ log x, i.e., C should be 1. We cannot reasonably hope to obtain a factor better
than 2(log s)/ log x in the minor arcs due to what is known as the parity problem in
sieve theory. As it turns out, Ramaré [Ram09] had given general bounds on the large
sieve that were clearly conducive to better bounds on (1.41), though they involved a
ratio that was not easy to bound in general.
I used several careful estimations (including [Ram95, Lem. 3.4]) to reduce the
problem of bounding this ratio to a finite number of cases, which I then checked by
a rigorous computation. This approach gave a bound on (1.41) with a factor of size
close to 2(log s)/ log x. (This solves the large-sieve problem for s ≤ x0.3 ; it would
still be worthwhile to give a computation-free proof for all s ≤ x1/2− ,  > 0.) It was
then easy to give an analogous bound for the integral over Ms , namely,
Z Z
2 log s
|Sη+ (α, x)|2 dα . |Sη+ (α, x)|2 dα,
Ms log x R/Z

where . can easily be made precise by replacing log s by log s + 1.36 and log x by
log x + c, where c is a small constant. Without this improvement, the main theorem
would still have been proved, but the required computation time would have been mul-
tiplied by a factor of considerably more than e3γ = 5.6499 . . . .
What remained then was just to compare the estimates on (1.35) and (1.36) and
check that (1.36) is smaller for n ≥ 1027 . This final step was just bookkeeping. As
we already discussed, a check for n < 1027 is easy. Thus ends the proof of the main
theorem.

1.6 Some remarks on computations


There were two main computational tasks: verifying the ternary conjecture for all n ≤
C, and checking the Generalized Riemann Hypothesis for modulus q ≤ r up to a
certain height.
The first task was not very demanding. Platt and I verified in [HP13] that every
odd integer 5 < n ≤ 8.8 · 1030 can be written as the sum of three primes. (In the
end, only a check for 5 < n ≤ 1027 was needed.) We proceeded as follows. In a
major computational effort, Oliveira e Silva, Herzog and Pardi [OeSHP14]) had already
checked that the binary Goldbach conjecture is true up to 4 · 1018 – that is, every even
number up to 4 · 1018 is the sum of two primes. Given that, all we had to do was
to construct a “prime ladder”, that is, a list of primes from 3 up to 8.8 · 1030 such
that the difference between any two consecutive primes in the list is at least 4 and at
most 4 · 1018 . (This is a known strategy: see [Sao98].) Then, for any odd integer
1.6. SOME REMARKS ON COMPUTATIONS 29

5 < n ≤ 8.8 · 1030 , there is a prime p in the list such that 4 ≤ n − p ≤ 4 · 1018 + 2.
(Choose the largest p < n in the ladder, or, if n minus that prime is 2, choose the prime
immediately under that.) By [OeSHP14] (and the fact that 4 · 1018 + 2 equals p + q,
where p = 2000000000000001301 and q = 1999999999999998701 are both prime),
we can write n − p = p1 + p2 for some primes p1 , p2 , and so n = p + p1 + p2 .
Building a prime ladder involves only integer arithmetic, that is, computer manip-
ulation of integers, rather than of real numbers. Integers are something that computers
can handle rapidly and reliably. We look for primes for our ladder only among a spe-
cial set of integers whose primality can be tested deterministically quite quickly (Proth
numbers: k · 2m + 1, k < 2m ). Thus, we can build a prime ladder by a rigorous,
deterministic algorithm that can be (and was) parallelized trivially.
The second computation is more demanding. It consists in verifying that, for every
L-function L(s, χ) with χ of conductor q ≤ r = 300000 (for q even) or q ≤ r/2
(for q odd), all zeroes of L(s, χ) such that |=(s)| ≤ Hq = 108 /q (for q odd) and
|=(s)| ≤ Hq = max(108 /q, 200 + 7.5 · 107 /q (for q even) lie on the critical line.
As a matter of fact, Platt went up to conductor q ≤ 200000 (or twice that for q even)
[Plab]; he had already gone up to conductor 100000 in his PhD thesis [Pla11]. The
verification took, in total, about 400000 core-hours (i.e., the total number of processor
cores used times the number of hours they ran equals 400000; nowadays, a top-of-the-
line processor typically has eight cores). In the end, since I used only q ≤ 150000 (or
twice that for q even), the number of hours actually needed was closer to 160000; since
I could have made do with q ≤ 120000 (at the cost of increasing C to 1029 or 1030 ), it
is likely, in retrospect, that only about 80000 core-hours were needed.
Checking zeros of L-functions computationally goes back to Riemann (who did
it by hand for the special case of the Riemann zeta function). It is also one of the
things that were tried on digital computers in their early days (by Turing [Tur53], for
instance; see the exposition in [Boo06b]). One of the main issues to be careful about
arises whenever one manipulates real numbers via a computer: generally speaking, a
computer cannot store an irrational number; moreover, while a computer can handle
rationals, it is really most comfortable handling just those rationals whose denomina-
tors are powers of two. Thus, one cannot really say: “computer, give me the sine of
that number” and expect a precise result. What one should do, if one really wants to
prove something (as is the case here!), is to say: “computer, I am giving you an interval
I = [a/2k , b/2k ]; give me an interval I 0 = [c/2` , d/2` ], preferably very short, such
that sin(I) ⊂ I 0 ”. This is called interval arithmetic; it is arguably the easiest way to do
floating-point computations rigorously.
Processors do not do this natively, and if interval arithmetic is implemented purely
on software, computations can be slowed down by a factor of about 100. Fortunately,
there are ways of running interval-arithmetic computations partly on hardware, partly
on software.
Incidentally, there are some basic functions (such as sin) that should always be done
on software, not just if one wants to use interval arithmetic, but even if one just wants
reasonably precise results: the implementation of transcendental functions in some of
the most popular processors does not always round correctly, and errors can accumulate
quickly. Fortunately, this problem is already well-known, and there is software that
takes care of this. (Platt and I used the crlibm library [DLDDD+ 10].)
30 CHAPTER 1. INTRODUCTION

Lastly, there were several relatively minor computations strewn here and there in
the proof. There is some numerical integration, done rigorously; once or twice, this
was done using a standard package based on interval arithmetic [Ned06], but most of
the time I wrote my own routines in C (using Platt’s interval arithmetic package) for
the sake of speed. Another kind of computation (employed much more in [Hela] than
in the somewhat more polished version of the proof given here) was a rigorous version
of a “proof by graph” (“the maximum of a function f is clearly less than 4 because I
can see it on the screen”). There is a standard way to do this (see, e.g., [Tuc11, §5.2]);
essentially, the bisection method combines naturally with interval arithmetic, as we
shall describe in §2.6. Yet another computation (and not a very small one) was that
involved in verifying a large-sieve inequality in an intermediate range (as we discussed
in §1.5).
It may be interesting to note that one of the inequalities used to estimate (1.30) was
proven with the help of automatic quantifier elimination [HB11]. Proving this inequal-
ity was a very minor task, both computationally and mathematically; in all likelihood,
it is feasible to give a human-generated proof. Still, it is nice to know from first-
hand experience that computers can nowadays (pretend to) do something other than
just perform numerical computations – and that this is already applicable in current
mathematical practice.
Chapter 2

Notation and preliminaries

2.1 General notation


Given positive integers m, n, we say m|n∞ if every prime dividing m also divides n.
We say a positive integer n is square-full if, for every prime p dividing n, the square
p2 also divides n. (In particular, 1 is square-full.) We say n is square-free if p2 - n
for every prime p. For p prime, n a non-zero integer, we define vp (n) to be the largest
non-negative integerPα such that pα |n.
P∞
When we write n , we mean n=1 , unless the contrary is stated. As always,
Λ(n) denotes the von Mangoldt function:
(
log p if n = pα for some prime p and some integer α ≥ 1,
Λ(n) =
0 otherwise,
and µ denotes the Möbius function:
(
(−1)k if n = p1 p2 . . . pk , all pi distinct
µ(n) =
0 if p2 |n for some prime p,

We let τ (n) be the number of divisors of an integer n, ω(n) the number of prime
divisors of n, and σ(n) the sum of the divisors of n.
We write (a, b) for the greatest common divisor of a and b. If there is any risk
of confusion with the pair (a, b), we write gcd(a, b). Denote by (a, b∞ ) the divisor
vp (a)
of a. (Thus, a/(a, b∞ ) is coprime to b, and is in fact the maximal divisor
Q
p|b p
of a with this property.)
As is customary, we write e(x) for e2πix . We denote the Lr norm of a function f
by |f |r . We write O∗ (R) to mean a quantity at most R in absolute value. Given a set
S, we write 1S for its characteristic function:
(
1 if x ∈ S,
1S (x) =
0 otherwise.

Write log+ x for max(log x, 0).

31
32 CHAPTER 2. NOTATION AND PRELIMINARIES

2.2 Dirichlet characters and L functions


Let us go over some basic terms. A Dirichlet character χ : Z → C of modulus q is a
character χ of (Z/qZ)∗ lifted to Z with the convention that χ(n) = 0 when (n, q) 6= 1.
(In other words: χ is completely multiplicative and periodic modulo q, and vanishes
on integers not coprime to q.) Again by convention, there is a Dirichlet character of
modulus q = 1, namely, the trivial character χT : Z → C defined by χT (n) = 1 for
every n ∈ Z.
If χ is a character modulo q and χ0 is a character modulo q 0 |q such that χ(n) =
χ (n) for all n coprime to q, we say that χ0 induces χ. A character is primitive if it is
0

not induced by any character of smaller modulus. Given a character χ, we write χ∗ for
the (uniquely defined) primitive character inducing χ. If a character χ mod q is induced
by the trivial character χT , we say that χ is principal and write χ0 for χ (provided the
modulus q is clear from the context). In other words, χ0 (n) = 1 when (n, q) = 1 and
χ0 (n) = 0 when (n, q) = 0.
A Dirichlet P L-function L(s, χ) (χ a Dirichlet character) is defined as the analytic
continuation of n χ(n)n−s to the entire complex plane; there is a pole at s = 1 if χ
is principal.
A non-trivial zero of L(s, χ) is any s ∈ C such that L(s, χ) = 0 and 0 < <(s) < 1.
(In particular, a zero at s = 0 is called “trivial”, even though its contribution can be
a little tricky to work out. The same would go for the other zeros with <(s) = 0
occuring for χ non-primitive, though we will avoid this issue by working mainly with
χ primitive.) The zeros that occur at (some) negative integers are called trivial zeros.
The critical line is the line <(s) = 1/2 in the complex plane. Thus, the generalized
Riemann hypothesis for Dirichlet L-functions reads: for every Dirichlet character χ,
all non-trivial zeros of L(s, χ) lie on the critical line. Verifiable finite versions of
the generalized Riemann hypothesis generally read: for every Dirichlet character χ of
modulus q ≤ Q, all non-trivial zeros of L(s, χ) with |=(s)| ≤ f (q) lie on the critical
line (where f : Z → R+ is some given function).

2.3 Fourier transforms and exponential sums


The Fourier transform on R is normalized here as follows:
Z ∞
f (t) =
b e(−xt)f (x)dx.
−∞

The trivial bound is |fb|∞ ≤ |f |1 . If f is compactly supported (or of fast enough


decay as t 7→ ±∞) and piecewise continuous, fb(t) = fb0 (t)/(2πit) by integration by
parts. Iterating, we obtain that if f is of fast decay and differentiable k times outside
finitely many points, then
!
(k) | |f (k) |1
 
|fd ∞
fb(t) = O∗ =O ∗
. (2.1)
(2πt)k (2πt)k
2.3. FOURIER TRANSFORMS AND EXPONENTIAL SUMS 33

Thus, for instance, if f is compactly supported, continuous and piecewise C 1 , then fb


decays at least quadratically.
It could happen that |f (k) |1 = ∞, in which case (2.1) is trivial (but not false). In
practice, we require f (k) ∈ L1 . In a typical situation, f is differentiable k times except
at x1 , x2 , . . . , xk , where it is differentiable only (k − 2) times; the contribution of xi
(say) to |f (k) |1 is then | limx→x+ f (k−1) (x) − limx→x− f (k−1) (x)|.
i i
The following bound is standard (see, e.g., [Tao14, Lemma 3.1]): for α ∈ R/Z and
f : R → C compactly supported and piecewise continuous,
1 0
|f |1
 
X 1
f (n)e(αn) ≤ min |f |1 + |f 0 |1 , 2 . (2.2)
2 | sin(πα)|
n∈Z

(The first bound follows from n∈Z |f (n)| ≤ |f |1 + (1/2)|f 0 |1 , which, in turn is
P
a quick consequence of the fundamental theorem of calculus; the second bound is
proven by summation by parts.) The alternative bound (1/4)|f 00 |1 /| sin(πα)|2 given
in [Tao14, Lemma 3.1] (for f continuous and piecewise C 1 ) can usually be improved
by the following estimate.
Lemma 2.3.1. Let f : R → C be compactly supported, continuous and piecewise C 1 .
Then
1 c
X |f 00 |∞
f (n)e(αn) ≤ 4 (2.3)
(sin πα)2
n∈Z
for every α ∈ R.
As usual, the assumption of compact support could easily be relaxed to an assump-
tion of fast decay.
Proof. By the Poisson summation formula,

X ∞
X
f (n)e(αn) = fb(n − α).
n=−∞ n=−∞

Since fb(t) = fb0 (t)/(2πit),


∞ ∞ ∞
X X fb0 (n − α) X fc00 (n − α)
fb(n − α) = = .
n=−∞ n=−∞
2πi(n − α) n=−∞ (2πi(n − α))2
P∞
By Euler’s formula π cot sπ = 1/s + n=1 (1/(n + s) − 1/(n − s)),

X 1 0 π2
= −(π cot sπ) = . (2.4)
n=−∞
(n + s)2 (sin sπ)2

Hence
∞ ∞
X X 1 1 π2
fb(n − α) ≤ |fc00 |∞ 2
= |fc00 |∞ · · .
n=−∞ n=−∞
(2π(n − α)) (2π) (sin απ)2
2
34 CHAPTER 2. NOTATION AND PRELIMINARIES

The trivial bound |fc00 |∞ ≤ |f 00 |1 , applied to (2.3), recovers the bound in [Tao14,
Lemma 3.1]. In order to do better, we will give a tighter bound for |fc00 |∞ in Appendix
B when f is equal to one of our main smoothing functions (f = η2 ).
Integrals of multiples of f 00 (in particular, |f 00 |1 and fc00 ) can still be made sense
of when f 00 is undefined at a finite number of points, provided f is understood as a
distribution (and f 0 has finite total variation). This is the case, in particular, for f = η2 .

***
P
When we need to estimate n f (n) precisely, we will use the Poisson summation
formula: X X
f (n) = fb(n).
n n

We will not have to worry about convergence here, since we will apply the Poisson
summation formula only to compactly supported functions f whose Fourier transforms
decay at least quadratically.

2.4 Mellin transforms


The Mellin transform of a function φ : (0, ∞) → C is
Z ∞
M φ(s) := φ(x)xs−1 dx. (2.5)
0
R∞
If φ(x)xσ−1 is in `1 with respect to dt (i.e., 0 |φ(x)|xσ−1 dx < ∞), then the Mellin
transform is defined on the line σ + iR. Moreover, if φ(x)xσ−1 is in `1 for σ = σ1 and
for σ = σ2 , where σ2 > σ1 , then it is easy to see that it is also in `1 for all σ ∈ (σ1 , σ2 ),
and that, moreover, the Mellin transform is holomorphic on {s : σ1 < <(s) < σ2 }. We
then say that {s : σ1 < <(s) < σ2 } is a strip of holomorphy for the Mellin transform.
The Mellin transform becomes a Fourier transform (of η(e−2πv )e−2πvσ ) by means
of the change of variables x = e−2πv . We thus obtain, for example, that the Mellin
transform is an isometry, in the sense that
Z ∞ Z ∞
dx 1
|f (x)|2 x2σ = |M f (σ + it)|2 dt. (2.6)
0 x 2π −∞

Recall that, in the case of the Fourier transform, for |fb|2 = |f |2 to hold, it is enough
that f be in `1 ∩ `2 . This gives us that, for (2.6) to hold, it is enough that f (x)xσ−1 be
in `1 and f (x)xσ−1/2 be in `2 (again, with respect to dt, in both cases).
We write f ∗M g for the multiplicative, or Mellin, convolution of f and g:
Z ∞  x  dw
(f ∗M g)(x) = f (w)g . (2.7)
0 w w

In general,
M (f ∗M g) = M f · M g (2.8)
2.5. BOUNDS ON SUMS OF µ AND Λ 35

and
Z σ+i∞
1
M (f · g)(s) = M f (z)M g(s − z)dz [GR94, §17.32] (2.9)
2πi σ−i∞

provided that z and s − z are within the strips on which M f and M g (respectively) are
well-defined.
We also have several useful transformation rules, just as for the Fourier transform.
For example,
M (f 0 (t))(s) = −(s − 1) · M f (s − 1),
M (tf 0 (t))(s) = −s · M f (s), (2.10)
M ((log t)f (t))(s) = (M f )0 (s)
(as in, e.g., [BBO10, Table 1.11]).
Let
η2 = (2 · 1[1/2,1] ) ∗M (2 · 1[1/2,1] ).
Since (see, e.g., [BBO10, Table 11.3] or [GR94, §16.43])

bs − as
(M I[a,b] )(s) = ,
s
we see that
2 4
1 − 2−s 1 − 2−s
 
M η2 (s) = , M η4 (s) = . (2.11)
s s

Let fz = e−zt , where <(z) > 0. Then


Z ∞ Z ∞
−zt s−1 1
(M f )(s) = e t dt = s e−t dt
0 z 0
Z z∞ Z ∞
1 1 Γ(s)
= s e−u us−1 du = s e−t ts−1 dt = s ,
z 0 z 0 z

where the next-to-last step holds by contour integration, and the last step holds by the
definition of the Gamma function Γ(s).

2.5 Bounds on sums of µ and Λ


We will need some simple explicit bounds on sums involving the von Mangoldt func-
tion Λ and the Moebius function µ. In non-explicit work, such sums are usually
bounded using the prime number theorem, or rather using the properties of the zeta
function ζ(s) underlying the prime number theorem. Here, however, we need robust,
fully explicit bounds valid over just about any range.
For the most part, we will just be quoting the literature, supplemented with some
computations when needed. The proofs in the literature are sometimes based on prop-
erties of ζ(s), and sometimes on more elementary facts.
36 CHAPTER 2. NOTATION AND PRELIMINARIES

First, let us see some bounds involving Λ. The following bound can be easily
derived from [RS62, (3.23)], supplemented by a quick calculation of the contribution
of powers of primes p < 32:
X Λ(n)
≤ log x. (2.12)
n
n≤x

We can derive a bound in the other direction from [RS62, (3.21)] (for x > 1000,
adding the contribution of all prime powers ≤ 1000) and a numerical verification for
x ≤ 1000:
X Λ(n) 3
≥ log x − log √ . (2.13)
n 2
n≤x

We also use the following older bounds:

1. By the second table in [RR96, p. 423], supplemented by a computation for


2 · 106 ≤ V ≤ 4 · 106 , X
Λ(n) ≤ 1.0004y (2.14)
n≤y

for y ≥ 2 · 106 .

2. X
Λ(n) < 1.03883y (2.15)
n≤y

for every y > 0 [RS62, Thm. 12].

For all y > 663,


X y2
Λ(n)n < 1.03884 , (2.16)
2
n≤y

where we use (2.15) and partial summation for y > 200000, and a computation for
663 < y ≤ 200000. Using instead the second table in [RR96, p. 423], together with
computations for small y < 107 and partial summation, we get that
X y2
Λ(n)n < 1.0008 (2.17)
2
n≤y

for y > 1.6 · 106 .


Similarly,
X Λ(n) √
√ < 2 · 1.0004 y (2.18)
n
n≤y

for all y ≥ 1.
It is also true that
X 1
(log p)2 ≤ y(log y) (2.19)
2
y/2<p≤y
2.5. BOUNDS ON SUMS OF µ AND Λ 37

for y ≥ 117: this holds for y ≥ 2 · 758699 by [RS75, Cor. 2] (applied to x = y,


x = y/2 and x = 2y/3) and for 117 ≤ y < 2 · 758699 by direct computation.
Now let us see some estimates on sums involving µ. The situation here is less
P involving Λ. The main reason is that the complex-analytic
satisfactory than for sums
approach to estimating n≤N µ(n) would involve 1/ζ(s) rather than ζ 0 (s)/ζ(s), and
thus strong explicit bounds on the residues of 1/ζ(s) would be needed. Thus, explicit
estimates on sums involving µ are harder to obtain than estimates on sums involving Λ.
This is so even though analytic number theorists are generally used (from the habit of
non-explicit work) to see the estimation of one kind of sum or the other as essentially
the same task. P
Fortunately, in the case of sums of the type n≤x µ(n)/n for x arbitrary (a type of
sum that will be rather important for us), all we need is a saving of (log n) or (log n)2
on the trivial bound. This is provided by the following.
1. (Granville-Ramaré [GR96], Lemma 10.2)

X µ(n)
≤1 (2.20)
n
n≤x:gcd(n,q)=1

for all x, q ≥ 1,
2. (Ramaré [Ram13]; cf. El Marraki [EM95], [EM96])

X µ(n) 0.03
≤ (2.21)
n log x
n≤x

for x ≥ 11815.
3. (Ramaré [Ramb])
 
X µ(n) 1 4 q
= O∗ · (2.22)
n log x/q 5 φ(q)
n≤x:gcd(n,q)=1

for all x and all q ≤ x;


 
X µ(n) x q
log = O∗ 1.00303 (2.23)
n n φ(q)
n≤x:gcd(n,q)=1

for all x and all q.


Improvements on these bounds would lead to improvements on type I estimates, but
not in what are the worst terms overall at this point.
A computation carried out by the author has proven the following inequality for all
real x ≤ 1012 :
r
X µ(n) 2
≤ (2.24)
n x
n≤x
38 CHAPTER 2. NOTATION AND PRELIMINARIES

The computation was conducted rigorously by means of interval arithmetic. For the
sake of verification, we record that
X µ(n)
5.42625 · 10−8 ≤ ≤ 5.42898 · 10−8 .
12
n
n≤10

Computations also show that the stronger bound

X µ(n) 1
≤ √
n 2 x
n≤x

holds for all 3 ≤ x ≤ 7727068587, but not for x = 7727068588 − .


Earlier, numerical work carried out by Olivier Ramaré [Ram14] had shown that
(2.24) holds for all x ≤ 1010 .

2.6 Interval arithmetic and the bisection method


Interval arithmetic has, at its basic data type, intervals of the form I = [a/2` , b/2` ],
where a, b, ` ∈ Z and a ≤ b. Say we have a real number x, and we want to know sin(x).
In general, we cannot represent x in a computer, in part because it may have no finite
description. The best we can do is to construct an interval of the form I = [a/2` , b/2` ]
in which x is contained.
What we ask of a routine in an interval-arithmetic package is to construct an interval
0 0
I 0 = [a0 /2` , b0 /2` ] in which sin(I) is contained. (In practice, this is done partly in
software, by means of polynomial approximations to sin with precise error terms, and
partly in hardware, by means of an efficient usage of rounding conventions.) This gives
0
us, in effect, a value for sin(x) (namely, (a0 + b0 )/2` +1 ) and a bound on the error term
0
(namely, (b0 − a0 )/2` +1 ).
There are several implementations of interval arithmetic available. We will almost
always use D. Platt’s implementation [Pla11] of double-precision interval arithmetic
based on Lambov’s [Lam08] ideas. (At one point, we will use the PROFIL/BIAS inter-
val arithmetic package [Knü99], since it underlies the VNODE-LP [Ned06] package,
which we use to bound an integral.)
The bisection method is a particularly simple method for finding maxima and min-
ima of functions, as well as roots. It combines rather nicely with interval arithmetic,
which makes the method rigorous. We follow an implementation based on [Tuc11,
§5.2]. Let us go over the basic ideas.
Let us use the bisection method to find the minima (say) of a function f on a
compact interval I0 . (If the interval is non-compact, we generally apply the bisection
method to a compact sub-interval and use other tools, e.g., power-series expansions,
in the complement.) The method proceeds by splitting an interval into two repeatedly,
discarding the halfs where the minimum cannot be found. More precisely, if we im-
plement it by interval arithmetic, it proceeds as follows. First, in an optional initial
step, we subdivide (if necessary) the interval I0 into smaller intervals Ik to which the
algorithm will actually be applied. For each k, interval arithmetic gives us a lower
2.6. INTERVAL ARITHMETIC AND THE BISECTION METHOD 39

bound rk− and an upper bound rk+ on {f (x) : x ∈ Ik }; here rk− and rk+ are both of
the form a/2` , a, ` ∈ Z. Let m0 be the minimum of rk+ over all k. We can discard
all the intervals Ik for which rk− > m0 . Then we apply the main procedure: starting
with i = 1, split each surviving interval into two equal halves, recompute the lower and
upper bound on each half, define mi , as before, to be the minimum of all upper bounds,
and discard, again, the intervals on which the lower bound is larger than mi ; increase i
by 1. We repeat the main procedure as often as needed. In the end, we obtain that the
minimum is no smaller than the minimum of the lower bounds (call them (r(i) )− k ) on
(i)
all surviving intervals Ik . Of course, we also obtain that the minimum (or minima, if
there is more than one) must lie in one of the surviving intervals.
It is easy to see how the same method can be applied (with a trivial modification)
to find maxima, or (with very slight changes) to find the roots of a real-valued function
on a compact interval.
40 CHAPTER 2. NOTATION AND PRELIMINARIES
Part I

Minor arcs

41
Chapter 3

Introduction

The circle method expresses the number of solutions to a given problem in terms of
exponential sums. Let η : R+ → C be a smooth function, Λ the von Mangoldt function
(defined as in (1.5)) and e(t) = e2πit . The estimation of exponential sums of the type
X
Sη (α, x) = Λ(n)e(αn)η(n/x), (3.1)
n

where α ∈ R/Z, already lies at the basis of Hardy and Littlewood’s approach to the
ternary Goldbach problem by means of the circle method [HL22]. The division of the
circle R/Z into “major arcs” and “minor arcs” goes back to Hardy and Littlewood’s
development of the circle method for other problems. As they themselves noted, as-
suming GRH means that, for the ternary Goldbach problem, all of the circle can be,
in effect, subdivided into major arcs – that is, under GRH, (3.1) can be estimated with
major-arc techniques for α arbitrary. They needed to make such an assumption pre-
cisely because they did not yet know how to estimate Sη (α, x) on the minor arcs.
Minor-arc techniques for Goldbach’s problem were first developed by Vinogradov
[Vin37]. These techniques make it possible to work without GRH. The main obstacle
to a full proof of the ternary Goldbach conjecture since then has been that, in spite of
gradual improvements, minor-arc bounds have simply not been strong enough.
As in all work to date, our aim will be to give useful upper bounds on (3.1) for
α in the minor bounds, rather than the precise estimates that are typical of the major-
arc case. We will have to give upper bounds that are qualitatively stronger than those
known before. (In Part III, we will also show how to use them more efficiently.)
Our main challenge will be to give a good upper bound whenever q is larger than a
constant r. Here “sufficiently good” means “smaller than the trivial bound divided by
a large constant, and getting even smaller quickly as q grows”. Our bound must also be
good for α = a/q + δ/x, where q < r but δ is large. (Such an α may be said to lie on
the tail (δ large) of a major arc (q small).)
Of course, all expressions must be explicit and all constants in the leading terms of
the bound must be small. Still, the main requirement is a qualitative one. For instance,
we know in advance that a single factor of log x would be the end of us. That is, we

43
44 CHAPTER 3. INTRODUCTION

know that, if there is a single term of the form, say, (x log x)/q, and the trivial bound
is about x, we are lost: (x log x)/q is greater than x for x large and q constant.
The quality of the results here is due to several new ideas of general applicability.
In particular, §5.1 introduces a way to obtain cancellation from Vaughan’s identity.
Vaughan’s identity is a two-log gambit, in that it introduces two convolutions (each of
them at a cost of log) and offers a great deal of flexibility in compensation. One of the
ideas presented here is that at least one of two logs can be successfully recovered after
having been given away in the first stage of the proof. This reduces the cost of the use
of this basic identity in this and, presumably, many other problems.
There are several other improvements that make a qualitative difference; see the
discussions at the beginning of §4 and §5. Considering smoothed sums – now a com-
mon idea – also helps. (Smooth sums here go back to Hardy-Littlewood [HL22] – both
in the general context of the circle method and in the context of Goldbach’s ternary
problem. In recent work on the problem, they reappear in [Tao14].)

3.1 Results
p
The main bound we are about to see is essentially proportional to ((log q)/ φ(q)) · x.
The term δ0 serves to improve the bound when we are on the tail of an arc.
Theorem 3.1.1. Let x ≥ x0 , x0 = 2.16 · 1020 . Let Sη (α, x) be as in (3.1), with η
defined in (3.4). Let 2α = a/q + δ/x, q ≤ Q, gcd(a, q) = 1, |δ/x| ≤ 1/qQ, where
Q = (3/4)x2/3 . If q ≤ x1/3 /6, then
Rx,δ0 q log δ0 q + 0.5 2.5x 2x
|Sη (α, x)| ≤ p ·x+ √ + · Lx,δ0 q,q + 3.36x5/6 ,
δ0 φ(q) δ0 q δ0 q
(3.2)
where
!
log 4t
δ0 = max(2, |δ|/4), Rx,t = 0.27125 log 1 + 9x1/3
+ 0.41415,
2 log 2.004t
 
q 13
Lx,t,q = log t + 7.82 + 13.66 log t + 37.55.
φ(q) 4
(3.3)
If q > x1/3 /6, then

|Sη (α, x)| ≤ 0.276x5/6 (log x)3/2 + 1234x2/3 log x.

The factor Rx,t is small in practice; for instance, for x = 1025 and δ0 q = 5 · 105
(typical “difficult” values), Rx,δ0 q equals 0.59648 . . . .
The classical choice1 for η in (3.1) is η(t) = 1 for t ≤ 1, η(t) = 0 for t > 1, which,
of course, is not smooth, or even continuous. We use

η(t) = η2 (t) = 4 max(log 2 − | log 2t|, 0), (3.4)


1 Or,
more precisely, the choice made by Vinogradov and followed by most of the literature since him.
Hardy and Littlewood [HL22] worked with η(t) = e−t .
3.2. COMPARISON TO EARLIER WORK 45

as in Tao [Tao14], in part for purposes of comparison. (This is the multiplicative con-
volution of the characteristic function of an interval with itself.) Nearly all work should
be applicable to any other sufficiently smooth function η of fast decay. It is important
that ηb decay at least quadratically.
We are not forced to use the same smoothing function as in Part II, and we do not.
As was explained in the introduction, the simple technique (1.40) allows us to work
with one smoothing function on the major arcs and with another one on the minor arcs.

3.2 Comparison to earlier work


Table 3.1 compares the bounds for the ratio |Sη (a/q, x)|/x given by this paper and by
[Tao14][Thm. 1.3] for x = 1027 and different values of q. We are comparing worst
cases: φ(q) as small as possible (q divisible by 2 · 3 · 5 · · · ) in the result here, and q
divisible by 4 (implying 4α ∼ a/(q/4)) in Tao’s result. The main term in the result in
this paper improves slowly with increasing x; the results in [Tao14] worsen slowly with
increasing x. The qualitative
p gain with respect to the main term in [Tao14, (1.10)] is in
the order of log(q) φ(q)/q. Notice also that the bounds in [Tao14] are not log-free; in
[Tao14, (1.10)], there is a term proportional to x(log x)2 /q. This becomes larger than
the trivial bound x for x very large.
The results in [DR01] are unfortunately worse than the trivial bound in the range
covered by Table 3.1. Ramaré’s results ([Ram10, Thm. 3], [Ramc, Thm. 6]) are not
applicable within the range, since neither of the conditions log q ≤ (1/50)(log x)1/3 ,
q ≤ x1/48 is satisfied. Ramaré’s bound in [Ramc, Thm. 6] is

X q
Λ(n)e(an/q) ≤ 13000 x (3.5)
φ(q)
x<n≤2x

for 20 ≤ q ≤ x1/48 . We should underline that, while both the constant 13000 and the
condition q ≤ x1/48 keep (3.5) from being immediately useful in the present context,
(3.5) is asymptotically better than the results here as q → ∞. (Indeed, qualitatively
speaking, the form of (3.5) is the best one can expect from results derived by the family
of methods stemming from Vinogradov’s work.) There is also unpublished work by
Ramaré (ca. 1993) with different constants for q  (log x/ log log x)4 .

3.3 Basic setup


In the minor-arc regime, the first step in estimating an exponential sum on the primes
generally consists in the application of an identity expressing the von Mangoldt func-
tion Λ(n) in terms of a sum of convolutions of other functions.

3.3.1 Vaughan’s identity


We recall Vaughan’s identity [Vau77b]:
Λ = µ≤U ∗ log +µ≤U ∗ Λ≤V ∗ 1 + µ>U ∗ Λ>V ∗ 1 + Λ≤V , (3.6)
46 CHAPTER 3. INTRODUCTION

|Sη (a/q,x)| |Sη (a/q,x)|


q0 x , HH x , Tao
105 0.04661 0.34475
1.5 · 105 0.03883 0.28836
2.5 · 105 0.03098 0.23194
5 · 105 0.02297 0.17416
7.5 · 105 0.01934 0.14775
106 0.01756 0.13159
107 0.00690 0.05251

Table 3.1: Worst-case upper bounds on x−1 |Sη (a/2q, x)| for q ≥ q0 , |δ| ≤ 8, x =
1027 . The trivial bound is 1.

where 1 is the constant function 1, and where we write


( (
f (n) if n ≤ z 0 if n ≤ z
f≤z (n) = f>z (n) =
0 if n > z, f (n) if n > z.
P
Here f ∗ g denotes the Dirichlet convolution (f ∗ g)(n) = d|n f (d)g(n/d). We can
set the values of U and V however we wish.
Vaughan’s identity is essentially a consequence of the Möbius inversion formula
(
1 if n = 1,
(1 ∗ µ)(n) = (3.7)
0 otherwise.

Indeed, by (3.7),
X
Λ>V (n) = µ(d)Λ>V (m)
dm|n
X X
= µ≤U (d)Λ>V (m) + µ>U (d)Λ>V (m).
dm|n dm|n

Applying to this the trivial equality Λ>V = Λ − Λ≤V , as well as the simple fact that
1 ∗ Λ = log, we obtain that
X X X
Λ>V (n) = µ≤U (d) log(n/d) − µ≤U (d)Λ≤V (m) + µ>U (d)Λ>V (m).
d|n dm|n dm|n

By ΛV = Λ>V + Λ≥V , we conclude that Vaughan’s identity (3.6) holds.


Applying Vaughan’s identity, we easily get that, for any function η : R → R, any
completely multiplicative function f : Z+ → C and any x > 0, U, V ≥ 0,
X
Λ(n)f (n)e(αn)η(n/x) = SI,1 − SI,2 + SII + S0,∞ , (3.8)
n
3.3. BASIC SETUP 47

where
X X
SI,1 = µ(m)f (m) (log n)e(αmn)f (n)η(mn/x),
m≤U n
X X X
SI,2 = Λ(d)f (d) µ(m)f (m) e(αdmn)f (n)η(dmn/x),
d≤V m≤U n
 
X (3.9)
X X
SII = f (m) 
 µ(d)
 Λ(n)e(αmn)f (n)η(mn/x),
m>U d>U n>V
d|m
X
S0,∞ = Λ(n)e(αn)f (n)η(n/x).
n≤V

We will use the function


(
1 if gcd(n, v) = 1,
f (n) = (3.10)
0 otherwise,

where v is a small, positive, square-free integer. (Our final choice will be v = 2.) Then

Sη (x, α) = SI,1 − SI,2 + SII + S0,∞ + S0,w , (3.11)

where Sη (x, α) is as in (3.1) and


X
S0,v = Λ(n)e(αn)η(n/x).
n|v

The sums SI,1 , SI,2 are called “of type I”, the sum SII is called “of type II” (or
bilinear). (The not-all-too colorful nomenclature goes back to Vinogradov.) The sum
S0,∞ is in general negligible; for our later choice of V and η, it will be in fact 0. The
sum S0,v will be negligible as well.
As we already discussed in the introduction, Vaughan’s identity is highly flexible
(in that we can choose U and V at will) but somewhat inefficient in practice (in that a
trivial estimate for the right side of (3.11) is actually larger than a trivial estimate for
the left side of (3.11)). Some of our work will consist in regaining part of what is given
up when we apply Vaughan’s identity.

3.3.2 An alternative route


There is an alternative route – namely, to use a less sacrificial, though also more in-
flexible, identity. While this was not, in the end, the route that was followed, let us
nevertheless discuss it in some detail, in part so that we can understand to what extent
it was, in retrospect, viable, and in part so as to see how much of the work we will
undertake is really more or less independent of the particular identity we choose.
48 CHAPTER 3. INTRODUCTION

Since ζ 0 (s)/ζ(s) = Λ(n)n−s and


P
n
(2) !0
2
ζ 0 (s) ζ 00 (s) (ζ 0 (s))

= −
ζ(s) ζ(s) ζ(s)2
 0 3
ζ (3) (s) 3ζ 00 (s)ζ 0 (s) ζ (s) (3.12)
= − +2
ζ(s) ζ(s)2 ζ(s)
0  0 3
ζ (3) (s) ζ 0 (s) ζ 0 (s)
 
ζ (s)
= −3 · − ,
ζ(s) ζ(s) ζ(s) ζ(s)
we can see, comparing coefficients, that
Λ · log2 = µ ∗ log3 −3(Λ · log) ∗ Λ − Λ ∗ Λ ∗ Λ, (3.13)
as was stated by Bombieri in [Bom76].
Here the term µ ∗ log3 is of the same kind as the term µ≤U ∗ log we have to estimate
if we use Vaughan’s identity, though the fact that there is no truncation at U means that
one of the error terms will get larger – it will be proportional to x, in fact, if we sum
from 1 to x. The trivial upper bound on the sum of Λ · log2 from 1 to x is x(log x)2 ;
thus, an error term of size x is barely acceptable.
In general, when we have a double or triple sum, we are not very good at getting
better than trivial bounds in ranges in which all but one of the variables are very small.
This is the source of the large error term that appears in the sum involving µ ∗ log3
because we are no longer truncating as for µ≤U ∗ log. It will also be the source of other
large error terms, including one that would be too large – namely, the one coming from
the term (Λ · log) ∗ Λ when the variable of Λ · log is large and that of Λ is small. (The
trivial bound on that range is  x log x.)
We avoid this problem by substituting the identity Λ · log = µ ∗ log2 −Λ ∗ Λ inside
(3.13):
Λ · log2 = µ ∗ log3 −3(µ ∗ log2 ) ∗ Λ + 2Λ ∗ Λ ∗ Λ. (3.14)
(We could also have got this directly from the next-to-last line in (3.12).) When the
variable of Λ in (µ ∗ log2 ) ∗ Λ is small, the variable of µ ∗ log2 is large, and we can
estimate the resulting term using the same techniques as for µ ∗ log3 .
It is easy to see that we can in fact mix (3.13) and (3.14):
Λ · log2 = µ ∗ log3 −3 (Λ · log) ∗ Λ>V + (µ ∗ log2 ) ∗ Λ≤V

(3.15)
+ (−Λ>V ∗ Λ ∗ Λ + 2Λ≤V ∗ Λ ∗ Λ)
for V arbitrary. Note here that there is some cancellation in the last term: writing
F3,V (n) = (−Λ>V ∗ Λ ∗ Λ + 2Λ≤V ∗ Λ ∗ Λ) (n), (3.16)
we can check easily that, for n = p1 p2 p3 square-free with V 3 < n, we have


 −6 log p1 log p2 log p3 if all pi > V ,

0 if p1 < p2 ≤ V < p3 ,
F3,V (n) =


 6 log p1 log p2 log p3 if p1 ≤ V < p2 < p3 ,
12 log p1 log p2 log p3 if all pi ≤ V .

3.3. BASIC SETUP 49

In contrast, for n square-free, −Λ ∗ Λ ∗ Λ(n) is −6 if n is of the form p1 p2 p3 , and 0


otherwise.
We may find it useful to take aside two large terms that may need to be bounded
trivially, namely, µ ∗ log3≤u and (Λ · log)≤u ∗ Λ>V , where u will be a small parameter.
(We can let, for instance, u = 3.) We conclude that

Λ · log2 = FI,1,u (n) − 3FI,2,V,u (n) − 3FII,V,u (n) + F3,V (n) + F0,V,u (n), (3.17)

where
FI,1,u = µ ∗ log3>u ,
FI,2,V,u = (µ ∗ log2 ) ∗ Λ≤V ,
FII,V,u (n) = (Λ · log)>u ∗ Λ>V ,
F0,V,u (n) = µ ∗ log3≤u −3(Λ · log)≤u ∗ Λ>V
and F3,V is as in (3.16).
In the bulk of the present work – in particular, in all steps that are part of the proof
of Theorem 3.1.1 or the Main Theorem – we will use Vaughan’s identity, rather than
(3.17). This choice was made while the proof was still underway; it was due mainly
to back-of-the-envelope estimates that showed that the error terms could be too large
if (3.14) was used. Of course, this might have been the case with Vaughan’s identity
as well, but the fact that the parameters U , V there have a large effect on the outcome
meant that one could hope to improve on insufficient estimates in part by adjusting U
and V , without losing all previous work. (This is what was meant by the “flexibility”
of Vaughan’s identity.)
The question remains: can one prove ternary Goldbach using (3.17) rather than
Vaughan’s identity? This seems likely. If so, which proof would be more complicated?
This is not clear.
There are large parts of the work that are the essentially the same in both cases:

• estimates for sums involving µ≤U ∗ logk (“type I”),

• estimates for sums involving Λ>u ∗ Λ>V and the like (“type II”).

Trilinear sums, i.e., sums involving Λ∗Λ∗Λ, can be estimated much like bilinear sums,
i.e., sums involving Λ ∗ Λ.
There are also challenges that appear only for Vaughan’s identity and others that
appear only for (3.17). An example of a challenge that is successfully faced in the main
proof, but does not appear if (3.17) is used, consists in bounding sums of type
 2
X X 

  .
µ(d)
U <m≤x/W d>U
d|m

(In §5.1, we will be able to bound sums of this type by a constant times x/W .) Like-
wise, large tail terms that have to be estimated trivially seem unavoidable in (3.17).
(The choice of a parameter u > 1, as above, is meant to alleviate the problem.)
50 CHAPTER 3. INTRODUCTION

In the end, losing a factor of about log x/U V seems inevitable when one uses
Vaughan’s identity, but not when one uses (3.17). Another reason why a full treatment
based on (3.17) would also be worthwhile is that it is a somewhat less familiar, and
arguably under-used, identity and deserves more exploration. With these comments,
we close the discussion of (3.17); we will henceforth use Vaughan’s identity.
Chapter 4

Type I sums

Here, we must bound sums of the basic type


X X  mn 
µ(m) e(αmn)η
n
x
m≤D

and variations thereof. There are three main improvements in comparison to standard
treatments:

1. The terms with m divisible by q get taken out and treated separately by analytic
means. This all but eliminates what would otherwise be the main term.

2. The other terms get handled by improved estimates on trigonometric sums. For
large m, the improvements have a substantial total effect – more than a constant
factor is gained.

3. The “error” term δ/x = α − a/q is used to our advantage. This happens both
through the Poisson summation formula and through the use of two alternative
approximations to the same number α.

The fact that a continuous weight η is used (“smoothing”) is a difference with respect
to the classical literature ([Vin37] and what followed), but not with respect to more
recent work (including [Tao14]); using smooth or continuous weights is an idea that
has become commonplace in analytic number theory, even though it is not consistently
applied. The improvements due to smoothing in type I are both relatively minor and
essentially independent of the improvements due to (1) and (3). The use of a contin-
uous weight combines nicely with (2), but the ideas given here would give qualitative
improvements in the treatment of trigonometric sums even in the absence of smoothing.

4.1 Trigonometric sums


The following lemmas on trigonometric sums improve on the best Vinogradov-type
lemmas in the literature. (By this, we mean results of the type of Lemma 8a and

51
52 CHAPTER 4. TYPE I SUMS

Lemma 8b in [Vin04, Ch. I]. See, in particular, the work of Daboussi and Rivat [DR01,
Lemma 1].) The main idea is to switch between different types of approximation within
the sum, rather than just choosing between bounding all terms either trivially (by A)
or non-trivially (by C/| sin(παn)|2 ). There will also1 be improvements in our appli-
cations stemming from the fact that Lemmas 4.1.1 and Lemma 4.1.2 take quadratic
(| sin(παn)|2 ) rather than linear (| sin(παn)|) inputs. (These improved inputs come
from the use of smoothing elsewhere.)

Lemma 4.1.1. Let α = a/q + β/qQ, (a, q) = 1, |β| ≤ 1, q ≤ Q. Then, for any
A, C ≥ 0,

6q 2 4q √
   
X C
min A, ≤ min 2A + C, 3A + AC . (4.1)
| sin(παn)|2 π2 π
y<n≤y+q

Proof. We start by letting m0 = byc + b(q + 1)/2c, j = n − m0 , so that j ranges in


the interval (−q/2, q/2]. We write

aj + c
αn = + δ1 (j) + δ2 mod 1,
q

where |δ1 (j)| and |δ2 | are both ≤ 1/2q; we can assume δ2 ≥ 0. The variable r =
aj + c mod q occupies each residue class mod p exactly once.
One option is to bound the terms corresponding to r = 0, −1 by A each and all
the other terms by C/| sin(παn)|2 . (This can be seen as the simple case; it will take
us about a page just because we should estimate all sums and all terms here with great
care – as in [DR01], only more so.)
The terms corresponding to r = −k and r = k − 1 (2 ≤ k ≤ q/2) contribute at
most
1 1 1 1
+ ≤ + ,
sin2 πq (k − 1
2 − qδ2 ) sin2 πq (k − 3
2 + qδ2 ) sin2 π
q k − 12 sin2 π
q k − 32

1
since x 7→ (sin x)2 is convex-up on (0, ∞). Hence the terms with r 6= 0, 1 contribute at
most
Z q/2
1 X 1 1 1
 2 + 2  2 ≤  2 + 2  2 ,
π π π
sin 2q 2≤r≤ q2 sin q (r − 1/2) sin 2q 1 sin πq x

where we use again the convexity of x 7→ 1/(sin x)2 . (We can assume q > 2, as
otherwise we have no terms other than r = 0, 1.) Now
π
Z q/2 Z
1 q 2 1 q π
2 dx = du = cot .
π (sin u)2 π q

π π
1 sin qx
q

1 This is a change with respect to the first version of the preprint [Helb]. The version of Lemma 4.1.1

there has, however, the advantage of being immediately comparable to results in the literature.
4.1. TRIGONOMETRIC SUMS 53

Hence
 
X C C 2q π
min A, ≤ 2A +  2 + C · cot .
(sin παn)2 π π q
y<n≤y+q sin 2q

Now, by [AS64, (4.3.68)] and [AS64, (4.3.70)], for t ∈ (−π, π),

t X t2
=1+ a2k+1 t2k+2 = 1 + + . . .
sin t 6
k≥0
(4.2)
X
2k+2 t2 t4
t cot t = 1 − b2k+1 t =1− − − ...,
3 45
k≥0

where a2k+1 ≥ 0, b2k+1 ≥ 0. Thus, for t ∈ [0, t0 ], t0 < π,


2
t2 t2

t
=1+ + c0 (t)t4 ≤ 1 + + c0 (t0 )t4 , (4.3)
sin t 3 3
where 2 !
t2
 
1 t
c0 (t) = 4 − 1+ ,
t sin t 3
which is an increasing function because a2k+1 ≥ 0. For t0 = π/4, c0 (t0 ) ≤ 0.074807.
Hence,

t2 t2  π    1 2t2 8t4
 
4
+ t cot 2t ≤ 1 + + c0 t + − −
sin2 t 3 4 2 3 45
2 2
 
3 t  π  8 3 t 3
= − + c0 − t4 ≤ − ≤
2 3 4 45 2 3 2

for t ∈ [0, π/4].


Therefore, the left side of (4.1) is at most
 2
2q 3 6
2A + C · · = 2A + 2 Cq 2 .
π 2 π
The following is an alternative approach; it yields the other estimate in (4.1). We
bound the terms corresponding to r = 0, r = −1, r = 1 by A each. We let r = ±r0
for r0 ranging from 2 to q/2. We obtain that the sum is at most
 
X C
3A + min A, 
 
2 
sin πq r0 − 12 − qδ2

2≤r 0 ≤q/2
  (4.4)
X C
+ min A,   2  .
 
π 0 1
2≤r 0 ≤q/2 sin q r − 2 + qδ2
54 CHAPTER 4. TYPE I SUMS

We bound a term min(A, C/ sin((π/q)(r0 − 1/2 ± qδ2 ))2 ) by A if and only if


C/ sin((π/q)(r0 − 1 ± qδ2 ))2 ≥ A. (In other words, we are choosing which of the two
bounds A, C/| sin(παn)|2 on a case-by-case basis, i.e., for each n, instead of making
a single choice for all n in one go. This is hardly anything deep, but it does result in
a marked improvement with respect to the literature, and would give an improvement
even if we were given a bound B/| sin(παn)| instead of a bound C/| sin(παn)|2 as
input.) The number of such terms is
p
≤ max(0, b(q/π) arcsin( C/A) ∓ qδ2 c),
p
and thus at most (2q/π) arcsin( C/A) in total. (Recall that qδ2 ≤ 1/2.) Each
other term gets
p bounded by the integral of C/ sin2 (πα/q) from r0 − 1 ± qδ2 (≥
0
(q/π) arcsin( C/A)) to r ± qδ2 , by convexity. Thus (4.4) is at most
r Z q/2
2q C C
3A + A arcsin +2 √ dt
π A q
π arcsin
C
A
sin2 πt
q
r r
2q C 2q A
≤ 3A + A arcsin + C −1
π A π C
We can easily show (taking derivatives) that arcsin x + x(1 − x2 ) ≤ 2x for 0 ≤
x ≤ 1. Setting x = C/A, we see that this implies that

4q √
r r
2q C 2q A
3A + A arcsin + C − 1 ≤ 3A + AC.
π A π C π

(If C/A > 1, then 3A + (4q/π) AC is greater than Aq, which is an obvious upper
bound for the left side of (4.1).)
Now we will see that, if we take out terms with n divisible by q and n is not too
large, then we can give a bound that does not involve a constant term A√ at all. (We are
referring to the bound (20/3π 2 )Cq 2 below; of course, 2A + (4q/π) AC does have
a constant term 2A – it is just smaller than the constant term 3A in the corresponding
bound in (4.1).)
Lemma 4.1.2. Let α = a/q + β/qQ, (a, q) = 1, |β| ≤ 1, q ≤ Q. Let y2 > y1 ≥ 0. If
y2 − y1 ≤ q and y2 ≤ Q/2, then, for any A, C ≥ 0,

4q √
   
X C 20 2
min A, ≤ min Cq , 2A + AC . (4.5)
| sin(παn)|2 3π 2 π
y1 <n≤y2
q-n

Proof. Clearly, αn equals an/q + (n/Q)β/q; since y2 ≤ Q/2, this means that |αn −
an/q| ≤ 1/2q for n ≤ y2 ; moreover, again for n ≤ y2 , the sign of αn − an/q remains
constant. Hence the left side of (4.5) is at most
q/2
! q/2 !
X C X C
min A, + min A, .
r=1
(sin πq (r − 1/2))2 r=1
(sin πq r)2
4.1. TRIGONOMETRIC SUMS 55

Proceeding as in the proof of Lemma 4.1.1, we obtain a bound of at most


!
1 1 q π q 3π
C π 2 + π 2 + cot + cot
(sin 2q ) (sin q ) π q π 2q

for q ≥ 2. (If q = 1, then the left-side of (4.5) is trivially zero.) Now, by (4.2),

t2 t2 4t2 16t4
 π  1  
t
+ cot 2t ≤ 1 + + c0 t4 + 1− −
(sin t)2 2 3 4 4 3 45
   
5 π 4 5
≤ + c0 − t4 ≤
4 4 45 4

for t ∈ [0, π/4], and

t2 t2 3t2 81t4
 π  2  
3t
2
+ t cot ≤ 1 + + c0 t4 + 1− − 4
(sin t) 2 3 2 3 4 2 · 45
     
5 1 π 27  π 2 2 5
≤ + − + c0 − t ≤
3 6 2 360 2 3

for t ∈ [0, π/2]. Hence,


!  2
1 1 q π q 3π 2q 5  q 2 5 20 2
π 2 + + cot + cot ≤ · + · ≤ q .
(sin 2q ) (sin πq )2 π q π 2q π 4 π 3 3π 2

Alternatively, we can follow the second


√ approach in the proof of Lemma 4.1.1, and
obtain an upper bound of 2A + (4q/π) AC.

The following bound will be useful when the constant A in an application of


Lemma 4.1.2 would be too large. (This tends to happen for n small.)

Lemma 4.1.3. Let α = a/q + β/qQ, (a, q) = 1, |β| ≤ 1, q ≤ Q. Let y2 > y1 ≥ 0. If


y2 − y1 ≤ q and y2 ≤ Q/2, then, for any B, C ≥ 0,

Ce3 q
   
X B C q
min , ≤ 2B max 2, log . (4.6)
| sin(παn)| | sin(παn)|2 π Bπ
y1 <n≤y2
q-n

The upper bound ≤ (2Bq/π) log(2e2 q/π) is also valid.

Proof. As in the proof of Lemma 4.1.2, we can bound the left side of (4.6) by

q/2
!
X B C
2 min , .
sin πq r − 12 sin2 πq r − 12

r=1
56 CHAPTER 4. TYPE I SUMS

Assume B sin(π/q) ≤ C ≤ B. By the convexity of 1/ sin(t) and 1/ sin(t)2 for


t ∈ (0, π/2],
q/2
!
X B C
min ,
sin πq r − 12 sin2 πq r − 12

r=1
Z πq arcsin B
C Z q/2
B B 1
≤ π + π dt + q 2 π dt
sin 2q 1 sin q t C sin
π arcsin B qt
     
B q 1 C π C
≤ π + B log tan arcsin − log tan + C cot arcsin
sin 2q π 2 B 2q B
   p 
B q π C
≤ π + B log cot − log √ + B2 − C 2 .
sin 2q π 2q B − B2 − C 2
Now, for all t ∈ (0, π/2),
e2
 
2 1 1
+ log cot t < log ;
sin t t t t
we can verify this by comparing series. Thus
B q π q 2e2 q
π + B log cot ≤ B log
sin 2q π 2q π π
for q ≥ 2. (If q = 1, the sum on the left of (4.6) is empty, and so the bound we are
trying to prove is trivial.) We also have
p p
t log(t − t2 − 1) + t2 − 1 < −t log 2t + t (4.7)
√ √
for t ≥ 1 (as this is equivalent to log(2t2 (1 − 1√− t−2 )) < 1 − 1 − t−2 , which we
check easily after changing variables to δ = 1 − 1 − t−2 ). Hence
   p 
B q π C
π + B log cot − log √ 2
+ B −C 2
sin 2q π 2q B − B2 − C 2
2e2 q Ce3 q
 
q q 2B q
≤ B log + B − B log ≤ B log
π π π C π Bπ
for q ≥ 2.
Given any C, we can apply the above with C = B instead, as, for any t > 0,
min(B/t, C/t2 ) ≤ B/t ≤ min(B/t, B/t2 ). (We refrain from applying (4.7) so as to
avoid worsening a constant.) If C < B sin π/q (or even if C < (π/q)B), we relax the
input to C = B sin π/q and go through the above.

4.2 Type I estimates


Let us give our first main type I estimate.2 One of the main innovations is the manner
in which the “main term” (m divisible by q) is separated; we are able to keep error
2 The current version of Lemma 4.2.1 is an improvement over that included in the first version of the

preprint [Helb].
4.2. TYPE I ESTIMATES 57

terms small thanks to the particular way in which we switch between two different
approximations.
(These are not necessarily successive approximations in the sense of continued
fractions; we do not want to assume that the approximation a/q we are given arises
from a continued fraction, and at any rate we need more control on the denominator q 0
of the new approximation a0 /q 0 than continued fractions would furnish.)
The following lemma is a theme, so to speak, to which several variations will be
given. Later, in practice, we will always use one of the variations, rather than the
original lemma itself. This is so just because, even though (4.8) is the basic type of
sum we treat in type I, the sums that we will have to estimate in practice will always
present some minor additional complication. Proving the lemma we are about to give
in full will give us a chance to see all the main ideas at work, leaving complications for
later.
Lemma 4.2.1. Let α = a/q + δ/x, (a, q) = 1, |δ/x| ≤ 1/qQ0 , q ≤ Q0 , Q0 ≥ 16. Let
η be continuous, piecewise C 2 and compactly supported, with |η|1 = 1 and η 00 ∈ L1 .
Let c0 ≥ |ηc00 |∞ .
√ p
Let 1 ≤ D ≤ x. Then, if |δ| ≤ 1/2c2 , where c2 = (3π/5 c0 )(1 + 13/3), the
absolute value of  mn 
X X
µ(m) e(αmn)η (4.8)
n
x
m≤D

is at most

     2 
x c0 X µ(m) ∗ 1 1 D D
min 1, + O c0 − + (4.9)
q (2πδ)2 m 4 π2 2xq 2x
m≤ M
q
(m,q)=1

plus
√ √
2 c0 c1 x D c0 c1 D
D + 3c1 log+ + q log+
π q c2 x/q π q/2
0
 3 2
  √  (4.10)
|η |1 c0 e q 2 3c0 c1 3c1 55c0 c2
+ q · max 2, log + + + q,
π 4π|η 0 |1 x π c2 12π 2
where c1 = 1 + |η 0 |1 /(2x/D) and M ∈ [min(Q0 /2, D), D]. The same bound holds if
|δ| ≥ 1/2c2 but D ≤ Q0 /2.
In general, if |δ| ≥ 1/2c2 , the absolute value of (4.8) is at most (4.9) plus
√    !!
2 c0 c1 x 1 + 2D
D + (1 + ) min + 1, 2D $ + log x
π |δ|q 2 |δ|q
! (4.11)
(1 + ) 2D x 35c 0 c 2
+ 3c1 2 + log+ x + q,
 |δ|q Q0 6π 2
√ p
for  ∈ (0, 1] arbitrary, where $ = 3 + 2 + ((1 + 13/3)/4 − 1)/(2(1 + )).
58 CHAPTER 4. TYPE I SUMS

2
p In (4.9), min(1, c0 /(2πδ) ) always equals 1 when |δ| ≤ 1/2c2 (since (3/5)(1 +
13/3) > 1).

Proof. Let Q = bx/|δq|c. Then α = a/q + O∗ (1/qQ) and q ≤ Q. (If δ = 0, we


let Q = ∞ and ignore the rest of the paragraph, since then we will never need Q0 or
the alternative approximation a0 /q 0 .) Let Q0 = d(1 + )Qe ≥ Q + 1. Then α is not
a/q + O∗ (1/qQ0 ), and so there must be a different approximation a0 /q 0 , (a0 , q 0 ) = 1,
q 0 ≤ Q0 such that α = a0 /q 0 + O∗ (1/q 0 Q0 ) (since such an approximation always
exists). Obviously, |a/q − a0 /q 0 | ≥ 1/qq 0 , yet, at the same time, |a/q − a0 /q 0 | ≤
1/qQ + 1/q 0 Q0 ≤ 1/qQ + 1/((1 + )q 0 Q). Hence q 0 /Q + q/((1 + )Q) ≥ 1, and so
q 0 ≥ Q−q/(1+) ≥ (/(1+))Q. (Note also that (/(1+))Q ≥ (2|δq|/x)·bx/δqc >
1, and so q 0 ≥ 2.)
Lemma 4.1.2 will enable us to treat separately the contribution from terms with
m divisible by q and m not divisible by q, provided that m ≤ Q/2. Let M =
min(Q/2, D). We start by considering all terms with m ≤ M divisible by q. Then
e(αmn) equals e((δm/x)n). By Poisson summation,
X X
e(αmn)η(mn/x) = fb(n),
n n

where f (u) = e((δm/x)u)η((m/x)u). Now


Z Z
x  xn   x x 
fb(n) = e(−un)f (u)du = e δ− u η(u)du = ηb n−δ .
m m m m

By assumption, m ≤ M ≤ Q/2 ≤ x/2|δq|, and so |x/m| ≥ 2|δq| ≥ 2δ. Thus, by


(2.1) (with k = 2),
 
X x  X  nx 
fb(n) = ηb(−δ) + ηb −δ 
n
m m
n6=0
   
x  X 1
= ηb(−δ) + O∗  2  · ηc00  (4.12)
m nx ∞
n6=0 2π m − δ
 
x m c0 X 1
= ηb(−δ) + 2
O∗  max1 .
m x (2π) |r|≤ 2 (n − r)2
n6=0

Since x 7→ 1/x2 is convex on R+ ,

X 1 X 1 2
max1 2
=  = π − 4.
1 2
|r|≤ 2 (n − r) n −
n6=0 n6=0 2
4.2. TYPE I ESTIMATES 59

Therefore, the sum of all terms with m ≤ M and q|m is


X x X m c0
ηb(−δ) + (π 2 − 4)
m x (2π)2
m≤M m≤M
q|m q|m

xµ(q) X µ(m)
= · ηb(−δ) · .
q M
m
m≤ q
(m,q)=1

D2
   
∗ 2 1 1 D
+O µ(q) c0 − + .
4 π2 2xq 2x
We will bound |bη (−δ)| by (2.1).
As we have just seen, estimating the contribution of the terms with m divisible by
q and not too large (m ≤ M ) involves isolating a main term, estimating it carefully
(with cancellation) and then bounding the remaining error terms.
We will now bound the contribution of all other m – that is, m not divisible by q
and m larger than M . Cancellation will now be used only within the inner sum; that
is, we will bound each inner sum
X  mn 
Tm (α) = e(αmn)η ,
n
x

and then we will carefully consider how to bound sums of |Tm (α)| over m efficiently.
By (2.2) and Lemma 2.3.1,
1 0
2 |η |1
 
x 1 0 m c0 1
|Tm (α)| ≤ min + |η |1 , , . (4.13)
m 2 | sin(πmα)| x 4 (sin πmα)2
For any y2 > y1 > 0 with y2 − y1 ≤ q and y2 ≤ Q/2, (4.13) gives us that
 
X X C
|Tm (α)| ≤ min A, (4.14)
(sin πmα)2
y1 <m≤y2 y1 <m≤y2
q-m q-m

for A = (x/y1 )(1 + |η 0 |1 /(2(x/y1 ))) and C = (c0 /4)(y2 /x). We must now estimate
the sum X X
|Tm (α)| + |Tm (α)|. (4.15)
m≤M Q
2 <m≤D
q-m

To bound the terms with m ≤ M , we can use Lemma 4.1.2. The question is then
which one is smaller: the first or the second bound given by Lemma 4.1.2? A brief
calculation gives that the second
p bound is smaller
p (and hence preferable) exactly when
p √
C/A > (3π/10q)(1 + 13/3). Since C/A ∼ ( c0 /2)m/x, this means that
it is sensible to preferpthe second bound in Lemma 4.1.2 when m > c2 x/q, where

c2 = (3π/5 c0 )(1 + 13/3).
It thus makes sense to ask: does Q/2 ≤ c2 x/q (so that m ≤ M implies m ≤
c2 x/q)? This question divides our work into two basic cases.
60 CHAPTER 4. TYPE I SUMS

√ p
Case (a). δ large: |δ| ≥ 1/2c2 , where c2 = (3π/5 c0 )(1 + 13/3). Then
Q/2 ≤ c2 x/q; this will induce us to bound the first sum in (4.15) by the first bound in
Lemma 4.1.2.
Recall that M = min(Q/2, D), and so M ≤ c2 x/q. By (4.14) and Lemma 4.1.2,
∞ (j+1)q
!
X X X x |η 0 |1 c40 x
|Tm (α)| ≤ min + ,
j=0
jq + 1 2 (sin πmα)2
1≤m≤M jq<m≤min((j+1)q,M )
q-m q-m

20 c0 q 3 20 c0 q 3 1 M2
 
X 3 c2 x
≤ (j + 1) ≤ · + +1
3π 2 4x 3π 2 4x 2 q2 2 q2
0≤j≤ M
q

q2
 
5c0 c2 5c0 q 3 5c0 c2 35c0 c2
≤ M+ c2 + ≤ M+ q,
6π 2 3π 2 2 x 6π 2 6π 2
(4.16)
where, to bound the smaller terms, we are using the inequality Q/2 ≤ c2 x/q, and
where we are also using the observation that, since |δ/x| ≤ 1/qQ0 , the assumption
|δ| ≥ 1/2c2 implies that q ≤ 2c2 x/Q0 ; moreover, since q ≤ Q0 , this gives us that
q 2 ≤ 2c2 x. In the main term, we are bounding qM 2 /x from above by M · qQ/2x ≤
M/2δ ≤ c2 M .
If D ≤ (Q + 1)/2, then M ≥ bDc and so (4.16) is all we need: the second sum
in (4.15) is empty. Assume from now on that D > (Q + 1)/2. The first sum in (4.15)
is then bounded by (4.16) (with M = Q/2). To bound the second sum in (4.15), we
will use the approximation a0 /q 0 instead of a/q. The motivation is the following: if
we used the approximation a/q even for m > Q/2, the contribution of the terms with
q|m would be too large. When we use a0 /q 0 , the contribution of the terms with q 0 |m
(or m ≡ ±1 mod q 0 ) is very small: only a fraction 1/q 0 (tiny, since q 0 is large) of all
terms are like that, and their individual contribution is always small, precisely because
m > Q/2.
By (4.14) (without the restriction q - m on either side) and Lemma 4.1.1,
X ∞
X X
|Tm (α)| ≤ |Tm (α)|
Q/2<m≤D j=0 jq 0 + Q <m≤min((j+1)q 0 +Q/2,D)
2
j k
D−(Q+1)/2 s
q0
!
X x 4q 0 c1 c0 x (j + 1)q 0 + Q/2
≤ 3c1 Q+1
+ 0
j=0
jq 0 + 2 π 4 jq + (Q + 1)/2 x
j k
D−(Q+1)/2 s
q0 !
4q 0 q0

X x c1 c0
≤ 3c1 + 1+ 0 ,
j=0
jq 0 + Q+1
2
π 4 jq + (Q + 1)/2

where we recall that c1 = 1 + |η 0 |1 /(2x/D). Since q 0 ≥ (/(1 + ))Q,


j k
D−(Q+1)/2
q0 Z D
X x x x 1 2x (1 + )x D
Q+1
≤ + 0 dt ≤ + log+ Q+1 . (4.17)
j=0
0
jq + 2 Q/2 q Q+1
2
t Q Q 2
4.2. TYPE I ESTIMATES 61

Recall now that q 0 ≤ (1 + )Q + 1 ≤ (1 + )(Q + 1). Therefore,


D−(Q+1)/2
b q0
c s s Z D r
X q 0 (1 + )Q + 1 q0
0 0
q 1+ 0 ≤q 1+ + 1 + dt
j=0
jq + (Q + 1)/2 (Q + 1)/2 Q+1
2
t
√ 0
 
Q+1 q D
≤ q 0 3 + 2 + D − + log+ Q+1 .
2 2 2
P (4.18)
We conclude that Q/2<m≤D |Tm (α)| is at most
√ !

 
2 c0 c1 1 (1 + )Q + 1 + D
D + (1 + ) 3 + 2 − (Q + 1) + log Q+1
π 2 2 2
!
(1 + ) D x
+ 3c1 2 + log+ Q+1
 2
Q
(4.19)
We sum this to (4.16) (with M = Q/2), and obtain that (4.15) is at most
√ !!
2 c0 c1 1 + D
D + (1 + )(Q + 1) $ + log Q+1
π 2 2
! (4.20)
(1 + ) D x 35c0 c2
+ 3c1 2 + log Q+1 + q,
 2
Q 6π 2

where we are bounding


r ! √ r ! √ r !
5c0 c2 5c0 3π 13 c0 13 2 c0 c1 1 13
= √ 1+ = 1+ ≤ · 1+
6π 2 6π 2 5 c0 3 2π 3 π 4 3
(4.21)
and defining
r ! !
√ 1 13 1
$ = 3 + 2 + 1+ −1 . (4.22)
4 3 2(1 + )

(Note that $ < 3 for  < 0.1741). A quick check against (4.16) shows that (4.20)
is valid also when D ≤ Q/2, even when Q + 1 is replaced by min(Q + 1, 2D). We
bound Q from above by x/|δ|q and log+ D/((Q + 1)/2) by log+ 2D/(x/|δ|q + 1),
and obtain the result.
Case (b): |δ| small: |δ| ≤ 1/2c2 or D ≤ Q0 /2. Then min(c2 x/q, D) ≤ Q/2. We
start by bounding the first q/2 terms in (4.15) by (4.13) and Lemma 4.1.3:
 1 0
2 |η |1

X X c0 q/8x
|Tm (α)| ≤ min ,
| sin(πmα)| | sin(πmα)|2
m≤q/2 m≤q/2
(4.23)
|η 0 |1 c0 e3 q 2
 
≤ q max 2, log .
π 4π|η 0 |1 x
62 CHAPTER 4. TYPE I SUMS

If q 2 < 2c2 x, we estimate the terms with q/2 < m ≤ c2 x/q by Lemma 4.1.2,
which is applicable because min(c2 x/q, D) < Q/2:
∞ (j+1/2)q
!
X X X x |η10 | c40 x
|Tm (α)| ≤ min  + ,
q 0 j=1
j − 12 q 2 (sin πmα)2
2 <m≤D (j− 2 )q<m≤(j+ 2 )q
1 1

q-m c x
m≤min( 2q ,D )
q-m
3
20 c0 q 3 c2 x D 0
     
20 c0 q X 1 3 c2 x 5
≤ j+ ≤ + +
3π 2 4x 0
2 3π 2 4x 2q 2 q 2 q2 8
1≤j≤ Dq + 21

5 q3
   
5c0 0 5c0 c2 0 11
≤ c2 D + 3c2 q + ≤ D + q ,
6π 2 4 x 6π 2 2
(4.24)
where we write D0 = min(c2 x/q, D). If c2 x/q ≥ D, we stop here. Assume that
c2 x/q < D. Let R = max(c2 x/q, q/2). The terms we have already estimated are
precisely those with m ≤ R. We bound the terms R < m ≤ D by the second bound
in Lemma 4.1.1:

!
c0 (j+1)q+R
X X X c1 x
|Tm (α)| ≤ min , 4 x
2
j=0
jq + R (sin πmα)
R<m≤D m>jq+R
m≤min((j+1)q+R,D)
(4.25)
b q1 (D−R)
X c
s  
3c1 x 4q c1 c0 q
≤ + 1+
j=0
jq + R π 4 jq + R

(Note there is no need to use two successive approximations a/q, a0 /q 0 as in case (a).
We are also including all terms with m divisible by q, as we may, since |Tm (α)| is
non-negative.) Now, much as before,

b q1 (D−R)
X c Z D  
x x x 1 q 2x x D
≤ + dt ≤ min , + log+ , (4.26)
j=0
jq + R R q R t c2 q q c2 x/q

and
X cr
b q1 (D−R)
1 D
r Z r
q q q
1+ ≤ 1+ + 1 + dt
j=0
jq + R R q R t (4.27)
√ D−R 1 D
≤ 3+ + log+ .
q 2 q/2
We sum with (4.23) and (4.24), and we obtain that (4.15) is at most

2 c0 c1 √
   
q D D x
3q + D + log+ + 3c1 log+
π 2 q/2 c2 x/q q
(4.28)
|η 0 |1 c0 e 3 q 2
   
q 2x 55c0 c2
+ 3c1 min , + q + q · max 2, log ,
c2 q 12π 2 π 4π|η 0 |1 x
4.2. TYPE I ESTIMATES 63


where we are using the fact that 5c0 c2 /6π 2 < 2 c0 c1 /π to make sure that the term

(5c0 c2 /6π 2 )D0 from (4.24) is more than compensated by the term −2 c0 c1 R/π com-
ing from −R/q in (4.27) (by the definition of D0 and R, we have R ≥ D). We can

also use 5c0 c2 /6π 2 < 2 c0 c1 /π to bound the term (5c0 c2 /6π 2 )D0 from (4.24) by the

term 2 c0 c1 D/π in (4.28), in case c2 x/q ≥ D. (Again by definition, D0 ≤ D.) Thus,
(4.28) is valid both when c2 x/q < D and when c2 x/q ≥ D.

4.2.1 Type I: variations


We will need a version of Lemma 4.2.1 with m and n restricted to the odd numbers.
(We will barely be using the restriction of m, whereas the restriction on n is both (a)
slightly harder to deal with, (b) something that can be turned to our advantage.)
Lemma 4.2.2. Let α ∈ R/Z with 2α = a/q + δ/x, (a, q) = 1, |δ/x| ≤ 1/qQ0 ,
q ≤ Q0 , Q0 ≥ 16. Let η be continuous, piecewise C 2 and compactly supported, with
|η|1 = 1 and η 00 ∈ L1 . Let c0 ≥ |ηc00 |∞ .

Let 1 ≤ D ≤ x. Then, if |δ| ≤ 1/2c2 , where c2 = 6π/5 c0 , the absolute value of
X X  mn 
µ(m) e(αmn)η (4.29)
x
m≤D n odd
m odd

is at most

    2 !
x c0 X µ(m) c0 q 1 1 D
min 1, + O∗ − 2 +1 (4.30)
2q (πδ)2 m x 8 2π q
m≤ M
q
(m,2q)=1

plus
√ √
2 c0 c1 3c1 x + D c0 c1 D
D+ log + q log+
π 2 q c2 x/q π q/2
  √ (4.31)
2|η 0 |1 c0 e3 q 2
 
2 3c0 c1 3c1 55c0 c2
+ q · max 1, log + + + q,
π 4π|η 0 |1 x π 2c2 6π 2

where c1 = 1 + |η 0 |1 /(x/D) and M ∈ [min(Q0 /2, D), D]. The same bound holds if
|δ| ≥ 1/2c2 but D ≤ Q0 /2.
In general, if |δ| ≥ 1/2c2 , the absolute value of (4.8) is at most (4.30) plus
√ !!

  
2 c0 c1 x 1 + 2D
D + (1 + ) min + 1, 2D 3 + 2 + log x
π |δ|q 2 |δ|q
!
3 (1 + ) 2D x 35c0 c2
+ c1 2 + log+ x + q,
2  |δ|q Q 0 3π 2
(4.32)
for  ∈ (0, 1] arbitrary.
64 CHAPTER 4. TYPE I SUMS

If q is even, the sum (4.30) can be replaced by 0.


Proof. The proof is almost exactly that of Lemma 4.2.1; we go over the differences.
The parameters Q, Q0 , a0 , q 0 and M are defined just as before (with 2α wherever we
had α).
Let us first consider m ≤ M odd and divisible by q. (Of course, this case arises
only if q is odd.) For n = 2r + 1,

e(αmn) = e(αm(2r + 1)) = e(2αrm)e(αm)


    
δ a δ κ
=e rm e + + m
x 2q 2x 2
     
δ(2r + 1) a + κq m 0 δ(2r + 1)
=e m e =κe m ,
2x 2 q 2x

where κ ∈ {0, 1} and κ0 = e((a + κq)/2) ∈ {−1, 1} are independent of m and n.


Hence, by Poisson summation,
X X
e(αmn)η(mn/x) = κ0 e((δm/2x)n)η(mn/x)
n odd n odd
0
! (4.33)
κ X X
= fb(n) − f (n + 1/2) ,
b
2 n n

where f (u) = e((δm/2x)u)η((m/x)u). Now


 
x x δ
f (t) = ηb
b t− .
m m 2

Just as before, |x/m| ≥ 2|δq| ≥ 2δ. Thus


 
  X  x n δ
1 Xb X x 1 δ 1
f (n) − fb(n + 1/2) ≤  ηb − + ηb − 
2 n n
m 2 2 2 m 2 2
n6=0
   
 
x 1 δ 1 X 1
= ηb − + · O∗  2  · ηc00 
m 2 2 2 π nx
− δ ∞
n6=0 m
 
x δ m c0
= ηb − + (π 2 − 4)x.
2m 2 x 2π 2
(4.34)
The contribution of the second term in the last line of (4.34) is
X m c0 q c0 X
2
(π 2 − 4) = 2
(π 2 − 4) · m
x 2π x 2π
m≤M m≤M/q
m odd m odd
q|m
  2
qc0 1 1 M
= − +1 .
x 8 2π 2 q
4.2. TYPE I ESTIMATES 65

Hence, the absolute value of the sum of all terms with m ≤ M and q|m is given by
(4.30).
We define Tm,◦ (α) by
X  mn 
Tm,◦ (α) = e(αmn)η . (4.35)
x
n odd

Changing variables by n = 2r + 1, we see that


X
|Tm,◦ (α)| = e(2α · mr)η(m(2r + 1)/x) .
r

Hence, instead of (4.13), we get that


1 0
2 |η |1
 
x 1 m c0 1
|Tm,◦ (α)| ≤ min + |η 0 |1 , , . (4.36)
2m 2 | sin(2πmα)| x 2 (sin 2πmα)2
We obtain (4.14), but with Tm,◦ instead of Tm , A = (x/2y1 )(1 + |η 0 |1 /(x/y1 )) and
C = (c0 /2)(y2 /x), and so c1 = 1 + |η 0 |1 /(x/D).
The rest of the proof of Lemma 4.2.1 carries almost over word-by-word. (For the
sake of simplicity, we do not really try to take advantage of the odd support of m
here.) Since√C has doubled,
p it would seem to make sense to reset the value of c2 to be
c2 = (3π/5 2c0 )(1 + 13/3); this would cause complications related to the fact that

5c0 c2 /3π 2 would become larger than 2 c0 /π, and so we set c2 to the slightly smaller

value c2 = 6π/5 c0 instead. This implies

5c0 c2 2 c0
= . (4.37)
3π 2 π
The bound from (4.16) gets multiplied by 2 (but the value of c2 has changed), the
second line in (4.19) gets halved, (4.21) gets replaced by (4.37), the second term in
the maximum in the second line of (4.23) gets doubled, the bound from (4.24) gets
doubled, and the bound from (4.26) gets halved.
We will also need a version of Lemma 4.2.1 (or rather Lemma 4.2.2; we will decide
to work with the restriction that n and m be odd) with a factor of (log n) within the
inner sum. This is the sum SI,1 in (3.9).
Lemma 4.2.3. Let α ∈ R/Z √ with 2α = a/q + δ/x, (a, q) = 1, |δ/x| ≤ 1/qQ0 ,
q ≤ Q0 , Q0 ≥ max(16, 2 x). Let η be continuous, piecewise C 2 and compactly
supported, with |η|1 = 1 and η 00 ∈ L1 . Let c0 ≥ |ηc00 |∞ . Assume that, for any ρ ≥ ρ0 ,
ρ0 a constant, the function η(ρ) (t) = log(ρt)η(t) satisfies
0
|η(ρ) |1 ≤ log(ρ)|η|1 , |η(ρ) |1 ≤ log(ρ)|η 0 |1 , |ηd
00
(ρ) |∞ ≤ c0 log(ρ) (4.38)
√ √
Let 3 ≤ D ≤ min(x/ρ0 , x/e). Then, if |δ| ≤ 1/2c2 , where c2 = 6π/5 c0 , the
absolute value of  mn 
X X
µ(m) (log n)e(αmn)η (4.39)
n
x
m≤D
m odd n odd
66 CHAPTER 4. TYPE I SUMS

is at most

c0 /δ 2
 
x X µ(m) x x [ X µ(m)
min 1, log + |log ·η(−δ)|
q (2π)2 m mq q m
m≤ M
q m≤ M
q (4.40)
(m,q)=1 (m,q)=1

D2 e1/2 x 1
   
∗ 1 2
+O c0 − log +
2 π2 4qx D e

plus

2 c0 c1 ex 3c1 x D q
D log + log+ log
π D 2 q c2 x/q c2

2 c0 c1 √
 0
c0 e3 q 2
    
2|η |1 1 + D q
+ max 1, log log x + 3 + log log q
π 4π|η 0 |1 x π 2 q/2 c2
3/2 √
2x 20c0 c2 √
r
3c1 2x 2 ex
+ log + 2x log
2 c2 c2 3π 2 c2
(4.41)
for c1 = 1 + |η 0 |1 /(x/D). The same bound holds if |δ| ≥ 1/2c2 but D ≤ Q0 /2.
In general, if |δ| ≥ 1/2c2 , the absolute value of (4.39) is at most

2 c0 c1 ex
D log +
π D
√ !
√ + √
 
2 c0 c1 x 1 + 2D +
(1 + ) +1 3 + 2 · log 2 e|δ|q + log x log 2|δ|q
π |δ|q 2 |δ|q
√ 3/2 √
   
3c1 2 1+ 40
+ √ + log x + 2c0 c2 x log x
4 5 2 3
(4.42)
for  ∈ (0, 1].

Proof. Define Q, Q0 , M , a0 and q 0 as in the proof of Lemma 4.2.1. The same method of
proof works as for Lemma 4.2.1; we go over the differences. When applying Poisson
summation or (2.2), use η(x/m) (t) = (log xt/m)η(t) instead of η(t). Then use the
bounds in (4.38) with ρ = x/m; in particular,

00 x
|η\
(x/m) |∞ ≤ c0 log .
m

For f (u) = e((δm/2x)u)(log u)η((m/x)u),


 
x x δ
fb(t) = η\
(x/m) t−
m m 2
4.2. TYPE I ESTIMATES 67

and so
 
   
1X b x 1 δ 1X xn δ 
f (n/2) ≤ (x/m) −
η\ + ηb −
2 n m 2 2 2 m2 2
n6=0
    x   δ  m 
1x [ δ x  c0
= log ·η − + log ηb − + log (π 2 − 4).
2m 2 m 2 x m 2π 2

The part of the main term involving log(x/m) becomes


 
xb
η (−δ) X µ(m)  x  xµ(q) X µ(m) x
log = ηb(−δ) · log
2 m m q m mq
m≤M m≤M/q
m odd (m,2q)=1
q|m

for q odd. (We can see that this, like the rest of the main term, vanishes for m even.)
In the term in front of π 2 − 4, we find the sum

q M/q
X m x M Z
x x/q
log ≤ log + t log dt
x m x M 2 0 t
m≤M
m odd
q|m

M x M2 e1/2 x
= log + log ,
x M 4qx M
where we use the fact that t 7→ t log(x/t) is increasing for t ≤ x/e. By the same fact
(and by M ≤ D), (M 2 /q) log(e1/2 x/M ) ≤ (D2 /q) log(e1/2 x/D). It is also easy to
see that (M/x) log(x/M ) ≤ 1/e (since M ≤ D ≤ x).
The basic estimate for the rest of the proof (replacing (4.13)) is
X  mn  X  mn 
Tm,◦ (α) = e(αmn)(log n)η = e(αmn)η(x/m)
x x
n odd n odd
  
0 1 0 1 \ 00
x |η (x/m) | 1 2 |η(x/m) | 1 m 2 |η (x/m) | ∞
= O∗ min  |η(x/m) |1 + , , 
2m 2 | sin(2πmα)| x (sin 2πmα)2
1 0
|η 0 |1 2 |η |1
  
x x m c0 1
= O∗ log · min + , , .
m 2m 2 | sin(2πmα)| x 2 (sin 2πmα)2
We wish to bound
X X
|Tm,◦ (α)| + |Tm,◦ (α)|. (4.43)
m≤M Q
2 <m≤D
q-m
m odd

Just as in the proofs of Lemmas 4.2.1 and 4.2.2, we give two bounds, one valid for
|δ| large (|δ| ≥ 1/2c2 ) and the other for δ small (|δ| ≤ 1/2c2 ). Again as in the proof
of Lemma 4.2.2, we ignore the condition that m is odd in (4.15).
68 CHAPTER 4. TYPE I SUMS

Consider the case of |δ| large first. Instead of (4.16), we have


X 40 c0 q 3 X x
|Tm (α)| ≤ (j + 1) log . (4.44)
3π 2 2x jq + 1
1≤m≤M 0≤j≤ M
q
q-m

Since
X x
(j + 1) log
jq + 1
0≤j≤ M
q

M x X x X x
≤ log x + log + log + j log
q M M
jq jq
1≤j≤ q 1≤j≤ M
q −1
Z Mq Z Mq
M x x x
≤ log x + log + log dt + t log dt
q M 0 tq 1 tq
2 1/2
 
2M M e x
≤ log x + + 2 log ,
q 2q M
this means that
40 c0 q 3 M2 e1/2 x
   
X 2M
|Tm (α)| ≤ log x + + log
3π 2 4x q 2q 2 M
1≤m≤M
q-m (4.45)

5c0 c2 ex 40 √ 3/2 √
≤ M log + 2c0 c2 x log x,
3π 2 M 3
where we are using the bounds M ≤ Q/2 ≤ c2 x/q and q 2 ≤ 2c2 x (just as in (4.16)).
Instead of (4.17), we have
j k
D−(Q+1)/2
q0
! Z D
X x x x 2x x x dt
log ≤ log + 0 log
j=0
jq + Q+1
0
2
0
jq + 2Q+1 Q/2 Q q Q+1
2
t t
2x 2x x 2x 2D
≤ log + 0 log log+ ;
Q Q q Q Q
recall that the coefficient in front of this sum will be halved by the condition that n is
odd. Instead of (4.18), we obtain
D−(Q+1)/2
b q0
c s !
0
X q0 x
q 1+ 0 log
j=0
jq + (Q + 1)/2 jq 0 + Q+1 2
Z D 
√ 0

2x q x
≤ q 0 3 + 2 · log + 1+ log dt
Q+1 Q+1
2
2t t
√ 2x ex
≤ q 0 3 + 2 · log + D log
Q+1 D
0
Q+1 2ex q 2x 2D
− log + log log .
2 Q+1 2 Q+1 Q+1
4.2. TYPE I ESTIMATES 69

Rb
(The bound a log(x/t)dt/t ≤ log(x/a)
P log(b/a) will be more practical than the exact
expression for the integral.) Hence Q/2<m≤D |Tm (α)| is at most

2 c0 c1 ex
D log
π  D



2 c0 c1 (1 + ) 2D 2x
+ (1 + ) 3 + 2 + log (Q + 1) log
π 2 Q+1 Q+1

√ √
 
2 c0 c1 Q + 1 2ex 3c1 2 1+ D
− · log + √ + log+ x log x.
π 2 Q+1 2 5  Q/2
Summing this to (4.45) (with M = Q/2), and using (4.21) and (4.22) as before, we
obtain that (4.43) is at most

2 c0 c1 ex
D log
π D √


 
2 c0 c1 2 ex 1 2D 2x
+ (1 + )(Q + 1) 3 + 2 log+ + log+ log+
π Q+1 2 Q+1 Q+1
√ √ √ √
 
3c1 2 1+ D 40 3/2
+ √ + log+ x log x + 2c0 c2 x log x.
2 5  Q/2 3
Now we go over the case of |δ| small (or D ≤ Q0 /2). Instead of (4.23), we have
2|η 0 |1 c0 e3 q 2
X  
|Tm,◦ (α)| ≤ q max 1, log log x. (4.46)
π 4π|η 0 |1 x
m≤q/2

Suppose q 2 < 2c2 x. (Otherwise, the sum we are about to estimate is empty.) Instead
of (4.24), we have
40 c0 q 3
 
X X 1 x
|Tm,◦ (α)| ≤ 2 6x
j+ log 1

q 0
3π D0
2 j − 2 q
2 <m≤D
1
1≤j≤ q +2
q-m
Z 0 Z 0 !
10c0 q 3 2x 1 D x 1 D x D0 x
≤ log + log dt + t log dt + log 0
3π 2 x q q 0 t q 0 t q D

10c0 q 3 2D0 (D0 )2
   
2x ex
= log + + log
2
3π x q q 2q 2 D0
√ √ 
√ √

5c0 c2 2x ex 0 ex
≤ 4 2c2 x log + 4 2c 2 x log + D log
3π 2 q D0 D0
√ √ 


5c0 c2 0 ex 2 ex
≤ D log + 4 2c 2 x log
3π 2 D0 c2
(4.47)
where D0 = min(c2 x/q, D). (We are using the bounds q 3 /x ≤ (2c2 )3/2 , D0 q 2 /x ≤
3/2 √
c2 q < c 2 2x and D0 q/x ≤ c2 .) Instead of (4.25), we have
b D−R
q c 3c1
s  !
2 x 4q c1 c0 q x
X X
|Tm,◦ (α)| ≤ + 1+ log ,
j=0
jq + R π 4 jq + R jq + R
R<m≤D
70 CHAPTER 4. TYPE I SUMS

where R = max(c2 x/q, q/2). We can simply reuse (4.26), multiplying it by log x/R;
the only p we take care to bound min(q/c2 , 2x/q) by the geometric
pdifference is that now
mean (q/c2 )(2x/q) = 2x/c2 . We replace (4.27) by

b q1 (D−R)
X cr 1 D
r Z r
q x q x q x
1+ log ≤ 1 + log + 1 + log dt
j=0
jq + R jq + R R R q R t t

 
q D ex R ex 1 q D
≤ 3 log + log − log + log log+ .
c2 q D q R 2 c2 R
(4.48)
We sum with (4.46) and (4.47), and obtain (4.41) as an upper √bound for (4.43). (Just as
in the proof of Lemma 4.2.1, the term (5c0 c2 /(3π 2 ))D0 log( ex/D0 ) is smaller than

the term (2 c1 c0 /π)R log ex/R in (4.48), and thusPgets absorbed by it when D > R.
If D ≤ R, then, again as in Lemma 4.2.1, the sum R<m≤D |Tm,◦ (α)| is empty, and
√ √
we bound (5c0 c2 /(3π 2 ))D0 log( ex/D0 ) by the term (2 c1 c0 /π)D log ex/D, which
would not appear otherwise.)
Now comes the time to focus on our second type I sum, namely,
X X X
Λ(v) µ(u) e(αvun)η(vun/x),
v≤V u≤U n
v odd u odd n odd

which corresponds to the term SI,2 in (3.9). The innermost two sums, on their own,
are a sum of type I we have already seen. Accordingly, for q small, we will be able to
bound them using Lemma 4.2.2. If q is large, then that approach does not quite work,
since then the approximation av/q to vα is not always good enough. (As we shall later
see, we need q ≤ Q/v for the approximation to be sufficiently close for our purposes.)
Fortunately, when q is large, we can also afford to lose a factor of log, since the
gains from q will be large. Here is the estimate we will use for q large.
Lemma 4.2.4. Let α ∈ R/Z √ with 2α = a/q + δ/x, (a, q) = 1, |δ/x| ≤ 1/qQ0 ,
q ≤ Q0 , Q0 ≥ max(2e, 2 x). Let η be continuous, piecewise C 2 and compactly

supported, with |η|1 = 1 and η 00 ∈ L1 . Let c0 ≥ |ηc00 |∞ . Let c2 = 6π/5 c0 . Assume
that x ≥ e2 c2 /2.
Let U, V ≥ 1 satisfy U V + (19/18)Q0 ≤ x/5.6. Then, if |δ| ≤ 1/2c2 , the absolute
value of

X X X
Λ(v) µ(u) e(αvun)η(vun/x) (4.49)
v≤V u≤U n
v odd u odd n odd

is at most
 
x c0
min 1, log V q
2q (πδ)2
 2 (4.50)
3c4 U V 2 (U + 1)2 V
  
1 1 D log V
+ O∗ − 2 · c0 + + log q
4 π 2qx 2 x 2x
4.2. TYPE I ESTIMATES 71

plus
√  


2 c0 c1 D c2 x log D + D
D log √ +q 3 log + log
π e q 2 q/2
0
c0 e 3 q 2
 
3c1 x + D 2|η |1 q
+ log D log + q max 1, log log (4.51)
2 q c2 x/q π 4π|η 0 |1 x 2
3c1 √ c2 x 25c0 √
+ √ x log + 2
(2c2 )3/2 x log x,
2 2c2 2 4π

where D = U V and c1 = 1 + |η 0 |1 /(2x/D) and c4 = 1.03884. The same bound holds


if |δ| ≥ 1/2c2 but D ≤ Q0 /2.
In general, if |δ| ≥ 1/2c2 , the absolute value of (4.49) is at most (4.50) plus

2 c0 c1 D
D log
π e
√ x
!
√ |δ|q + 1
2
 
2 c0 c1 x 1 + e D
+ (1 + ) +1 ( 3 + 2 − 1) log √ + log D log x
π |δ|q 2 2 |δ|q

   
3c1 1 3(1 + ) 20c0
+ + log x + (2c2 )3/2 x log x
2 2 16 3π 2
(4.52)
for  ∈ (0, 1].

Proof. We proceed essentially as in Lemma 4.2.1 and Lemma 4.2.2. Let Q, q 0 and Q0
be as in the proof of Lemma 4.2.2, that is, with 2α where Lemma 4.2.1 uses α.
Let M = min(U V, Q/2). We first consider the terms with uv ≤ M , u and v odd,
uv divisible by q. If q is even, there are no such terms. Assume q is odd. Then, by
(4.33) and (4.34), the absolute value of the contribution of these terms is at most
 
!!
X X  xb
η (−δ/2) a |ηc00 |∞ 2
Λ(v)µ(a/v) +O · (π − 4) . (4.53)
 
2a x 2π 2

 
a≤M v|a
a odd a/U ≤v≤V
q|a

Now
X X Λ(v)µ(a/v)
a
a≤M v|a
a odd a/U ≤v≤V
q|a
X Λ(v) X µ(u) X Λ(pα ) X µ(u)
= + ,
v u α
pα u
v≤V u≤min(U,M/V ) p ≤V u≤min(U,M/V )
v odd u odd p odd u odd
(v,q)=1 q
q|u p|q (q,pα )
|u
72 CHAPTER 4. TYPE I SUMS

which equals

µ(q) X Λ(v) X µ(u)


q v u
v≤V u≤min(U/q,M/V q)
v odd (u,2q)=1
(v,q)=1
 
q
µ (q,pα ) X Λ(pα ) X µ(u)
+
q pα /(q, pα ) u
pα ≤V u≤min( q/(q,p
M/V
α ) , q/(q,pα ) )
U
p odd
u odd
p|q
(u, (q,pq α ) )=1
 
 
1  X
∗ Λ(v) X log p 
= ·O  + ,

q  v pα /(q, pα ) 

 v≤V pα ≤V 
(v,2q)=1 p odd
p|q

where we are using (2.20) to bound the sums on u by 1. We notice that


 
X log p X  X 1 
≤ (log p) vp (q) +
 
pα /(q, pα ) pα−vp (q) 

pα ≤V

p odd α>vp (q)
p odd p|q pα ≤V
p|q
X X log p X Λ(v)
≤ log q + (log p) β
≤ log q + ,
p v
p odd β>0 v≤V
p|q pβ ≤ v V(q) v odd
p p (v,q)=1

and so
 
X X Λ(v)µ(a/v) 1 X Λ(v) 
= · O∗ 

log q + 
a q  v 
a≤M v|a v≤V
a odd a/U ≤v≤V (v,2)=1
q|a
1
= · O∗ (log q + log V )
q

by (2.12). The absolute value of the sum of the terms with ηb(−δ/2) in (4.53) is thus at
most  
x ηb(−δ/2) x c0
(log q + log V ) ≤ min 1, log V q,
q 2 2q (πδ)2
where we are bounding ηb(−δ/2) by (2.1) (with k = 2).
4.2. TYPE I ESTIMATES 73

The other terms in (4.53) contribute at most

|ηc00 |∞ 1 X X
(π 2 − 4) Λ(v)uv. (4.54)
2π 2 x
u≤U v≤V
uv odd
uv≤M, q|uv
u sq-free

For any R, u≤R,u odd,q|u ≤ R2 /4q + 3R/4. Using the estimates (2.12), (2.15)
P
and (2.16), we obtain that the double sum in (4.54) is at most
X X X X
Λ(v)v u+ (log p)pα u
v≤V u≤min(U,M/v) pα ≤V u≤U
(v,2q)=1 u odd p odd u odd
q
q|u p|q (q,pα )
|u

(M/v)2 (U + 1)2
 
X 3M X
≤ Λ(v)v · + + (log p)pα · (4.55)
4q 4v 4
v≤V pα ≤V
(v,2q)=1 p odd
p|q

M 2 log V 3c4 (U + 1)2


≤ + MV + V log q,
4q 4 4

where c4 = 1.03884.
From this point onwards, we use the easy bound

X
Λ(v)µ(a/v) ≤ log a.
v|a
a/U ≤v≤V

What we must bound now is


X X
(log m) e(αmn)η(mn/x). (4.56)
m≤U V n odd
m odd
q - m or m > M

The inner sum is the same as the sum Tm,◦ (α) in (4.35); we will be using the bound
(4.36). Much as before, we will be able to ignore the condition that m is odd.
Let D = U V . What remains to do is similar to what we did in the proof of Lemma
4.2.1 (or Lemma 4.2.2).
Case (a). δ large: |δ| ≥ 1/2c2 . Instead of (4.16), we have

X 40 c0 q 3 X
(log m)|Tm,◦ (α)| ≤ (j + 1) log(j + 1)q,
3π 2 4x
1≤m≤M 0≤j≤ M
q
q-m
74 CHAPTER 4. TYPE I SUMS


and, since M ≤ min(c2 x/q, D), q ≤ 2c2 x (just as in the proof of Lemma 4.2.1) and
X
(j + 1) log(j + 1)q
0≤j≤ M
q
  Z M
M M 1
≤ log M + + 1 log(M + 1) + 2 t log t dt
q q q 0
M2
 
M M
≤ 2 + 1 log x + 2 log √ ,
q 2q e
we conclude that
X 5c0 c2 M 20c0 √
|Tm,◦ (α)| ≤ M log √ + (2c2 )3/2 x log x.
3π 2 e 3π 2 (4.57)
1≤m≤M
q-m

Instead of (4.17), we have


D−(Q+1)/2
b q0
c
x D log t
  Z
X x 0 Q+1 xQ+1
log jq + ≤ log
+ dt
j=0
jq 0 + Q+1
2
2 Q+1
2
2 q 0 Q+1
2
t
 2 !
2x Q (1 + )x 2 Q
≤ log + (log D) − log .
Q 2 2Q 2

Instead of (4.18), we estimate


 Q+1 
D−
2
q0  s
q0
 
0
X Q+1 0
q log + jq 1+
j=0
2 jq + Q+1
0
2
 Z D Z D 0


Q+1 q log t
≤ q 0 log D + ( 3 + 2 − 1) log + log t dt + dt
2 Q+1
2
Q+1
2
2t

   
0
 Q+1 D Q+1 Q+1
≤ q log D + 3 + 2 − 1 log + D log − log
2 e 2 2e
q0 D
+ log D log+ Q+1 .
2 2
P
We conclude that, when D ≥ Q/2, the sum Q/2<m≤D (log m)|Tm (α)| is at most


  
2 c0 c1 D Q+1 1 Q+1
D log + (Q + 1) (1 + )( 3 + 2 − 1) log − log
π e 2 2 2e
√ 2
c0 c1 e D
+ (Q + 1)(1 + ) log D log+ Q+1
π 2
 2 !!
3c1 2x Q (1 + )x Q
+ log + (log D)2 − log .
2 Q 2 2Q 2
4.2. TYPE I ESTIMATES 75

We must now add this to (4.57). Since



p
√ 1 1 + 13/3 √
(1 + )( 3 + 2 − 1) log 2 − log 2e + log 2 e > 0
2 2

and Q ≥ 2 x, we conclude that (4.56) is at most

2 c0 c1 D
D log
π e
√ !
2 c0 c1 √ Q+1 1 + e D
2
+ (1 + )(Q + 1) ( 3 + 2 − 1) log √ + log D log Q+1
π 2 2 2

   
3c1 1 3(1 + ) 20c0
+ + log x + (2c2 )3/2 x log x.
2 2 16 3π 2
(4.58)
Case (b). δ small: |δ| ≤ 1/2c2 or D ≤ Q0 /2. The analogue of (4.23) is a bound of
2|η 0 |1 c0 e3 q 2
 
q
≤ q max 1, log 0
log
π 4π|η |1 x 2
for the terms with m ≤ q/2. If q 2 < 2c2 x, then, much as in (4.24), we have
10 c0 q 3
 
X X 1
|Tm,◦ (α)|(log m) ≤ 2 j+ log(j + 1/2)q
q 0
π 3x D0
2
2 <m≤D
1
1≤j≤ q +2
q-m (4.59)
Z D 0 + 23 q
10 c0 q
≤ x log x dx.
π 2 3x q

Since
D 0 + 32 q 2
D0 + 3 q 1
Z 
1 0 3 q
x log x dx = D + q log √ 2 − q 2 log √
q 2 2 e 2 e
  0
 0 3
1 02 3 0 D 3 q 9 D + q 1 q
= D + Dq log √ + + q 2 log √ 2 − q 2 log √
2 2 e 2 D0 8 e 2 e
0
  
1 D 3 9 2 3 19
= D02 log √ + D0 q log D0 + q 2 + + log D0 + q ,
2 e 2 8 9 2 18
where D0 = min(c2 x/q, D), and since the assumption (U V + (19/18)Q0 ) ≤ x/5.6
implies that (2/9 + 3/2 + log(D0 + (19/18)q)) ≤ x, we conclude that
X
|Tm,◦ (α)|(log m)
q 0
2 <m≤D
q-m

D0 √ √
 
5c0 c2 0 10c0 3 3/2 9 3/2 (4.60)
≤ D log √ + (2c 2 ) x log x + (2c 2 ) x log x
3π 2 e 3π 2 4 8
5c0 c2 0 D0 25c0 √
≤ D log √ + (2c2 )3/2 x log x.
3π 2 e 4π 2
76 CHAPTER 4. TYPE I SUMS

Let R = max(c2 x/q, q/2). We bound the terms R < m ≤ D as in (4.25), with a
factor of log(jq + R) inside the sum. The analogues of (4.26) and (4.27) are

b q1 (D−R)
X c x D log t
Z
x x
log(jq + R) ≤ log R + dt
jq + R R q R t
j=0 (4.61)
r r
2x c2 x x D
≤ log + log D log+ ,
c2 2 q R

where we use the assumption that x ≥ e2 c/2, and

X c
b q1 (D−R) r
q √
log(jq + R) 1 + ≤ 3 log R
jq + R
j=0 (4.62)
 
1 D R 1 D
+ D log − R log + log D log
q e e 2 R

(or 0 if D < R). We sum with (4.60) and the terms with m ≤ q/2, and obtain, for
D0 = c2 x/q = R,
√  


2 c0 c1 D c2 x log D + D
D log √ +q 3 log + log
π e q 2 q/2
0
c0 e3 q 2
 
3c1 x + D 2|η |1 q
+ log D log + q max 1, log log
2 q c2 x/q π 4π|η 0 |1 x 2
3c1 √ c2 x 25c0 √
+ √ x log + 2
(2c2 )3/2 x log x,
2 2c2 2 4π

which, it is easy to check, is also valid even if D0 = D (in which case (4.61) and (4.62)
do not appear) or R = q/2 (in which case (4.60) does not appear).
Chapter 5

Type II sums

We must now consider the sum


 
X X  X
SII = 
 µ(d)
 Λ(n)e(αmn)η(mn/x). (5.1)
m>U d>U n>V
(m,v)=1 d|m (n,v)=1

Here the main improvements over classical treatments of type II sums are as fol-
lows:

1. obtaining cancellation in the term


X
µ(d),
d>U
d|m

leading to a gain of a factor of log;

2. using a large sieve for primes, getting rid of a further log;

3. exploiting, via a non-conventional application of the principle of the large sieve


(Lemma 5.2.1), the fact that α is in the tail of an interval (when that is the case).

It should be clear that these techniques are of general applicability. (It is also clear that
(2) is not new, though, strangely enough, it seems not to have been applied to Gold-
bach’s problem. Perhaps this oversight is due to the fact that proofs of Vinogradov’s
result given in textbooks often follow Linnik’s dispersion method, rather than the large
sieve. Our treatment of the large sieve for primes will follow the lines set by Mont-
gomery and Montgomery-Vaughan [MV73, (1.6)]. The fact that the large sieve for
primes can be combined with the new technique (3) is, of course, a novelty.)
While (1) is particularly useful for the treatment of a term that generally arises in
applications of Vaughan’s identity, all of the points above address issues that can arise
in more general situations in number theory.

77
78 CHAPTER 5. TYPE II SUMS

It is technically helpful to express η as the (multiplicative) convolution of two func-


tions of compact support – preferrably the same function:
Z ∞
dt
η(x) = η1 ∗M η1 = η1 (t)η1 (x/t) . (5.2)
0 t

For the smoothing function η(t) = η2 (t) = 4 max(log 2 − | log 2t|, 0), equation (5.2)
holds with η1 = 2 · 1[1/2,1] , where 1[1/2,1] is the characteristic function of the interval
[1/2, 1]. We will work with η = η2 , yet most of our work will be valid for any η of the
form η = η1 ∗ η1 .
By (5.2), the sum (5.1) equals
 
Z ∞  
X X  X mn/x dt
4  µ(d)  Λ(n)e(αmn)η 1 (t)η 1
0
  t t
m>U d>U n>V
(m,v)=1 d|m (n,v)=1
 
Z x/U X X  X dW
=4  µ(d) Λ(n)e(αmn)
V
  W
max( 2W ,U )<m≤ W d>U max(V, W
2 )<n≤W
x x
d|m
(m,v)=1 (n,v)=1
(5.3)
by the substitution t = (m/x)W . (We can assume V ≤ W ≤ x/U because otherwise
one of the sums in (5.4) is empty.) As we can see, the sums within the integral are now
unsmoothed. This will not be truly harmful, and to some extent it will be convenient,
in that ready-to-use large-sieve estimates in the literature have been optimized more
carefully for unsmoothed sums than for smooth sums. The fact that the sums start at
x/2W and W/2 rather than at 1 will also be slightly helpful.
(This is presumably why the weight η2 was introduced in [Tao14], which also uses
the large sieve. As we will later see, the weight η2 – or anything like it – will simply
not do on the major arcs, which are much more sensitive to the choice of weights. On
the minor arcs, however, η2 is convenient, and this is why we use it here. For type I
sums – as should be clear from our work so far, which was stated for general weights
– any function whose second derivative exists almost everywhere and lies in `1 would
do just as well. The option of having no smoothing whatsoever – as in Vinogradov’s
work, or as in most textbook accounts – would not be quite as good for type I sums,
and would lead to a routine but inconvenient splitting of sums into short intervals in
place of (5.3).)
We now do what is generally the first thing in type II treatments: we use Cauchy-
Schwarz. A minor note, however, that may help avoid confusion: the treatments fa-
miliar to some readers (e.g., the dispersion method, not followed here) start with the
special case of Cauchy-Schwarz that is most common in number theory
2
X X
an ≤N |an |2 ,
n≤N n≤N
79

whereas here we apply the general rule

s s
X X X
am bm ≤ |am |2 |bm |2 .
m m m

to the integrand in (5.3). At any rate, weP will haveP reduced the estimation of a sum
to the estimation of two simpler sums m |am |2 , m |bm |2 , but each of these two
simpler sums will be of a kind that we will lead to a loss of a factor of log x (or
(log x)3 ) if not estimated carefully. Since we cannot afford to lose a single factor of
log x, we will have to deploy and develop techniques to eliminate these factors of log x.
The procedure followed will be quite different for the two sums; a variety of techniques
will be needed.
We separate n prime and n non-prime in the integrand of (5.3), and, as we were
saying, we applypCauchy-Schwarz. We obtain p that the expression within the integral in
(5.3) is at most S1 (U, W ) · S2 (U, V, W ) + S1 (U, W ) · S3 (W ), where

 2
X X 
S1 (U, W ) = 
  ,
µ(d)
max( 2W
x
,U )<m≤ W
x d>U
d|m
(m,v)=1
2 (5.4)

X X
S2 (U, V, W ) = (log p)e(αmp) .
max( 2W
x
,U )<m≤ W
x
2 )<p≤W
max(V, W
(m,v)=1 (p,v)=1

and
2

X X
S3 (W ) = Λ(n)
x x
2W <m≤ W n≤W
(m,v)=1 n non-prime (5.5)
X  2
= 1.42620W 1/2 ≤ 1.0171x + 2.0341W
x x
2W <m≤ W
(m,v)=1

(by [RS62, Thm. 13]). We will assume V ≤ w; thus the condition (p, v) = 1 will be
fulfilled automatically and can be removed.
The contribution of S3 (W ) will be negligible. We must bound S1 (U, W ) and
S2 (U, V, W ) from above.
80 CHAPTER 5. TYPE II SUMS

5.1 The sum S1 : cancellation


We shall bound
 2
X X 
S1 (U, W ) = 
  .
µ(d) (5.6)
max(U,x/2W )<m≤x/W d>U
(m,v)=1 d|m

There will be a surprising amount of cancellation: the expression within the sum
will be bounded by a constant on average – a constant less than 1, and usually less than
1/2, in fact. In other words, the inner sum in (5.6) is exactly 0 most of the time.
Recall that we need explicit constants throughout, and that this essentially con-
strains us to elementary means. (We will at one point use Dirichlet series and ζ(s) for
s real and greater than 1.)

5.1.1 Reduction to a sum with µ


It is tempting to start by applying Möbius inversion to change d > U to d ≤ U in
(5.6), but this just makes matters worse. We could also try changing variables so that
m/d (which is smaller than x/U W ) becomes the variable instead of d, but this leads
to complications for m non-square-free. Instead, we write
 2
X X  X X

 µ(d)
 = µ(d1 > U )µ(d2 > U )
x x
max(U,x/2W )<m≤x/W d>U 2W <m≤ W d1 ,d2 |m
(m,v)=1 d|m (m,v)=1
X X X X
= µ(r1 l)µ(r2 l) 1,
x x
r1 <x/W U r2 <x/W U l 2W <m≤ W
(r1 ,r2 )=1 (l,r1 r2 )=1 r1 r2 l|m
(r1 r2 ,v)=1 r1 l,r2 l>U (m,v)=1
(`,v)=1
(5.7)
where d1 = r1 l, d2 = r2 l, l = (d1 , d2 ). (The inequality r1 < x/W U comes from
r1 r2 l|m, m ≤ x/W , r2 l > U ; r2 < x/W U is proven in the same way.) Now (5.7)
equals
X X X X
µ(r1 )µ(r2 ) 1, (5.8)
s< WxU r1 < WxU s r2 < WxU s
 
x/W x/W
max min(rU , <l≤ r
1 ,r2 ) 2r1 r2 s 1 r2 s
(s,v)=1 (r1 ,r2 )=1
(l,r1 r2 )=1,(µ(l))2 =1
(r1 r2 ,v)=1
(`,v)=1

where we have set s = m/(r1 r2 l). We begin by simplifying the innermost triple sum.
This we do in the following Lemma; it is not a trivial task, and carrying it out efficiently
actually takes an idea.
5.1. THE SUM S1 : CANCELLATION 81

Lemma 5.1.1. Let z, y > 0. Then


X X X
µ(r1 )µ(r2 ) 1 (5.9)
r1 <y r2 <y
 
z/y
min min(r ,r , z <l≤ r zr
(r1 ,r2 )=1 1 2 ) 2r1 r2 1 2

(r1 r2 ,v)=1 (l,r1 r2 )=1,(µ(l))2 =1


(`,v)=1

equals
  
6z v X X µ(r1 )µ(r2 ) 1 r1 r2
1 − max , ,
π 2 σ(v) r <y r <y σ(r1 )σ(r2 ) 2 y y
1 2
(r1 ,r2 )=1
(r1 r2 ,v)=1 (5.10)
 
 2 2
√ Y 
3 1 1
+ O∗ 5.08 ζ y z· 1+ √ 1 − 3/2  .
2 p p
p|v

If v = 2, the error term in (5.10) can be replaced by


 2 2 !

 
∗ 3 1 1
O 1.27ζ y z· 1+ √ 1 − 3/2 . (5.11)
2 2 2

Proof. By Möbius inversion, (5.9) equals


X X X X
µ(r1 )µ(r2 ) µ(d1 )µ(d2 )
r1 <y r2 <y l≤ r zr d1 |r1 ,d2 |r2
1 2
(r1 ,r2 )=1 
z/y
 d d |l
1 2
l>min , z
(r1 r2 ,v)=1 min(r1 ,r2 ) 2r1 r2
(`,v)=1 (5.12)
X X
µ(d3 ) µ(m).
d3 |v m2 |l
d3 |l (m,r1 r2 v)=1

We can change the order of summation of ri and di by defining si = ri /di , and we can
also use the obvious fact that the number of integers in an interval (a, b] divisible by d
is (b − a)/d + O∗ (1). Thus (5.12) equals
X X
µ(d1 )µ(d2 ) µ(d1 s1 )µ(d2 s2 )
d1 ,d2 <y s1 <y/d1
(d1 ,d2 )=1 s2 <y/d2
(d1 d2 ,v)=1 (d1 s1 ,d2 s2 )=1
(s1 s2 ,v)=1
  
X X µ(m) z 1 s1 d1 s2 d2
µ(d3 ) 2
1 − max , ,
d1 d2 d3 m s1 d1 s2 d2 2 y y
d3 |v z
q
m≤
d2 2
1 s1 d 2 s 2 d 3
(m,d1 s1 d2 s2 v)=1
(5.13)
82 CHAPTER 5. TYPE II SUMS

plus  
 
 
X X X X
O∗ 
 
 .
1 (5.14)
d1 ,d2 <y s1 <y/d1 d3 |v m≤ z
 q 
 d2 2
1 s1 d2 s2 d3

(d1 d2 ,v)=1 s2 <y/d2
m sq-free
(s1 s2 ,v)=1

If we complete the innermost sum in (5.13) by removing the condition


q
m ≤ z/(d21 sd22 s2 ),

we obtain (reintroducing the variables ri = di si )


X µ(r1 )µ(r2 )  
1 r1 r2

z· 1 − max , ,
r ,r <y
r1 r2 2 y y
1 2
(r1 ,r2 )=1
(r1 r2 ,v)=1 (5.15)
XX X µ(d1 )µ(d2 )µ(m)µ(d3 )
m
d1 d2 d3 m2
d1 |r1 d3 |v
d2 |r2 (m,r1 r2 v)=1

times z. Now (5.15) equals


X µ(r1 )µ(r2 )z  
1 r1 r2
 Y  
1 Y

1

1 − max , , 1− 1− 2
r ,r <y
r1 r2 2 y y p p
1 2 p|r1 r2 p-r1 r2
(r1 ,r2 )=1 or v p-v
(r1 r2 ,v)=1
  
6z v X µ(r1 )µ(r2 ) 1 r1 r2
= 1 − max , , ,
π 2 σ(v) r1 ,r2 <y
σ(r1 )σ(r2 ) 2 y y
(r1 ,r2 )=1
(r1 r2 ,v)=1

i.e., the main term in (5.10). It remains to estimate the terms used to complete the
sum; p exactly by (5.13) with the inequality m ≤
p their total is, by definition, given
z/(d21 sd22 s2 d3 ) changed to m > z/(d21 sd22 s2 d3 ). This is a total of size at most
1 X X X X 1 z
2
. (5.16)
2 d1 d2 d3 m s1 d1 s2 d2
d1 ,d2 <y s1 <y/d1 d3 |v m> z
q
d2 2
1 s1 d 2 s2 d 3
(d1 d2 ,v)=1 s2 <y/d2
m sq-free
(s1 s2 ,v)=1

Adding this to (5.14), we obtain, as our total error term,


X X X r z

f , (5.17)
d21 s1 d22 s2 d3
d1 ,d2 <y s1 <y/d1 d3 |v
(d1 d2 ,v)=1 s2 <y/d2
(s1 s2 ,v)=1
5.1. THE SUM S1 : CANCELLATION 83

where
X 1 X x2
f (x) := 1+ .
2 m>x m2
m≤x
m sq-free m sq-free

It is easy to see that f (x)/x has a local maximum exactly when x is a square-free
(positive) integer. We can hence check that
  
1 ζ(2)
f (x) ≤ 2+2 − 1.25 x = 1.26981 . . . x
2 ζ(4)

for all x ≥ 0 by checking all integers smaller than a constant, using {m : m sq-free} ⊂
{m : 4 - m} and 1.5 · (3/4) < 1.26981 to bound f from below for x larger than a
constant. Therefore, (5.17) is at most
X X Xr z
1.27
d21 s1 d22 s2 d3
d1 ,d2 <y s1 <y/d1 d3 |v
(d1 d2 ,v)=1 s2 <y/d2
(s1 s2 ,v)=1
 2

√ Y
   X
1 X 1 
= 1.27 z 1+ √ · √  .
 
p  d s
p|v d<y s<y/d
(d,v)=1 (s,v)=1

We can bound the double sum simply by


p  Y 
X X 1 X y/d √ 3 1
√ ≤2 ≤2 y·ζ 1 − 3/2 .
sd d 2 p
d<y s<y/d d<y p|v
(d,v)=1

Alternatively, if v = 2, we bound

1 y/d 1
Z
X 1 X 1 p
√ = √ ≤1+ √ ds = y/d
s s 2 1 s
s<y/d s<y/d
(s,v)=1 s odd

and thus
p    
X X 1 X y/d √ 1 3
√ ≤ ≤ y 1 − 3/2 ζ .
sd d 2 2
d<y s<y/d d<y
(d,v)=1 (s,v)=1 (d,2)=1

Applying Lemma 5.1.1 with y = S/s and z = x/W s, where S = x/W U , we


84 CHAPTER 5. TYPE II SUMS

obtain that (5.8) equals


6x v X 1 X X µ(r1 )µ(r2 )  
1 r1 r2

2
1 − max , ,
π W σ(v) s σ(r1 )σ(r2 ) 2 S/s S/s
s<S r1 <S/s r2 <S/s
(s,v)=1 (r1 ,r2 )=1
(r1 r2 ,v)=1
 
 3 r Y   3
3 x 1 1
+ O∗ 5.04ζ S 1+ √ 1 − 3/2  ,
2 W p p
p|v
(5.18)
with 5.04 replaced by 1.27 if v = 2. The main term in (5.18) can be written as
6x v X 1 Z 1 X X µ(r1 )µ(r2 )
du. (5.19)
π 2 W σ(v) s 1/2 σ(r1 )σ(r2 )
s≤S uS uS
r1 ≤ s r2 ≤ s
(s,v)=1 (r1 ,r2 )=1
(r1 r2 ,v)=1

As we can see, the use of an integral eliminates the unpleasant factor


  
1 r1 r2
1 − max , , .
2 S/s S/s
From now on, we will focus on the cases v = 1 and v = 2 for simplicity. (Higher
values of v do not seem to be really profitable in the last analysis.)

5.1.2 Explicit bounds for a sum with µ


We must estimate the expression within parentheses in (5.19). It is not too hard to
show thatPit tends to 0; the first part of the proof of Lemma 5.1.2 will reduce this to the
fact that n µ(n)/n = 0. Obtaining good bounds is a more delicate matter. For our
2
purposes, we will need the expression to converge to 0 at least
P as fast as 1/(log) , with
a good constant in front. For this task, the bound (2.21) on n≤x µ(n)/n is enough.
Lemma 5.1.2. Let
X X µ(r1 )µ(r2 )
gv (x) := ,
σ(r1 )σ(r2 )
r1 ≤x r2 ≤x
(r1 ,r2 )=1
(r1 r2 ,v)=1

where v = 1 or v = 2. Then

1/x
 if 33 ≤ x ≤ 106 ,
1
|g1 (x)| ≤ x (111.536 + 55.768 log x) if 106 ≤ x < 1010 ,
 0.0044325 + 0.1079

√ if x ≥ 1010 ,
(log x)2 x

2.1/x
 if 33 ≤ x ≤ 106 ,
|g2 (x)| ≤ x1 (1634.34 + 817.168 log x) if 106 ≤ x < 1010 ,
 0.038128 + 0.2046

√ . if x ≥ 1010 .
(log x)2 x
5.1. THE SUM S1 : CANCELLATION 85

Tbe proof involves what may be called a version of Rankin’s trick, using Dirichlet
series and the behavior of ζ(s) near s = 1.

Proof. We prove the statements for x ≤ 106 by a direct computation, using interval
arithmetic. (In fact, in that range, one gets 2.0895071/x instead of 2.1/x.) Assume
from now on that x > 106 .
Clearly
 
X X X µ(r1 )µ(r2 )
g(x) =  µ(d)
σ(r1 )σ(r2 )
r1 ≤x r2 ≤x d|(r1 ,r2 )
(r1 r2 ,v)=1
X X X µ(r1 )µ(r2 )
= µ(d)
σ(r1 )σ(r2 )
d≤x r1 ≤x r2 ≤x
(d,v)=1 d|(r1 ,r2 )
(r1 r2 ,v)=1
(5.20)
X µ(d) X X µ(u1 )µ(u2 )
=
(σ(d))2 σ(u1 )σ(u2 )
d≤x u1 ≤x/d u2 ≤x/d
(d,v)=1 (u1 ,dv)=1 (u2 ,dv)=1
 2
X µ(d)  X µ(r) 
=  .
 
(σ(d))2 σ(r) 


d≤x r≤x/d
(d,v)=1 (r,dv)=1

Moreover,
 
X µ(r) X µ(r) X Y p
= −1
σ(r) r 0 0
p+1
r≤x/d r≤x/d d |r p|d
(r,dv)=1 (r,dv)=1
 
X Y −1 X µ(r)
=  
p+1 r
d0 ≤x/d 0p|d r≤x/d
µ(d0 )2 =1 (r,dv)=1
(d0 ,dv)=1 d0 |r
X 1 X µ(r)
=
d0 σ(d0 ) r
d0 ≤x/d r≤x/dd0
µ(d0 )2 =1 (r,dd0 v)=1
(d0 ,dv)=1

and
X µ(r) X 1 X µ(r)
= .
r d00 r
r≤x/dd0 d00 ≤x/dd0 r≤x/dd0 d00
(r,dd0 v)=1 d00 |(dd0 v)∞
86 CHAPTER 5. TYPE II SUMS

Hence
 2
 
X (µ(d))2   X 1 X 1

0 00 

|g(x)| ≤ 
0 σ(d0 )
f (x/dd d ) , (5.21)
(σ(d))2 
 d0 ≤x/d d d00
d≤x
 02 d00 ≤x/dd0 

(d,v)=1 µ(d ) =1 d00 |(dd0 v)∞
(d0 ,dv)=1

P
where f (t) = r≤t µ(r)/r .
We intend to bound the function f (t) by a linear combination of terms of the form
t−δ , δ ∈ [0, 1/2). Thus it makes sense now to estimate Fv (s1 , s2 , x), defined to be the
quantity
 
X (µ(d))2  X µ(d01 )2 X 1 
0 00 1−s1 
· (dd d )

1 1
(σ(d))2 d01 σ(d01 ) d001
 
d01 d00 0
 ∞

d 1 |(dd1 v)
(d,v)=1 (d01 ,dv)=1
 
 X µ(d02 )2 X 1 
0 00 1−s2 
· (dd d ) .

2 2
d02 σ(d02 ) d002

d02 d00 0
 ∞

2 |(dd2 v)
0
(d2 ,dv)=1

for s1 , s2 ∈ [1/2, 1]. This is equal to


X µ(d)2 Y 1
d s +s −1 2 −s
Q 1 −s2 )
d
1 2
p|d
(1 + p ) (1 − p 1 ) p|v (1−p−s1 )(1−p −s2 ) (1 − p

(d,v)=1
 
 X µ(d0 )2 Y 1 
· 0 s +1 0−1 0−s

 (d ) 1 (1 + p ) (1 − p 1 )
d0 0 0
p |d
(d0 ,dv)=1
 
 X µ(d0 )2 Y 1 
· 0 s +1 0−1 0−s
,
 (d ) 2 (1 + p ) (1 − p 2 ) 
d0 0 0
p |d
(d0 ,dv)=1

which in turn can easily be seen to equal


Y p−s1 p−s2
Y
1
1+
(1 − p−s1 + p−1 )(1 − p−s2 + p−1 ) (1 − p−s1 )(1 − p−s2 )
p-v p|v
−1 −s
(5.22)
p−1 p−s2
   
Y p p 1 Y
· 1+ · 1+
(1 + p−1 )(1 − p−s1 ) (1 + p−1 )(1 − p−s2 )
p-v p-v
5.1. THE SUM S1 : CANCELLATION 87

Now, for any 0 < x ≤ y ≤ x1/2 < 1,

(1+x−y)(1−xy)(1−xy 2 )−(1+x)(1−y)(1−x3 ) = (x−y)(y 2 −x)(xy−x−1)x ≤ 0,

and so
xy (1 + x − y)(1 − xy)(1 − xy 2 ) (1 − x3 )
1+ = ≤ .
(1 + x)(1 − y) (1 + x)(1 − y)(1 − xy)(1 − xy 2 ) (1 − xy)(1 − xy 2 )
(5.23)
For any x ≤ y1 , y2 < 1 with y12 ≤ x, y22 ≤ x,

y1 y2 (1 − x3 )2 (1 − x4 )
1+ ≤ . (5.24)
(1 − y1 + x)(1 − y2 + x) (1 − y1 y2 )(1 − y1 y22 )(1 − y12 y2 )

This can be checked as follows: multiplying by the denominators and changing vari-
ables to x, s = y1 + y2 and r = y1 y2 , we obtain an inequality where the left side,
quadratic on s with positive leading coefficient, must be less than or equal to the right
side, which is linear on s. The left side minus the right side can be maximal for given
x, r only
√ when s is maximal or minimal. This happens when y1 = y2 or when either
yi = x or yi = x for at least one of i = 1, 2. In each of these cases, we have re-
duced (5.24) to an inequality in two variables that can be proven automatically1 by a
quantifier-elimination program; the author has used QEPCAD [HB11] to do this.
Hence Fv (s1 , s2 , x) is at most
Y (1 − p−3 )2 (1 − p−4 ) Y 1
·
(1 − p−s1 −s2 )(1 − p−2s1 −s2 )(1 − p−s1 −2s2 ) (1 − p−s1 )(1 − p−s2 )
p-v p|v
−3
Y 1−p Y 1 − p−3
·
(1 + p−s1 −1 )(1 + p−2s1 −1 ) (1 + p−s2 −1 )(1
+ p−2s2 −1 )
p-v p-v

ζ(s1 + 1)ζ(s2 + 1)ζ(2s1 + 1)ζ(2s2 + 1)


= Cv,s1 ,s2 · ,
ζ(3)4 ζ(4)(ζ(s1 + s2 )ζ(2s1 + s2 )ζ(s1 + 2s2 ))−1
(5.25)
where Cv,s1 ,s2 equals 1 if v = 1, and

(1 − 2−s1 −2s2 )(1 + 2−s1 −1 )(1 + 2−2s1 −1 )(1 + 2−s2 −1 )(1 + 2−2s2 −1 )
(1 − 2−s1 +s2 )−1 (1 − 2−2s1 −s2 )−1 (1 − 2−s1 )(1 − 2−s2 )(1 − 2−3 )4 (1 − 2−4 )

if v = 2.
For 1 ≤ t ≤ x, (2.21) and (2.24) imply
q
 2 if x ≤ 1010
t
f (t) ≤ q log 1010 (5.26)
 2 + 0.03 x  loglogx−log 1010
if x > 1010 ,
t log x t

1 In √
practice, the case yi = x leads to a polynomial of high degree, and quantifier elimination increases
sharply in complexity as the degree increases; a stronger inequality of lower degree (with (1 − 3x3 ) instead
of (1 − x3 )2 (1 − x4 )) was given to QEPCAD to prove in this case.
88 CHAPTER 5. TYPE II SUMS

where we are using the fact that log x is convex-down. Note that, again by convexity,
log log x − log log 1010 1
10
< (log t)0 |t=log 1010 = = 0.0434294 . . .
log x − log 10 log 1010
p
Obviously, 2/t in (5.26) can be replaced by (2/t)1/2− for any  ≥ 0.
By (5.21) and (5.26),
 1−2
2
|gv (x)| ≤ Fv (1/2 + , 1/2 + , x)
x
for x ≤ 1010 . We set  = 1/ log x and obtain from (5.25) that
ζ(1 + 2)ζ(3/2)4 ζ(2)2
Fv (1/2 + , 1/2 + , x) ≤ Cv, 21 +, 12 +
ζ(3)4 ζ(4)
  (5.27)
log x
≤ 55.768 · Cv, 12 +, 21 + · 1 + ,
2
where we use the easy bound ζ(s) < 1 + 1/(s − 1) obtained by
X Z ∞
ns < 1 + ts dt.
1

(For sharper bounds, see [BR02].) Now


(1 − 2−3/2− )2 (1 + 2−3/2 )2 (1 + 2−2 )2 (1 − 2−1−2 )
C2, 21 +, 12 + ≤
(1 − 2−1/2 )2 (1 − 2−3 )4 (1 − 2−4 )
≤ 14.652983,
whereas C1, 21 +, 21 + = 1. (We are assuming x ≥ 106 , and so  ≤ 1/(log 106 ).) Hence
(
1
(111.536 + 55.768 log x) if v = 1,
|gv (x)| ≤ x1
x (1634.34 + 817.168 log x) if v = 2.

for 106 ≤ x < 1010 .


For general x, we must use the second bound in (5.26). Define c = 1/(log 1010 ).
We see that, if x > 1010 ,
0.032
|gv (x)| ≤ F1 (1 − c, 1 − c) · Cv,1−c,1−c
(log x)2

2 0.03
+2· √ F (1 − c, 1/2) · Cv,1−c,1/2
x log x
1
+ (111.536 + 55.768 log x) · Cv, 12 +, 12 + .
x
For v = 1, this gives
0.0044325 2.1626 1
|g1 (x)| ≤ +√ + (111.536 + 55.768 log x)
(log x)2 x log x x
0.0044325 0.1079
≤ + √ ;
(log x)2 x
5.1. THE SUM S1 : CANCELLATION 89

for v = 2, we obtain
0.038128 25.607 1
|g2 (x)| ≤ 2
+√ + (1634.34 + 817.168 log x)
(log x) x log x x
0.038128 0.2046
≤ + √ .
(log x)2 x

5.1.3 Estimating the triple sum


We will now be able to bound the triple sum in (5.19), viz.,
X 1Z 1
gv (uS/s)du, (5.28)
s 1/2
s≤S
(s,v)=1

where gv is as in Lemma 5.1.2.


As we will soon see, Lemma 5.1.2 that (5.28) is bounded by a constant (essentially
R 1/2
because the integral 0 1/t(log t)2 converges). We must give as good a constant as
we can, since it will affect the largest term in the final result.
Clearly gv (R) = gv (bRc). The contribution of each gv (m), 1 ≤ m ≤ S, to (5.28)
is exactly gv (m) times
1 1 1 (m+1)s/S
X Z X Z
1du + 1du
S S
s ms/S S S
s ms/S
m+1 <s≤ m 2m <s≤ m+1
(s,v)=1 (s,v)=1
Z (m+1)s/S  
X 1 X 1 m
+ du = −
S S
s 1/2 S S
s S (5.29)
2(m+1)
<s≤ 2m m+1 <s≤ m
(s,v)=1 (s,v)=1
 
X 1 X m+1 1
+ + − .
S S
S S S
S 2s
2m <s≤ m+1 2(m+1)
<s≤ 2m
(s,v)=1 (s,v)=1

Write f (t) = 1/S for S/2m < t ≤ S/(m+1), f (t) = 0 for t > S/m or t < S/2(m+
1), f (t) = 1/t − m/S for S/(m + 1) < t ≤ S/m P and f (t) = (m + 1)/S − 1/2t for
S/2(m + 1) < t ≤ S/2m; then (5.29) equals n:(n,v)=1 f (n). By Euler-Maclaurin
(second order),
Z ∞ Z ∞  
X 1 00 ∗ 1 00
f (n) = f (x) − B2 ({x})f (x)dx = f (x) + O |f (x)| dx
n −∞ 2 −∞ 12
Z ∞     
1 ∗ 0 3 0 s
= f (x)dx + · O f + f
−∞ 6 2m m+1
   2  2 !
1 1 1 ∗ 2m m+1
= log 1 + + ·O + .
2 m 6 s s
(5.30)
90 CHAPTER 5. TYPE II SUMS

Similarly,

d2 f (2x + 1)
Z
X 1
f (n) = f (2x + 1) − B2 ({x}) dx
−∞ 2 dx2
n odd
1 ∞
Z ∞  
x−1
Z
1
= f (x)dx − 2 B2 f 00 (x)dx
2 −∞ −∞ 2 2
1 ∞ 1 ∞ ∗ 00
Z Z
= f (x)dx + O (|f (x)|) dx
2 −∞ 6 −∞
   2  2 !
1 1 1 ∗ 2m m+1
= log 1 + + ·O + .
4 m 3 s s

We use these expressions for m ≤ C0 , where C0 ≥ 33 is a constant to be computed


later; they will give us the main term. For m > C0 , we use the bounds on |g(m)| that
Lemma 5.1.2 gives us.
(Starting now and for the rest of the paper, we will focus on the cases v = 1,
v = 2 when giving explicit computational estimates. All of our procedures would
allow higher values of v as well, but, as will become clear much later, the gains from
higher values of v are offset by losses and complications elsewhere.)
Let us estimate (5.28). Let
( (
1/6 if v = 1, 1 if v = 1,
cv,0 = cv,1 =
1/3 if v = 2, 2.5 if v = 2,

( (
55.768 . . . if v = 1, 111.536 . . . if v = 1,
cv,2 = cv,3 =
817.168 . . . if v = 2, 1634.34 . . . if v = 2,
( (
0.0044325 . . . if v = 1, 0.1079 . . . if v = 1,
cv,4 = cv,5 =
0.038128 . . . if v = 2, 0.2046 . . . if v = 2.

Then (5.28) equals

5m2 + 2m + 1
    
X φ(v) 1 ∗
gv (m) · log 1 + + O cv,0
2v m S2
m≤C0

1 1 ∗ cv,1
X Z  
+ O du
s 1/2 uS/s
S/106 ≤s<S/C0

1 1 ∗ cv,2 log(uS/s) + cv,3


X Z  
+ O du
s 1/2 uS/s
S/1010 ≤s<S/106
!
X 1Z 1 cv,4 cv,5

+ O +p du,
10
s 1/2 (log uS/s)2 uS/s
s<S/10
5.1. THE SUM S1 : CANCELLATION 91

which is
5m2 + 2m + 1
   
X φ(v) 1 X

gv (m) · log 1 + + |g(m)| · O cv,0
2v m S2
m≤C0 m≤C0
√ !
∗ log 2 log 2 6
 2− 2
+ O cv,1 + cv,3 + cv,2 (1 + log 10 ) + cv,5
C0 106 1010/2
 
X c v,4 /2
+ O∗  
10
s(log S/2s)2
s<S/10

P 1
R 2/1010 1
for S ≥ (C0 + 1). Note that s<S/1010 s(log S/2s)2 = 0 t(log t)2 dt.
Now
10
(
cv,4 2/10
Z
1 cv,4 /2 0.00009923 . . . if v = 1
dt = =
2 0 t(log t)2 log(1010 /2) 0.000853636 . . . if v = 2.

and
√ (
log 2 6
 2− 2 0.0006506 . . . if v = 1
cv,3 + cv,2 (1 + log 10 ) + cv,5 =
106 105 0.009525 . . . if v = 2.

For C0 = 10000,
  (
φ(v) 1 X 1 0.362482 . . . if v = 1,
gv (m) · log 1 + =
v 2 m 0.360576 . . . if v = 2,
m≤C0
(
X 6204066.5 . . . if v = 1,
cv,0 |gv (m)|(5m2 + 2m + 1) ≤
m≤C
15911340.1 . . . if v = 2,
0

and (
0.00006931 . . . if v = 1,
cv,1 · (log 2)/C0 =
0.00017328 . . . if v = 2.
Thus, for S ≥ 100000,
(
X 1Z 1 0.36393 if v = 1,
gv (uS/s)du ≤ (5.31)
s 1/2 0.37273 if v = 2.
s≤S
(s,v)=1

For S < 100000, we proceed as above, but using the exact expression (5.29) instead
of (5.30). Note (5.29) is of the form fs,m,1 (S) + fs,m,2 (S)/S, where both fs,m,1 (S)
and fs,m,2 (S) depend only on bSc (and on s and m). Summing over m ≤ S, we obtain
a bound of the form
X 1Z 1
gv (uS/s)du ≤ Gv (S)
s 1/2
s≤S
(s,v)=1
92 CHAPTER 5. TYPE II SUMS

with
Gv (S) = Kv,1 (|S|) + Kv,2 (|S|)/S,
where Kv,1 (n) and Kv,2 (n) can be computed explicitly for each integer n. (For exam-
ple, Gv (S) = 1 − 1/S for 1 ≤ S < 2 and Gv (S) = 0 for S < 1.)
It is easy to check numerically that this implies that (5.31) holds not just for S ≥
100000 but also for 40 ≤ S < 100000 (if v = 1) or 16 ≤ S < 100000 (if v =
RT
2). Using the fact that Gv (S) is non-negative, we can compare 1 Gv (S)dS/S with
log(T +1/N ) for each T ∈ [2, 40]∩ N1 Z (N a large integer) to show, again numerically,
that (
Z T
dS 0.3698 log T if v = 1,
Gv (S) ≤ (5.32)
1 S 0.37273 log T if v = 2.
(We use N = 100000 for v = 1; already N = 10 gives us the answer above for
v = 2. Indeed, computations suggest the better bound 0.358 instead of 0.37273; we
are committed to using 0.37273 because of (5.31).)
Multiplying by 6v/π 2 σ(v), we conclude that

x3/2
 
x  x 
S1 (U, W ) = · H1 + O∗ 5.08ζ(3/2)3 3/2 (5.33)
W WU W U

if v = 1,

x3/2
 
x  x 
S1 (U, W ) = · H2 + O∗ 1.27ζ(3/2)3 3/2 (5.34)
W WU W U

if v = 2, where
( (
6 4
π 2 G1 (S) if 1 ≤ S < 40, π 2 G2 (S) if 1 ≤ S < 16,
H1 (S) = H2 (s) =
0.22125 if S ≥ 40, 0.15107 if S ≥ 16.
(5.35)
Hence (by (5.32))
(
Z T
dS 0.22482 log T if v = 1,
Hv (S) ≤ (5.36)
1 S 0.15107 log T if v = 2;

moreover,
3 2
H1 (S) ≤ , H2 (S) ≤ (5.37)
π2 π2
for all S.

***

Note. There is another way to obtain cancellation on µ, applicable when (x/W ) >
U q (as is unfortunately never the case in our main application). For this alternative
to be taken, one must either apply Cauchy-Schwarz on n rather than m (resulting in
exponential sums over m) or lump together all m near each other and in the same
5.2. THE SUM S2 : THE LARGE SIEVE, PRIMES AND TAILS 93

congruence class modulo q before applying Cauchy-Schwarz on m (one can indeed do


this if δ is small). We could then write
X X X X X
µ(d) = − µ(d) = − µ(d)(W/qd + O(1))
m∼W d|m m∼W d|m d≤U
m≡r mod q d>U m≡r mod q d≤U

and obtain cancellation on d. If U q ≥ (x/W ), however, the error term dominates.

5.2 The sum S2 : the large sieve, primes and tails


We must now bound
2
X X
0 0
S2 (U , W , W ) = (log p)e(αmp) . (5.38)
x
U 0 <m≤ W W 0 <p≤W
(m,v)=1

for U 0 = max(U, x/2W ), W 0 = max(V, W/2). (The condition (p, v) = 1 will be


fulfilled automatically by the assumption V > v.)
From a modern perspective, this is clearly a case for a large sieve. It is also clear that
we ought to try to apply a large sieve for sequences of prime support. What is subtler
here is how to do things well for very large q (i.e., x/q small). This is in some sense a
dual problem to that of q small, but it poses additional complications; for example, it is
not obvious how to take advantage of prime support for very large q.
As in type I, we avoid this entire issue by forbidding q large and then taking advan-
tage of the error term δ/x in the approximation α = aq + xδ . This is one of the main
innovations here. Note this alternative method will allow us to take advantage of prime
support.
A key situation to study is that of frequencies αi clustering around given rationals
a/q while nevertheless keeping at a certain small distance from each other.
Lemma 5.2.1. Let q ≥ 1. Let α1 , α2 , . . . , αk ∈ R/Z be of the form αi = ai /q + υi ,
0 ≤ ai < q, where the elements υi ∈ R all lie in an interval of length υ > 0, and where
ai = aj implies |υi − υj | > ν > 0. Assume ν + υ ≤ 1/q. Then, for any W, W 0 ≥ 1,
W 0 ≥ W/2,
2
k  
X X 2q 1
(log p)e(αi p) ≤ min 1,
i=1 W 0 <p≤W
φ(q) log ((q(ν + υ))−1 )
(5.39)
 X
· W − W 0 + ν −1 (log p)2 .
W 0 <p≤W

Proof. For any distinct i, j, the angles αi , αj are separated by at least ν (if ai = aj ) or
at least 1/q − |υi − υj | ≥ 1/q − υ ≥ ν (if ai 6= aj ). Hence we can apply the large sieve
(in the optimal N + δ −1 − 1 form due to Selberg [Sel91] and Montgomery-Vaughan
[MV74]) and obtain the bound in (5.39) with 1 instead of min(1, . . . ) immediately.
94 CHAPTER 5. TYPE II SUMS

We can also apply Montgomery’s inequality ([Mon68], [Hux72]; see the exposi-
tions in [Mon71, pp. 27–29] and [IK04, §7.4]). This gives us that the left side of (5.39)
is at most

 −1
2
k
 X (µ(r))2  X X X X
0
  (log p)e((αi + a /r)p) (5.40)
 φ(r) 
r≤R r≤R a0 mod r i=1 W 0 <p≤W
(r,q)=1 (r,q)=1 (a0 ,r)=1

If we add all possible fractions of the form a0 /r, r ≤ R, (r, q) = 1, to the fractions
ai /q, we obtain fractions that are separated by at least 1/qR2 . If ν + υ ≥ 1/qR2 , then
the resulting angles αi + a0 /r are still
p separated by at least ν. Thus we can apply the
large sieve to (5.40); setting R = 1/ (ν + υ)q, we see that we gain a factor of

X (µ(r))2 φ(q) X (µ(r))2 φ(q) X 1 φ(q)


log (q(ν + υ))−1 ,

≥ ≥ ≥
φ(r) q φ(r) q d 2q
r≤R r≤R d≤R
(r,q)=1

P (5.41)
since d≤R 1/d ≥ log(R) for all R ≥ 1 (integer or not).

Let us first give a bound on sums of the type of S2 (U, V, W ) using prime sup-
port but not the error terms (or Lemma 5.2.1). This is something that can be done
very well using tools available in the literature. (Not all of these tools seem to be
known as widely as they should be.) Bounds (5.42) and (5.44) are completely standard
large-sieve bounds. To obtain the gain of a factor of log in (5.43), we use a lemma
of Montgomery’s, for whose modern proof (containing an improvement by Huxley)
we refer to the standard source [IK04, Lemma 7.15]. The purpose of Montgomery’s
lemma is precisely to gain a factor of log in applications of the large sieve to sequences
supported on the primes. To use the lemma efficiently, we apply Montgomery and
Vaughan’s large sieve with weights [MV73, (1.6)], rather than more common forms of
the large sieve. (The idea – used in [MV73] to prove an improved version of the Brun-
Titchmarsh inequality – is that Farey fractions (rationals with bounded denominator)
are not equidistributed; this fact can be exploited if a large sieve with weights is used.)

Lemma 5.2.2. Let W ≥ 1, W 0 ≥ W/2. Let α = a/q + O∗ (1/qQ), q ≤ Q. Then

2
X X
(log p)e(αmp)
A0 <m≤A1 W 0 <p≤W (5.42)
 
A1 − A0 X
≤ · (W − W 0 + 2q) (log p)2 .
min(q, dQ/2e)
W 0 <p≤W
5.2. THE SUM S2 : THE LARGE SIEVE, PRIMES AND TAILS 95

If q < W/2 and Q ≥ 3.5W , the following bound also holds:


2
X X
(log p)e(αmp)
A0 <m≤A1 W 0 <p≤W (5.43)
 
A1 − A0 q W X
≤ · · (log p)2 .
q φ(q) log(W/2q)
W 0 <p≤W

If A1 − A0 ≤ %q and q ≤ ρQ, %, ρ ∈ [0, 1], the following bound also holds:


2
X X
(log p)e(αmp)
A0 <m≤A1 W 0 <p≤W (5.44)
X
0 2
≤ (W − W + q/(1 − %ρ)) (log p) .
W 0 <p≤W

Proof. Let k = min(q, dQ/2e) ≥ dq/2e. We split (A0 , A1 ] into d(A1 −A0 )/ke blocks
of at most k consecutive integers m0 + 1, m0 + 2, . . . . For m, m0 in such a block, αm
and αm0 are separated by a distance of at least

|{(a/q)(m − m0 )}| − O∗ (k/qQ) = 1/q − O∗ (1/2q) ≥ 1/2q.

By the large sieve


2
q
X X X
(log p)e(α(m0 + a)p) ≤ ((W −W 0 )+2q) (log p)2 . (5.45)
a=1 W 0 <p≤W W 0 <p≤W

We obtain (5.42) by summing over all d(A1 − A0 )/ke blocks.


If A1 − A0 ≤ |%q| and q ≤ ρQ, %, ρ ∈ [0, 1], we obtain (5.44) simply by applying
the large sieve without splitting the interval A0 < m ≤ A1 .
Let us now prove (5.43). We will use Montgomery’s inequality, followed by Mont-
gomery and Vaughan’s large sieve with weights. An angle a/q + a01 /r1 is separated
from other angles a0 /q + a02 /r2 (r1 , r2 ≤ R, (ai , ri ) = 1) by at least 1/qr1 R, rather
than just 1/qR2 . We will choose R so that qR2 < Q; this implies 1/Q < 1/qR2 ≤
1/qr1 R.
P of Montgomery’s [IK04, Lemma 7.15], applied (for each 1 ≤ a ≤ q)
By a lemma
to S(α) = n an e(αn) with an = log(n)e(α(m0 + a)n) if n is prime and an = 0
otherwise,
2
1 X
(log p)e(α(m0 + a)p)
φ(r)
W 0 <p≤W

 2 (5.46)
a0
X X 
≤ (log p)e α (m0 + a) + p .
r
a0 mod r W 0 <p≤W
(a0 ,r)=1
96 CHAPTER 5. TYPE II SUMS

for each square-free r ≤ W 0 . We multiply both sides of (5.46) by


 −1 !−1
W 3 1 1
+ −
2 2 qrR Q

and sum over all a = 0, 1, . . . , q − 1 and all square-free r ≤ R coprime to q; we will


later make sure that R ≤ W 0 . We obtain that
−1 !−1
µ(r)2

X W 3 1 1
+ −
2 2 qrR Q φ(r)
r≤R
(r,q)=1
(5.47)
2
q
X X
· (log p)e(α(m0 + a)p)
a=1 W 0 <p≤W

is at most
 −1 !−1
X W 3 1 1
+ −
2 2 qrR Q
r≤R
(r,q)=1
r sq-free (5.48)
2
q
a0
X X X   
(log p)e α (m0 + a) + p
a=1 a0 mod r W 0 <p≤W
r
(a0 ,r)=1

We now apply the large sieve with weights [MV73, (1.6)], recalling that each angle
α(m0 + a) + a0 /rPis separated from the others by at least 1/qrR − 1/Q; we obtain that
(5.48) is at most W 0 <p≤W (log p)2 . It remains to estimate the sum in the first line of
(5.47). (We are following here a procedure analogous to that used in [MV73] to prove
the Brun-Titchmarsh theorem.)
Assume first that q ≤ W/13.5. Set
 1/2
W
R= σ , (5.49)
q
where σ = 1/2e2·0.25068 = 0.30285 . . . . It is clear that qR2 < Q, q < W 0 and R ≥ 2.
Moreover, for r ≤ R,
1 1 σ 1 σ 1 σ/3.5
≤ ≤ = ≤ .
Q 3.5W 3.5 σW 3.5 qR2 qrR
Hence
 −1
W 3 1 1 W 3 qrR W 3r W
+ − ≤ + = + σ
 · 2σ
2 2 qrR Q 2 2 1 − σ/3.5 2 2 1 − 3.5 R 2
   
W 3σ rW W rW
= 1+ < 1+
2 1 − σ/3.5 R 2 R
5.2. THE SUM S2 : THE LARGE SIEVE, PRIMES AND TAILS 97

and so
−1 !−1
µ(r)2

X W 3 1 1
+ −
2 2 qrR Q φ(r)
r≤R
(r,q)=1

2 X µ(r)2 2 φ(q) X µ(r)2


≥ (1 + rR−1 )−1 ≥ (1 + rR−1 )−1 .
W φ(r) W q φ(r)
r≤R r≤R
(r,q)=1

For R ≥ 2,
X µ(r)2
(1 + rR−1 )−1 > log R + 0.25068;
φ(r)
r≤R

this is true for R ≥ 100 by [MV73, Lemma 8] and easily verifiable numerically for
2 ≤ R < 100. (It suffices to verify this for R integer with r < R instead of r ≤ R, as
that is the worst case.)
Now  
1 W 1 W
log R = log + log 2σ = log − 0.25068.
2 2q 2 2q
Hence
X µ(r)2 1 W
(1 + rR−1 )−1 > log
φ(r) 2 2q
r≤R

and the statement follows.


Now consider the case q > W/13.5. If q is even, then, in this range, inequality
(5.42) is always better than (5.43), and so we are done. Assume, then, that W/13.5 <
q ≤ W/2 and q is odd. We set R = 2; clearly qR2 < W ≤ Q and q < W/2 ≤ W 0 ,
and so this choice of R is valid. It remains to check that
1 1 1 W
−1 + −1 ≥ log .
W 2q
 
W 3 1 1 W 3 1 1
2 + 2 2q − Q 2 + 2 4q − Q

This follows because


1 1 t
−1 + −1 ≥ log
1
+ 3 t
− 1 1
+ 3 t
− 1 2
2 2 2 3.5 2 2 4 3.5

for all 2 ≤ t ≤ 13.5.

We need a version of Lemma 5.2.2 with m restricted to the odd numbers, since we
plan to set the parameter v equal to 2.
98 CHAPTER 5. TYPE II SUMS

Lemma 5.2.3. Let W ≥ 1, W 0 ≥ W/2. Let 2α = a/q + O∗ (1/qQ), q ≤ Q. Then


2
X X
(log p)e(αmp)
A0 <m≤A1 W 0 <p≤W
m odd (5.50)
 
A1 − A0 X
≤ · (W − W 0 + 2q) (log p)2 .
min(2q, Q)
W 0 <p≤W

If q < W/2 and Q ≥ 3.5W , the following bound also holds:


2
X X
(log p)e(αmp)
A0 <m≤A1 W 0 <p≤W
m odd (5.51)
 
A1 − A0 q W X
≤ · · (log p)2 .
2q φ(q) log(W/2q)
W 0 <p≤W

If A1 − A0 ≤ 2%q and q ≤ ρQ, %, ρ ∈ [0, 1], the following bound also holds:
2
X X
(log p)e(αmp)
A0 <m≤A1 W 0 <p≤W (5.52)
X
0 2
≤ (W − W + q/(1 − %ρ)) (log p) .
W 0 <p≤W

Proof. We follow the proof of Lemma 5.2.2, noting the differences. Let

k = min(q, dQ/2e) ≥ dq/2e,

just as before. We split (A0 , A1 ] into d(A1 − A0 )/ke blocks of at most 2k consecutive
integers; any such block contains at most k odd numbers. For odd m, m0 in such a
block, αm and αm0 are separated by a distance of
m − m0
 
|{α(m − m0 )}| = 2α = |{(a/q)k}| − O∗ (k/qQ) ≥ 1/2q.
2
We obtain (5.50) and (5.52) just as we obtained (5.42) and (5.44) before. To obtain
(5.51), proceed again as before, noting that the angles we are working with can be
labelled as α(m0 + 2a), 0 ≤ a < q.
The idea now (for large δ) is that, if δ is not negligible, then, as m increases and
αm loops around the circle R/Z, αm roughly repeats itself every q steps – but with a
slight displacement. This displacement gives rise to a configuration to which Lemma
5.2.1 is applicable. The effect is that we can apply the large sieve once instead of many
times, thus leading to a gain of a large factor (essentially, the number of times the large
sieve would have been used). This is how we obtain the factor of |δ| in the denominator
of the main term x/|δ|q in (5.56) and (5.57).
5.2. THE SUM S2 : THE LARGE SIEVE, PRIMES AND TAILS 99

Proposition 5.2.4. Let x ≥ W ≥ 1, W 0 ≥ W/2, U 0 ≥ x/2W . Let Q ≥ 3.5W . Let


2α = a/q + δ/x, (a, q) = 1, |δ/x| ≤ 1/qQ, q ≤ Q. Let S2 (U 0 , W 0 , W ) be as in
(5.38) with v = 2.
For q ≤ ρQ, where ρ ∈ [0, 1],
   
0 0 x x W X
S2 (U , W , W ) ≤ max(1, 2ρ) + + + 2q · (log p)2
8q 2W 2 0 W <p≤W
(5.53)
If q < W/2,
 
x 1 q W X
S2 (U 0 , W 0 , W ) ≤ + · (log p)2 .
4φ(q) log(W/2q) φ(q) log(W/2q)
W 0 <p≤W
(5.54)
If W > x/4q, the following bound also holds:
 
0 0 W q X
S2 (U , W , W ) ≤ + (log p)2 . (5.55)
2 1 − x/4W q
W 0 <p≤W

If δ 6= 0 and x/4W + q ≤ x/|δ|q,


 
2q/φ(q)
S2 (U 0 , W 0 , W ) ≤ min 1,  
x x −1

log |δq| q+ 4W
  (5.56)
x W X
2
· + (log p) .
|δq| 2
W 0 <p≤W

Lastly, if δ 6= 0 and q ≤ ρQ, where ρ ∈ [0, 1),


 
x W x x X
S2 (U 0 , W 0 , W ) ≤ + + + (log p)2 .
|δq| 2 8(1 − ρ)Q 4(1 − ρ)W
W 0 <p≤W
(5.57)
The trivial bound would be in the order of
X
S2 (U 0 , W 0 , W ) = (x/2 log x) (log p)2 .
W 0 <p≤W

In practice, (5.55) gets applied when W ≥ x/q.


Proof. Let us first prove statements (5.54) and (5.53), which do not involve δ. Assume
first q ≤ W/2. Then, by (5.51) with A0 = U 0 , A1 = x/W ,

x/W − U 0
 
q W X
S2 (U 0 , W 0 , W ) ≤ +1 (log p)2 .
2q φ(q) log(W/2q) 0
W <p≤W

Clearly (x/W − U 0 )W ≤ (x/2W ) · W = x/2. Thus (5.54) holds.


100 CHAPTER 5. TYPE II SUMS

Assume now that q ≤ ρQ. Apply (5.50) with A0 = U 0 , A1 = x/W . Then


x/W − U 0
  X
0 0 0
S2 (U , W , W ) ≤ + 1 (W − W + 2q) (log p)2 .
q · min(2, ρ−1 ) 0W <p≤W

Now
x/W − U 0
 
+ 1 · (W − W 0 + 2q)
q · min(2, ρ−1 )
x  W − W0 x 
≤ − U0 −1
+ max(1, 2ρ) − U 0 + W/2 + 2q
W q min(2, ρ ) W
x/4 x
≤ + max(1, 2ρ) + W/2 + 2q.
q min(2, ρ−1 ) 2W
This implies (5.53).
If W > x/4q, apply (5.44) with % = x/4W q, ρ = 1. This yields (5.55).
Assume now that δ 6= 0 and x/4W + q ≤ x/|δq|. Let Q0 = x/|δq|. For any m1 ,
m2 with x/2W < m1 , m2 ≤ x/W , we have |m1 − m2 | ≤ x/2W ≤ 2(Q0 − q), and
so
m1 − m2 1
· δ/x + qδ/x ≤ Q0 |δ|/x = . (5.58)
2 q
The conditions of Lemma 5.2.1 are thus fulfilled with υ = (x/4W ) · |δ|/x and ν =
|δq|/x. We obtain that S2 (U 0 , W 0 , W ) is at most
 
2q 1 0 −1
 X
min 1, W − W + ν (log p)2 .
φ(q) log ((q(ν + υ))−1 ) 0
W <p≤W
0 −1 0
Here W − W + ν = W − W + x/|qδ| ≤ W/2 + x/|qδ| and
 −1 
|δ| x −1
(q(ν + υ))−1 = q q+ .
x 4W
Lastly, assume δ 6= 0 and q ≤ ρQ. We let Q0 = x/|δq| ≥ Q again, and we split
the range U 0 < m ≤ x/W into intervals of length 2(Q0 − q), so that (5.58) still holds
within each interval. We apply Lemma 5.2.1 with υ = (Q0 − q) · |δ|/x and ν = |δq|/x.
We obtain that S2 (U 0 , W 0 , W ) is at most
 
x/W − U  X
1+ 0
W − W 0 + ν −1 (log p)2 .
2(Q − q) 0
W <p≤W

Here W − W 0 + ν −1 ≤ W/2 + x/q|δ| as before. Moreover,


     
W x x/W − U W 0 x/2W
+ 1+ ≤ + Q 1 +
2 q|δ| 2(Q0 − q) 2 2(1 − ρ)Q0
W x x
≤ + Q0 + 0
+
2 8(1 − ρ)Q 4W (1 − ρ)
x W x x
≤ + + + .
|δq| 2 8(1 − ρ)Q 4(1 − ρ)W
Hence (5.57) holds.
Chapter 6

Minor-arc totals

It is now time to make all of our estimates fully explicit, choose our parameters, put
our type I and type II estimates together and give our final minor-arc estimates.
Let x > 0 be given. Starting in section 6.3.1, we will assume that x ≥ x0 =
2.16 · 1020 . We will choose our main parameters U and V gradually, as the need arises;
we assume from the start that 2 · 106 ≤ V < x/4 and U V ≤ x.
We are also given an angle α ∈ R/Z. We choose an approximation 2α = a/q +
δ/x, (a, q) = 1, q ≤ Q, |δ/x| ≤ 1/qQ. √The parameter Q will be chosen later; we
assume from the start that Q ≥ max(16, 2 x) and Q ≥ max(2U, x/U ).
(Actually, U and V will be chosen in different ways depending on the size of q.
Actually, even Q will depend on the size of q; this may seem circular, but what actually
happens is the following: we will first set a value for Q depending only on x, and if
the corresponding value of q ≤ Q is larger than a certain parameter y depending on x,
then we reset U , V and Q, and obtain a new value of q.)
Let SI,1 , SI,2 , SII , S0 be as in (3.9), with the smoothing function η = η2 as in
(3.4). (We bounded the type I sums SI,1 , SI,2 for a general smoothing function η; it is
only here that we are specifying η.)
The term S0 is 0 because V < x/4 and η2 is supported on [−1/4, 1]. We set v = 2.

6.1 The smoothing function


For the smoothing function η2 in (3.4),

|η2 |1 = 1, |η20 |1 = 8 log 2, |η200 |1 = 48, (6.1)

as per [Tao14, (5.9)–(5.13)]. Similarly, for η2,ρ (t) = log(ρt)η2 (t), where ρ ≥ 4,

|η2,ρ |1 < log(ρ)|η2 |1 = log(ρ)


0
|η2,ρ |1 = 2η2,ρ (1/2) = 2 log(ρ/2)η2 (1/2) < (8 log 2) log ρ,
00 (6.2)
|η2,ρ |1 = 4 log(ρ/4) + |2 log ρ − 4 log(ρ/4)| + |4 log 2 − 4 log ρ|
+ | log ρ − 4 log 2| + | log ρ| < 48 log ρ.

101
102 CHAPTER 6. MINOR-ARC TOTALS

In the first inequality, we are using the fact that log(ρt) is always positive (and less than
log(ρ)) when t is in the support of η2 .
Write log+ x for max(log x, 0).

6.2 Contributions of different types


6.2.1 Type I terms: SI,1 .
The term SI,1 can be handled directly by Lemma 4.2.3, with ρ0 = 4 and D = U .
(Condition (4.38) is valid thanks to (6.2).) Since U ≤ Q/2, the contribution of SI,1
gets bounded by (4.40) and (4.41): the absolute value of SI,1 is at most

c0 /δ 2
 
x X µ(m) x x [ X µ(m)
min 1, log + |log ·η(−δ)|
q (2π)2 m mq q m
m≤ U
q m≤ U
q
(m,q)=1 (m,q)=1

ex √
 
2 c0 c1 q q q + 2U 3c1 x q U
+ U log + 3q log + log log + log log+ c2 x
π U c2 2 c2 q 2q c2 q
r   2 1/2

3c1 2x 2x c0 2c0 U e x 1
+ log + − 2 log +
2 c2 c2 2 π 4qx U e
0
 3 2
 3/2 √ √
2|η |1 c0 e q 20c0 c2 2 ex
+ q max 1, log log x + 2x log ,
π 4π|η 0 |1 x 3π 2 c2
(6.3)
where c0 = 31.521 (by Lemma B.2.3), c1 = 1.0000028 > 1 + (8 log 2)/V ≥ 1 +

(8 log 2)/(x/U ) and c2 = 6π/5 c0 = 0.67147 . . . . By (2.1) (with k = 2), (B.17) and
Lemma B.2.4,  
24 log 2
|log ·η(−δ)| ≤ min 2 − log 4, 2 2
[ .
π δ
By (2.20), (2.22) and (2.23), the first line of (6.3) is at most
! !
c00
 
x 4 q/φ(q) x q
min 1, 2 min , 1 log + 1.00303
q δ 5 log+ qU2 U φ(q)
!
c00
 
x 4 q/φ(q)
+ min 2 − log 4, 02 min ,1 ,
q δ 5 log+ qU2

where c00 = 0.798437 > c0 /(2π)2 , c000 = 1.685532. Clearly c000 /c0 > 1 > 2 − log 4.
Taking derivatives, we see that t 7→ (t/2) log(t/c2 ) log+ 2U/t takes its maxi-
mum (for t ∈ [1, 2U ]) when log(t/c2 ) log+ 2U/t = log t/c2 − log+ 2U/t; since
t → log t/c2 − log+ 2U/t is increasing on [1, 2U ], we conclude that

q q 2U 2U
log log+ ≤ U log .
2 c2 q c2
6.2. CONTRIBUTIONS OF DIFFERENT TYPES 103

Similarly, t 7→ t log(x/t) log+ (U/t) takes its maximum at a point t ∈ [0, U for which
log(x/t) log+ (U/t) = log(x/t) + log+ (U/t), and so
x q U U
log log+ c2 x ≤ (log x + log U ).
q c2 q c2

We conclude that
! !
c00
 
x 4q/φ(q)  x  q
|SI,1 | ≤ min 1, 2 min ,1 log + c3,I + c4,I
q δ 5 log+ qU2 U φ(q)
2 2
e1/2 x
  
q c11,I q U
+ c7,I log + c8,I log x max 1, log q + c10,I log
c2 x 4qx U


 
2U 2 ex c10,I
+ c5,I log + c6,I log xU U + c9,I x log + ,
c2 c2 e
(6.4)
where c2 and c00 are as above, c3,I = 2.11104 > c000 /c00 , c4,I = 1.00303, √ 5,I =
c

3.57422 > 2 c0 c1 /π, c6,I = 2.23389 > 3c1 /2c2 , c7,I = 6.19072 > 2 3c0 c1 /π,
c8,I = 3.53017 > 2(8 log 2)/π,
√ √ 3/2
3 2c1 20 2c0 c2
c9,I = 19.1568 > √ + ,
2 c2 3π 2

c10,I = 9.37301 > c0 (1/2 − 2/π 2 ) and c11,I = 9.0857 > c0 e3 /(4π · 8 log 2).

6.2.2 Type I terms: SI,2 .


The case q ≤ Q/V . If q ≤ Q/V , then, for v ≤ V ,
   
va v va 1
2vα = + O∗ = + O∗ ,
q Qq q q2
and so va/q is a valid approximation to 2vα. (Here we are using v to label an integer
variable bounded above by v ≤ V ; we no longer need v to label the quantity in (3.10),
since that has been set equal to the constant 2.) Moreover, for Qv = Q/v, we see that
2vα = (va/q) + O∗ (1/qQv ). If α = a/q + δ/x, then vα = va/q + δ/(x/v). Now
X X X
SI,2 = Λ(v) µ(m) e((vα) · mn)η(mn/(x/v)). (6.5)
v≤V m≤U n
v odd m odd n odd

We can thus estimate SI,2 by applying Lemma 4.2.2 to each inner double sum in (6.5).

We obtain that, if |δ| ≤ 1/2c2 , where c2 = 6π/5 c0 and c0 = 31.521, then |SI,2 | is
at most
 
 x/v    2 
X c0 X µ(m) c10,I q U
Λ(v)  min 1, + +1 (6.6)
 
 2qv (πδ)2 m 4x/v qv


v≤V m≤Mv /q
(m,2q)=1
104 CHAPTER 6. MINOR-ARC TOTALS

plus
√ √ !
X 2 c0 c+ 3c+ x + U c0 c+ + U
Λ(v) U+ log c2 x + qv log
π 2 vqv vqv π qv /2
v≤V

c11,I qv2
     
X 2 3c0 c+ 3c+ 55c0 c2
+ Λ(v) c8,I max log , 1 qv + + + qv ,
x/v π 2c2 6π 2
v≤V
(6.7)
where qv = q/(q, v), Mv ∈ [min(Q/2v, U ), U ] and c+ = 1 + (8 log 2)/(x/U V ); if
|δ| ≥ 1/2c2 , then |SI,2 | is at most (6.6) plus


   
X c0 c1 3c 1 (1 + ) + 2U x 35c 0 c2
Λ(v)  U+ 2 + log x/v  + qv 
π/2 2  Q 3π 2
v≤V |δ|qv

√ log+ x/v 2U
 

  
X c0 c1 x/v b |δ|qv c +1
+ Λ(v) (1 + ) min + 1, 2U  3 + 2 + 
π/2 |δ|qv 2
v≤V
(6.8)
P
Write SV = v≤V Λ(v)/(vqv ). By (2.12),

X Λ(v) X Λ(v)  (q, v) 1 


SV ≤ + −
vq v q q
v≤V v≤V
(v,q)>1
 

log V 1X  X 1 X 1  (6.9)
≤ + (log p) vp (q) + −
 
q q pα pα 


p|q α≥1 α≥1
pα+vp (q) ≤V pα ≤V

log V 1X log V q
≤ + (log p)vp (q) = .
q q q
p|q

This helps us to estimate (6.6). We could also use this to estimate the second term in
the first line of (6.7), but, for that purpose, it will actually be wiser to use the simpler
bound
X x U X U/c2 1.0004
Λ(v) log+ c2 x ≤ Λ(v) ≤ UV (6.10)
vqv vqv e ec2
v≤V v≤V

(by (2.14) and the fact that t log+ A/t takes its maximum at t = A/e).
We bound the sum over m in (6.6) by (2.20) and (2.22):

!
X µ(m) 4 q/φ(q)
≤ min .
m 5 log+ M
2q 2 , 1
v
m≤Mv /q
(m,2q)=1
6.2. CONTRIBUTIONS OF DIFFERENT TYPES 105

To bound the terms involving (U/qv + 1)2 , we use


X
Λ(v)v ≤ 0.5004V 2 (by (2.17)),
v≤V
X X X ,
Λ(v)v(v, q)j ≤ Λ(v)v + V Λ(v)(v, q)j
v≤V v≤V v≤V
(v,q)6=1

X X X X log V vp (q)
Λ(v)(v, q) ≤ (log p) pvp (q) ≤ (log p) p
log p
v≤V p|q 1≤α≤logp V p|q
(v,q)6=1
X
≤ (log V ) pvp (q) ≤ q log V
p|q

and X X X
Λ(v)(v, q)2 ≤ (log p) pvp (q)+α
v≤V p|q 1≤α≤logp V
(v,q)6=1
X
≤ (log p) · 2pvp (q) · plogp V ≤ 2qV log q.
p|q

Using (2.14) and (6.9) as well, we conclude that (6.6) is at most


  !
x c0 4 q/φ(q)
min 1, min , 1 log V q
2q (πδ)2 5 log+ min(Q/2V,U )
2q 2
 2 !
c10,I U
+ 0.5004V 2 q + 1 + 2U V q log V + 2U 2 V log V .
4x q

Assume Q ≤ 2U V /e. Using (2.14), (6.10), (2.18) and the inequality vq ≤ V q ≤ Q


(which implies q/2 ≤ U/e), we see that (6.7) is at most
 √  √ 
2 c0 c+ 3c+ c0 c+ U
1.0004 + UV + Q log
π 2ec2 π q/2
c11,I q 2 V
   
+ c5,I2 max log , 2 + c6,I2 Q,
x

where c5,I2 = 3.53312 > 1.0004 · c8,I and


 √ 
2 3c0 c+ 3c+ 55c0 c2
c6,I2 = 1.0004 + + . (6.11)
π 2c2 6π 2

The expressions in (6.8) get estimated similarly. The first line of (6.8) is at most
 √   
2 c0 c+ 3c+ 1+ 2U V |δ|q xV 35c0 c2
1.0004 UV + 2+ log+ + qV
π 2  x Q 3π 2
106 CHAPTER 6. MINOR-ARC TOTALS

by (2.14). Since q ≤ Q/V , we can obviously bound qV by Q. As for the second line
of (6.8) –

  
X x/v 1 2U
Λ(v) min + 1, 2U · log+ j k
|δ|qv 2 x/v
+1
v≤V |δ|qv
X U X U 1.0004
≤ Λ(v) max t log+ ≤ Λ(v) = U V,
t>0 t e e
v≤V v≤V

but

  
X x/v X
Λ(v) min + 1, 2U ≤ Λ(v) · 2U
|δ|qv
v≤V v≤ 2Ux|δ|q
 
X x/|δ| X X x/|δ| 1 1
+ Λ(v) + Λ(v) + Λ(v) −
x vq v qv q
2U |δ|q
<v≤V v≤V v≤V
(v,q)=1 (v,q)6=1
 
x x x 3
≤ 1.03883 + max log V − log + log √ , 0
|δ|q |δ|q 2U |δ|q 2
x 1X
+ 1.0004V + (log p)vp (q)
|δ| q
p|q
 
x + 6U V |δ|q
≤ 1.03883 + log q + log √ + 1.0004V
|δ|q 2x

by (2.12), (2.13), (2.14) and (2.15); we are proceeding much as in (6.9).


Let us collect our bounds. If |δ| ≤ 1/2c2 , then, assuming Q ≤ 2U V /e, we con-
clude that |SI,2 | is at most

  !
x c0 4/5
min 1, min , 1 log V q
2φ(q) (πδ)2 log+ 4VQq2
2  (6.12)
U 2V
  
x UV q 2 c10,I U V
+ c8,I2 1+ + q log V + log V
q x U 2 x x

plus

c11,I q 2 V
 
U
(c4,I2 +c9,I2 )U V +(c10,I2 log +c5,I2 max log , 2 +c12,I2 )·Q, (6.13)
q x
6.2. CONTRIBUTIONS OF DIFFERENT TYPES 107

where

c4,I2 = 3.57565(1 + 0 ) > 1.0004 · 2 c0 c+ /π,
c5,I2 = 3.53312 > 1.0004 · c8,I ,
c10,I
c8,I2 = 1.17257 > · 0.5004,
4
c9,I2 = 0.82214(1 + 20 ) > 3c+ · 1.0004/2ec2 ,
√ √
c10,I2 = 1.78783 1 + 20 > 1.0004 c0 c+ /π,
c12,I2 = 29.3333 + 11.9020
 √ 
3 2 3c0 √ 55c0 c2
> 1.0004 c+ + c+ + + 1.78783(1 + 0 ) log 2
2c2 π 6π 2
= c6,I2 + c10,I2 log 2

and c10,I = 9.37301 as before. Here 0 = (4 log 2)/(x/U V ), and c6,I2 is as in (6.11).
If |δ| ≥ 1/2c2 , then |SI,2 | is at most (6.12) plus
   
+ 6U V |δ|q x
(c4,I2 + (1 + )c13,I2 )U V + c c14,I2 log q + log √ + c15,I2
2x |δ|q
 
1+ 2U V |δ|q x
+ c16,I2 2 + log+ + c17,I2 Q + c · c4,I2 V,
 x Q/V
(6.14)
where √
2 c0 c+ 1.0004
c13,I2 = 1.31541(1 + 0 ) > · ,
√π e
√ 2 c0 c+
c14,I2 = 3.57422 1 + 20 > ,
√π
√ 2 c0 c+
c15,I2 = 3.71301 1 + 20 > · 1.03883,
π
c16,I2 = 1.5006(1 + 20 ) > 1.0004 · 3c+ /2
35c0 c2
c17,I2 = 25.0295 > 1.0004 · ,
3π 2
√ √
and c = (1 + ) 3 + 2. We recall that c2 = 6π/5 c0 = 0.67147 . . . . We will
choose  ∈ (0, 1) later; we also leave the task of bounding 0 for later.
The case q > Q/V . We use Lemma 4.2.4 in this case.

6.2.3 Type II terms.


As we showed in (5.1)–(5.5), SII (given in (5.1)) is at most
Z x/U Z x/U
p dW p dW
4 S1 (U, W ) · S2 (U, V, W ) +4 S1 (U, W ) · S3 (W ) , (6.15)
V W V W

where S1 , S2 and S3 are as in (5.4) and (5.5). We bounded S1 in (5.33) and (5.34), S2
in Prop. 5.2.4 and S3 in (5.5).
108 CHAPTER 6. MINOR-ARC TOTALS

Let us try to give some structure to the bookkeeping we must now inevitably do.
The second integral in (6.15) will be negligible (because S3 is); let us focus on the first
integral.
Thanks to our work in §5.1, the term S1 (U, W ) is bounded by a (small) constant
times x/W . (This represents a gain of several factors of log with respect to the trivial
bound.) We bounded S2 (U, V, W ) using the large sieve; we expected, and got, a bound

that is better than trivial by a factor of size roughly  q log x – the exact factor in
the bound depends on the value of W . In particular, it is only in the central part of the

range for W that we will really be able to save a factor of  q log x, as opposed to

just  q. We will have to be slightly clever in order to get a good total bound in the
end.

***

We first recall our estimate for S1 . In the whole range [V, x/U ] for W , we know
from (5.33), (5.34) and (5.37) that S1 (U, W ) is at most
r
2 x 3 x x/W U
2
+ κ0 ζ(3/2) , (6.16)
π W W U
where
κ0 = 1.27.
(We recall we are working with v = 2.)
We have better estimates for the constant in front in some parts of the range; in
what is usually the main part, (5.34) and (5.36) give us a constant of 0.15107 instead
of 2/π 2 . Note that 1.27ζ(3/2)3 = 22.6417 . . . . We should choose U , V so that the
first term in (6.16) dominates. For the while being, assume only
x
U ≥ 5 · 105 ; (6.17)
VU
then (6.16) gives
x
S1 (U, W ) ≤ κ1 , (6.18)
W
where
2 22.6418
κ1 = +p ≤ 0.2347.
π2 106 /2
This will suffice for our cruder estimates.
The second integral in (6.15) is now easy to bound. By (5.5),

S3 (W ) ≤ 1.0171x + 2.0341W ≤ 1.0172x,

since W ≤ x/U ≤ x/5 · 105 . Hence


x/U Z x/U
Z r
p dW x dW
4 S1 (U, W ) · S3 (W ) ≤4 κ1 · 1.0172x
V W V W W
x
≤ κ9 √ ,
V
6.2. CONTRIBUTIONS OF DIFFERENT TYPES 109

where √
κ9 = 8 · 1.0172 · κ1 ≤ 3.9086.
Let us now examine S2 , which was bounded in Prop. 5.2.4. We set the parameters
W 0 , U 0 as follows, in accordance with (5.4):

W 0 = max(V, W/2), U 0 = max(U, x/2W ).

Since W 0 ≥ W/2 and W ≥ V > 117, we can always bound


X 1
(log p)2 ≤ W (log W ). (6.19)
2
W 0 <p≤W

by (2.19).
Bounding S2 for δ arbitrary. We set

W0 = min(max(2θq, V ), x/U ),

where θ ≥ e is a parameter that will be set later.


For V ≤ W < W0 , we use the bound (5.53):
   
0 0 x x W 1
S2 (U , W , W ) ≤ max(1, 2ρ) + + + 2q · W (log W )
8q 2W 2 2
W 2 log W
  
1 W 1
≤ max ,ρ + x log W + + qW log W,
2 8q 2 4

where ρ = q/Q.
If W0 > V , the contribution of the terms with V ≤ W < W0 to (6.15) is (by 6.18)
bounded by
Z W0 s
W 2 log W
   
x ρ0 W dW
4 κ1 + 1 x log W + + qW log W
V W 4 4q 4 W
Z W0 √ Z W0 √
κ2 √ log W κ2 √ log W
≤ ρ0 x 3/2
dW + x 1/2
dW
2 V W 2 V W
s Z W0 √
ρ0 x2 log W (6.20)
+ κ2 + qx dW
16q V W
 
√ x p p
≤ κ2 ρ0 √ + κ2 xW0 log W0
V
s
2κ2 ρ0 x2  
+ + qx (log W0 )3/2 − (log V )3/2 ,
3 16q

where ρ0 = max(1, 2ρ) and



κ2 = 4 κ1 ≤ 1.93768.
√ √ √ √
(We are using the easy bound a + b + c ≤ a + b + c.)
110 CHAPTER 6. MINOR-ARC TOTALS

We now examine the terms with W ≥ W0 . If 2θq > x/U , then W0 = x/U , the
contribution of the case is nil, and the computations below can be ignored. Thus, we
can assume that 2θq ≤ x/U .
We use (5.54):
 
x 1 q W 1
S2 (U 0 , W 0 , W ) ≤ + · W log W.
4φ(q) log(W/2q) φ(q) log(W/2q) 2
√ √ √
By a + b ≤ a + b, we can take out the q/φ(q) · W/ log(W/2q) term and estimate
its contribution on its own; it is at most
Z x/U s
x q 1 log W dW
4 κ1 · · W2
W0 W φ(q) 2 log W/2q W
r Z x/U s
κ2 q x log W
=√ dW (6.21)
2 φ(q) W0 W log W/2q
r Z x/U s !
κ2 qx 1 log 2q
≤√ √ 1+ dW
2 φ(q) W0 W log W/2q

Now
s
Z x/U Z x/2U q
1 log 2q p 1
√ dW = 2q log 2q √ dt.
W0 W log W/2q max(θ,V /2q) t log t

We bound this last integral somewhat crudely: for T ≥ e,


s
Z T
1 T
√ dt ≤ 2.3 , (6.22)
e t log t log T

(This is shown as follows: since


s !0
1 T
√ < 2.3
T log T log T

−1
if and only if T > T0 , where T0 = e(1−2/2.3) = 2135.94 . . . , it is enough to check
(numerically) that (6.22) holds for T = T0 .) Since θ ≥ e, this gives us that
s !
Z x/U
1 log 2q
√ 1+ dW
W0 W log W/2q
r s
x p x/2U q
≤2 + 2.3 2q log 2q · ,
U log x/2U q
6.2. CONTRIBUTIONS OF DIFFERENT TYPES 111

and so (6.21) is at most


s !
√ r
q log 2q x
2κ2 1 + 1.15 √ .
φ(q) log x/2U q U

We are left with what will usually be the main term, viz.,
Z x/U s  
x log W dW
4 S1 (U, W ) · W , (6.23)
W0 8φ(q) log W/2q W
p
which, by (5.34), is at most x/ φ(q) times the integral of
v r !
u
1 u
t 2H2
 x  κ
4 x/W U log W
+
W WU 2 U log W/2q

for W going from W0 to x/U , where H2 is as in (5.35) and

κ4 = 4κ0 ζ(3/2)3 ≤ 90.5671.

By the arithmetic/geometric mean inequality, the integrand is at most 1/W times


r
β + β −1 · 2H2 (x/W U ) β −1 κ4 x/W U β log 2q
+ + (6.24)
2 2 2 U 2 log W/2q

for any β > 0. We will choose β later.


The first summand in (6.24) gives what we can think of as the main or worst term
in the whole paper; let us compute it first. The integral is
x/U x/U W0
β + β −1 · 2H2 (x/W U ) dW β + β −1 · 2H2 (s) ds
Z Z
=
W0 2 W 1 2 s
  (6.25)
β κ6 x
≤ + log
2 4β U W0

by (5.36), where
κ6 = 0.60428.
Thus the main term is simply
 
β κ6 x x
+ p log . (6.26)
2 4β φ(q) U W0

The integral of the second summand is at most


√ Z r
−1 κ4 x x/U dW −1 κ4 x/U V
β · ≤β · .
4 U V W 3/2 2 U
112 CHAPTER 6. MINOR-ARC TOTALS

By (6.17), this is at most

β −1
√ · 10−3 · κ4 ≤ β −1 κ7 /2,
2

where √
2κ4
κ7 = ≤ 0.1281.
1000
Thus the contribution of the second summand is at most

β −1 κ7 x
·p .
2 φ(q)

The integral of the third summand in (6.24) is


Z x/U
β log 2q dW
. (6.27)
2 W0 log W/2q W

If V < 2θq ≤ x/U , this is


Z x/U Z x/2U q
β log 2q dW β 1 dt
= log 2q ·
2 2θq log W/2q W 2 θ log t t
 
β x
= log 2q · log log − log log θ .
2 2U q

If 2θq > x/U , the integral is over an empty range and its contribution is hence 0.
If 2θq ≤ V , (6.27) is
x/U
β log 2q x/2U q 1 dt
Z Z
β log 2q dW
=
2 V log W/2q W 2 V /2q log t t
β log 2q x
= · (log log − log log V /2q) (6.28)
2 2U q
 
β log 2q log x/U V
= · log 1 + .
2 log V /2q

(Let us stop for a moment and ask ourselves when this will be smaller than what
we can see as the main term, namely, the term (β/2) log x/U W0 in (6.25). Clearly,
log(1 + (log x/U V )/(log V /2q)) ≤ (log x/U V )/(log V /2q), and that is smaller than
(log x/U V )/ log 2q when V /2q > 2q. Of course, it does not actually matter if (6.28)
is smaller than the term from (6.25) or not, since we are looking for upper bounds here,
not for asymptotics.)
The total bound for (6.23) is thus
    
x 1 x Φ −1 1 x κ7
p · β· log + +β κ6 log + , (6.29)
φ(q) 2 U W0 2 4 U W0 2
6.2. CONTRIBUTIONS OF DIFFERENT TYPES 113

where
  
log 2q log log x − log log θ if V /2θ < q < x/(2θU ).
2U q
Φ= 
log x/U V
 (6.30)
log 2q log 1 +
log V /2q if q ≤ V /2θ.

Choosing β optimally, we obtain that (6.23) is at most


s  
x x x
p log +Φ κ6 log + 2κ7 , (6.31)
2φ(q) U W0 U W0

where Φ is as in (6.30).
Bounding S2 for |δ| ≥ 8. Let us see how much a non-zero δ can help us. It makes
sense to apply (5.56) only when |δ| ≥ 8; otherwise (5.54) is almost certainly better.
Now, by definition, |δ|/x ≤ 1/qQ, and so |δ| ≥ 8 can happen only when q ≤ x/8Q.
With this in mind, let us apply (5.56), assuming |δ| > 8. Note first that

x  x −1 1/|δq| 4/|δq|


q+ ≥ q 1 ≥ 1 1
|δq| 4W x + 4W 2Q + W
4W 1 4W 1
≥ · W
≥ · .
|δ|q 1 + 2Q |δ|q 1 + x/U
2Q

This is at least 2 min(2Q, W )/|δq|. Thus we are allowed to apply (5.56) when |δq| ≤
2 min(2Q, W ). Since Q ≥ x/U , we know that min(2Q, W ) = W for all W ≤ x/U ,
and so it is enough to assume that |δq| ≤ 2W . We will soon be making a stronger
assumption.
Recalling also (6.19), we see that (5.56) gives us
 
 
0 0
 2q/φ(q)  x W 1
S2 (U , W , W ) ≤ min 
1,   + · W (log W ).
4W 1
 |δq| 2 2
log |δ|q · x/U
1+ 2Q
(6.32)
2
Similarly to before, we define W0 = max(V, θ|δq|), where θ ≥ 3e /8 will be set
later. (Here θ ≥ 3e2 /8 is an assumption we do not yet need, but we will be using it
soon to simplify matters slightly.) For W ≥ W0 , we certainly have |δq| ≤ 2W . Hence
the part of the first term of (6.15) coming from the range W0 ≤ W < x/U is
Z x/U p dW
4 S1 (U, W ) · S2 (U, V, W )
W0 W
Z x/U v
Wx W2
r  
q u log W dW
≤4 uS1 (U, W ) · + .
u
 
φ(q) W0 t |δq| 2 W
log 4W|δ|q ·
1
x/U
1+ 2Q

(6.33)
114 CHAPTER 6. MINOR-ARC TOTALS

By (5.34), the contribution of the term W x/|δq| to (6.33) is at most


v !
Z x/U u r
4x u  x  κ
4 x/W U log W dW
u H2 +
p u  
|δ|φ(q) W0 t WU 4 U W
log 4W |δ|q ·
1
x/U
1+ 2Q

Note that 1 + (x/U )/2Q ≤ 3/2. Proceeding as in (6.23)–(6.31), we obtain that this is
at most s  
2x x x
p log +Φ κ6 log + 2κ7 ,
|δ|φ(q) U W0 U W0
where
  
(1+1 )|δq| log x/U V
log
4 log 1 + log 4V /|δ|(1+1)q if |δq| ≤ V /θ,
Φ= 3|δq|
(6.34)
8x 8θ
log
8 log log 3U |δq| − log log 3 if V /θ < |δq| ≤ x/θU ,

where 1 = x/2U Q. This is what we think of as the main term.


By (6.18), the contribution of the term W 2 /2 to (6.33) is at most
r Z x/U r s
q κ1 dW log W
4 x √ · max x 8W
. (6.35)
φ(q) W0 2 W W0 ≤W ≤ U log 3|δq|

Since t → (log t)/(log t/c) is decreasing for t > c, (6.35) is at most


s

r 
q x p log W0
4 2κ1 √ − xW0 8W0
. (6.36)
φ(q) U log 3|δq|

If W0 > V , we also have to consider the range V ≤ W < W0 . By Prop. 5.2.4 and
(6.19), the part of (6.15) coming from this is
Z θ|δq| s
W2
 
Wx Wx x dW
4 S1 (U, W ) · (log W ) + + + .
V 2|δq| 4 16(1 − ρ)Q 8(1 − ρ) W

The contribution of W 2 /4 is at most


Z W0 r
x W 2 dW √ p p
4 κ1 log W · ≤ 4 κ1 · xW0 · log W ;
V W 4 W

the sum of this and (6.36) is at most


s  s !
√ 2q x p log W0 p p
4 κ1 √ − xW0 + xW0 log W0
φ(q) U log 8θ
3
r
q x p
≤ κ2 · √ log W0 ,
φ(q) U
6.2. CONTRIBUTIONS OF DIFFERENT TYPES 115

where we use the facts that W0 = θ|δq| (by W0 > V ) and θ ≥ 3e2 /8, and where we

recall that κ2 = 4 κ1 .
The terms W x/2|δ|q and W x/(16(1 − ρ)Q) contribute at most

s
θ|δq|  

Z
x x x dW
4 κ1 · (log W )W +
V W 2|δq| 16(1 − ρ)Q W
!Z
θ|δq| p
1 1 dW
= κ2 x p + p log W
2|δ|q 4 (1 − ρ)Q V W
!
2κ2 1 1  
= x p + p (log θ|δ|q)3/2 − (log V )3/2 .
3 2|δ|q 4 (1 − ρ)Q

The term x/8(1 − ρ) contributes

s √ √
√ θ|δq|
2κ1 x ∞ log W
Z Z
log W dW
2κ1 x ≤√ dW
V W (1 − ρ) W 1−ρ V W 3/2
κ2 x p p
≤p ( log V + 1/ log V ),
2(1 − ρ)V

where we use the estimate


Z ∞ √ Z ∞√
log W 1 log u + log V
dW = √ du
V W 3/2 V 1 u3/2
Z ∞√ Z ∞
1 log V 1 1 log u
≤√ du + √ √ du
V 1 u3/2 V 1 2 log V u3/2

log V 1 2 p p 
=2 √ + √ ·4≤ √ log V + 1/ log V .
V 2 V log V V

***

It is time to collect all type II terms. Let us start with the case of general δ. We will
set θ ≥ e later. If q ≤ V /2θ, then |SII | is at most

s   
x x log x/U V x 
p · log + log 2q log 1 + κ6 log + 2κ7
2φ(q) UV log V /2q UV
s !
√ r
q log 2q x x
+ 2κ2 1 + 1.15 √ + κ9 √ .
φ(q) log x/2U q U V
(6.37)
116 CHAPTER 6. MINOR-ARC TOTALS

If V /2θ < q ≤ x/2θU , then |SII | is at most


s  
x x log x/2U q x
p · log + log 2q log κ6 log + 2κ7
2φ(q) U · 2θq log θ U · 2θq
s !
√ r
q log 2q x p x
+ 2κ2 1 + 1.15 √ + (κ2 log 2θq + κ9 ) √
φ(q) log x/2U q U V
κ2   x
+ (log 2θq)3/2 − (log V )3/2 √
6 q
 
p 2 3/2 3/2 √
+ κ2 2θ · log 2θq + ((log 2θq) − (log V ) ) qx,
3
(6.38)
where we use the fact that Q ≥ x/U (implying that ρ0 = max(1, 2q/Q) equals 1 for
q ≤ x/2U ). Finally, if q > x/2θU ,
p x p x
|SII | ≤ (κ2 2 log x/U + κ9 ) √ + κ2 log x/U √
V U
  (6.39)
2κ2 3/2 3/2 x √
+ ((log x/U ) − (log V ) ) √ + qx .
3 2 2q

Now let us examine the alternative bounds for |δ| ≥ 8. Here we assume θ ≥ 3e2 /8.
If |δq| ≤ V /θ, then |SII | is at most
v !
u
2x u
tlog x |δq|(1 +  1 ) log x/U V
p + log log 1 + 4V
|δ|φ(q) UV 4 log |δ|(1+1 )q
r
x (6.40)
· κ6 log + 2κ7
UV
s s
2q log V x x
+ κ2 · · √ + κ9 √ ,
φ(q) log 2V /|δq| U V

where 1 = x/2U Q. If V /θ < |δ|q ≤ x/θU , then |SII | is at most


v
log 3U8x
u ! 
2x u x 3|δq| |δq| x
p t log + log log κ6 log + 2κ7
|δ|φ(q) U · θ|δ|q 8 log 8θ/3 U · θ|δq|
!
2κ2 x x  
+ p + √ (log θ|δq|)3/2 − (log V )3/2
3 2|δq| 4 Q − q
!
κ2 p p  x
+ p log V + 1/ log V + κ9 √
2(1 − ρ) V
r
q p x
+ κ2 · log θ|δq| · √ ,
φ(q) U
(6.41)
6.3. ADJUSTING PARAMETERS. CALCULATIONS. 117

where ρ = q/Q. Note that |δ| ≤ x/Qq implies ρ ≤ x/Q2 , and so ρ will be very small
and Q − q will be very close to Q.
The case |δq| > x/θU will not arise in practice, essentially because of |δ|q ≤ x/Q.

6.3 Adjusting parameters. Calculations.


P
We must bound the exponential sum n Λ(n)e(αn)η(n/x). By (3.8), it is enough to
sum the bounds we obtained in §6.2. We will now see how it will be best to set U , V
and other parameters.
Usually, the largest terms will be

C0 U V, (6.42)

where C0 equals
(
c4,I2 + c9,I2 = 4.39779 + 5.219930 if |δ| ≤ 1/2c2 ∼ 0.74463,
c4,I2 + (1 + )c13,I2 = (4.89106 + 1.31541)(1 + 0 ) if |δ| > 1/2c2
(6.43)
(from (6.13) and (6.14), type I; we will specify  and 0 = (4 log 2)/(x/U V ) later)
and
v !
log UxV
u
x u x
p tlog + (log δ0 (1 + 1 )q) log 1 + V
δ0 φ(q) UV log δ0 (1+ 1 )q (6.44)
r
x
· κ6 log + 2κ7
UV

(from (6.37) and (6.40), type II; here δ0 = max(2, |δ|/4), while 1 = x/2U Q for
|δ| > 8 and 1 = 0 for |δ| < 8.

We set U V = κx/ qδ0 ; we must choose κ > 0.
p |δ| ≤ 4,√so that
Let us first optimize (or, rather, almost optimize) κ in the case
δ0 = 2 and 1 = 0. For the purpose of choosing κ, we replace φ(q) by q/C1 ,
where C1 = 2.3536 ∼ 510510/φ(510510), and also replace V by q 2 /c, c a constant.
We use the approximation

x
!  √   √ 
log UV log( 2q/κ) 3 log 2 c/κ
log 1 + V
= log 1 + = log +
log |2q|
log(q/2c) 2 log q/2c

3 2 log 2 c/κ
∼ log + .
2 3 log q/2c
118 CHAPTER 6. MINOR-ARC TOTALS

What we must minimize, then, is


√ !! 
v
u √ 2 c √ 
C0 κ C1 u 2q 3 2 log κ 2q
√ +√ t log + log 2q log + q κ 6 log + 2κ 7
2q 2q κ 2 3 log 2c κ
√ s  
C0 κ C1 κ6 0 5 2 log 4c
≤ √ + √ p κ1 log q − + q log κ + κ02
2q 2 q κ01 3 3 log 2c
s
4κ0 κ7
· κ01 log q − 2κ01 log κ + 1 + κ01 log 2
κ6
     
C0 0 0 5 0 1 log 4c 0
≤√ κ + κ4 κ1 log q − + κ1 + q log κ + κ3 ,
2q 6 3 log 2c
(6.45)
where
1 3 √ 3 log 4c log 2q
κ01 = + log , κ02 = log 2 + log 2 log + ,
2 2 2 3 log q/2c
4κ0 κ7 log 4c (log 4c)2
 
1
κ03 = κ02 + 1 + κ01 log 2 = + 0
q + κ5 ,
2 κ6 6 6 log 2c
(
0.30915
C1 if |δ| ≤ 4
r
0 κ6 1+1.186940
κ4 = 0 ∼ 0.27797
C0 2κ1 (1+0.26894)(1+0 ) if |δ| > 4,
1 √ 3 4κ0 κ7
κ05 = (log 2 + log 2 log + 1 + κ01 log 2) ∼ 1.01152.
2 2 κ6
Taking derivatives, we see that the minimum is attained when
   
5 0 1 log 4c 0 log 4c 0.30915
κ= + κ1 + q κ4 ∼ 1.7388 + q · (6.46)
6 3 log 2c 3 log 2c 1 + 1.190

provided that |δ| ≤ 4. (What we obtain for |δ| > 4 is essentially the same, only with
δ0 q = δq/4 instead of 2q and 0.27797/((1 + 0.27)(1 + 0 )) in place of 0.30915.) For
q = 5 · 105 , c = 2.5 and |δ| ≤ 4 (typical values in the most delicate range), we get that
κ should be about 0.5582/(1 + 1.190 ). Values of q, c nearby give similar values for
κ, whether |δ| ≤ 4 or for |δ| > 4.
(Incidentally, at this point, we could already give a back-of-the-envelope estimate
for the last line of (6.45), i.e., our main term. It suggests that choosing w = 1 instead
of w = 2 would have given bounds worse by about 15 percent.)
We make the choices
x
κ = 1/2, and so UV = √
2 qδ0
for the sake of simplicity. (Unsurprisingly, (6.45) changes very
√ slowly around its min-
imum.) Note, by the way, that this means that 0 = (2 log 2)/ qδ0 .
Now we must decide how to choose U , V and Q, given our choice of U V . We will
actually make two sets of choices.
6.3. ADJUSTING PARAMETERS. CALCULATIONS. 119

First, we will use the SI,2 estimates for q ≤ Q/V to treat all α of the form α =
a/q + O∗ (1/qQ), q ≤ y. (Here y is a parameter satisfying y ≤ Q/V .)
Then, the remaining α will get treated with the (coarser) SI,2 estimate for q >
Q/V , with Q reset to a lower value (call it Q0 ). If α was not treated in the first go (so
that it must be dealt with the coarser estimate) then α = a0 /q 0 + δ 0 /x, where either
q 0 > y or δ 0 q 0 > x/Q. (Otherwise, α = a0 /q 0 + O∗ (1/q 0 Q) would be a valid estimate
with q 0 ≤ y.) The value of Q0 is set to be smaller than Q both because this is helpful
(it diminishes error terms that would be large for large q) and because this is harmless
(since we are no longer assuming that q ≤ Q/V ).

6.3.1 First choice of parameters: q ≤ y


The largest items affected strongly by our choices at this point are
 
1+ + 2U V |δ|q x
c16,I2 2 + log + c17,I2 Q (from SI,2 , |δ| > 1/2c2 ),
 x Q/V
 
U
c10,I2 log + 2c5,I2 + c12,I2 Q (from SI,2 , |δ| ≤ 1/2c2 ),
q
(6.47)
and
s s !
2q log 2q x x
κ2 1 + 1.15 √ + κ9 √ (from SII , any |delta|),
φ(q) log x/2U q U V
(6.48)
with s s
2q log V x
κ2 · ·√ (from SII )
φ(q) log 2V /|δq| U

as an alternative to (6.48) for |δ| ≥ 8. (In several of these expressions, we are apply-
ing some minor simplifications that our later choices will justify. Of course, even if
these simplifications were not justified, we would not be getting incorrect results, only
potentially suboptimal ones; we are trying to decide how choose certain parameters.)
In addition, we have a relatively mild but important dependence on V in the main
term (6.44), even when we hold U V constant (as we do, in so far as we have already
chosen U V ). We must also respect the condition q ≤ Q/V , the lower bound on
U given by (6.17), and the assumptions made √ at the beginning of the chapter (e.g.
Q ≥ x/U , V ≥ 2 · 106 ). Recall that U V = x/2 qδ0 .
We set
x
Q= ,
8y
since we will then have not just q ≤ y but also q|δ| ≤ x/Q = 8y, and so qδ0 ≤ 2y.
We want q ≤ Q/V to be true whenever q ≤ y; this means that

Q QU QU U qδ0
q≤ = = √ =
V UV x/2 qδ0 4y
120 CHAPTER 6. MINOR-ARC TOTALS


must be true when q ≤ y, and so it is enough to set U = 4y 2 / qδ0 . The following
choices make sense: we will work with the parameters

x1/3 x 3 p p
y= , Q= = x2/3 , x/U V = 2 qδ0 ≤ 2 2y,
6 8y 4
(6.49)
4y 2 x2/3 x x 9x1/3
U=√ = √ , V = = 2 = ,
qδ0 9 qδ0 (x/U V ) · U 8y 2

where,
√ as before,√δ0 = max(2, |δ|/4). So, for instance, we obtain 1 ≤ x/2U Q =
6 qδ0 /x1/3 ≤ 2 3/x1/6 . Assuming

x ≥ 2.16 · 1020 , (6.50)


√ √
we obtain that U/(x/U V ) ≥ (x2/3 /9 qδ0 )/(2 qδ0 ) = x2/3 /18qδ0 ≥ x1/3 /6 ≥
106 , and so (6.17) holds. We also get that 1 ≤ 0.002.
Since V = x/8y 2 = (9/2)x1/3 , (6.50) also implies that V ≥ 2 · 106 (in fact,
V ≥ 27 · 106 ). It is easy to check that

V < x/4, U V ≤ x, Q ≥ max(16, 2 x), Q ≥ max(2U, x/U ), (6.51)

as stated at the beginning of the chapter. Let θ = (3/2)3 = 27/8. Then

V x/8y 2 x x
= ≥ = = 4 > 1,
2θq 2θq 16θy 3 54y 3
(6.52)
V x/8y 2 x x
≥ ≥ = = 1.
θ|δq| 8θy 64θy 3 216y 3
The first type I bound is
   
4 q
c0 
 
x 5 φ(q)  1p
 c q
4,I 
|SI,1 | ≤ min 1, 02 

min  , 1 log 9x qδ + c +
3
0 3,I
 2
q δ x3 /9 φ(q) 
log+

5 1
q2 δ02

c10,I x1/3
 
y 1/3
p
+ c7,I log + c8,I log x y + 1 (log 9x eqδ0 )
c2 34 22 q 3/2 δ02

2x2/3 x5/3
 2/3


x 2 ex c10,I
+ c5,I log √ + c6,I log √ √ + c9,I x log + ,
9c2 qδ0 9 qδ0 9 qδ0 c2 e
(6.53)
where the constants are as in §6.2.1. For any c, R ≥ 1, the function

x → (log cx)/(log x/R)

attains its maximum on [R0 , ∞], R0 > R, at x = R0 . Hence, for qδ0 fixed,
 
4/5  1p

min  + 4x2/3 , 1 log 9x 3 qδ0 + c3,I (6.54)
log 5
9(δ0 q) 2
6.3. ADJUSTING PARAMETERS. CALCULATIONS. 121

attains its maximum for x ∈ [(9e4/5 (δ0 q)5/2 /4)3/2 , ∞) at


 3/2
x = 9e4/5 (δ0 q)5/2 /4 = (27/8)e6/5 (qδ0 )15/4 . (6.55)

Now, notice that, for smaller values of x, (6.54) increases as x increases, since the term
min(. . . , 1) equals the constant 1. Hence, (6.54) attains its maximum for x ∈ (0, ∞)
at (6.55), and so
 
4/5  1p

min  + 4x2/3 , 1 log 9x 3 qδ0 + c3,I + c4,I
log 5
9(δ0 q) 2
27 2/5 7
≤ log e (δ0 q)7/4 + c3,I + c4,I ≤ log δ0 q + 6.11676.
2 4
Examining the other terms in (6.53) and using (6.50), we conclude that

c00
   
x q 7
|SI,1 | ≤ min 1, 2 · log δ0 q + 6.11676
q δ φ(q) 4
(6.56)
x2/3
+ √ (0.67845 log x − 1.20818) + 0.37864x2/3 ,
qδ0

where we are using (6.50) (and, of course, the trivial bound δ0 q ≥ 2) to simplify the
smaller error terms. We recall that c00 = 0.798437 > c0 /(2π)2 .
Let us now consider SI,2 . The terms that appear both for |δ| small and |δ| large are
given in (6.12). The second line in (6.12) equals

2U V 2 qV 2 x2/3 9x1/3
   
x c10,I q
c8,I2 2
+ + + √ + log
4q δ0 x x 2 2 qδ0 18qδ0 2
1/3
   1/6 2/3
 
x 9x 27 c10,I y x 1 9
≤ c8,I2 + √ + + + log x + log
4q 2 δ0 2 2 8 2 23/2 18qδ0 3 2
2/3 √
x x
≤ 0.29315 2 + (0.08679 log x + 0.39161) + 0.00153 x,
q δ0 qδ0

where we are using (6.50) to simplify. Now


! !
4/5 4/5 9x1/3 q
min , 1 log V q = min + y , 1 log (6.57)
log+ 4VQq2 log 4q2 2

can be bounded trivially by log(9x1/3 q/2) ≤ (2/3) log x+log 3/4. We can also bound
(6.57) as we bounded (6.54) before, namely, by fixing q and finding the maximum for
x variable. In this way, we obtain that (6.57) is maximal for y = 4e4/5 q 2 ; since, by
definition, x1/3 /6 = y, (6.57) then equals

9(6 · 4e4/5 q 2 )q 4
log = 3 log q + log 108 + ≤ 3 log q + 5.48214.
2 5
122 CHAPTER 6. MINOR-ARC TOTALS

We conclude that (6.12) is at most

4c0
   
3 x
min 1, 20 · log q + 2.74107
δ 2 φ(q)
(6.58)
x
+ 0.29315 2 + (0.0434 log x + 0.1959)x2/3 .
q δ0

If |δ| ≤ 1/2c2 , we must consider (6.13). This is at most

x x2/3 3
(c4,I2 + c9,I2 ) √ + (c10,I2 log 3/2 √ + 2c5,I2 + c12,I2 ) · x2/3
2 qδ0 9q δ0 4
2.1989x 3.61818x
≤ √ + + (1.77019 log x + 29.2955)x2/3 ,
qδ0 qδ0

where we recall that 0 = (4 log 2)/(x/U
p V )√= (2 log 2)/ qδ0 , which can be bounded

crudely by 2 log √ 2. (Thus, c10,I2 ≤ 1 + 8 log 2·1.78783 < 3.54037 and c12,I2 ≤
29.3333 + 11.902 2 log 2 ≤ 41.0004.)
If |δ| > 1/2c2 , we must consider (6.14) instead. For  = 0.07, that is at most
 
x 2 log 2
(c4,I2 + (1 + )c13,I2 ) √ 1+ √
2 qδ0 qδ0
 
2 log 2 x
+ (3.38845 1 + √ log δq 3 + 20.8823)
qδ0 |δ|q
   
4 log 2
+ 68.8133 1 + √ log |δ|q + 72.0828 x2/3 + 60.4141x1/3
qδ0
 
x 2 log 2 x
= 2.49157 √ 1+ √ + (3.38845 log δq 3 + 32.6771)
qδ0 qδ0 |δ|q
 
log |δ|q 2
+ 22.9378 log x + 190.791 √ + 130.691 x 3
qδ0
x x
≤ 2.49157 √ + (3.59676 log δ0 + 27.3032 log q + 91.2218)
qδ0 qδ0
2
+ (22.9378 log x + 411.228)x 3 ,

where, besides the crude bound 0 ≤ 2 log 2, we use the inequalities

log |δ|q log 4qδ0 log 8 log q 1 log q 1 log e2 2
√ ≤ √ ≤ √ , √ ≤√ √ ≤√ = ,
qδ0 qδ0 2 qδ0 2 q 2 e e
1 4c2 log |δ| 2 log δ0
≤ , ≤ · .
|δ| δ0 |δ| e log 2 δ0

(Obviously, 1/|δ| ≤ 4c2 /δ0 is based on the assumption |δ| > 1/2c2 and on the inequal-
ity 16c2 ≥ 1. The bound on (log |δ|)/|δ| is based on the fact that (log t)/t reaches its
maximum at t = e, and (log δ0 )/δ0 = (log 2)/2 for |δ| ≤ 8.)
6.3. ADJUSTING PARAMETERS. CALCULATIONS. 123

We sum (6.58) and whichever one of our bounds for (6.13) and (6.14) is greater
(namely, the latter). We obtain that, for any δ,

4c00
   
x 3 x
|SI,2 | ≤ 2.49157 √ + min 1, 2 · log q + 2.74107
qδ0 δ 2 φ(q)
x
+ (3.59676 log δ0 + 27.3032 log q + 91.515) + (22.9812 log x + 411.424)x2/3 ,
qδ0
(6.59)
where we bound one of the lower-order terms in (6.58) by x/q 2 δ0 ≤ x/qδ0 .
For type II, we have to consider two cases: (a) |δ| < 8, and (b) |δ| ≥ 8. Consider
first |δ| < 8. Then δ0 = 2. Recall that θ = 27/8. We have q ≤ V /2θ and |δq| ≤ V /θ
thanks to (6.52). We apply (6.37), and obtain that, for |δ| < 8,

v !
1
u
x 2 log 4qδ0
u1
|SII | ≤ p · t log 4qδ0 + log 2q log 1 + V
2φ(q) 2 log 2q
p
· 0.30214 log 4qδ0 + 0.2562
 
r v
q  u log 2q 
u
1/4 2/3
+ 8.22088 1 + 1.15t 1/3
√  (qδ0 ) x + 1.84251x5/6
φ(q) log 9x 2√q δ0
r
x log 8q p
≤p · Cx,2q log 2q + · 0.30214 log 2q + 0.67506
2φ(q) 2
r
q 3/4
+ 16.406 x + 1.84251x5/6
φ(q)
(6.60)
where we bound

x1/3 x1/3
log 2q log 3√ log 3√
√ ≤ < lim = 2,
9x1/3
√ δ0
1/6 2
9x√ x→∞ 1/6 2
9x√
log 2 q log log
2 1/6 2 1/6

and where we define


!
log 4t
Cx,t := log 1 + 9x1/3
2 log 2.004t

for 0 < t < 9x1/3 /2. (We have 2.004 here instead of 2 because we want a constant
≥ 2(1 + 1 ) in later occurences of Cx,t , for reasons that will soon become clear.)
For purposes of later comparison, we remark that 16.404 ≤ 1.57863x4/5−3/4 for
x ≥ 2.16 · 1020 .
Consider now case (b), namely, |δ| ≥ 8. Then δ0 = |δ|/4. By (6.52), |δq| ≤ V /θ.
124 CHAPTER 6. MINOR-ARC TOTALS

Hence, (6.40) gives us that

v !
u
2x u1 |δq|(1 + 1 ) log |δ|q
|SII | ≤ p · t log |δq| + log log 1 + 18x1/3
|δ|φ(q) 2 4 2 log |δ|(1+ 1 )q
p
· 0.30214 log |δ|q + 0.2562
v
1/3
log 9x2
r u
q u
+ 8.22088 t · (qδ0 )1/4 x2/3 + 1.84251x5/6
φ(q) log 9x1/3
|δq|
r
x log 4δ0 q p
≤p Cx,δ0 q log δ0 (1 + 1 )q + 0.30214 log δ0 q + 0.67506
δ0 φ(q) 2
r
q 4/5
+ 1.79926 x + 1.84251x5/6 ,
φ(q)
(6.61)
since
v s
9x1/3 1/3
log 9x2
u
u log
8.22088t 2
· (qδ0 )1/4 ≤ 8.22088 · (x1/3 /3)1/4
log 9x1/3
|δq|
log 27
4

≤ 1.79926x4/5−2/3

for x ≥ 2.16 · 1020 . Clearly

log δ0 (1 + 1 )q = log δ0 q + log(1 + 1 ) ≤ log δ0 q + 1 .

1/3 3
p q/φ(q) ≤ z(y) = z(x /6) (since x ≥ 18 ). It is easy to
By Lemma C.2.2,
4/5−5/6
check that x → z(x /6)x1/3 is decreasing
p for x ≥ 2.16 · 1020 (in fact, for
3
18 ). Using (6.50),
p we conclude that 1.67718 q/φ(q)x4/5 ≤ 0.89657x5/6 and, by
the way, 16.406 q/φ(q)x3/4 ≤ 0.78663x5/6 . This allows us to simplify the last lines
of (6.60) and (6.61). We obtain that, for δ arbitrary,

r
x log 4δ0 q p
|SII | ≤ p Cx,δ0 q (log δ0 q + 1 ) + 0.30214 log δ0 q + 0.67506
δ0 φ(q) 2
+ 2.73908x5/6 .
(6.62)
It is time to sum up SI,1 , SI,2 and SII . The main terms come from the first line
of (6.62) and the first term of (6.59). Lesser-order terms can be dealt with roughly:
we bound min(1, c00 /δ 2 ) and min(1, 4c00 /δ 2 ) from above by 2/δ0 (using the fact that
c00 = 0.798437 < 16, which implies that 8/δ > 4c00 /δ 2 for δ > 8; of course, for δ ≤ 8,
we have min(1, 4c00 /δ 2 ) ≤ 1 = 2/2 = 2/δ0 ).
6.3. ADJUSTING PARAMETERS. CALCULATIONS. 125

The terms inversely proportional to q, φ(q) or q 2 thus add up to at most


   
2x q 7 2x 3
· log δ0 q + 6.11676 + log q + 2.74107
δ0 q φ(q) 4 δ0 φ(q) 2
x
+ (3.59676 log δ0 + 27.3032 log q + 91.515)
qδ0
 
2x 13 2x
≤ log δ0 q + 7.81811 + (13.6516 log δ0 q + 37.5415),
δ0 φ(q) 4 δ0 q

where, for instance, we bound (3/2) log q + 2.74107 by (3/2) log δ0 q + 2.74107 −
(3/2) log 2.
As for the other terms – we use the assumption x ≥ 2.16 √· 1020 to bound√x
2/3
2/3 5/6 2/3 2/3
and x log x by a small constant times x . We bound x / qδ0 by x / 2 (in
(6.56)). We obtain

x2/3 2
√ (0.67845 log x − 1.20818) + 0.37864x 3
2
2 5
+ (22.9812 log x + 411.424)x 3 + 2.73908x 6 ≤ 3.35531x5/6 .

The sums S0,∞ and S0,w in (3.11) are 0 (by (6.50) and the fact that η2 (t) = 0 for
t ≤ 1/4). We conclude that, for q ≤ y = x1/3 /6, x ≥ 2.16 · 1020 and η = η2 as in
(3.4),

|Sη (x, α)| ≤ |SI,1 | + |SI,2 | + |SII |


r
x log 4δ0 q p
≤p Cx,δ0 q (log δ0 q + 0.002) + 0.30214 log δ0 q + 0.67506
δ0 φ(q) 2
 
2.49157x 2x 13 2x
+ √ + log δ0 q + 7.81811 + (13.6516 log δ0 q + 37.5415)
δ0 q δ0 φ(q) 4 δ0 q
+ 3.35531x5/6 ,
(6.63)
where !
log 4t
δ0 = max(2, |δ|/4), Cx,t = log 1 + 9x1/3
. (6.64)
2 log 2.004t

Since Cx,t is an increasing function as a function of t (for x fixed and t ≤ 9x1/3 /2.004)
and δ0 q ≤ 2y, we see that Cx,t ≤ Cx,2y . It is clear that x 7→ Cx,t (fixed t) is a
decreasing function of x. For x = 2.16 · 1020 , Cx,2y = 1.39942 . . . .

6.3.2 Second choice of parameters


If, with the original choice of parameters, we obtained q > y = x1/3 /6, we now reset
our parameters (Q, U and V ). Recall that, while the value of q may now change (due to
the change in Q), we will be able to assume that either q > y or |δq| > x/(x/8y) = 8y.
126 CHAPTER 6. MINOR-ARC TOTALS

We want U/(x/U V ) ≥ 5 · 105 (this is (6.17)). We also want U V small. With this
in mind, we let

x1/3 √ x x2/3
V = , U = 500 6x1/3 , Q= = √ . (6.65)
3 U 500 6
6
Then (6.17) holds (as an equality). Since we are assuming√ (6.50), we have√ V ≥ 2 · 10 .
It is easy to check that (6.50) also implies that U ≤ x/2 and Q ≥ 2 x, and so the
inequalities in (6.51) all hold.
Write 2α = a/q + δ/x for the new approximation; we must have either q > y or
|δ| > 8y/q, since otherwise a/q would already be a valid approximation under the first
choice of parameters. Thus, either (a) q > y, or both (b1) |δ| > 8 and (b2) |δ|q > 8y.
Since now V = 2y, we have q > V /2θ in case (a) and |δq| > V /θ in case (b) for any
θ ≥ 1. We set θ = 4.
(Thanks to this choice of θ, we have |δq| ≤ x/Q ≤ x/θU , as we commented at the
end of §6.2.3; this will help us avoid some case-work later.)
By (6.4),

c0 √
  
x q
|SI,1 | ≤ min 1, 02 log x2/3 − log 500 6 + c3,I + c4,I
q δ φ(q)
2 2 1/2 2/3
 
Q Q U e x c
+ c7,I log + c8,I log x log c11,I Q + c10,I log √ + 10,I
c2 x 4x 500 6 e
√ 1/3 ! √
1000 6x √ √ √ 2 ex
+ c5,I log + c6,I log 500 6x4/3 · 500 6x1/3 + c9,I x log
c2 c2
0
  
x c 2 q 2.89 2/3
≤ min 1, 02 log x − 4.99944 + 1.00303 + x (log x)2 ,
q δ 3 φ(q) 1000

where we are bounding

Q Q2
c7,I log + c8,I log x log c11,I
c2 x
 
2 2 1
=c8,I (log x) − c8,I (log 1500000 − log c11,I ) − c7,I log x + c7,I log √
3 500 6c2
≤c8,I (log x)2 − 38 log x.

We are also using the assumption (6.50) repeatedly in √ order to show that the sum of
all lower-order terms is less than (38c8,I log x)/(500 6). Note that c8,I (log x)2 Q ≤
0.00289x2/3 (log x)2 . √
We have q/φ(q) ≤ z(Q) (where z is as in (C.19)) and, since Q > 6 · 12 · 109
for x ≥ 2.16 · 1020 ,
 
2.50637
1.00303z(Q) ≤ 1.00303 eγ log log Q + √
log log 6 · 12 · 109
≤ 0.2359 log Q + 0.79 < 0.1573 log x.
6.3. ADJUSTING PARAMETERS. CALCULATIONS. 127

(It is possible to give a much better estimation, but it is not worthwhile, since this will
be a very minor term.) We have either q > y or q|δ| > 8y; if q|δ| > 8y but q ≤ y, then
|δ| ≥ 8, and so c00 /δ 2 q < 1/8|δ|q < 1/64y < 1/y. Hence
  
x 2
|SI,1 | ≤ + 0.1573 log x + 0.00289x2/3 (log x)2
y 3
≤ 2.4719x2/3 log x + 0.00289x2/3 (log x)2 .

We bound |SI,2 | using Lemma 4.2.4. First we bound (4.50): this is at most

4c00 x1/3 q
 
x
min 1, 2 log
2q δ 3
√ √
 1

x3 1 2
(U V )2 log 1/3 2
 
1 1 3 3c4 500 6 (500 6x + 1) x log x 
3 3
+ c0 − 2  + + ,
4 π 2x 2 9 6x

where c4 = 1.03884. We bound the second line of this using (6.50). As for the first
line, we have either q ≥ y (and so the first line is at most (x/2y)(log x1/3 y/3)) or
q < y and 4c00 /δ 2 q < 1/16y < 1/y (and so the same bound applies). Hence (4.50) is
at most
 
2
3x2/3 log x − log 18 + 0.02017x2/3 log x = 2.02017x2/3 log x − 3(log 18)x2/3 .
3

Now we bound (4.51), which comes up when |δ| ≤ 1/2c2 , where c2 = 6π/5 c0 ,
c0 = 31.521 (and so c2 = 0.6714769 . . . ). Since 1/2c2 < 8, it follows that q > y (the
alternative q ≤ y, q|δ| > 8y is impossible, since it implies |δ| > 8). Then (4.51) is at
most
√  


2 c0 c1 UV c2 x log U V UV
U V log √ + Q 3 log + log
π e Q 2 Q/2
3c1 x UV 16 log 2 c0 e3 Q2 Q
+ log U V log + Q log log (6.66)
2 y c2 x/y π 4π · 8 log 2 · x 2
3c1 √ c2 x 25c0 √
+ √ x log + 2
(3c2 )1/2 x log x,
2 2c2 2 4π

where c1 = 1.000189 > 1 + (8 log 2)/(2x/U V ).


The first line of (6.66) is a linear combination of terms of the form x2/3 log Cx,
C > 1; using (6.50), we obtain that it is at most 1144.693x2/3 log x. (The main contri-
bution comes from the first term.) Similarly, we can bound the first term in the second
line by 33.0536x2/3 log x. Since log(c0 e3 Q2 /(4π · 8 log 2 · x)) log Q/2 is at most
log x1/3 log x2/3 , the second term in the second line is at most 0.0006406x(log x)2 .
The third line of (6.66) can be bounded easily by 0.0122x2/3 log x.
Hence, (6.66) is at most

1177.76x2/3 log x + 0.0006406x2/3 (log x)2 .


128 CHAPTER 6. MINOR-ARC TOTALS

If |δ| > 1/2c2 , then we know that |δq| > min(y/2c2 , 8y) = y/2c2 . Thus (4.52)
(with  = 0.01) is at most


2 c0 c1 UV
U V log √
π e
√ √ x
+1
!
e2 U V
 
2.02 c0 c1 x y/2c2 1
+ +1 ( 3.02 − 1) log √ + log U V log x
π y/2c2 2 2 y/2c2

   
3c1 1 3.03 20c0
+ + log x + (2c2 )3/2 x log x.
2 2 0.16 3π 2

Again by (6.50), and in much the same way as before, this simplifies to

≤ (1144.66 + 15.107 + 68.523)x2/3 log x + 29.136x1/2 (log x)2


≤ 1228.85x2/3 (log x).

Hence, in total and for any |δ|,

|SI,2 | ≤ 2.02017x2/3 log x + 1228.85x2/3 (log x) + 0.0006406x2/3 (log x)2


≤ 1230.9x2/3 (log x) + 0.0006406x2/3 (log x)2 .

Now we must estimate SII . As we said before, either (a) q > y, or both (b1)
|δ| > 8 and (b2) |δ|q > 8y. Recall that θ = 4. In case (a), we have q > x1/3 /6 =
V /2 > V /2θ; thus, we can use (6.38), and obtain that, if q ≤ x/8U , |SII | is at most

p s  
x z(q) x log x/(2U q) x
√ log + log 2q log κ6 log + 2κ7
2q U · 8q log 4 U · 8q
s !

r 
x  log x/4U x p x
+ 2κ2 z 1 + 1.15 √ + (κ2 log x/U + κ9 ) √
8U log 4 U V
κ2   x
+ (log 8y)3/2 − (log 2y)3/2 √
6 y
 
p 2 x
+ κ2 8 log x/U + ((log x/U )3/2 − (log V )3/2 ) √ ,
3 8U
(6.67)
where z is as in (C.19). (We are already simplifying the third line; the bound given
is justified by a derivative test.) It is easy to check that q → (log 2q)(log log q)/q is
decreasing for q ≥ y (indeed for q ≥ 9), and so the first line of (6.67) is maximal for
q = y.
6.3. ADJUSTING PARAMETERS. CALCULATIONS. 129

We can thus bound (6.67) by x5/6 times


s
t
− log 2c  κ6
   
t t 
3z(et/3 /6) − log 8c + − log 3 log 3 t − 4.214
3 3 log 4 3
√ s   s 
2
e2t/3  t − log 24c

2κ2
+ √ z 1 + 1.15 3 
6c 48c log 4
!

r
2t
+ κ2 − log 6c + κ9 3
3
 3  3 !
κ2 t 8 2 t 2 2
+√ + log − + log
6 3 6 3 6
s     32   32 !!
κ2 2t 2 2t t
+√ 8 − log 6c + − log 6c − − log 3
48c 3 3 3 3
√ (6.68)
where t = log x and c = 500/ 6. Asymptotically, the largest term in (6.67) comes
from the last line (of order t3/2 ), even if the first line is larger in practice (while being
of order at most t log t). Let us bound (6.68) by a multiple of t3/2 .
First of all, notice that
 0
t 2.50637
 t/3  γ
  t/3 
z e e log 3 − log 6 + log( 3 −log 6)
t z e
d 6 6
= −
dt log t log t t(log t)2 (6.69)
γ 2.50637 γ 2.50637
e − log2 t −log 6 e + log2 t −log 6 log t − log 6
(3 ) (3 ) 3
= − · ,
(t − 3 log 6) log t t log t log t
which, for t ≥ 100, is
2·2.50637 log t
eγ log 3 − log2 ( 3t −log 6) 1.95671 − 8.92482
log t
> ≥ > 0.
t(log t)2 t(log t)2
Similarly, for t ≥ 2000,
 2t/3 
z e eγ log 32 − log2.50637 log t
2 2t −log 48c −
2.50637
d 48c (3 ) log( 2t
3 −log 48c)
>
dt log t t(log t)2
0.72216 − 5.45234
log t
≥ > 0.
t(log t)2
Thus,
 
es/3
t/3 z
 
e 6
z ≤ (log t) · lim = eγ log t for t ≥ 100,
6 s→∞ log s
 2s/3  (6.70)
e
2t/3 z
 
e 48c
z ≤ (log t) · lim = eγ log t for t ≥ 2000.
48c s→∞ log s
130 CHAPTER 6. MINOR-ARC TOTALS


Also note that, since (x3/2 )0 = (3/2) x,
 32  32 ! r

 
t 8 t 2 3 t 8
+ log − + log ≤ + log · log 4 ≤ 1.20083 t.
3 6 3 6 2 3 6

for t ≥ 2000. We also have


  32   32   23   32
2t t 2t t
− log 6c − − log 3 < − log 9 − − log 3
3 3 3 3
 23
t3/2

3/2 t
= (2 − 1) − log 3 < (23/2 − 1) 3/2 ≤ 0.35189t3/2 .
3 3

Of course,
t
− log 2c
   
t t 3 t t t t
− log 8c + − log 3 log < + log < log t.
3 3 log 4 3 3 3 3

We conclude that, for t ≥ 2000, (6.68) is at most



√
r
t κ6 2κ2 p γ 
3 · eγ log t · log t · t + √ e log t 1 + 0.79749 t
3 3 6c
! !

r

r
2 1/2 κ2 κ2 16t 2 3/2
+ κ2 t + κ9 3 + √ · 1.2009 t + √ + · 0.35189t
3 6 48c 3 3
≤ (0.10181 + 0.00012 + 0.00145 + 0.000048 + 0.00462)t3/2 ≤ 0.10848t3/2 .

On the remaining interval log(2.16 · 1020 ) ≤ t ≤ log 2000, we use interval arith-
metic (as in §2.6, with 30 iterations) to bound the ratio of (6.68) to t3/2 . We obtain that
it is at most
0.275964t3/2 .

Hence, for all x ≥ 2.16 · 1020 ,

|SII | ≤ 0.275964x5/6 (log x)3/2 . (6.71)

in the case y < q ≤ x/8U .


√ √
If x/8U < q ≤ Q, we use√(6.39). In this range, x/2 2q + qx adopts its max-

imum at q = Q (because x/2 2q for q = x/8U is smaller than qx for q = Q, by
(6.65) and (6.50)). Hence, (6.39) is at most x5/6 times
s  !

 r
2 0
2 1
κ2 2 t − log c + κ9 3 + κ2 t − log c0 · √
3 3 c0
 23   32 ! √ !
0

2κ2 2 t c 1
+ t − log c0 − − log 3 √ e−t/6 + √ ,
3 3 3 2 2 c0
6.3. ADJUSTING PARAMETERS. CALCULATIONS. 131

where t = log x (as before) and c0 = 500 6. This is at most
√ !
√ √
r
κ2 2√ 2κ2 23/2 − 1 3 c0 −t/6 1
(2κ2 + 3κ9 ) t + √ t+ t2 √ e +√
c0 3 3 33/2 2 2 c0
≤ 0.10327

for t ≥ log 2.16 · 1020 , and so

|SII | ≤ 0.10327x5/6 (log x)3/2 ,

for x/8U < q ≤ Q, using the assumption x ≥ 2.16 · 1020 .


Finally, let us treat case (b), that is, |δ| > 8 and |δ|q > 8y; we can also assume
q ≤ y, as otherwise we are in case (a), which has already been treated. Since |δ/x| ≤
1/qQ, we know that

x √ x2/3 x x
|δq| ≤ = U = 500 6x1/3 ≤ √ = = ,
Q 2000 6 4U θU

again under assumption (6.50). We apply (6.41), and obtain that |SII | is at most
p s  
2x z(y) x log x/3U y x
√ log + log 3y log κ6 log + 2κ7
8y U · 4 · 8y log 32/3 U · 4 · 8y
 
2κ2 x 3 3 x/4 3 3
+ √ ((log 32y) 2 − (log 2y) 2 ) + √ ((log 4U ) 2 − (log 2y) 2 )
3 16y Q−y
!
κ2  p p  x
+ p log V + 1/ log V + κ9 √
2(1 − y/Q) V
p p x
+ κ2 z(y) · log 4U · √ ,
U
(6.72)
where we are using the facts that (log 3t/8)/t is increasing for t ≥ 8y > 8e/3 and that

d (log t)3/2 − (log V )3/2 3(log t)1/2 − ((log t)3/2 − (log V )3/2 )
√ =
dt t 2t3/2
t

log e3 · log t − (log V )3/2
=− <0
2t3/2
for t ≥ θ · 8y = 16V , thanks to
2
e3
  
16V 3
log 3 log 16V > (log V ) + log 16 − 2 log (log V )2
e 16
2 !
e3

16
+ log 3 − 2 log log 16 log V > (log V )3
e 16
132 CHAPTER 6. MINOR-ARC TOTALS

(valid for log V ≥ 1). Much as before, we can rewrite (6.72) as x5/6 times
s
t
p
2 z(et/3 /6) t − log 3c
 
t
p − log 32c + − log 2 log 3
8/6 3 3 log 32/3
s   r  3   32 !
t 2κ2 3 t 32 2 t
· κ6 − log 32c + 2κ7 + + log − − log 3
3 3 8 3 6 3
 3/2  3/2 !
2κ2 1/4 t t
+ q + log 24c − − log 3
3 et/3 1 3 3
6c − 6
√ !
κ2 3 p 1 √
+q t/3 − log 3 + p + κ9 3
c t/3 − log 3

2 1 − et/3
r
t/3 + log 24c
q
+ κ2 z(et/3 /6) ,
6c
√ (6.73)
where t = log x and c = 500/ 6. For t ≥ 100, we use (6.70) to bound z(et/3 /6),
and we obtain that (6.73) is at most
√ r r 1/2
2 eγ 1 κ6

2κ2 3 1 t 32
p · · (log t)t + · + log · log 16
8/6 3 3 3 8 2 3 6
 1/2
2κ2 1/4 1 t
+ q · + log 24c · log 72c
3 e100/3 1 2 3
6c − 6
√ ! r
κ2 3 p 1 √ p t/3 + log 24c
+q t/3 + p + κ9 3 + κ2 e log tγ ,
2 1− c

t/3 6c
e100/3
(6.74)
where we have bounded expressions of the form a3/2 −b3/2 (a > b) by (a1/2 /2)·(a−b).
The ratio of (6.74) to t3/2 is clearly a decreasing function of t. For t = 200, this ratio
is 0.23747 . . . ; hence, (6.74) (and thus (6.73)) is at most 0.23748t3/2 for t ≥ 200.
On the range log(2.16 · 1020 ) ≤ t ≤ 200, the bisection method (with 25 iterations)
gives that the ratio of (6.73) to t3/2 is at most 0.23511.
We conclude that, when |δ| > 8 and |δ|q > 8y,
|SII | ≤ 0.23511x5/6 (log x)3/2 .
Thus (6.71) gives the worst case.
We now take totals, and obtain
Sη (x, α) ≤ |SI,1 | + |SI,2 | + |SII |
≤ (2.4719 + 1230.9)x2/3 log x + (0.00289 + 0.0006406)x2/3 (log x)2
+ 0.275964x5/6 (log x)3/2
≤ 0.27598x5/6 (log x)3/2 + 1233.38x2/3 log x,
(6.75)
6.4. CONCLUSION 133

where we use (6.50) yet again.

6.4 Conclusion
Proof of Theorem 3.1.1. We have shown that |Sη (α, x)| is at most (6.63) for q ≤
x1/3 /6 and at most (6.75) for q > x1/3 /6. It remains to simplify (6.63) slightly.
By the geometric mean/arithmetic mean inequality,
r
log 4δ0 q p
Cx,δ0 q (log δ0 q + 0.002) + 0.30214 log δ0 q + 0.67506 (6.76)
2
is at most
  √
1 log 4δ0 q ρ
√ Cx,δ0 q (log δ0 q + 0.002) + + (0.30214 log δ0 q + 0.67506)
2 ρ 2 2
for any ρ > 0. We recall that
!
log 4t
Cx,t = log 1 + 9x1/3
.
2 log 2.004t

Let
Cx1 ,2q0 (log 2q0 + 0.002) + log28q0
ρ= = 3.397962 . . . ,
0.30214 log 2q0 + 0.67506
where x1 = 1025 , q0 = 2 · 105 . (In other words, we are optimizing matters for x = x1 ,
δ0 q = 2q0 ; the losses in nearby ranges will be very slight.) We obtain that (6.76) is at
most

ρ · 0.30214
 
Cx,δ0 q 1
√ (log δ0 q + 0.002) + √ + log δ0 q
2 ρ 4 ρ 2
 √ 
1 log 2 ρ (6.77)
+ √ + · 0.67506
2 ρ 2
≤ 0.27125Cx,t (log δ0 q + 0.002) + 0.4141 log δ0 q + 0.49911.

Now, for x ≥ x0 = 2.16 · 1020 ,


!
Cx,t Cx0 ,t 1 log 4t
≤ = log 1 + 6 ≤ 0.08659
log t log t log t 2 log 54·10
2.004t

for 8 ≤ t ≤ 106 (by the bisection method, with 20 iterations), and


 
Cx,t C(6t)3 ,t 1 log 4t
≤ ≤ log 1 + 9·6 ≤ 0.08659.
log t log t log t 2 log 2.004

if 106 < t ≤ x1/3 /6. Hence

0.27125 · Cx,δ0 q · 0.002 ≤ 0.000047 log δ0 q.


134 CHAPTER 6. MINOR-ARC TOTALS

We conclude that, for q ≤ x1/3 /6,

Rx,δ0 q log δ0 q + 0.49911 2.492x


|Sη (α, x)| ≤ p ·x+ √
φ(q)δ0 qδ0
 
2x 13 2x
+ log δ0 q + 7.82 + (13.66 log δ0 q + 37.55) + 3.36x5/6 ,
δ0 φ(q) 4 δ0 q

where !
log 4t
Rx,t = 0.27125 log 1 + 9x1/3
+ 0.41415.
2 log 2.004t
Part II

Major arcs

135
Chapter 7

Major arcs: overview and


results

Our task, as in Part I, will be to estimate


X
Sη (α, x) = Λ(n)e(αn)η(n/x), (7.1)
n

where η : R+ → C us a smooth function, Λ is the von Mangoldt function and e(t) =


e2πit . Here, we will treat the case of α lying on the major arcs.
We will see how we can obtain good estimates by using smooth functions η based
2
on the Gaussian e−t /2 . This will involve proving new, fully explicit bounds for the
Mellin transform of the twisted Gaussian, or, what is the same, bounds on parabolic
cylindrical functions in certain ranges. It will also require explicit formulae that are
general and strong enough, even for moderate values of x.
Let α = a/q + δ/x. For us, saying that α lies on a major arc will be the same as
saying that q and δ are bounded; more precisely, q will be bounded by a constant r and
|δ| will be bounded by a constant times r/q. As is customary on the major arcs, we
will express our exponential sum (3.1) as a linear combination of twisted sums

X
Sη,χ (δ/x, x) = Λ(n)χ(n)e(δn/x)η(n/x), (7.2)
n=1

for χ : Z → C a Dirichlet character mod q, i.e., a multiplicative character on (Z/qZ)∗


lifted to Z. (The advantage here is that the phase term is now e(δn/x) rather than
e(αn), and e(δn/x) varies very slowly as n grows.) Our task, then, is to estimate
Sη,χ (δ/x, x) for δ small.
P Estimates on Sη,χ (δ/x, x) rely on the properties of Dirichlet L-functions L(s, χ) =
−s
n χ(n)n . What is crucial is the location of the zeroes of L(s, χ) in the critical strip
0 ≤ <(s) ≤ 1 (a region in which L(s, χ) can be defined by analytic continuation). In
contrast to most previous work, we will not use zero-free regions, which are too narrow
for our purposes. Rather, we use a verification of the Generalized Riemann Hypothesis
up to bounded height for all conductors q ≤ 300000 (due to D. Platt [Plab]).

137
138 CHAPTER 7. MAJOR ARCS: OVERVIEW AND RESULTS

A key feature of the present work is that it allows one to mimic a wide variety
of smoothing functions by means of estimates on the Mellin transform of a single
2
smoothing function – here, the Gaussian e−t /2 .

7.1 Results
2
Write η♥ (t) = e−t /2 . Let us first give a bound for exponential sums on the primes
using η♥ as the smooth weight. Without loss of generality, we may assume that our
character χ mod q is primitive, i.e., that it is not really a character to a smaller modulus
q 0 |q.

Theorem 7.1.1. Let x be a real number ≥ 108 . Let χ be a primitive Dirichlet character
mod q, 1 ≤ q ≤ r, where r = 300000.
Then, for any δ ∈ R with |δ| ≤ 4r/q,
∞  
X δ (n/x)2
Λ(n)χ(n)e n e− 2 = Iq=1 · ηc
♥ (−δ) · x + E · x,
n=1
x

where Iq=1 = 1 if q = 1, Iq=1 = 0 if q 6= 1, and


 
−22 1 650400
|E| ≤ 4.306 · 10 +√ √ + 112 .
x q
R∞
We normalize the Fourier transform fb as follows: fb(t) = −∞ e(−xt)f (x)dx. Of
√ 2 2
course, ηc♥ (−δ) is just 2πe−2π δ .
As it turns out, smooth weights based on the Gaussian are often better in applica-
tions than the Gaussian η♥ itself. Let us give a bound based on η(t) = t2 η♥ (t).
2
Theorem 7.1.2. Let η(t) = t2 e−t /2 . Let x be a real number ≥ 108 . Let χ be a
primitive character mod q, 1 ≤ q ≤ r, where r = 300000.
Then, for any δ ∈ R with |δ| ≤ 4r/q,
∞  
X δ
Λ(n)χ(n)e n η(n/x) = Iq=1 · ηb(−δ) · x + E · x,
n=1
x

where Iq=1 = 1 if q = 1, Iq=1 = 0 if q 6= 1, and


 
−19 1 281200
|E| ≤ 2.485 · 10 +√ √ + 56 .
x q

The advantage of η(t) = t2 η♥ (t) over η♥ is that it vanishes at the origin (to second
order); as we shall see, this makes it is easier to estimate exponential sums with the
smoothing η ∗M g, where ∗M is a Mellin convolution and g is nearly arbitrary. Here is
a good example that is used, crucially, in Part III.
7.1. RESULTS 139

2
Corollary 7.1.3. Let η(t) = t2 e−t /2 ∗M η2 (t), where η2 = η1 ∗M η1 and η1 =
2 · I[1/2,1] . Let x be a real number ≥ 108 . Let χ be a primitive character mod q,
1 ≤ q ≤ r, where r = 300000.
Then, for any δ ∈ R with |δ| ≤ 4r/q,
∞  
X δ
Λ(n)χ(n)e n η(n/x) = Iq=1 · ηb(−δ) · x + E · x,
n=1
x

where Iq=1 = 1 if q = 1, Iq=1 = 0 if q 6= 1, and


 
1 381500
|E| ≤ 2.485 · 10−19 + √ √ + 76 .
x q

Let us now look at a different kind of modification of the Gaussian smoothing. Say
we would like a weight of a specific shape; for example, what we will need to do in
Part III, we would like an approximation to the function
( 2
t3 (2 − t)3 e−(t−1) /2 for t ∈ [0, 2],
η◦ : t 7→ (7.3)
0 otherwise.

At the same time, what we have is an estimate for the Mellin transform of the Gaussian
2
e−t /2 , centered at t = 0.
The route taken here is to work with an approximation η+ to η◦ . We let
2
η+ (t) = hH (t) · te−t /2
, (7.4)

where hH is a band-limited approximation to


(
t2 (2 − t)3 et−1/2 if t ∈ [0, 2],
h(t) = (7.5)
0 otherwise.

By band-limited we mean that the restriction of the Mellin transform of hH to the


imaginary axis is of compact support. (We could, alternatively, let hH be a function
whose Fourier transform is of compact support; this would be technically easier in
some ways, but it would also lead to using GRH verifications less efficiently.)
To be precise: we define

sin(H log y)
FH (t) = ,
π log y
Z ∞ (7.6)
−1 dy
hH (t) = (h ∗M FH )(y) = h(ty )FH (y)
0 y

and H is a positive constant. It is easy to check that M FH (iτ ) = 1 for −H <


τ < H and M FH (iτ ) = 0 for τ > H or τ < −H (unsurprisingly, since FH is a
Dirichlet kernel under a change of variables). Since, in general, the Mellin transform of
a multiplicative convolution f ∗M g equals M f · M g, we see that the Mellin transform
140 CHAPTER 7. MAJOR ARCS: OVERVIEW AND RESULTS

of hH , on the imaginary axis, equals the truncation of the Mellin transform of h to


[−iH, iH]. Thus, hH is a band-limited approximation to h, as we desired.
The distinction between the odd and the even case in the statement that follows
simply reflects the two different points up to which computations where carried out in
[Plab]; these computations were, in turn, to some extent tailored to the needs of the
present work (as was the shape of η+ itself).
2
Theorem 7.1.4. Let η(t) = η+ (t) = hH (t)te−t /2 , where hH is as in (7.6) and
H = 200. Let x be a real number ≥ 1012 . Let χ be a primitive character mod q, where
1 ≤ q ≤ 150000 if q is odd, and 1 ≤ q ≤ 300000 if q is even.
Then, for any δ ∈ R with |δ| ≤ 600000 · gcd(q, 2)/q,
∞  
X δ
Λ(n)χ(n)e n η(n/x) = Iq=1 · ηb(−δ) · x + E · x,
n=1
x

where Iq=1 = 1 if q = 1, Iq=1 = 0 if q 6= 1, and

1.617 · 10−10
 
−14 1 499900
|E| ≤ 1.3482 · 10 + +√ √ + 52 .
q x q
If q = 1, we have the sharper bound
251400
|E| ≤ 4.772 · 10−11 + √ .
x
This is a paradigmatic example, in that, following the proof given in §9.4, we can
2
bound exponential sums with weights of the form hH (t)e−t /2 , where hH is a band-
limited approximation to just about any continuous function of our choosing.
Lastly, we will need an explicit estimate of the `2 norm corresponding to the sum
in Thm. 7.1.4, for the trivial character.
2
Proposition 7.1.5. Let η(t) = η+ (t) = hH (t)te−t /2 , where hH is as in (7.6) and
H = 200. Let x be a real number ≥ 1012 .
Then
X∞ Z ∞
2 2
Λ(n)(log n)η (n/x) = x · η+ (t) log xt dt + E1 · x log x
n=1 0

= 0.640206x log x − 0.021095x + E2 · x log x,


where
366.91 366.91
|E1 | ≤ 5.123 · 10−15 + √ |E2 | ≤ 2 · 10−6 + √ .
x x

7.2 Main ideas


An explicit formula gives an expression
X
Sη,χ (δ/x, x) = Iq=1 ηb(−δ)x − Fδ (ρ)xρ + small error, (7.7)
ρ
7.2. MAIN IDEAS 141

where Iq=1 = 1 if q = 1 and Iq=1 = 0 otherwise. Here ρ runs over the complex
numbers ρ with L(ρ, χ) = 0 and 0 < <(ρ) < 1 (“non-trivial zeros”). The function Fδ
is the Mellin transform of e(δt)η(t) (see §2.4).
The questions are then: where are the non-trivial zeros ρ of L(s, χ)? How fast does
Fδ (ρ) decay as =(ρ) → ±∞?
Write σ = <(s), τ = =(s). The belief is, of course, that σ = 1/2 for every non-
trivial zero (Generalized Riemann Hypothesis), but this is far from proven. Most work
to date has used zero-free regions of the form σ ≤ 1 − 1/C log q|τ |, C a constant. This
is a classical zero-free region, going back, qualitatively, to de la Vallée-Poussin (1899).
The best values of C known are due to McCurley [McC84a] and Kadiri [Kad05].
These regions seem too narrow to yield a proof of the three-primes theorem. What
we will use instead is a finite verification of GRH “up to Tq ”, i.e., a computation show-
ing that, for every Dirichlet character of conductor q ≤ r0 (r0 a constant, as above),
every non-trivial zero ρ = σ + iτ with |τ | ≤ Tq satisfies <(σ) = 1/2. Such verifica-
tions go back to Riemann; modern computer-based methods are descended in part from
a paper by Turing [Tur53]. (See the historical article [Boo06b].) In his thesis [Pla11],
D. Platt gave a rigorous verification for r0 = 105 , Tq = 108 /q. In coordination with
the present work, he has extended this to
• all odd q ≤ 3 · 105 , with Tq = 108 /q,
• all even q ≤ 4 · 105 , with Tq = max(108 /q, 200 + 7.5 · 107 /q).
This was a major computational effort, involving, in particular, a fast implementation
of interval arithmetic (used for the sake of rigor).
What remains to discuss, then, is how to choose η in such a way Fδ (ρ) decreases
fast enough as |τ | increases, so that (7.7) gives a good estimate. We cannot hope for
Fδ (ρ) to start decreasing consistently before |τ | is at least as large as a constant times
|δ|. Since δ varies within (−cr0 /q, cr0 /q), this explains why Tq is taken inversely
proportional to q in the above. As we will work with r0 ≥ 150000, we also see that we
have little margin for maneuver: we want Fδ (ρ) to be extremely small already for, say,
|τ | ≥ 80|δ|. We also have a Scylla-and-Charybdis situation, courtesy of the uncertainty
principle: roughly speaking, Fδ (ρ) cannot decrease faster than exponentially on |τ |/|δ|
both for |δ| ≤ 1 and for δ large.
The most delicate case is that of δ large, since then |τ |/|δ| is small. It turns out
we can manage to get decay that is much faster than exponential for δ large, while no
slower than exponential for δ small. This we will achieve by working with smoothing
2
functions based on the (one-sided) Gaussian η♥ (t) = e−t /2 .
2
The Mellin transform of the twisted Gaussian e(δt)e−t /2 is a parabolic cylinder
function U (a, z) with z purely imaginary. Since fully explicit estimates for U (a, z),
z imaginary, have not been worked in the literature, we will have to derive them our-
selves.
Once we have fully explicit estimates for the Mellin transform of the twisted Gaus-
sian, we are able to use essentially any smoothing function based on the Gaussian
2
η♥ (t) = e−t /2 . As we already saw, we can and will consider smoothing functions
obtained by convolving the twisted Gaussian with another function and also functions
obtained by multiplying the twisted Gaussian with another function. All we need to
142 CHAPTER 7. MAJOR ARCS: OVERVIEW AND RESULTS

do is use an explicit formula of the right kind – that is, a formula that does not as-
sume too much about the smoothing function or the region of holomorphy of its Mellin
transform, but still gives very good error terms, with simple expressions.
All results here will be based on a single, general explicit formula (Lem. 9.1.1) valid
for all our purposes. The contribution of the zeros in the critical trip can be handled in
a unified way (Lemmas 9.1.3 and 9.1.4). All that has to be done for each smoothing
function is to bound a simple integral (in (9.24)). We then apply a finite verification of
GRH and are done.
Chapter 8

The Mellin transform of the


twisted Gaussian

Our aim in this chapter is to give fully explicit, yet relatively simple bounds for the
2
Mellin transform Fδ (ρ) of e(δt)η♥ (t), where η♥ (t) = e−t /2 and δ is arbitrary. The
rapid decay that results will establish that the Gaussian η♥ is a very good choice for a
smoothing, particularly when the smoothing has to be twisted by an additive character
e(δt).
The Gaussian smoothing has been used before in number theory; see, notably,
Heath-Brown’s well-known paper on the fourth power moment of the Riemann zeta
function [HB79]. What is new here is that we will derive fully explicit bounds on the
Mellin transform of the twisted Gaussian. This means that the Gaussian smoothing will
be a real option in explicit work on exponential sums in number theory and elsewhere
from now on.1

2
Theorem 8.0.1. Let fδ (t) = e−t /2 e(δt), δ ∈ R. Let Fδ be the Mellin transform of fδ .
Let s = σ + iτ , σ ≥ 0, τ 6= 0. Let ` = −2πδ. Then, if sgn(δ) 6= sgn(τ ) and δ 6= 0,

(
π
−E(ρ)τ c1,σ,τ /τ σ/2 for ρ arbitrary,
|Fδ (s)| ≤ |Γ(s)|e 2τ e · (8.1)
c2,σ,τ /`σ for ρ ≤ 3/2.

1 There has also been work using the Gaussian after a logarithmic change of variables; see, in particular,

[Leh66]. In that case, the Mellin transform is simply a Gaussian (as in, e.g., [MV07, Ex. XII.2.9]). However,
2
for δ non-zero, the Mellin transform of a twist e(δt)e−(log t) /2 decays very slowly, and thus would not be
useful for our purposes, or, in general, for most applications in which GRH is not assumed.

143
144 CHAPTER 8. THE MELLIN TRANSFORM OF THE TWISTED GAUSSIAN

where ρ = 4τ /`2 ,
 
1 1 2(υ(ρ) − 1)
E(ρ) = arccos − ,
2 υ(ρ) ρ
 !σ/2 √  
2−1
− τ
1 1 2 e 2
c1,σ,τ = 1 + 2 4 + σ

1 + sin2 π8 tan π8

2 (8.2)
  q  
1 1
sec 2π5 e − τ6
c2,σ,τ = 1 + min 2σ+ 2 , σ  + √ .
2 sin π5 (1/ 3)σ

and s p
1+ ρ2 + 1
υ(ρ) = .
2
If sgn(δ) = sgn(τ ) or δ = 0,
 
1 2 π π  π 1
|Fδ (s)| ≤ |x0 |−σ · e− 2 ` |Γ(s)|e 2 |τ | · 1+ e− 4 |τ | + e−π|τ | , (8.3)
23/2 2

where ( √
0.51729 τ for ρ arbitrary,
|x0 | ≥ (8.4)
0.84473 |τ
|`|
|
for ρ ≤ 3/2.
2
As we shall see, the choice of smoothing function η(t) = e−t /2 can be easily
motivated by the method of stationary phase, but the problem is actually solved by the
saddle-point method. One of the challenges here is to keep all expressions explicit and
practical.
(In particular, the more critical estimate, (8.1), is optimal up to a constant depending
on σ; the constants we give will be good rather than optimal.)
The expressions in Thm. 8.0.1 can be easily simplified further, especially if one is
ready to introduce some mild constraints and make some sacrifices in the main term.
2
Corollary 8.0.2. Let fδ (t) = e−t /2 e(δt), δ ∈ R. Let Fδ be the Mellin transform of
fδ . Let s = σ + iτ , where σ ∈ [0, 1] and |τ | ≥ 20. Then, for 0 ≤ k ≤ 2,
  k 2|τ | 2
κk,0 |τ |
e−0.1065( |`| ) if 4|τ |/`2 < 3/2.

|Fδ (s + k)| + |Fδ ((1 − s) + k)| ≤ |`|
κ |τ |k/2 e−0.1598|τ | if 4|τ |/`2 ≥ 3/2.
k,1

where
κ0,0 ≤ 3.001, κ1,0 ≤ 4.903, κ2,0 ≤ 7.96,
κ0,1 ≤ 3.286, κ1,1 ≤ 4.017, κ2,1 ≤ 5.13.
We are considering Fδ (s + k), and not just Fδ (s), because bounding Fδ (s + k)
2
enables us to work with smoothing functions equal to or based on tk e−t /2 . Clearly,
we can easily derive bounds with k arbitrary from Thm. 8.0.1. It is just that we will
8.1. HOW TO CHOOSE A SMOOTHING FUNCTION? 145

use k = 0, 1, 2 in practice. Corollary 8.0.2 is meant to be applied to cases where τ


is larger than a constant (10, say) times |`|, and σ cannot be bounded away from 1; if
either condition fails to hold, it is better to apply Theorem 8.0.1 directly.
Let us end by a remark that may be relevant to applications outside number theory.
By (8.9), Thm. 8.0.1 gives us bounds on the parabolic cylinder function U (a, z) for z
purely imaginary. (Surprisingly, there seem to have been no fully explicit bounds for
this case in the literature.) The bounds are useful when |=(a)| is at least somewhat
larger than |=(z)| (i.e., when |τ | is large compared to `). While the Thm. 8.0.1 is stated
for σ ≥ 0 (i.e., for <(a) ≥ −1/2), extending the result to larger half-planes for a is
not hard.

8.1 How to choose a smoothing function?


Let us motivate our choice of smoothing function η. The method of stationary phase
([Olv74, §4.11], [Won01, §II.3])) suggests that the main contribution to the integral
Z ∞
dt
Fδ (t) = e(δt)η(t)ts (8.5)
0 t
should come when the phase has derivative 0. The phase part of (8.5) is

e(δt)t=(s)i = e(2πδt+τ log t)i

(where we write s = σ + iτ ); clearly,


τ
(2πδt + τ log t)0 = 2πδ + =0
t
when t = −τ /2πδ. This is meaningful when t ≥ 0, i.e., sgn(τ ) 6= sgn(δ). The
contribution of t = −τ /2πδ to (8.5) is then
   σ+iτ −1
s−1 −τ −iτ −τ
η(t)e(δt)t =η e (8.6)
2πδ 2πδ

multiplied by a “width” approximately equal to a constant divided by


p p 2π|δ|
|(2πiδt + τ log t)00 | = | − τ /t2 | = p .
|τ |

The absolute value of (8.6) is


 τ  −τ σ−1
η − · . (8.7)
2πδ 2πδ

In other words, if sgn(τ ) 6= sgn(δ) and δ is not too small, asking that Fδ (σ + iτ )
decay rapidly as |τ | → ∞ amounts to asking that η(t) decay rapidly as t → 0. Thus,
if we ask for Fδ (σ + iτ ) to decay rapidly as |τ | → ∞ for all moderate δ, we are
requesting that
146 CHAPTER 8. THE MELLIN TRANSFORM OF THE TWISTED GAUSSIAN

1. η(t) decay rapidly as t → ∞,


2. the Mellin transform F0 (σ + iτ ) decay rapidly as τ → ±∞.
Requirement (2) is there because we also need to consider Fδ (σ + it) for δ very small,
and, in particular, for δ = 0.
There is clearly an uncertainty-principle issue here; one cannot do arbitrarily well
in both aspects at the same time. Once we are conscious of this, the choice η(t) = e−t
in Hardy-Littlewood actually looks fairly good: obviously, η(t) = e−t decays expo-
nentially, and its Mellin transform Γ(s + iτ ) also decays exponentially as τ → ±∞.
Moreover, for this choice of η, the Mellin transform Fδ (s) can be written explicitly:
Fδ (s) = Γ(s)/(1 − 2πiδ)s .
It is not hard to work out an explicit formula2 for η(t) = e−t . However, it is not
hard to see that, for Fδ (s) as above, Fδ (1/2 + it) decays like e−t/2π|δ| , just as we
expected from (8.7). This is a little too slow for our purposes: we will often have
to work with relatively large δ, and we would like to have to check the zeroes of L
functions only up to relatively low heights t – say, up to 50|δ|. Then e−t/2π|δ| >
e−8 = 0.00033 . . . , which is not very small. We will settle for a different choice of η:
the Gaussian.
2
The decay of the Gaussian smoothing function η(t) = e−t /2 is much faster than
exponential. Its Mellin transform is Γ(s/2), which decays exponentially as =(s) →
±∞. Moreover, the Mellin transform Fδ (s) (δ 6= 0), while not an elementary or
very commonly occurring function, equals (after a change of variables) a relatively
well-studied special function, namely, a parabolic cylinder function U (a, z) (or, in
Whittaker’s [Whi03] notation, D−a−1/2 (z)).
For δ not too small, the main term will indeed work out to be proportional to
2
e−(τ /2πδ) /2 , as the method of stationary phase indicated. This is, of course, much
better than e−τ /2π|δ| . The “cost” is that the Mellin transform Γ(s/2) for δ = 0 now
decays like e−(π/4)|τ | rather than e−(π/2)|τ | . This we can certainly afford.

8.2 The twisted Gaussian: overview and setup


8.2.1 Relation to the existing literature
We wish to approximate the Mellin transform
Z ∞
2 dt
Fδ (s) = e−t /2 e(δt)ts , (8.8)
0 t
where δ ∈ R. The parabolic cylinder function U : C2 → C is given by
2 Z ∞
e−z /4 1 1 2
U (a, z) = 1
 ta− 2 e− 2 t −zt dt
Γ 2 +a 0
2 There may be a minor gap in the literature in this respect. The explicit formula given in [HL22, Lemma

4] does not make all constants explicit. The constants and trivial-zero terms were fully worked out for
q = 1 by [Wig20] (cited in [MV07, Exercise 12.1.1.8(c)]; the sign of hypκ,q (z) there seems to be off). As
was pointed out by Landau (see [Har66, p. 628]), [HL22] seems to neglect the effect of the zeros ρ with
<(ρ) = 0, =(ρ) 6= 0 for χ non-primitive. (The author thanks R. C. Vaughan for this information and the
references.)
8.2. THE TWISTED GAUSSIAN: OVERVIEW AND SETUP 147

for <(a) > −1/2; the function can be extended to all a, z ∈ C either by analytic
continuation or by other integral representations ([AS64, §19.5], [Tem10, §12.5(i)]).
Hence  
2 1
Fδ (s) = e(πiδ) Γ(s)U s − , −2πiδ . (8.9)
2
The second argument of U is purely imaginary; it would be otherwise if a Gaussian of
non-zero mean were chosen.
Let us briefly discuss the state of knowledge up to date on Mellin transforms of
2
“twisted” Gaussian smoothings, that is, e−t /2 multiplied by an additive character
e(δt). As we have just seen, these Mellin transforms are precisely the parabolic cylin-
der functions U (a, z).
The function U (a, z) has been well-studied for a and z real; see, e.g., [Tem10].
Less attention has been paid to the more general case of a and z complex. The most
notable exception is by far the work of Olver [Olv58], [Olv59], [Olv61], [Olv65]; he
gave asymptotic series for U (a, z), a, z ∈ C. These were asymptotic series in the sense
of Poincaré, and thus not in general convergent; they would solve our problem if and
only if they came with error term bounds. Unfortunately, it would seem that all fully
explicit error terms in the literature are either for a and z real, or for a and z outside
our range of interest (see both Olver’s work and [TV03].) The bounds in [Olv61]
involve non-explicit constants. Thus, we will have to find expressions with explicit
error bounds ourselves. Our case is that of a in the critical strip, z purely imaginary.

8.2.2 General approach


We will use the saddle-point method (see, e.g., [dB81, §5], [Olv74, §4.7], [Won01,
§II.4]) to obtain bounds with an optimal leading-order term and small error terms. (We
used the stationary-phase method solely as an exploratory tool.)
What do we expect to obtain? Both the asymptotic expressions in [Olv59] and the
bounds in [Olv61] make clear that, if the sign of τ = =(s) is different from that of δ,
there will a change in behavior when τ gets to be of size about (2πδ)2 . This is unsur-
prising, given our discussion using stationary phase: for |=(a)| smaller than a constant
times |=(z)|2 , the term proportional to e−(π/4)|τ | = e−|=(a)|/2 should be dominant,
whereas for |=(a)| much larger than a constant times |=(z)|2 , the term proportional to
2
e− 2 ( 2πδ ) should be dominant.
1 τ

There is one important difference between the approach we will follow here and
that in [Hela]. In [Hela], the integral (8.8) was estimated by a direct application of
the saddle-point method. Here, following a suggestion of N. Temme, we will use the
identity
1 2 Z c+i∞
e4z u2 1
U (a, z) = √ e−zu+ 2 u−a− 2 du (8.10)
2πi c−i∞
(see, e.g., [OLBC10, (12.5.6)]; c > 0 is arbitrary). Together, (8.9) and (8.10) give us
that
2 2
e−2π δ Γ(s) c+i∞ 2πiδu+ u2 −s
Z
Fδ (s) = √ e 2 u du. (8.11)
2πi c−i∞
148 CHAPTER 8. THE MELLIN TRANSFORM OF THE TWISTED GAUSSIAN

Estimating the integral in (8.11) turns out to be a somewhat cleaner task than estimating
(8.8). The overall procedure, however, is in essence the same in both cases.
We write
u2
φ(u) = − − (2πiδ)u + iτ log u (8.12)
2
for u real or complex, so that the integral in (8.11) equals
Z c+i∞
I(s) = e−φ(u) u−σ du. (8.13)
c−i∞

We wish to find a saddle point. A saddle point is a point u at which φ0 (u) = 0.


This means that

− u − 2πiδ + = 0, i.e., u2 − i`u − iτ = 0, (8.14)
u
where ` = −2πδ. The solutions to φ0 (u) = 0 are thus

i` ± −`2 + 4iτ
u0 = . (8.15)
2
The value of φ(u) at u0 is

i`u0 + iτ
φ(u0 ) = − + i`u0 + iτ log u0
2
(8.16)
i` u0
= u0 + iτ log √ .
2 e

The second derivative at u0 is

1 1
φ00 (u0 ) = − u20 + iτ = − 2 (i`u0 + 2iτ ).

2 (8.17)
u0 u0

Assign the names u0,+ , u0,− to the roots in (8.15) according to the sign in front
of the square-root (where the square-root is defined so as to have its argument in the
interval (−π/2, π/2]). We will actually have to pay attention just to u0,+ , since, unlike
u0,− , it lies on the right half of the plane, where our contour of integration also lies.
We remark that
q
i` + |`| −1 + 4iτ
r !
`2 ` 4τ
u0,+ = = i ± −1 + 2 i (8.18)
2 2 `

where
√ √ the sign ± is + if ` > 0 and − if ` < 0. If ` = 0, then u0,+ = (1/ 2 +
i/ 2) τ .
We can assume without loss of generality that τ ≥ 0. We will find it convenient to
assume τ > 0, since we can deal with τ = 0 simply by letting τ → 0+ .
8.3. THE SADDLE POINT 149

8.3 The saddle point

8.3.1 The coordinates of the saddle point

We should start by determining u0,+ explicitly, both in rectangular and polar coordi-
nates. For one thing, we will need to estimate the integrand in (8.13) for u = u0,+ . The
absolute value of the integrand is then e−φ(u0,+ ) u−σ
0,+ = |u0,+ |
−σ −<φ(u0,+ )
e , and, by
(8.16),

`
<φ(u0,+ ) = − =(u0,+ ) − arg(u0,+ )τ. (8.19)
2

p √
If ` = 0, we already know that <(u0,+ ) = =(u0,+ ) = τ /2, |u0,+ | = τ and
arg u0,+ = π/4. Assume from now on that ` 6= 0.
We will use the expression for u0,+ in (8.18). Solving a quadratic equation, we see
that

r r r
4τ j(ρ) − 1 j(ρ) + 1
−1 + 2 i = +i , (8.20)
` 2 2

where j(ρ) = (1 + ρ2 )1/2 and ρ = 4τ /`2 . Hence

r r !
` j(ρ) − 1 ` j(ρ) + 1
<(u0,+ ) = ± , =(u0,+ ) = 1± . (8.21)
2 2 2 2

Here and in what follows, the sign ± is + if ` > 0 and − if ` < 0. (Notice that <(u0,+ )
and =(u0,+ ) are always positive, except for τ = ` = 0, in which case <(u0,+ ) =
=(u0,+ ) = 0.) By (8.21),

r r !
|`| −1 + j(ρ) 1 + j(ρ)
|u0,+ | = · + 1± i
2 2 2
s r
|`| −1 + j(ρ) 1 + j(ρ) 1 + j(ρ) (8.22)
= + +1±2
2 2 2 2
s r
|`| 1 + j(ρ) |`| p
= 1 + j(ρ) ± 2 =√ υ(ρ)2 ± υ(ρ),
2 2 2
150 CHAPTER 8. THE MELLIN TRANSFORM OF THE TWISTED GAUSSIAN
p
where υ(ρ) = (1 + j(ρ))/2. We now compute the argument of u0,+ :

  p 
arg(u0,+ ) = arg ` i ± −1 + iρ
r r !!
−1 + j(ρ) 1 + j(ρ)
= arg + i ±1 +
2 2
 q  r q 
1+j(ρ)
±1 + 1+j(ρ) ±1 + 2
 2   
= arcsin 
r q
 = arcsin 
  r q 

1 + j(ρ) ± 2 1+j(ρ)
2 2 1+j(ρ)
2
v
u s ! s !
u1 2 π 1 2
= arcsin t 1±  = − arccos ±
2 1 + j(ρ) 2 2 1 + j(ρ)
(8.23)
(by cos(π − 2θ) = − cos 2θ = 2 sin2 θ − 1). Thus

(
π 1 1 1 −1
2 − 2 arccos υ(ρ) = 2 arccos υ(ρ) if ` > 0,
arg(u0,+ ) = 1 1
(8.24)
2 arccos υ(ρ) if ` < 0.

In particular, arg(u0,+ ) lies in [0, π/2], and is close to π/2 only when ` > 0 and
ρ → 0+ . Here and elsewhere, we follow the convention that arcsin and arctan have
image in [−π/2, π/2], whereas arccos has image in [0, π].

8.3.2 The direction of steepest descent

As is customary in the saddle-point method, it is now time to determine the direction


of steepest descent at the saddle-point u0,+ . Even if we decide to use a contour that
goes through the saddle-point in a direction that is not quite optimal, it will be useful
to know what the direction w of steepest descent actually is. A contour that passes
through the saddle-point making an angle between −π/4 +  and π/4 −  with w
may be acceptable, in that the contribution of the saddle point is then suboptimal by at
most a bounded factor depending on ; an angle approaching −π/4 or π/4 leads to a
contribution suboptimal by an unbounded factor.
Let w ∈ C be the unit vector pointing in the direction of steepest descent. Then,
by definition, w2 φ00 (u0,+ ) is real and positive, where φ is as in (8.12). Thus arg(w) =
− arg(φ00 (u0,+ ))/2 mod π. (The direction of steepest descent is defined only modulo
π.) By (8.17),

arg(φ00 (u0,+ )) = −π + arg(i`u0,+ + 2iτ ) − 2 arg(u0,+ ) mod 2π


π
= − + arg(`u0,+ + 2τ ) − 2 arg(u0,+ ) mod 2π.
2
8.3. THE SADDLE POINT 151

By (8.21),
r ! r !
`2 j(ρ) − 1 4τ `2 j(ρ) − 1
<(`u0,+ + 2τ ) = ± + 2 = ρ± ,
2 2 ` 2 2
r !
`2 j(ρ) + 1
=(`u0,+ + 2τ ) = 1± .
2 2

Therefore, arg(`u0,+ + 2τ ) = arctan $, where


q
1 ± j(ρ)+1
2
$= q
j(ρ)−1
ρ± 2

It is easy to check that sgn $ = sgn `. Hence,


 q 
j(ρ)−1
π ρ± 2
arctan $ = ± − arctan  q .
2 j(ρ)+1
1± 2

At the same time,


 q  q 
j−1
1 ∓ j+1
q
ρ± j−1 ρ± 2 2
p p
2 ρ± 2(j − 1) ∓ ρ 2(j + 1)
= =
1 − j+1
q
1± j+1
2
1−j
2
q p 
2
ρ ± j+1 j 2 − 1 − ρ · (j + 1)
ρ ± υ1 (ρ − ρ · (j + 1))
= =
1−j 1−j
ρ(1 ∓ j/υ) (−1 ± j/υ)(j + 1) 2υ(−υ ± j)
= = = .
1−j ρ ρ
(8.25)
Hence, modulo 2π,
(
00 2υ(−υ ± j) 0 if ` ≥ 0
arg(φ (u0,+ )) = − arctan − 2 arg(u0,+ ) −
ρ π if ` < 0.

Therefore, the direction of steepest descent is


(
arg(φ00 (u0,+ )) 1 2υ(−υ ± j) if ` ≥ 0 0
arg(w) = − = arg(u0,+ ) + arctan + π
2 2 ρ if ` < 0. 2
√ p (8.26)
By (8.24) and arccos 1/υ = arctan υ 2 − 1 = arctan (j − 1)/2, we conclude that
  q 
2υ(j+υ) j−1
 π2 + 12 − arctan + arctan if ` < 0,


ρ 2
arg(w) =  q  (8.27)
2υ(j−υ) j−1
 π2 + 12 arctan − arctan if ` ≥ 0.


ρ 2
152 CHAPTER 8. THE MELLIN TRANSFORM OF THE TWISTED GAUSSIAN

Figure 8.1: arg(w) − π/2 as a function of Figure 8.2: arg(w) − π/2 as a function of
ρ for ` < 0 ρ for ` ≥ 0

There is nothing wrong in using plots here to get an idea of the behavior of arg(w),
since, at any rate, the direction of steepest descent will play only an advisory role in
our choices. See Figures 8.1 and 8.2.

8.4 The integral over the contour


We must now choose the contour of integration. The optimal contour should be one on
which the phase of the integrand in (8.13) is constant, i.e., =(φ(u)) is constant. This
is so because, throughout the contour, we want to keep descending from the saddle
as rapidly as possible, and so we want to maximize the absolute value of the deriva-
tive of the real part of the exponent −φ(u). At any point u, if we are to maximize
|<(dφ(u)/dt)|, we want our contour to be such that =(dφ(u)/dt) = 0. (We can also
see this as follows: if =(φ(u)) is constant, there is no cancellation in (8.13) for us to
miss.)
Writing u = x + iy, we obtain from (8.12) that
p
=(φ(u)) = −xy + `x + τ log x2 + y 2 . (8.28)

We would thus be considering the curve =(φ(u)) = c, where c is a constant. Since we


need the contour to pass through the saddle point u0,+ , we set c = =(φ(u0,+ )). The
only problem is that the curve =(φ(u)) = 0 given by (8.28) is rather uncomfortable to
work with.
Instead, we shall use several rather simple contours, each appropriate for different
values of ` and τ .

8.4.1 A simple contour


Assume first that ` > 0. We could just let our contour L be the vertical line going
through u0,+ . Since the direction of steepest descent is never far from vertical (see
8.4. THE INTEGRAL OVER THE CONTOUR 153

(8.2)), this would be a good choice. However, the vertical line has the defect of going
too close to the origin when ρ → 0.
Instead, we will let L consist of three segments: (a) the straight vertical ray
{(x0 , y) : y ≥ y0 },
where x0 = <u0,+ ≥ 0, y0 = =u0,+ > 0; (b) the straight segment going downwards
and to the right from u0,+ to the x-axis, forming an angle of π/2 − β (where β > 0
will be determined later) with the x-axis at a point (x1 , 0); (c) the straight vertical ray
{(x1 , y) : y ≤ 0}. Let us call these three segments L1 , L2 , L3 . Shifting the contour in
(8.13), we obtain Z
I= e−φ(u) u−σ du,
L
and so |I| ≤ I1 + I2 + I3 , where
Z
Ij = e−φ(u) u−σ |du|. (8.29)
Lj

As we shall see, we have chosen the segments Lj so that each of the three integrals Ij
will be easy to bound.
Let us start with I1 . Since σ ≥ 0,
Z ∞
−σ
I1 ≤ |u0,+ | e−<φ(x0 +iy) dy,
y0

where, by (8.12),
y 2 − x2
<φ(x + iy) = − `y − τ arg(x + iy). (8.30)
2
Let us expand the expression on the right of (8.30) for x = x0 and y around y0 =
=u0,+ > 0. The constant term is
` `2 τ −1
<φ(u0,+ ) = − y0 − τ arg(u0,+ ) = − (1 + υ(ρ)) − arccos
2 4 2 υ(ρ)
  (8.31)
1 + υ(ρ) 1 −1
=− + arccos τ,
ρ 2 υ(ρ)
where we are using (8.19), (8.21) and (8.24).
The linear term vanishes because u0,+ is a saddle-point (and thus a local extremum
on L). It remains to estimate the quadratic term. Now, in (8.30), the term arg(x + iy)
equals arctan(y/x), whose quadratic term we should now examine – but, instead, we
are about to see that we can bound it trivially. In general, for t0 , t ∈ R and f ∈ C 2 ,
Z tZ r
0
f (t) = f (t0 ) + f (t0 ) · (t − t0 ) + f 00 (s)dsdr. (8.32)
t0 t0

Now, arctan00 (s) = −2s/(s2 + 1)2 , and this is negative for s > 0 and obeys
arctan00 (−s) = − arctan00 (s)
154 CHAPTER 8. THE MELLIN TRANSFORM OF THE TWISTED GAUSSIAN

for all s. Hence, for t0 ≥ 0 and t ≥ −t0 ,


arctan t ≤ arctan t0 + (arctan0 t0 ) · (t − t0 ). (8.33)
Therefore, in (8.30), we can consider only the quadratic term coming from (y − x2 )/2 2

– namely, (y − y0 )2 /2 – and ignore the quadratic term coming from arg(x + iy). Thus,
(y − y0 )2
<φ(x0 + iy) ≥ + <φ(u0,+ ) (8.34)
2
for y ≥ −y0 , and, in particular, for y ≥ y0 . Hence,
Z ∞ Z ∞
1 2 p
e−<φ(x0 +iy) dy ≤ e−<φ(u0,+ ) e− 2 (y−y0 ) dy = π/2 · e−<φ(u0,+ ) . (8.35)
y0 y0

Notice that, once we choose to use the approximation (8.33), the vertical direction is
actually optimal. (In turn, the fact that the direction of steepest descent is close to
vertical shows us that we are not losing much by using the approximation (8.33).)
As for |u0,+ |−σ , we will estimate it by the easy bound
ρ √ √
r 
` p 2 `
|u0,+ | = √ υ + υ ≥ √ max , 2 = max( τ , `), (8.36)
2 2 2
where we use (8.22).
Let us now bound I2 . As we already said, the linear term at u0,+ vanishes. Let
u◦ be the point at which L2 meets the line normal to it through the origin. We must
take care that the angle formed by the origin, u0,+ and u◦ be no larger than the angle
formed by the origin, (x1 , 0) and u0 ; this will ensure that we are in the range in which
the approximation (8.33) is valid (namely, t ≥ −t0 , where t0 = tan α0 ). The first
angle is π/2 + β − arg u0,+ , whereas the second angle is π/2 − β. Hence, it is enough
to set β ≤ (arg u0,+ )/2. Then we obtain from (8.12) and (8.33) that
(u − u0,+ )2
<φ(u) ≥ <φ(u0,+ ) − < . (8.37)
2
If we let s = |u − u0,+ |, we see that
(u − u0,+ )2 s2  π  s2
< = cos 2 · −β = − cos 2β.
2 2 2 2
Hence,
Z
I2 ≤ |u◦ |−σ e−<φ(u) |du|
L2
Z ∞ r (8.38)
−σ s2 π
< |u◦ | e−<φ(u0,+ )− 2 cos 2β
ds = |u◦ |−σ e−<φ(u0,+ ) .
0 2 cos 2β
Since arg u0 = arg u0,+ − β, we see that, by (8.21),
|u◦ | = < ((x0 + iy0 ) (cos β − i sin β))
r r ! !
` j−1 j+1 (8.39)
= cos β + 1 + sin β .
2 2 2
8.4. THE INTEGRAL OVER THE CONTOUR 155

The square of the expression within the outer parentheses is at least


r r !
j2 − 1
 
j−1 2 j+1 p 2 j−1
cos β + 1 + + 2(j + 1) sin β + + sin 2β
2 2 4 2
j 7 1 j
≥ + sin2 β − cos2 β + sin2 β.
2 2 2 2

If β ≥ π/8, then tan β > 1/ 7, and so, since j > ρ, we obtain
r √ q
` j ` ρ
|u◦ | ≥ 2
(1 + sin β) > 3/2 1 + sin2 β.
2 2 2
We can also apply the trivial bound j ≥ 1 directly to (8.39). Thus,
r q 
τ
|u◦ | ≥ max 1 + sin2 β, ` sin β .
2
Let us choose β as follows. We could always set β = π/8: since arg u0,+ ≥ π/4, we
then have β ≤ (arg u0,+ )/2, as required. However, if ρ ≤ 3/2, then υ(ρ) ≤ 1.18381,
and so, by (8.24), arg u0,+ ≥ 1.28842. We can thus set either β = π/6 = 0.523598 . . .
or β = π/5 = 0.628318 . . . , say, either of which is smaller than (arg u0,+ )/2. Going
back to (8.38), we conclude that
√ r r −σ
−<φ(u0,+ ) π τ π
I2 ≤ e · 1/4 1 + sin2
2 2 8
for ρ arbitrary, and
s !
−σ
π/2 π −σ √ `
I2 ≤ e−<φ(u0,+ ) · min · ` sin , π
cos 2π/5 5 2

when υ(ρ) ≤ 3/2.


It remains to estimate I3 . For u = x1 ,
2
(u − u0,+ )2 y 2 (tan β − i) 1
= −< 0 1 − tan2 β y02

−< =
2 2 2
 `2 `2
 
2 j+1  ρ
≥ 1 − tan β · 1+ ≥ 1 − tan2 β ·
8 2 8 2
1
1 − tan2 β τ,


4
where we are using (8.21). Thus, (8.37) tells us that
1 − tan2 β
<φ(x1 ) ≥ <φ(u0,+ ) + τ.
4
At the same time, by (8.30) and τ, ` ≥ 0,
y2
<φ(x1 + iy) ≥ <φ(x1 ) +
2
156 CHAPTER 8. THE MELLIN TRANSFORM OF THE TWISTED GAUSSIAN

for y ≤ 0. Hence
Z Z 0
2
I3 ≤ |x1 |−σ e−<φ(u) |du| ≤ |x1 |−σ e−<φ(x1 ) e−y /2
dy
L3 −∞
r
−σ π − 1−tan2 β τ −<φ(u0,+ )
≤ |x1 | · e 4 e .
2

Here note that x1 ≥ (tan β)|u0,+ |, and so, by (8.36),


√ 
x1 ≥ tan β · max τ , ` .

We conclude that, for ` > 0,


 !σ/2 √   p
2−1
− τ
2 e  · π/2 e−<φ(u0,+ )
2
1
|I| ≤ 1 + 2 4 + σ
1 + sin2 π
8 tan π8 τ σ/2


(since (1 − tan2 π/8)/4 = ( 2 − 1)/2) and, when ρ ≤ 3/2,
  q   p
sec 2π − τ6
1
|I| ≤ 1 + min 2σ+ 2 ,
5
 +
e
√  · π/2 e−<φ(u0,+ ) .
π σ σ `σ
sin 5 (1/ 3)

We know <φ(u0,+ ) from (8.31). Write

1 1 υ(ρ) − 1
E(ρ) = arccos − , (8.40)
2 υ(ρ) ρ

so that
1 + υ(ρ) 1 −1 π 2
−<φ(u0,+ ) = + arccos = − E(ρ) + .
ρ 2 υ(ρ) 2 ρ
π
To finish, we just need to apply (8.11). It makes sense to group together Γ(s)e 2 τ ,
p it is bounded on the critical line (by the classical formula |Γ(1/2 + iτ )| =
since
π/ cosh πτ , as in [MV07, Exer. C.1(b)]), and, in general, of slow growth on bounded
strips. Using (8.11), and noting that 2π 2 δ 2 = `2 /2 = (2/ρ) · τ , we obtain
(
π
τ −E(ρ)τ c1,σ,τ /τ σ/2 for ρ arbitrary,
|Fδ (s)| ≤ |Γ(s)|e 2 e · (8.41)
c2,σ,τ /`σ for ρ ≤ 3/2.

where  !σ/2 √  
2−1
− τ
1 1 2 e 2
c1,σ,τ = 1 + 24 + σ 
2 1 + sin2 π8 tan π8
  q   (8.42)
1 1
sec 2π5 e − τ6
c2,σ,τ = 1 + min 2σ+ 2 , σ  + √ .
2 sin π5 (1/ 3)σ
8.4. THE INTEGRAL OVER THE CONTOUR 157

We have assumed throughout that ` ≥ 0 and τ ≥ 0. We can immediately obtain a


bound valid for ` ≤ 0, τ ≤ 0, by reflection on the x-axis; we simply put absolute
values around τ and ` in (8.41).
We see that we have obtained a bound in a neat, closed form without too much
effort. Of course, this effortlessness is usually in part illusory; the contour we have
used here is actually the product of some trial and error, in that some other contours
give results that are comparable in quality but harder to simplify. We will have to
choose a different contour when sgn(`) 6= sgn(τ ).

8.4.2 Another simple contour


We now wish to give a bound for the case of sgn(`) 6= sgn(τ ), i.e., sgn(δ) = sgn(τ ).
We expect a much smaller upper bound than for sgn(`) = sgn(τ ), given what we
already know from the method of stationary phase. This also means that we will not
need to be as careful in order to get a bound that is good enough for all practical
purposes.
Our contour L will consist of three segments: (a) the straight vertical ray {(x0 , y) :
y ≥ 0}, (b) the quarter-circle from (x0 , 0) to (0, −x0 ) (that is, an arc where the argu-
ment runs from 0 to −π/2), and (c) the straight vertical ray {(0, y) : y ≤ −x0 }. We
call these segments L1 , L2 , L3 , and define the integrals I1 , I2 and I3 just as in (8.29).
Much as before, we have
Z ∞
I1 ≤ x−σ0 e−<φ(x0 +iy) dy.
0

Since (8.33) is valid for t ≥ 0, (8.34) holds, and so


Z ∞ √
1 2
I1 ≤ x−σ
0 e −<φ(u0,+ )
e− 2 (y−y0 ) dy = x−σ
0 2π · e−<φ(u0,+ ) .
−∞

By (8.12) and (8.30),


 
Z Z π/2 − − x20 cos 2α +`x sin α+τ α
0
x−σ −<φ(u)
x1−σ
2
I2 ≤ 0 e du = 0 e dα. (8.43)
L2 0

Now, for α ≥ 0 and ` ≤ 0,


0
(`x0 sin α + τ α) = `x0 cos α + τ ≥ `x0 + τ.
p p
Since j = 1 + ρ2 ≤ 1 + ρ2 /2, we have (j − 1)/2 ≤ ρ/2, and so, by (8.21),
|`x0 | ≤ `2 ρ/4 = τ , and thus `x0 + τ ≥ 0. In other words, the exponent in (8.43)
equals (x20 cos 2α)/2 minus an increasing function, and so, since <φ(x0 ) = −x20 /2,
Z π/2 x2
0 cos 2α π
I2 ≤ x−σ
0 · x0 e 2 dα = x−σ
0 · x0 · I0 (x20 /2),
0 2
1

where I0 (t) = π 0
et cos θ dθ is the modified Bessel function of the first kind (and
order 0).
158 CHAPTER 8. THE MELLIN TRANSFORM OF THE TWISTED GAUSSIAN
p
Since cos θ = 1 − sin2 θ < 1 − (sin2 θ)/2 ≤ 1 − 2θ2 /π 2 , we have3
Z ∞ √ √ √
1 π t 1− 2θ π et
Z 2
t 1 t π/ 2t π
 
− π2t2 θ 2
I0 (t) ≤ e π 2
dθ < e · e dθ = e = 3/2 √
π 0 π 0 π 2 2 t
for t ≥ 0.
Using the fact that <φ(x0 ) = −x20 /2, we conclude that
√ 2
π π ex0 /2 π 3/2 −σ −<φ(x0 )
I2 ≤ x−σ0 · x 0 · √ = x e .
2 23/2 x0 / 2 4 0
By (8.34), which is valid for all `, we know that <φ(x0 ) ≥ <φ(u0,+ ).
Let us now estimate the integral on L3 . Again by (8.30), for y < 0,
y2 π
<φ(iy) = − `y + τ .
2 2
Hence Z Z −x0  2 
− y2 −`y+τ π
e −φ(u) −σ
u du ≤ x−σ
0 e 2
du
L3 −∞
−x0
Z r
1 2 τπ 1 2 τπ π
= x−σ
0 e
2 ` e− 2 e− 2 (y−`) dy = x−σ
0 e
− 2 ,
−∞ 2
R −` 2 p 2
since y−` ≤ −` for y ≤ −x0 and −∞ e−t /2 dt ≤ π/2·e−` /2 (by [AS64, 7.1.13]).
Now that we have bounded the integrals over L1 , L2 and L3 , it remains to bound
x0 from below, starting from (8.21). We will bound it differently for ρ < 3/2 and for
ρ ≥ 3/2. (The choice
√ of 3/2 is fairly arbitrary.) √
Expanding ( 1 + t − 1)2 > 0, we obtain that 2(1 + t) − 2 1 + t ≥ t for all
t ≥ −1, and so
√ 0

 
1+t−1 1 t
= 2 √ − ( 1 + t − 1) < 0,
t t 2 1+t

i.e., ( 1 + t − 1)/t decreases as t increases. Hence, for ρ ≤ ρ0 , where ρ0 ≥ 0,
p
p
2
1 + ρ20 − 1 2
j(ρ) = 1 + ρ ≥ 1 + ρ , (8.44)
ρ20

which equals 1 + (2/9)( 13 − 2)ρ2 for ρ0 = 3/2. Thus, for ρ ≤ 3/2,
s √ p√
|`| 29 ( 13 − 2)ρ2 13 − 2
x0 ≥ = |`|ρ
2 2 6
p√ (8.45)
2 13 − 2 τ |τ |
= ≥ 0.84473 .
3 |`| `
3 It is actually not hard to prove rigorously the better bound I

t / t. For t ≥ 8, this can
≤ 0.468823e
0 (t)

2s2 , 2
be done directly by the change of variables cos θ = 1 − √ dθ = 2ds/ 1 − s , followed by the usage
of different upper bounds on the the integrand exp(−2ts2 / 1 − s2 ) for 0 ≤ s ≤ 1/2 and 1/2 ≤ s ≤ 1.
(Thanks are due G. Kuperberg for this argument.) For t < 8, use the Taylor expansion of I0 (t) around
t = 0 [AS64, (9.6.12)]: truncate it after 16 terms, and then bound the maximum of the truncated series by
the bisection method, implemented via interval arithmetic (as described in §2.6).
8.5. CONCLUSIONS 159

On the other hand,


0 p
ρ2 − (1 + ρ2 ) + 1 + ρ2

j(ρ) − 1 1 0
= 2 (j (ρ)ρ − (j(ρ) − 1)) = p ≥ 0,
ρ ρ ρ2 1 + ρ2

√ so, for ρ ≥ 3/2, (j(ρ) − 1)/ρ is minimal at ρ = 3/2, where it takes the value
and
( 13 − 2)/3. Hence
r √ p√ p√
|`| j(ρ) − 1 |`| ρ 13 − 2 13 − 2 √ √
x0 = ≥ √ = √ τ ≥ 0.51729 τ . (8.46)
2 2 2 6 6
We now sum I1 , I2 and I3 , and then use (8.11); we obtain that, when ` < 0 and
τ ≥ 0,
2 2
e−2π δ |Γ(s)|
Z
|Fδ (s)| ≤ √ e−φ(u) u−σ du
2π L
(8.47)
 
−σ π  −<φ(u0,+ ) 1 − τ π − 12 `2
≤ |x0 | 1 + 3/2 e + e 2 e |Γ(s)|.
2 2
By (8.19), (8.21) and (8.24),
`2 τ 1 τ 1 π
−<(φ(u0,+ )) = (1 − υ(ρ)) + arccos < arccos ≤ τ.
4 2 υ(ρ) 2 υ(ρ) 4
We conclude that, when sgn(`) 6= sgn(τ ) (i.e., sgn(δ) = sgn(τ )),
 
−σ − 21 `2 π
|τ | π  − π |τ | 1 −π|τ |
|Fδ (s)| ≤ |x0 | · e |Γ(s)|e 2 · 1 + 3/2 e 4 + e ,
2 2
where x0 can be bounded as in (8.45) and (8.46). Here, as before, we reducing the case
τ < 0 to the case τ > 0 by reflection. This concludes the proof of Theorem 8.0.1.

8.5 Conclusions
We have obtained bounds on |Fδ (s)| for sgn(δ) 6= sgn(τ ) (8.41) and for sgn(δ) =
sgn(τ ) (8.47). Our task is now to simplify them.
First, let us look at the exponent E(ρ), defined as in (8.2). Its plot can be seen in
Figure 8.5. We claim that
(
0.1598 if ρ ≥ 1.5,
E(ρ) ≥ (8.48)
0.1065ρ if ρ < 1.5.

This is so for ρ ≥ 1.5 because E(ρ) is increasing on ρ and E(1.5) = 0.15982 . . . . The
case ρ < 1.5 is a little more delicate. We can easilypsee that arccos(1 − t2 /2) ≥ t for
0 ≤ t ≥ 2 (since the derivative of the left side is 1/ 1 − t2 /4, which is always ≥ 1).
We also have
ρ2 ρ4 ρ2
1+ − ≤ j(ρ) ≤ 1 +
2 8 2
160 CHAPTER 8. THE MELLIN TRANSFORM OF THE TWISTED GAUSSIAN

Figure 8.3: The function E(ρ)


for 0 ≤ ρ ≤ 8, and so

ρ2 5ρ4 ρ2
1+ − ≤ υ(ρ) ≤ 1 +
8 128 8
p
for 0 ≤ ρ ≤ p32/5; this, in turn, gives us that 1/υ(ρ) ≤ 1 − ρ2 /8 + 7ρ4 /128 (again
for 0 ≤ ρ ≤ 32/5), and so 1/υ(ρ) ≤ 1 − (1 − 7/64)ρ2 /8 for 0 ≤ ρ ≤ 1/2. We
conclude that r
1 1 57
arccos ≥ ρ;
υ(ρ) 2 64
therefore,
r
1 57 ρ
E(ρ) ≥ ρ − > 0.11093ρ > 0.1065ρ.
4 64 8
In the remaining range 1/2 ≤ ρ ≤ 3/2, we prove that E(ρ)/ρ > 0.106551 using
the bisection method (with 20 iterations) implemented by means of interval arithmetic.
This concludes the proof of (8.48).
Assume from this point onwards that |τ | ≥ 20. Let us show that the contribution
of (8.3) is negligible relative to that of (8.1). Indeed,
 
π  −π 1 7.8 −0.1598τ
1+ e 4 |τ | + e−π|τ | ≤ e .
23/2 2 106
2
It is useful to note that e−` /2
= e−2τ /ρ , and so, for σ ≤ k + 1 and ρ ≤ 3/2,

e−2τ /ρ e−40/ρ e−80/(3t)

1 4
≤ σ ≤ σ
(0.84473|τ |/`)σ 0.84473
4 ρ `σ ` 0.84473 · 1.5 tσ
(8.49)
1 e−80/(3t)
≤ σ · 3.15683k+1 k+1 ,
` t
8.5. CONCLUSIONS 161

where t = 2ρ/3 ≤ 1. Since e−c/t /tk+1 attains its maximum at t = c/(k + 1),
k+1
e−80/(3t)

3(k + 1)
≤ e−(k+1) ,
tk+1 80

and so, for ρ ≤ 3/2,



0.04355 if 0 ≤ σ ≤ 1,
1 2 1 
|x0 |−σ e− 2 ` ≤ σ · 0.00759 if 1 ≤ σ ≤ 2,
` 
0.00224 if 2 ≤ σ ≤ 3,

2 √
whereas |x0 |−σ e−` /2 ≤ |x0 |−σ ≤ (0.51729 τ )−σ for ρ ≥ 3/2.
We conclude that, for |τ | ≥ 20 and σ ≤ 3,
(
4 1
π
τ −0.1598τ 7 σ if ρ ≤ 3/2,
|Fδ (s)| ≤ |Γ(s)|e · e
2 · 106 ` 1 (8.50)
105 τ σ/2 if ρ ≥ 3/2

provided that sgn(δ) = sgn(τ ) or δ = 0. This will indeed be negligible compared to


our bound for the case sgn(δ) = − sgn(τ ).
π
Let us now deal with the factor |Γ(s)|e 2 τ . By Stirling’s formula with remainder
term [GR94, (8.344)],
 
1 1 1
log Γ(s) = log(2π) + s − log s − s + + R2 (s),
2 2 12s

where √
1/30 2
|R2 (s)| < arg s =

12|s|3 cos3 2
180|s|3
for <(s) ≥ 0. The real part of (s − 1/2) log s − s is
 
π σ σ
(σ − 1/2) log |s| − τ arg(s) − σ = (σ − 1/2) log |s| − τ + τ arctan −
2 |τ | |τ |

for s = σ + iτ , σ ≥ 0. Since arctan(r) ≤ r for r ≥ 0, we conclude that



π √ σ− 21
1 2
+ 180|s|
|Γ(s)|e 2τ ≤ 2π|s| e 12|s| 3
. (8.51)

Lastly, |s|σ−1/2 = |τ |σ−1/2 |1 + iσ/τ |σ−1/2 . For |τ | ≥ 20,



1.000625 if 0 ≤ σ ≤ 1,

σ−1/2
|1 + iσ/τ | ≤ 1.007491 if 1 ≤ σ ≤ 2,

1.028204 if 2 ≤ σ ≤ 3

and √
1 2
+ 180|τ
e 12|τ | |3 ≤ 1.004177.
162 CHAPTER 8. THE MELLIN TRANSFORM OF THE TWISTED GAUSSIAN

Thus, 
π
2.51868 if 0 ≤ σ ≤ 1,

|Γ(s)|e 2 τ ≤ |τ |σ−1/2
· 2.53596 if 1 ≤ σ ≤ 2, (8.52)

2.5881 if 2 ≤ σ ≤ 3.

Let us now estimate the constants c1,σ,τ and c2,σ,τ in (8.2). By |τ | ≥ 20,
√ 
2−1
− τ τ
e 2
≤ 0.015889, e− 6 ≤ 0.035674. (8.53)

Since 8 sin(π/8) = 3.061467 . . . > 1, we obtain that


 
1.30454 if 0 ≤ σ ≤ 1,
 1.94511
 if 0 ≤ σ ≤ 1,
c1,σ,τ ≤ 1.58361 if 1 ≤ σ ≤ 2, c2,σ,τ ≤ 3.15692 if 1 ≤ σ ≤ 2,
 
1.98186 if 2 ≤ σ ≤ 3, 5.02186 if 2 ≤ σ ≤ 3.
 

Lastly, note that, for k ≤ σ ≤ k + 1, we have


1
· |τ |σ−1/2 = |τ |(σ−1)/2 ≤ τ k/2 ,
τ σ/2
whereas, for ρ ≤ 3/2 and 0 ≤ γ ≤ 1,
γ/2  1/2
|τ |γ−1/2  τ γ/2 
γ 1 γ 1 3/2 3
≤ |τ | 2−2 ≤ 20 2−2 ≤ ,
|`|γ `2 4 8
and so k r k
|τ |{σ}−1/2
 
1 |τ | 3 |τ |
· |τ |σ−1/2 = ≤ · .
`σ ` |`|{σ} 8 `
Multiplying, and remembering to add (8.50), we obtain that, for k = 0, 1, 2, σ ∈
[0, 1] and |τ | ≥ 20,
  k 2|τ | 2
κk,0 |τ |
e−0.1065( |`| ) if ρ < 3/2,

|Fδ (s + k)| + |Fδ ((1 − s) + k)| ≤ |`|
κ |τ |k e−0.1598|τ | if ρ ≥ 3/2,
k,1

where p
κ0,0 ≤ (4 · 10−7 + 1.94511) · 2.51868 · 3/8 ≤ 3.001,
−7
p
κ1,0 ≤ (4 · 10 + 3.15692) · 2.53596 · 3/8 ≤ 4.903,
−7
p
κ2,0 ≤ (4 · 10 + 5.02186) · 2.5881 · 3/8 ≤ 7.96,
and, similarly,

κ0,1 ≤ (6 · 10−5 + 1.30454) · 2.51868 ≤ 3.286,


κ1,1 ≤ (6 · 10−5 + 1.58361) · 2.53596 ≤ 4.017,
κ2,1 ≤ (6 · 10−5 + 1.98186) · 2.5881 ≤ 5.13.

This concludes the proof of Corollary 8.0.2.


Chapter 9

Explicit formulas

An explicit formula is an expression restating a sum such as Sη,χ (δ/x, x) as a sum of


the Mellin transform Gδ (s) over the zeros of the L function L(s, χ). More specifically,
for us, Gδ (s) is the Mellin transform of η(t)e(δt) for some smoothing function η and
some δ ∈ R. We want a formula whose error terms are good both for δ very close or
equal to 0 and for δ farther away from 0. (Indeed, our choice(s) of η will be made so
that Fδ (s) decays rapidly in both cases.)
We will be able to base all of our work on a single, general explicit formula, namely,
Lemma 9.1.1. This explicit formula has simple error terms given purely in terms of a
few norms of the given smoothing function η. We also give a common framework for
estimating the contribution of zeros on the critical strip (Lemmas 9.1.3 and 9.1.4).
2
The first example we work out is that of the Gaussian smoothing η(t) = e−t /2 .
We actually do this in part for didactic purposes and in part because of its likely ap-
plicability elsewhere; for our applications, we will always use smoothing functions
2 2
based on te−t /2 and t2 e−t /2 , generally in combination with something else. Since
2
η(t) = e−t /2 does not vanish at t = 0, its Mellin transform has a pole at s = 0
– something that requires some additional work (Lemma 9.1.2; see also the proof of
Lemma 9.1.1).
Other than that, for each function η(t), all that has to be done is to bound an integral
(from Lemma 9.1.3) and bound a few norms. Still, both for η∗ and for η+ , we find a
few interesting complications. Since η+ is defined in terms of a truncation of a Mellin
transform (or, alternatively, in terms of a multiplicative convolution with a Dirichlet
0
kernel, as in (7.4) and (7.6)), bounding the norms of η+ and η+ takes a little work. We
leave this to Appendix A. The effect of the convolution is then just to delay the decay
a shift, in that a rapidly decaying function f (τ ) will get replaced by f (τ − H), H a
constant.
2
The smoothing function η∗ is defined as a multiplicative convolution of t2 e−t /2
2
with something else. Given that we have an explicit formula for t2 e−t /2 , we obtain an
explicit formula for η∗ by what amounts to just exchanging the order of a sum and an
integral. (We already went over this in the introduction, in (1.40).)

163
164 CHAPTER 9. EXPLICIT FORMULAS

9.1 A general explicit formula


We will prove an explicit formula valid whenever the smoothing η and its derivative η 0
satisfy rather mild assumptions – they will be assumed to be L2 -integrable and to have
strips of definition containing {s : 1/2 ≤ <(s) ≤ 3/2}, though any strip of the form
{s :  ≤ <(s) ≤ 1 + } would do just as well.
(For explicit formulas with different sets of assumptions, see, e.g., [IK04, §5.5] and
[MV07, Ch. 12].)
The main idea in deriving any explicit formula is to start with an expression giving
a sum as integral over a vertical line with an integrand involving a Mellin transform
(here, Gδ (s)) and an L-function (here, L(s, χ)). We then shift the line of integration to
the left. If stronger assumptions were made (as in Exercise 5 in [IK04, §5.5]), we could
shift the integral all the way to <(s) = −∞; the integral would then disappear, replaced
entirely by a sum over zeros (or even, as in the same Exercise 5, by a particularly simple
integral). Another possibility is to shift the line only to <(s) = 1/2 +  for some  > 0
– but this gives a weaker result, and at any rate the factor L0 (s, χ)/L(s, χ) can be large
and messy to estimate within the critical strip 0 < <(s) < 1.
Instead, we will shift the line to <s = −1/2. We can do this because the assump-
tions on η and η 0 are enough to continue Gδ (s) analytically up to there (with a possible
pole at s = 0). The factor L0 (s, χ)/L(s, χ) is easy to estimate for <s < 0 and s = 0
(by the functional equation), and the part of the integral on <s = −1/2 coming from
Gδ (s) can be estimated easily using the fact that the Mellin transform is an isometry.
Lemma 9.1.1. Let η : R+ 1 +
0 → R be in C . Let x ∈ R , δ ∈ R. Let χ be a primitive
character mod q, q ≥ 1.
Write Gδ (s) for the Mellin transform of η(t)e(δt). Assume that η(t) and η 0 (t) are
in `2 (with respect to the measure dt) and that η(t)tσ−1 and η 0 (t)tσ−1 are in `1 (again
with respect to dt) for all σ in an open interval containing [1/2, 3/2].
Then
∞  
X δ X
Λ(n)χ(n)e n η(n/x) = Iq=1 · ηb(−δ)x − Gδ (ρ)xρ
n=1
x ρ (9.1)
− R + O∗ ((log q + 6.01) · (|η 0 |2 + 2π|δ||η|2 )) x−1/2 ,
where (
1
if q = 1,
Iq=1 =
if q 6= 1,
0
(9.2)
L0 (1, χ)
 

R = η(0) log +γ− + O∗ (c0 )
q L(1, χ)
for q > 1, R = η(0) log 2π for q = 1 and
√ √
 0  
2 η (t) η(t)
c0 = O∗ √ + η 0 (t) t + 2π|δ| √ + |η(t) t|1 . (9.3)
3 t 1 1 t 1
0 0

The normsP |η| 2 , |η |2 , |η (t)/ t|1 , etc., are taken with respect to the usual measure dt.
The sum ρ is a sum over all non-trivial zeros ρ of L(s, χ).
9.1. A GENERAL EXPLICIT FORMULA 165

Proof. Since (a) η(t)tσ−1 is in `1 for σ in an open interval containing 3/2 and (b)
η(t)e(δt) has bounded variation (since η, η 0 ∈ `1 , implying that the derivative of
η(t)e(δt) is also in `1 ), the Mellin inversion formula (as in, e.g., [IK04, 4.106]) holds:
3
2 +i∞
Z
1
η(n/x)e(δn/x) = Gδ (s)xs n−s ds.
2πi 3
2 −i∞

Since Gδ (s) is bounded for <(s) = 3/2 (by η(t)t3/2−1 ∈ `1 ) and n Λ(n)n−3/2 is
P
bounded as well, we can change the order of summation and integration as follows:
∞ ∞ Z 32 +i∞
X X 1
Λ(n)χ(n)e(δn/x)η(n/x) = Λ(n)χ(n) · Gδ (s)xs n−s ds
n=1 n=1
2πi 3
2 −i∞

Z 32 +i∞ X
1
= Λ(n)χ(n)Gδ (s)xs n−s ds (9.4)
2πi 23 −i∞ n=1
Z 32 +i∞
1 L0 (s, χ)
= − Gδ (s)xs ds.
2πi 23 −i∞ L(s, χ)

(This is the way the procedure always starts: see, for instance, [HL22, Lemma 1] or,
to look at a recent standard reference, [MV07, p. 144]. We are being very scrupulous
about integration because we are working with general η.)
The first question we should ask ourselves is: up to where can we extend Gδ (s)?
Since η(t)tσ−1 is in `1 for σ in an open interval I containing [1/2, 3/2], the transform
Gδ (s) is defined for <(s) in the same interval I. However, we also know that the
transformation rule M (tf 0 (t))(s) = −s · M f (s) (see (2.10); by integration by parts)
is valid when s is in the holomorphy strip for both M (tf 0 (t)) and M f . In our case
(f (t) = η(t)e(δt)), this happens when <(s) ∈ (I − 1) ∩ I (so that both sides of the
equation in the rule are defined). Hence s · Gδ (s) (which equals s · M f (s)) can be
analytically continued to <(s) in (I − 1) ∪ I, which is an open interval containing
[−1/2, 3/2]. This implies immediately that Gδ (s) can be analytically continued to the
same region, with a possible pole at s = 0.
When does Gδ (s) have a pole at s = 0? This happens when sGδ (s) is non-zero at
s = 0, i.e., when M (tf 0 (t))(0) 6= 0 for f (t) = η(t)e(δt). Now
Z ∞
M (tf 0 (t))(0) = f 0 (t)dt = lim f (t) − f (0).
0 t→∞

We already know that f 0 (t) = (d/dt)(η(t)e(δt)) is in `1 . Hence, limt→∞ f (t) exists,


and must be 0 because f is in `1 . Hence −M (tf 0 (t))(0) = f (0) = η(0).
Let us look at the next term in the Laurent expansion of Gδ (s) at s = 0. It is

sGδ (s) − η(0) −M (tf 0 (t))(s) − f (0) 1 ∞ 0


Z
lim = lim = − lim f (t)(ts − 1)dt
s→0 s s→0 s s→0 s 0
Z ∞ Z ∞
ts − 1
=− f 0 (t) lim dt = − f 0 (t) log t dt.
0 s→0 s 0
166 CHAPTER 9. EXPLICIT FORMULAS

Here we were able to exchange the limit and the integral because f 0 (t)tσ is in `1
for σ in a neighborhood of 0; in turn, this is true because f 0 (t) = η 0 (t) + 2πiδη(t)
and η 0 (t)tσ and η(t)tσ are both in `1 for σ in a neighborhood of 0. In fact, we will
use the easy bounds |η(t) log t| ≤ (2/3)(|η(t)t−1/2 |1 + |η(t)t1/2 |1 ), |η 0 (t) log t| ≤
(2/3)(|η 0 (t)t−1/2 |1 + |η 0 (t)t1/2 |1 ), resulting from the inequality

2  −1 1

t 2 + t 2 ≤ | log t|, (9.5)
3
valid for all t > 0.
We conclude that the Laurent expansion of Gδ (s) at s = 0 is

η(0)
Gδ (s) = + c0 + c1 s + . . . , (9.6)
s
where

c0 = O∗ (|f 0 (t) log t|1 )


√ √
 0  
2 η (t) η(t)
= O∗ √ + η 0 (t) t + 2πδ √ + |η(t) t|1 .
3 t 1 1 t 1

We shift the line of integration in (9.4) to <(s) = −1/2. We obtain


2+i∞
L0 (s, χ)
Z
1 X
− Gδ (s)xs ds = Iq=1 Gδ (1)x − Gδ (ρ)xρ − R
2πi 2−i∞ L(s, χ) ρ
Z −1/2+i∞ 0 (9.7)
1 L (s, χ)
− Gδ (s)xs ds,
2πi −1/2−i∞ L(s, χ)

where
L0 (s, χ)
R = Ress=0 Gδ (s).
L(s, χ)
Of course,
Z ∞
Gδ (1) = M (η(t)e(δt))(1) = η(t)e(δt)dt = ηb(−δ).
0

Let us work out the Laurent expansion of L0 (s, χ)/L(s, χ) at s = 0. By the func-
tional equation (as in, e.g., [IK04, Thm. 4.15]),

L0 (s, χ) L0 (1 − s, χ)
   
π 1 s+κ 1 1−s+κ
= log − ψ − ψ − , (9.8)
L(s, χ) q 2 2 2 2 L(1 − s, χ)

where ψ(s) = Γ0 (s)/Γ(s) and


(
0 if χ(−1) = 1
κ=
1 if χ(−1) = −1.
9.1. A GENERAL EXPLICIT FORMULA 167

By ψ(1 − x) − ψ(x) = π cot πx (immediate from Γ(s)Γ(1 − s) = π/ sin πs) and


ψ(s) + ψ(s + 1/2) = 2(ψ(2s) − log 2) (Legendre; [AS64, (6.3.8)]),
    
1 s+κ 1−s+κ π π(s + κ)
− ψ +ψ = −ψ(1−s)+log 2+ cot . (9.9)
2 2 2 2 2
Hence, unless q = 1, the Laurent expansion of L0 (s, χ)/L(s, χ) at s = 0 is
L0 (1, χ)
 
1−κ 2π a1 a2
+ log − ψ(1) − + + 2 + ... .
s q L(1, χ) s s
Here ψ(1) = −γ, the Euler gamma constant [AS64, (6.3.2)].
There is a special case for q = 1 due to the pole of ζ(s) at s = 1. We know that
ζ 0 (0)/ζ(0) = log 2π (see, e.g., [MV07, p. 331]).
From this and (9.6), we conclude that, if η(0) = 0, then
(
c0 if q > 1 and χ(−1) = 1,
R=
0 otherwise,

where c0 = O∗ (|η 0 (t) log t|1 + 2π|δ||η(t) log t|1 ). If η(0) 6= 0, then
 (
L0 (1, χ)

2π c0 if χ(−1) = 1
R = η(0) log +γ− +
q L(1, χ) 0 otherwise.
for q > 1, and
R = η(0) log 2π
for q = 1.
It is time to estimate the integral on the right side of (9.7). For that, we will need to
estimate L0 (s, χ)/L(s, χ) for <(s) = −1/2 using (9.8) and (9.9).
If <(z) = 3/2, then |t2 + z 2 | ≥ 9/4 for all real t. Hence, by [OLBC10, (5.9.15)]
and [GR94, (3.411.1)],
Z ∞
1 tdt
ψ(z) = log z − −2 2 + z 2 )(e2πt − 1)
2z 0 (t
Z ∞ 
1 tdt
= log z − + 2 · O∗ 9 2πt
2z 0 (e − 1)
Z ∞ 4 
1 8 ∗ tdt
= log z − + O (9.10)
2z 9 e2πt − 1
0 
1 8 1
= log z − + · O∗ Γ(2)ζ(2)
2z 9 (2π)2
   
1 ∗ 1 ∗ 10
= log z − +O = log z + O .
2z 27 27
Thus, in particular, ψ(1 − s) = log(3/2 − iτ ) + O∗ (10/27), where we write s =
1/2 + iτ . Now
π π π π
π(s + κ) e∓ 4 i− 2 τ + e± 4 i+ 2 τ
cot = ∓ π i− π τ π π = 1.
2 e 4 2 − e± 4 i+ 2 τ
168 CHAPTER 9. EXPLICIT FORMULAS

Since <(s) = −1/2, a comparison of Dirichlet series gives


L0 (1 − s, χ) |ζ 0 (3/2)|
≤ ≤ 1.50524, (9.11)
L(1 − s, χ) |ζ(3/2)|
where ζ 0 (3/2) and ζ(3/2) can be evaluated by Euler-Maclaurin. Therefore, (9.8) and
(9.9) give us that, for s = −1/2 + iτ ,
L0 (s, χ) q 3 10 π
≤ log + log + iτ + + log 2 + + 1.50524
L(s, χ) π 2 27 2
  (9.12)
q 1 9
≤ log + log τ 2 + + 4.1396.
π 2 4
Recall that we must bound the integral on the right side of (9.7). The absolute value
of the integral is at most x−1/2 times
Z − 12 +i∞ 0
1 L (s, χ)
Gδ (s) ds. (9.13)
2π − 21 −i∞ L(s, χ)
By Cauchy-Schwarz, this is at most
v v
u 1 Z − 12 +i∞ L0 (s, χ) 1 2 u 1 Z − 12 +i∞
u u
2
t · |ds| · t |Gδ (s)s| |ds|
2π − 21 −i∞ L(s, χ) s 2π − 21 −i∞

By (9.12),
v v
uZ − 1 +i∞ 2
uZ − 21 +i∞ 2
u 2 L0 (s, χ) 1 u log q
t · |ds| ≤ t |ds|
− 12 −i∞ L(s, χ) s − 12 −i∞ s
v
uZ
∞ 1 9
 2
u
2 log τ 2 + 4 + 4.1396 + log π
+t 1 2

−∞ 4 +τ
√ √
≤ 2π log q + 226.844,
where we compute the last integral numerically.1
Again, we use the fact that, by (2.10), sGδ (s) is the Mellin transform of
d(e(δt)η(t))
−t = −2πiδte(δt)η(t) − te(δt)η 0 (t) (9.14)
dt
Hence, by Plancherel (as in (2.6)),
v sZ
u 1 Z − 12 +i∞ ∞
u
2 2
t |Gδ (s)s| |ds| = |−2πiδte(δt)η(t) − te(δt)η 0 (t)| t−2 dt
2π − 21 −i∞ 0
sZ sZ
∞ ∞
= 2π|δ| |η(t)|2 dt + |η 0 (t)|2 dt.
0 0
(9.15)
1 By a rigorous integration from τ = −100000 to τ = 100000 using VNODE-LP [Ned06], which runs
on the PROFIL/BIAS interval arithmetic package [Knü99].
9.1. A GENERAL EXPLICIT FORMULA 169

Thus, (9.13) is at most


r !
226.844
log q + · (|η 0 |2 + 2π|δ||η|2 ) .

Lemma 9.1.1 leaves us with three tasks: bounding the sum of Gδ (ρ)xρ over all
non-trivial zeroes ρ with small imaginary part, bounding the sum of Gδ (ρ)xρ over all
non-trivial zeroes ρ with large imaginary part, and bounding L0 (1, χ)/L(1, χ). Let
us start with the last task: while, in a narrow sense, it is optional – in that, in the
applications we actually need (Thm. 7.1.2, Cor. 7.1.3 and Thm. 7.1.4), we will have
η(0) = 0, thus making the term L0 (1, χ)/L(1, χ) disappear – it is also very easy and
can be dealt with quickly.
Since we will be using a finite GRH check in all later applications, we might as
well use it here.
Lemma 9.1.2. Let χ be a primitive character mod q, q > 1. Assume that all non-trivial
zeroes ρ = σ + it of L(s, χ) with |t| ≤ 5/8 satisfy <(ρ) = 1/2. Then
L0 (1, χ) 5
≤ log M (q) + c,
L(1, χ) 2
P
where M (q) = maxn m≤n χ(m) and

2 3
c = 5 log = 15.07016 . . . .
ζ(9/4)/ζ(9/8)
Proof. By a lemma of Landau’s (see, e.g., [MV07, Lemma 6.3], where the constants
are easily made explicit) based on the Borel-Carathéodory Lemma (as in [MV07,
Lemma 6.2]), any function f analytic and zero-free on a disc Cs0 ,R = {s : |s − s0 | ≤
R} of radius R > 0 around s0 satisfies
f 0 (s)
 
∗ 2R log M/|f (s0 )|
=O (9.16)
f (s) (R − r)2
for all s with |s − s0 | ≤ r, where 0 < r < R and M is the maximum of |f (z)| on
Cs0 ,R . Assuming L(s, χ) has no non-trivial zeros off the critical line with |=(s)| ≤ H,
where H > 1/2, we set s0 = 1/2 + H, r = H − 1/2, and let R → H − . We obtain

L0 (1, χ) maxs∈Cs0 ,H |L(s, χ)|


 
= O∗ 8H log . (9.17)
L(1, χ) |L(s0 , χ)|
Now
Y Y (1 − p−2s0 )−1 ζ(2s0 )
|L(s0 , χ)| ≥ (1 + p−s0 )−1 = −s0 )−1
= .
p p
(1 − p ζ(s0 )

Since s0 = 1/2 + H, Cs0 ,H is contained in {s ∈ C : <(s) > 1/2} for any value of H.
We choose (somewhat arbitrarily) H = 5/8.
170 CHAPTER 9. EXPLICIT FORMULAS

By partial summation, for s = σ + it with 1/2 ≤ σ < 1 and any N ∈ Z+ ,


 
X X
L(s, χ) = χ(m)n−s −  χ(m) (N + 1)−s
n≤N m≤N
 
X X (9.18)
+  χ(m) (n−s − (n + 1)−s+1 )
n≥N +1 m≤n
1−1/2
 
∗ N 1−σ −σ
=O +N + M (q)N ,
1 − 1/2
P
where M (q) = maxn m≤n χ(m) . We set N = M (q)/3, and obtain
√ p
|L(s, χ)| ≤ 2M (q)N −1/2 = 2 3 M (q). (9.19)

We put this into (9.17) and are done.


Let M (q) be as in the statement of Lem. 9.1.2. Since the sum of χ(n) (χ mod q,
q > 1) over any interval of length q is 0, it is easy to see that M (q) ≤ q/2. We also
have the following explicit version of the Pólya-Vinogradov inequality:
( √ √ √
2
π 2 q log q + π42 q log log q + 32 q if χ(−1) = 1,
M (q) ≤ 1 √ 1√ √ (9.20)
2π q log q + π q log log q + q if χ(−1) = 1.

Taken together with M (q) ≤ q/2, this implies that

M (q) ≤ q 4/5 (9.21)

for all q ≥ 1, and also that


M (q) ≤ 2q 3/5 (9.22)
for all q ≥ 1.
Notice, lastly, that
2π eγ · 2π
log + γ ≤ log q + log
q 32
for all q ≥ 3. (There are no primitive characters modulo 2, so we can omit q = 2.)
We conclude that, for χ primitive and non-trivial,
2π L0 (1, χ) eγ · 2π 5 4
log +γ− ≤ log + log q + log q 5 + 15.07017
q L(1, χ) 32 2
≤ 3 log q + 15.289.

Obviously, 15.289 is more than log 2π, the bound for χ trivial. Hence, the absolute
value of the quantity R in the statement of Lemma 9.1.1 is at most

|η(0)|(3 log q + 15.289) + |c0 | (9.23)


9.1. A GENERAL EXPLICIT FORMULA 171

for all primitive χ.


It now remains to bound the sum ρ Gδ (ρ)xρ in (9.1). Clearly
P

X X
Gδ (ρ)xρ ≤ |Gδ (ρ)| · x<(ρ) .
ρ ρ

Recall that these are sums over the non-trivial zeros ρ of L(s, χ).
We first prove a general lemma on sums of values of functions on the non-trivial
zeros of L(s, χ). This is little more than partial summation, given a (classical) bound
for the number of zeroes N (T, χ) of L(s, χ) with |=(s)| ≤ T . The error term becomes
particularly simple if f is real-valued and decreasing; the statement is then practically
identical to that of [Leh66, Lemma 1] (for χ principal), except for the fact that the error
term is improved here.
Lemma 9.1.3. Let f : R+ → C be piecewise C 1 . Assume limt→∞ f (t)t log t = 0.
Let χ be a primitive character mod q, q ≥ 1; let ρ denote the non-trivial zeros ρ of
L(s, χ). Then, for any y ≥ 1,
Z ∞
X 1 qT
f (=(ρ)) = f (T ) log dT
2π y 2π
ρ non-trivial
=(ρ)>y (9.24)
 Z ∞ 
1 ∗ 0
+ O |f (y)|gχ (y) + |f (T )| · gχ (T )dT ,
2 y

where
gχ (T ) = 0.5 log qT + 17.7 (9.25)
If f is real-valued and decreasing on [y, ∞), the second line of (9.24) equals
 Z ∞ 
∗ 1 f (T )
O dT .
4 y T

Proof. Write N (T, χ) for the number of non-trivial zeros of L(s, χ) satisfying |=(s)| ≤
T . Write N + (T, χ) for the number of (necessarily non-trivial) zeros of L(s, χ) with
0 < =(s) ≤ T . Then, for any f : R+ → C with f piecewise differentiable and
limt→∞ f (t)N (T, χ) = 0,
X Z ∞
f (=(ρ)) = f (T ) dN + (T, χ)
ρ:=(ρ)>y y
Z ∞
=− f 0 (T )(N + (T, χ) − N + (y, χ))dT
y
Z ∞
1
=− f 0 (T )(N (T, χ) − N (y, χ))dT.
2 y

Now, by [Ros41, Thms. 17–19] and [McC84a, Thm. 2.1] (see also [Tru, Thm. 1]),
T qT
N (T, χ) = log + O∗ (gχ (T )) (9.26)
π 2πe
172 CHAPTER 9. EXPLICIT FORMULAS

for T ≥ 1, where gχ (T ) is as in (9.25). (This is a classical formula; the references


serve to prove the explicit form (9.25) for the error term gχ (T ).)
Thus, for y ≥ 1,

1 ∞ 0
Z  
X T qT y qy
f (=(ρ)) = − f (T ) log − log dT
2 y π 2πe π 2πe
ρ:=(ρ)>y
 Z ∞  (9.27)
1 ∗ 0
+ O |f (y)|gχ (y) + |f (T )| · gχ (T )dT .
2 y

Here
Z ∞   Z ∞
1 T qT y qy 1 qT
− f 0 (T ) log − log dT = f (T ) log dT. (9.28)
2 y π 2πe π 2πe 2π y 2π

If f is real-valued and decreasing (and so, by limt→∞ f (t) = 0, non-negative),


Z ∞ Z ∞
0
|f (y)|gχ (y) + |f (T )| · gχ (T )dT = f (y)gχ (y) − f 0 (T )gχ (T )dT
y y
Z ∞
f (T )
= 0.5 dT,
y T

since gχ0 (T ) ≤ 0.5/T for all T ≥ T0 .


Let us bound the part of the sum ρ Gδ (ρ)xρ corresponding to ρ with bounded
P

|=(ρ)|. The bound we will give is proportional to T0 log qT0 , whereas a very naive
approach (based on the trivial bound |Gδ (σ + iτ )| ≤ |G0 (σ)|) would give a bound
proportional to T0 log qT0 .
√ 2
We could obtain a bound proportional to T0 log qT0 for η(t) = tk e−t /2 by using
Theorem 8.0.1. Instead, we will give a bound of that same quality valid for η essentially
arbitrary simply by using the fact that the Mellin transform is an isometry (preceded by
an application of Cauchy-Schwarz).
+
√ Let η : R0 → R be such that both η(t) and (log t)η(t) lie in L1 ∩ L2
Lemma 9.1.4.
and η(t)/ t lies in L1 (with respect to dt). Let δ ∈ R. Let Gδ (s) be the Mellin
transform of η(t)e(δt).
Let χ be a primitive character mod q, q ≥ 1. Let T0 ≥ 1. Assume that all non-
trivial zeros ρ of L(s, χ) with |=(ρ)| ≤ T0 lie on the critical line. Then
X
|Gδ (ρ)|
ρ non-trivial
|=(ρ)|≤T0

is at most
p √ p
(|η|2 + |η · log |2 ) T0 log qT0 + (17.21|η · log |2 − (log 2π e)|η|2 ) T0
√ (9.29)
+ η(t)/ t · (1.32 log q + 34.5)
1
9.1. A GENERAL EXPLICIT FORMULA 173

Proof. For s = 1/2 + iτ , we have the trivial bound


Z ∞ √
dt
|Gδ (s)| ≤ |η(t)|t1/2 = η(t)/ t , (9.30)
0 t 1

where Fδ is as in (9.47). We also have the trivial bound


Z ∞ Z ∞
0 s dt dt
|Gδ (s)| = (log t)η(t)t ≤ |(log t)η(t)|tσ = (log t)η(t)tσ−1 1
0 t 0 t
(9.31)
for s = σ + iτ .
Let us start by bounding the contribution of very low-lying zeros (|=(ρ)| ≤ 1). By
(9.26) and (9.25),
1 q
N (1, χ) = log + O∗ (0.5 log q + 17.7) = O∗ (0.819 log q + 16.8).
π 2πe
Therefore,
X
|Gδ (ρ)| ≤ η(t)t−1/2 · (0.819 log q + 16.8).
1
ρ non-trivial
|=(ρ)|≤1

Let us now consider zeros ρ with |=(ρ)| > 1. Apply Lemma 9.1.3 with y = 1 and
(
|Gδ (1/2 + it)| if t ≤ T0 ,
f (t) =
0 if t > T0 .

This gives us that


Z T0
X 1 qT
f (=(ρ)) = f (T ) log
dT
π 1 2π
ρ:1<|=(ρ)|≤T0
 Z ∞  (9.32)
∗ 0
+ O |f (1)|gχ (1) + |f (T )| · gχ (T ) dT ,
1

where we are using the fact that f (σ + iτ ) = f (σ − iτ ) (because η is real-valued). By


Cauchy-Schwarz,
s s 2
1 T0 1 T0 1 T0
Z Z Z 
qT 2
qT
f (T ) log dT ≤ |f (T )| dT · log dT .
π 1 2π π 1 π 1 2π
Now
Z ∞ 2 Z ∞
1 T0
Z 
2 1 1
|f (T )| dT ≤ Gδ + iT dT ≤ |e(δt)η(t)|2 dt = |η|22
π 1 2π −∞ 2 0

by Plancherel (as in (2.6)). We also have


2 qT0 2 !
Z T0  Z 
qT 2π 2π qT0
log dT ≤ (log t)2 dt ≤ log +1 · T0 .
1 2π q 0 2πe
174 CHAPTER 9. EXPLICIT FORMULAS

Hence s
Z T0 2
1 qT qT0 p
f (T ) log dT ≤ log + 1 · |η|2 T0 .
π 1 2π 2πe
Again by Cauchy-Schwarz,
s s
Z ∞ Z ∞ Z T0
0 1 1
|f (T )| · gχ (T ) dT ≤ |f 0 (T )|2 dT · |gχ (T )|2 dT .
1 2π −∞ π 1

Since |f 0 (T )| = |G0δ (1/2 + iT )| and (M η)0 (s) is the Mellin transform of log(t) ·
e(δt)η(t) (by (2.10)),
Z ∞
1
|f 0 (T )|2 dT = |η(t) log(t)|2 .
2π −∞
Much as before,
Z T0 Z T0
2
|gχ (T )| dT ≤ (0.5 log qT + 17.7)2 dT
1 0
= (0.25(log qT0 )2 + 17.2(log qT0 ) + 296.09)T0 .
Summing, we obtain
Z ∞
1 T0
Z
qT
f (T ) log dT + |f 0 (T )| · gχ (T ) dT
π 1 2π 1
    p
qT0 1 log qT0
≤ log + |η|2 + + 17.21 |η(t)(log t)|2 T0
2πe 2 2
Finally, by (9.30) and (9.25),

|f (1)|gχ (1) ≤ η(t)/ t · (0.5 log q + 17.7).
1

By (9.32) and the assumption that all non-trivial zeros with |=(ρ)| ≤ T0 lie on the line
<(s) = 1/2, we conclude that
X p
|Gδ (ρ)| ≤ (|η|2 + |η · log |2 ) T0 log qT0
ρ non-trivial
1<|=(ρ)|≤T0
√ p
+ (17.21|η · log |2 − (log 2π e)|η|2 ) T0

+ η(t)/ t · (0.5 log q + 17.7).
1

All that remains is to bound the contribution to ρ Gδ (ρ)xρ corresponding to all


P
zeroes ρ with |=(ρ)| > T0 . This will do by another application of Lemma 9.1.3,
combined with bounds on Gδ (ρ) for =(ρ) large. This is the only part that will require
us to take a look at the actual smoothing function η we are working with; it is at this
point, not before, that we actually have to look at each of our options for η one by one.
9.2. SUMS AND DECAY FOR THE GAUSSIAN 175

9.2 Sums and decay for the Gaussian


It is now time to derive our bounds for the Gaussian smoothing. As we wereP saying,
there is really only one thing left to do, namely, an estimate for the sum ρ |Fδ (ρ)|
over all zeros ρ with |=(ρ)| > T0 .
2
Lemma 9.2.1. Let η♥ (t) = e−t /2 . Let x ∈ R+ , δ ∈ R. Let χ be a primitive character
mod q, q ≥ 1. Assume that all non-trivial zeros ρ of L(s, χ) with |=(ρ)| ≤ T0 satisfy
<(s) = 1/2. Assume that T0 ≥ 50.
Write Fδ (s) for the Mellin transform of η(t)e(δt). Then

δ 2 −0.1065( π|δ|
 
X qT0 −0.1598T0
T0
)
2
|Fδ (ρ)| ≤ log · 3.53e + 22.5 e .
ρ
2π T0
|=(ρ)|>T0

Here we have preferred to give a bound with a simple form. It is probably feasible
to derive from Theorem 8.0.1 a bound essentially proportional to e−E(ρ)T0 , where ρ =
T0 /(πδ)2 and E(ρ) is as in (8.2). (As we discussed in §8.5, E(ρ) behaves as e−(π/4)T0
2
for ρ large and as e−0.125(T0 /(πδ)) for ρ small.)
Proof. First of all,
X X
|Fδ (ρ)| = (|Fδ (ρ)| + |Fδ (1 − ρ)|) ,
ρ ρ
|=(ρ)|>T0 =(ρ)>T0

by the functional equation (which implies that non-trivial zeros come in pairs ρ, 1 − ρ).
Hence, by a somewhat brutish application of Cor. 8.0.2,
X X
|Fδ (ρ)| ≤ f (=(ρ)), (9.33)
ρ ρ
|=(ρ)|>T0 =(ρ)>T0

where 2
f (τ ) = 3.001e−0.1065( πδ ) + 3.286e−0.1598|τ | .
τ
(9.34)
Obviously, f (τ ) is a decreasing function of τ for τ ≥ T0 .
We now apply Lemma 9.1.3. We obtain that
Z ∞  
X 1 qT 1
f (=(ρ)) ≤ f (T ) log + dT. (9.35)
ρ T0 2π 2π 4T
=(ρ)>T0

We just need to estimate some integrals. For any y ≥ 1, c, c1 > 0,


Z ∞ Z ∞    Z ∞ −ct
c1  −ct 1 1 e
log t + e dt ≤ log t − e−ct dt + + c1 dt
y t y ct c y t
(log y)e−cy
 
1
= + + c1 E1 (cy),
c c
176 CHAPTER 9. EXPLICIT FORMULAS

R∞ R∞
where E1 (x) = x e−t dt/t. Clearly, E1 (x) ≤ x e−t dt/x = e−x /x. Hence
Z ∞     −cy
c1  −ct 1 1 e
log t + e dt ≤ log y + + c1 .
y t c y c
We conclude that
Z ∞  
−0.1598t 1 qt 1
e log + dt
T0 2π 2π 4t
Z ∞  q Z ∞
1 π/2 −ct log 2π
≤ log t + e dt + e−ct dt (9.36)
2π T0 t 2πc T0
   
1 q 1 π 1
= log T0 + log + + e−cT0
2πc 2π c 2 T0
with c = 0.1598. Since T0 ≥ 50 and q ≥ 1, this is at most
qT0 −cT0
1.072 log e . (9.37)

Now let us deal with the Gaussian term. (It appears only if T0 < (3/2)(πδ)2 , as
otherwise |τ | ≥ (3/2)(πδ)2 holds whenever |τ | ≥ T0 .) For any y ≥ e, c ≥ 0,
Z ∞ Z ∞ Z ∞ 2
2 1 2 1 −t2 e−cy
e−ct dt = √ √ e−t dt ≤ te dt ≤ , (9.38)
y c cy cy √cy 2cy
2
∞ Z ∞ −t 2
e−ct E1 (cy 2 ) e−cy
Z
e
dt = dt = ≤ , (9.39)
y t cy 2 2t 2 2cy 2
Z ∞ Z ∞ 
−ct2 log t − 1 −ct2 log y −cy2
(log t)e dt ≤ log t + 2
e dt = e . (9.40)
y y 2ct 2cy
Hence
Z ∞  
2 1 qT 1
e−0.1065( πδ )
T
log + dT
T0 2π 2π 4T
∞  
|δ|
Z
2 q|δ|t 1
= e−0.1065t log + dt
T0
π|δ|
2 2 4t (9.41)
 
|δ| T0 |δ| q|δ|
log log 1 T0
 −c0 ( π|δ| 2
≤
2 π|δ|
+ 2 2
+ )
2  e

T0 T0
2c0 π|δ| 2c0 π|δ|

T0
8c0 π|δ|

with c0 = 0.1065. Since T0 ≥ 50 and q ≥ 1,


2π π 1 qT0
≤ ≤ 0.0152 · log
8T0 200 2 2π
Thus, the last line of (9.41) is less than
|δ| qT0
log 0 T0 2
δ2 qT0 −c0 ( π|δ|
T0 2
1.0152 2
2c0 T0

e−c ( π|δ| ) = 7.487 · log ·e ) . (9.42)
π|δ|
T0 2π
9.2. SUMS AND DECAY FOR THE GAUSSIAN 177

Again by T0 ≥ 4π 2 |δ|, we see that 1.0057π|δ|/(4cT0 ) ≤ 1.0057/(16cπ) ≤ 0.18787.


To obtain our final bound, we simply sum (9.37) and (9.42), after multiplying them
by the constants 3.286 and 3.001 in (9.34). We conclude that the integral in (9.35) is at
most
δ 2 −0.1065( π|δ|
 
−0.1598T0
T0
)
2
qT0
3.53e + 22.5 e log .
T0 2π

2
We need to record a few norms related to the Gaussian η♥ (t) = e−t /2 before we
proceed. Recall we are working with the one-sided Gaussian, i.e., we set η♥ (t) = 0
for t < 0. Symbolic integration then gives
Z ∞ √
2 π
|η♥ |22 = e−t dt = ,
0 2
Z ∞ √
0 2 2 π
|η♥ |2 = (te−t /2 )2 dt = ,
0 4
Z ∞ (9.43)
2 −t2 2
|η♥ · log |2 = e (log t) dt
0

π 2
π + 2γ 2 + 8γ log 2 + 8(log 2)2 ≤ 1.94753,

=
16
√ Z ∞ −t2 /2
e Γ(1/4)
|η♥ (t)/ t|1 = √ dt = 3/4 ≤ 2.15581
t 2
√ √ Z0 ∞ √
0 t 2 Γ(3/4)
|η♥ (t)/ t| = |η♥ (t) t|1 = e− 2 tdt = 1/4 ≤ 1.03045 (9.44)
0 2
Z ∞
t2 3
0
η♥ (t)t1/2 = η♥ (t)t3/2 = e− 2 t 2 dt = 1.07791.
1 1 0
We can now state what is really our main result for the Gaussian smoothing. (The
version in §7.1 will, as we shall later see, follow from this, given numerical inputs.)
2
Proposition 9.2.2. Let η(t) = e−t /2 . Let x ≥ 1, δ ∈ R. Let χ be a primitive character
mod q, q ≥ 1. Assume that all non-trivial zeros ρ of L(s, χ) with |=(ρ)| ≤ T0 lie on
the critical line. Assume that T0 ≥ 50.
Then
∞     (
X δ n ηb(−δ)x + O∗ (errη,χ (δ, x)) · x if q = 1,
Λ(n)χ(n)e n η =
n=1
x x O∗ (errη,χ (δ, x)) · x if q > 1,
(9.45)
where
δ2
 
qT0 T0 2
errη,χ (δ, x) = log · 3.53e−0.1598T0 + 22.5 e−0.1065( π|δ| )
2π T0
p p 1
+ (2.337 T0 log qT0 + 21.817 T0 + 2.85 log q + 74.38)x− 2
+ (3 log q + 14|δ| + 17)x−1 + (log q + 6) · (1 + 5|δ|) · x−3/2 .
178 CHAPTER 9. EXPLICIT FORMULAS

Proof. Let Fδ (s) be the Mellin transform of η♥ (t)e(δt). By Lemmas 9.1.4 (with Gδ =
Fδ ) and Lemma 9.2.1,
X
Fδ (ρ)xρ
ρ non-trivial

is at most (9.29) (with η = η♥ ) times x, plus

|δ|2 −0.1065( π|δ|


 
qT0 T0 2
) · x.
log · 3.53e−0.1598T0 + 22.5 e
2π T0
By the norm computations in (9.43) and (9.44), we see that (9.29) is at most
p p
2.337 T0 log qT0 + 21.817 T0 + 2.85 log q + 74.38.

Let us now apply Lemma 9.1.1. We saw that the value of R in Lemma 9.1.1 is
bounded by (9.23). We know that η♥ (0) = 1. Again by (9.43) and (9.44), the quantity
c0 defined in (9.3) is at most 1.4056 + 13.3466|δ|. Hence

|R| ≤ 3 log q + 13.347|δ| + 16.695.

Lastly,
0
|η♥ |2 + 2π|δ||η♥ |2 ≤ 0.942 + 4.183|δ| ≤ 1 + 5|δ|.
Clearly
(6.01 − 6) · (1 + 5|δ|) + 13.347|δ| + 16.695 < 14|δ| + 17,
and so we are done.

9.3 The case of η∗ (t)


We will now work with a weight based on the Gaussian:
( 2
t2 e−t /2 if t ≥ 0,
η(t) = (9.46)
0 if t < 0.

The fact that this vanishes at t = 0 actually makes it easier to work with at several
levels.
Its Mellin transform is just a shift of that of the Gaussian. Write
t2
Fδ (s) = (M (e− 2 e(δt)))(s),
(9.47)
Gδ (s) = (M (η(t)e(δt)))(s).

Then, by the definition of the Mellin transform,

Gδ (s) = Fδ (s + 2).

We start by bounding the contribution of zeros with large imaginary part, just as
before.
9.3. THE CASE OF η∗ (T ) 179

2
Lemma 9.3.1. Let η(t) = t2 e−t /2 . Let x ∈ R+ , δ ∈ R. Let χ be a primitive character
mod q, q ≥ 1. Assume that all non-trivial zeros ρ of L(s, χ) with |=(ρ)| ≤ T0 satisfy
<(s) = 1/2. Assume that T0 ≥ max(10π|δ|, 50).
Write Gδ (s) for the Mellin transform of η(t)e(δt). Then
 2 
X qT0 −0.1598T0
T0
−0.1065· (πδ)
|Gδ (ρ)| ≤ T0 log · 6.11e + 1.578e 2
.
ρ

|=(ρ)|>T0

Proof. We start by writing


X X
|Gδ (ρ)| = (|Fδ (ρ + 2)| + |Fδ ((1 − ρ) + 2)|) ,
ρ ρ
|=(ρ)|>T0 =(ρ)>T0

where we are using Gδ (ρ) = Fδ (ρ + 2) and the fact that non-trivial zeros come in pairs
ρ, 1 − ρ.
By Cor. 8.0.2 with k = 2,
X X
|Gδ (ρ)| ≤ f (=(ρ)),
ρ ρ
|=(ρ)|>T0 =(ρ)>T0

where
 2
 |τ | 2
|τ |
e−0.1065( πδ )
κ2,0
κ2,1 |τ |e−0.1598|τ | + if |τ | < 32 (πδ)2 ,

f (τ ) = 4 πδ (9.48)
κ |τ |e−0.1598|τ | if |τ | ≥ 32 (πδ)2 ,
2,1

where κ2,0 = 7.96 and κ2,1 = 5.13. We are including the term |τ |e−0.1598|τ | in both
cases in part because we cannot be bothered to take it out (just as we could not be
bothered in the proof of Lem. 9.2.1) and in part to ensure that f (τ ) is a decreasing
function of τ for τ ≥ T0 .
We can now apply Lemma 9.1.3. We obtain, again,
Z ∞  
X 1 qT 1
f (=(ρ)) ≤ f (T ) log + dT. (9.49)
ρ T0 2π 2π 4T
=(ρ)>T0

Just as before, we will need to estimate some integrals.


For any y ≥ 1, c, c1 > 0 such that log y > 1/(cy),
Z ∞  
−ct y 1
te dt = + 2 e−cy ,
y c c
Z ∞ Z ∞   
c1  −ct a−1 1 a
t log t + e dt ≤ t+ log t − − 2 e−ct dt
y t y c c c t (9.50)
y a  −cy
= + 2 e log y,
c c
180 CHAPTER 9. EXPLICIT FORMULAS

where
log y 1 c1
c + c + y
a= log y 1
.
c − c2 y

Setting c = 0.1598, c1 = π/2, y = T0 ≥ 50, we obtain that


Z ∞  
1 qT 1
log + T e−0.1598T dT
T0 2π 2π 4T
      (9.51)
1 q T0 1 T0 a
≤ log · + 2 + + 2 log T0 e−0.1598T0
2π 2π c c c c

and
log T0 1 π/2
0.1598 + 0.1598 + T0
a= log T0 1
≤ 1.299.
0.1598 − 0.15982 T0

It is easy to see that ratio of the expression within parentheses on the right side of
(9.51) to T0 log(qT0 /2π) increases as q decreases and, if we hold q fixed, decreases as
T0 ≥ 2π increases; thus, it is maximal for q = 1 and T0 = 50. Multiplying (9.51) by
κ2,1 = 5.13 and simplifying by the assumption T0 ≥ 50, we obtain that
Z ∞  
−0.1598T 1 qT0 1 qT0 −0.1598T0
5.13T e log + ·e dT ≤ 6.11T0 log
.
T0 2π 2π 4T 2π
(9.52)
Now let us examine the Gaussian term. First of all – when does it arise? If T0 ≥
(3/2)(πδ)2 , then |τ | ≥ (3/2)(πδ)2 holds whenever |τ | ≥ T0 , and so (9.48) does not
give us a Gaussian term. Recall that T0 ≥ 10π|δ|, which means that |δ| ≤ 20/(3π)
implies that T0 ≥ (3/2)(πδ)2 . We can thus assume from now on that |δ| > 20/(3π),
since otherwise there is no Gaussian term to treat.
For any y ≥ 1, c, c1 > 0,
Z ∞ Z ∞    
2 −ct2 1 −ct2 y 1 2
t e dt < t2 + e dt = + 2 · e−cy ,
y y 4c2 t2 2c 4c y

Z ∞ Z ∞  
2 at log et log et a 2
(t2 log t + c1 t) · e−ct dt ≤ t2 log t + − − 2 e−ct dt
y y 2c 2c 4c t
(2cy + a) log y + a −cy2
= ·e ,
4c2
where
c1
c1 y + log ey
2c 1 c1 y + 4c21y2 1 2c1 c
1
2cy log ey + 4c2 y 2
a= y log ey 1
= + y log ey 1
= + + y log ey
.
2c − 4c2 y
y 2c − 4c2 y y log ey 2c − 4c12 y

(Note that a decreases as y ≥ y0 increases, provided that log ey0 > 1/(2cy02 ).) Setting
9.3. THE CASE OF η∗ (T ) 181

c = 0.1065, c1 = 1/(2|δ|) ≤ 3/16 and y = T0 /(π|δ|) ≥ 4π, we obtain


Z ∞  
1 q|δ|t 1 2
log + t2 e−0.1065t dt
T0
π|δ|
2π 2 4π|δ|t
   
1 q|δ| T0 1 T0 2
≤ log · + 2 · e−0.1065( π|δ| )
2π 2 2πc|δ| 4c · 10
 
T0 T0
1 2c π|δ| + a log π|δ| +a T0 2
−0.1065( π|δ| )
+ · · e
2π 4c2
and
2·20 −1 1

1 3π · 10 + 4·0.10652 ·102
a≤ + 10 log 10e 1
≤ 0.117.
10 2·0.1065 − 2
4·0.1065 ·10

Multiplying by (κ2,0 /4)π|δ|, we get that


Z ∞  2  
κ2,0 T 2 1 qT0 1
e−0.1065( π|δ| )
T
log + dT (9.53)
T0 4 π|δ| 2π 2π 4T
T0 2
is at most e−0.1065( π|δ| ) times
 
q|δ| T0 eT0
(1.487T0 + 2.194|δ|) · log + 1.487T0 log + 2.566|δ| log
2 π|δ| π|δ|
1
1 + log T0 /π|δ|
! (9.54)
qT0 qT0
≤ 1.487 + 2.566 · T0 log ≤ 1.578 · T0 log ,
T0 /|δ| 2π 2π

where we are using several times the assumption that T0 ≥ 4π 2 |δ| (and, in one occa-
sion, the fact that |δ| > 20/(3π) > 2).
We sum (9.52) and the estimate for (9.53) we have just got to reach our conclusion.

Again, we record some norms obtained by symbolic integration: for η as in (9.46),

3√ 7√
|η|22 = π, |η 0 |22 = π,
8
√ 16
π
|η · log |22 = 8(3γ − 8) log 2 + 3π 2 + 6γ 2 + 24(log 2)2 + 16 − 32γ

64
≤ 0.16364,
√ 21/4 Γ(1/4) √ 3
|η(t)/ t|1 = ≤ 1.07791, |η(t) t|1 = 23/4 Γ(3/4) ≤ 1.54568,

4 4
√ Z 2 Z ∞
t2 t2
|η 0 (t)/ t|1 = t3/2 e− 2 dt − √ t3/2 e− 2 dt ≤ 1.48469,
0 2
0

|η (t) t|1 ≤ 1.72169.
(9.55)
182 CHAPTER 9. EXPLICIT FORMULAS

2
Proposition 9.3.2. Let η(t) = t2 e−t /2 . Let x ≥ 1, δ ∈ R. Let χ be a primitive
character mod q, q ≥ 1. Assume that all non-trivial zeros ρ of L(s, χ) with |=(ρ)| ≤ T0
lie on the critical line. Assume that T0 ≥ max(10π|δ|, 50).
Then

(
ηb(−δ)x + O∗ (errη,χ (δ, x)) · x if q = 1,
 
X δ
Λ(n)χ(n)e n η(n/x) =
n=1
x O∗ (errη,χ (δ, x)) · x if q > 1,
(9.56)
where
 2 
qT0 −0.1598T0
T0
−0.1065· (πδ)
errη,χ (δ, x) = T0 log · 6.11e + 1.578e 2


 p p 
+ 1.22 T0 log qT0 + 5.056 T0 + 1.423 log q + 37.19 · x−1/2
+ (3 + 11|δ|)x−1 + (log q + 6) · (1 + 6|δ|) · x−3/2 .
(9.57)
Proof. We proceed as in the proof of Prop. 9.2.2. The contribution of Lemma 9.3.1 is
 2 
qT0 T0
−0.1065· (πδ)
T0 log · 6.11e−0.1598T0 + 1.578e 2
· x,

whereas the contribution of Lemma 9.1.4 is at most
p p √
(1.22 T0 log qT0 + 5.056 T0 + 1.423 log q + 37.188) x.

Let us now apply Lemma 9.1.1. Since η(0) = 0, we have

R = O∗ (c0 ) = O∗ (2.138 + 10.99|δ|).

Lastly,
|η 0 |2 + 2π|δ||η|2 ≤ 0.881 + 5.123|δ|.

Now that we have Prop. 9.3.2, we can derive from it similar bounds for a smoothing
defined as the multiplicative convolution of η with something else. In general, for
ϕ1 , ϕ2 : [0, ∞) → C, if we know how to bound sums of the form
X
Sf,ϕ1 (x) = f (n)ϕ1 (n/x), (9.58)
n

we can bound sums of the form Sf,ϕ1 ∗M ϕ2 , simply by changing the order of summation
and integration:
X n
Sf,ϕ1 ∗M ϕ2 = f (n) · (ϕ1 ∗M ϕ2 )
n
x
Z ∞X  n  Z ∞
dw dw
= f (n)ϕ1 ϕ2 (w) = Sf,ϕ1 (wx)ϕ2 (w) .
0 n
wx w 0 w
(9.59)
9.3. THE CASE OF η∗ (T ) 183

This is particularly nice if ϕ2 (t) vanishes in a neighbourhood of the origin, since then
the argument wx of Sf,ϕ1 (wx) is always large.
2
We will use ϕ1 (t) = t2 e−t /2 , ϕ2 (t) = η1 ∗M η1 , where η1 is 2 times the char-
acteristic function of the interval [1/2, 1]. The motivation for the choice of ϕ1 and ϕ2
is clear: we have just got bounds based on ϕ1 (t) in the major arcs, and we obtained
minor-arc bounds for the weight ϕ2 (t) in Part I.
2
Corollary 9.3.3. Let η(t) = t2 e−t /2 , η1 = 2 · I[1/2,1] , η2 = η1 ∗M η1 . Let η∗ =
η2 ∗M η. Let x ∈ R+ , δ ∈ R. Let χ be a primitive character mod q, q ≥ 1. Assume
that all non-trivial zeros ρ of L(s, χ) with |=(ρ)| ≤ T0 lie on the critical line. Assume
that T0 ≥ max(10π|δ|, 50).
Then

(
ηb∗ (−δ)x + O∗ (errη∗ ,χ (δ, x)) · x if q = 1,
 
X δ
Λ(n)χ(n)e n η∗ (n/x) =
n=1
x O∗ (errη∗ ,χ (δ, x)) · x if q > 1,
(9.60)
where
 2 
qT0 T0
−0.1065· (πδ)
errη,χ∗ (δ, x) = T0 log · 6.11e−0.1598T0 + 0.0102 · e 2


 p p  1
+ 1.679 T0 log qT0 + 6.957 T0 + 1.958 log q + 51.17 · x− 2
+ (6 + 22|δ|)x−1 + (log q + 6) · (3 + 17|δ|) · x−3/2 .
(9.61)
Proof. The left side of (9.60) equals
Z ∞X ∞   
δn n  dw
Λ(n)χ(n)e η η2 (w)
0 n=1
x wx w

Z ∞
1X   
δwn n  dw
= Λ(n)χ(n)e η η2 (w) ,
1
4 n=1
wx wx w

since η2 is supported on [−1/4, 1]. By Prop. 9.3.2, the main term (if q = 1) contributes
Z 1 Z ∞
dw
ηb(−δw)xw · η2 (w) =x ηb(−δw)η2 (w)dw
1
4
w 0
Z ∞Z ∞ Z ∞Z ∞  
r dr
=x η(t)e(δwt)dt · η2 (w)dw = x η e(δr) η2 (w)dw
0 −∞ 0 −∞ w w
Z ∞ Z ∞   
r dw
=x η η2 (w) e(δr)dr = ηb∗ (−δ) · x.
−∞ 0 w w
The error term is
Z 1 Z 1
dw
errη,χ (δw, wx) · wx · η2 (w) =x· errη,χ (δw, wx)η2 (w)dw. (9.62)
1
4
w 1
4
184 CHAPTER 9. EXPLICIT FORMULAS

Using the fact that



4 log 4w
 if w ∈ [1/4, 1/2],
η2 (w) = 4 log w−1 if w ∈ [1/2, 1],

0 otherwise,

we can easily check that


Z ∞ Z ∞
η2 (w)dw = 1, w−1/2 η2 (w)dw ≤ 1.37259,
0 0
Z ∞ Z ∞
−1
w 2
η2 (w)dw = 4(log 2) ≤ 1.92182, w−3/2 η2 (w)dw ≤ 2.74517
0 0

and, by rigorous numerical integration from 1/4 to 1/2 and from 1/2 to 1 (using, e.g.,
VNODE-LP [Ned06]),
Z ∞
2
e−0.1065·10 ( w2 −1) η2 (w)dw ≤ 0.006446.
1

We then see that (9.57) and (9.62) imply (9.61).

9.4 The case of η+ (t)


We will work with
2
η(t) = η+ (t) = hH (t) · tη♥ (t) = hH (t) · te−t /2
, (9.63)

where hH is as in (7.6). We recall that hH is a band-limited approximation to the


function h defined in (7.5) – to be more precise, M hH (it) is the truncation of M h(it)
to the interval [−H, H].
We are actually defining h, hH and η in a slightly different way from what was done
in the first version of [Hela]. The difference is instructive. There, η(t) was defined as
2
hH (t)e−t /2 , and hH was a band-limited approximation to a function h defined as in
(7.5), but with t3 (2 − t)3 instead of t2 (2 − t)3 . The reason for our new definitions is
that now the truncation of M h(it) will not break the holomorphy of M η, and so we
will be able to use the general results we proved in §9.1.
In essence, M h will still be holomorphic because the Mellin transform of tη♥ (t) is
holomorphic in the domain we care about, unlike the Mellin transform of η♥ (t), which
does have a pole at s = 0.
As usual, we start by bounding the contribution of zeros with large imaginary part.
The procedure is much as before: since η+ (t) = ηH (t)η♥ (t) , the Mellin transform
2
M η+ is a convolution of M (te−t /2 ) and something of support in [−H, H]i, namely,
M ηH restricted to the imaginary axis. This means that the decay of M η+ is (at worst)
2
like the decay of M (te−t /2 ), delayed by H.
9.4. THE CASE OF η+ (T ) 185

Lemma 9.4.1. Let η = η+ be as in (9.63) for some H ≥ 25. Let x ∈ R+ , δ ∈ R. Let


χ be a primitive character mod q, q ≥ 1. Assume that all non-trivial zeros ρ of L(s, χ)
with |=(ρ)| ≤ T0 satisfy <(s) = 1/2, where T0 ≥ H + max(10π|δ|, 50).
Write Gδ (s) for the Mellin transform of η(t)e(δt). Then
 0 2 !
T0
X 0 −0.1065 πδ qT0
T00 e−0.1598T0
p
|Gδ (ρ)| ≤ 11.308 + 16.147|δ|e log ,
ρ

|=(ρ)|>T0

where T00 = T0 − H.

Proof. As usual,
X X
|Gδ (ρ)| = (|Gδ (ρ)| + |Gδ (1 − ρ)|) .
ρ ρ
|=(ρ)|>T0 =(ρ)>T0

2
Let Fδ be as in (9.47). Then, since η+ (t)e(δt) = hH (t)te−t /2
e(δt), where hH is as
in (7.6), we see by (2.9) that
Z H
1
Gδ (s) = M h(ir)Fδ (s + 1 − ir)dr,
2π −H

and so, since |M h(ir)| = |M h(−ir)|,


Z H
1
|Gδ (ρ)| + |Gδ (1 − ρ)| ≤ |M h(ir)|(|Fδ (1 + ρ − ir)| + |Fδ (2 − (ρ − ir))|)dr.
2π −H
(9.64)
We apply Cor. 8.0.2 with k = 1 and T0 −H instead of T0 , and obtain that |Fδ (ρ)|+
|Fδ (1 − ρ)| ≤ g(τ ), where
p |τ | −0.1065( πδ
τ 2
) ,
g(τ ) = κ1,1 |τ |e−0.1598|τ | + κ1,0 e (9.65)
2π|δ|

where κ1,0 = 4.903 and κ1,1 = 4.017. (As in the proof of Lemmas 9.2.1 and 9.3.1, we
are putting in extra terms so as to simplify our integrals.)
From (9.64), we conclude that

|Gδ (ρ)| + |Gδ (1 − ρ)| ≤ f (τ ),

for ρ = σ + iτ , τ > 0, where

|M h(ir)|1
f (τ ) = · g(τ − H)

is decreasing for τ ≥ T0 (because g(τ ) is decreasing for τ ≥ T0 − H). By (A.17),
|M h(ir)|1 ≤ 16.193918.
186 CHAPTER 9. EXPLICIT FORMULAS

We apply Lemma 9.1.3, and get that


Z ∞  
X 1 qT 1
|Gδ (ρ)| ≤ f (T ) log + dT
ρ T0 2π 2π 4T
|=(ρ)|>T0 (9.66)
∞  
|M h(ir)|1
Z
1 qT 1
= g(T − H) log + dT.
2π T0 2π 2π 4T

Now we just need to estimate some integrals. For any y ≥ e2 , c > 0 and κ, κ1 ≥ 0,
∞ √


y
Z
1
te−ct dt ≤ + 2√ e−cy ,
y c 2c y

∞ √

  
y
Z
κ1 −ct a
t log(t + κ) + √ e dt ≤ + 2√ log(y + κ)e−cy ,
y t c c y

where
1 1 + cκ1
a= + .
2 log(y + κ)

The contribution of the exponential term in (9.65) to (9.66) thus equals

κ1,1 |M h(ir)|1 ∞ 1 1 √
Z  
qT
log + T − H · e−0.1598(T −H) dT
2π T0 2π 2π 4T
q
1 √ −0.1598T
Z ∞  
1 log 2π
≤ 10.3532 log(T + H) + + Te dT (9.67)
T0 −H 2π 2π 4T
√ 
10.3532 T0 − H a qT0 −0.1598(T0 −H)
≤ + 2
√ log ·e ,
2π 0.1598 0.1598 T0 − H 2π

where a = 1/2+(1+0.1598π/2)/ log T0 . Since T0 −H ≥ 50 and T0 ≥ 50+25 = 75,


this is at most
p qT0 −0.1598(T0 −H)
11.308 T0 − H log ·e .

We now estimate a few more integrals so that we can handle the Gaussian term in
(9.65). For any y > 1, c > 0, κ, κ1 ≥ 0,

∞ 2
e−cy
Z
2
te−ct dt = ,
y 2c

1
! 2
∞ κ1 + log(y + κ) · e−cy
Z
−ct2 2cy
(t log(t + κ) + κ1 )e dt ≤ 1+
y y log(y + κ) 2c

Proceeding just as before, we see that the contribution of the Gaussian term in (9.65)
9.4. THE CASE OF η+ (T ) 187

to (9.66) is at most
κ1,0 |M h(ir)|1 ∞ 1
Z  
qT 1 T − H −0.1065( Tπδ −H 2
) dT
log + ·e
2π T0 2π 2π 4T 2π|δ|
|δ| ∞
Z    
H q|δ| π/2 2
≤ 12.6368 · log T + + log + T e−0.1065T dT
4 Tπ|δ|0 −H π|δ| 2 T
 
π π|δ|
|δ| 2 + 2·0.1065·(T0 −H)
 log qT0 · e−0.1065( 0πδ ) ,
T −H 2
≤ 12.6368 · 1 +
T −H T
8 · 0.1065 π|δ| log π|δ|
0 0 2π
(9.68)
Since (T0 − H)/(π|δ|) ≥ 10, this is at most
qT0 −0.1065( T0πδ
−H 2
) .
16.147|δ| log ·e

Proposition 9.4.2. Let η = η+ be as in (9.63) for some H ≥ 25. Let x ≥ 103 , δ ∈ R.


Let χ be a primitive character mod q, q ≥ 1. Assume that all non-trivial zeros ρ of
L(s, χ) with |=(ρ)| ≤ T0 lie on the critical line, where T0 ≥ H + max(10π|δ|, 50).
Then

(

  
X δ ηc
+ (−δ)x + O errη+ ,χ (δ, x) · x if q = 1,
Λ(n)χ(n)e n η+ (n/x) =
O∗ errη+ ,χ (δ, x) · x

n=1
x if q > 1,
(9.69)
where
 0 2 !
T0
0 −0.1065 πδ qT0
errη+ ,χ (δ, x) = 11.308 T00 · e−0.1598T0 + 16.147|δ|e
p
log

p p
+ (1.634 T0 log qT0 + 12.43 T0 + 1.321 log q + 34.51)x1/2 ,
+ (9 + 11|δ|)x−1 + (log q)(11 + 6|δ|)x−3/2 ,
(9.70)
where T00 = T0 − H.
Proof. We can apply Lemmas 9.1.1 and Lemma 9.1.4 because η+ (t), (log t)η+ (t) and
0 0
η+ (t) are in `2 (by (A.25), (A.28) and (A.32)) and η+ (t)tσ−1 and η+ (t)tσ−1 are in
`1 for σ in an open interval containing [1/2, 3/2] (by (A.30) and (A.33)). (Because of
(9.5), the fact that η+ (t)t−1/2 and η+ (t)t1/2 are in `1 implies that η+ (t) log t is also in
`1 , as is required by Lemma 9.1.4.)
We apply Lemmas 9.1.1, 9.1.4 and 9.4.1. We bound the norms involving η+ using
the estimates in §A.3 and §A.4. Since η+ (0) = 0 (by the definition (A.3) of η+ ), the
term R in (9.2) is at most c0 , where c0 is as in (9.3). We bound
2 p p  p p 
c0 ≤ 2.922875 Γ(1/2) + Γ(3/2) + 1.062319 Γ(5/2) + Γ(7/2)
3
4π p p 
+ |δ| · 1.062319 Γ(3/2) + Γ(5/2) ≤ 6.536232 + 9.319578|δ|
3
188 CHAPTER 9. EXPLICIT FORMULAS

using (A.30) and (A.33). By (A.25), (A.32) and the assumption H ≥ 25,
0
|η+ |2 ≤ 0.80365, |η+ |2 ≤ 10.845789.

Thus, the error terms in (9.1) total at most

6.536232+9.319578|δ| + (log q + 6.01)(10.845789 + 2π · 0.80365|δ|)x−1/2


≤ 9 + 11|δ| + (log q)(11 + 6|δ|)x−1/2 .
(9.71)
The part of the sum ρ Gδ (ρ)xρ in (9.1) corresponding to zeros ρ with |=(ρ)| >
P
T0 gets estimated by Lem 9.4.1. By Lemma 9.1.4, the part of the sum corresponding
to zeros ρ with |=(ρ)| ≤ T0 is at most
p p
(1.634 T0 log qT0 + 12.43 T0 + 1.321 log q + 34.51)x1/2 ,

where we estimate the norms |η+ |2 , |η · log |2 and |η(t)/ t|1 by (A.25), (A.28) and
(A.30).

9.5 A sum for η+ (t)2


Using a smoothing function sometimes leads to considering sums involving the square
2
of the smoothing function. In particular, in Part III, we will need a result involving η+
– something that could be slightly challenging to prove, given the way in which η+ is
defined. Fortunately, we have bounds on |η+ |∞ and other `∞ -norms (see Appendix
A.5). Our task will also be made easier by the fact that we do not have a phase e(δn/x)
this time. All in all, this will be yet another demonstration of the generality of the
framework developed in §9.1.
Proposition 9.5.1. Let η = η+ be as in (9.63), H ≥ 25. Let x ≥ 108 . Assume that
all non-trivial zeros ρ of the Riemann zeta function ζ(s) with |=(ρ)| ≤ T0 lie on the
critical line, where T0 ≥ max(2H + 25, 200).
Then
X∞ Z ∞
Λ(n)(log n)η+ 2
(n/x) = x · 2
η+ (t) log xt dt + O∗ (err`2 ,η+ ) · x log x, (9.72)
n=1 0

where
(log T1 )2
    
log T1 π
err`2 ,η+ = 0.462 + 0.909 log T1 T1 + 1.71 1 + H e− 4 T1
log x log x
p
+ (2.445 T0 log T0 + 50.04) · x−1/2
(9.73)
and T1 = T0 − 2H.
The assumption T0 ≥ 200 is stronger than what we strictly need, but, as it happens,
we could make much stronger assumptions still. Proposition 9.5.1 relies on a verifica-
tion of zeros of the Riemann zeta function; such verifications have gone up to values
of T0 much higher than 200.
9.5. A SUM FOR η+ (T )2 189

Proof. We will need to consider two smoothing functions, namely, η+,0 (t) = η+ (t)2
and η+,1 = η+ (t)2 log t. Clearly,

X ∞
X ∞
X
2
Λ(n)(log n)η+ (n/x) = (log x) Λ(n)η+,0 (n/x) + Λ(n)η+,1 (n/x).
n=1 n=1 n=1

2
Since η+ (t) = hH (t)te−t /2
,
2 2
η+,0 (r) = h2H (t)t2 e−t , η+,1 (r) = h2H (t)(log t)t2 e−t .
2
Let η+,2 = (log x)η+,0 + η+,1 = η+ (t) log xt.
We wish to apply Lemma 9.1.1. For this, we must first check that some norms are
finite. Clearly,
2 2
η+,2 (t) = η+ (t) log x + η+ (t) log t
0 0 0 2
(9.74)
η+,2 (t) = 2η+ (t)η+ (t) log x + 2η+ (t)η+ (t) log t + η+ (t)/t.

Thus, we see that η+,2 (t) is in `2 because η+ (t) is in `2 and η+ (t), η+ (t) log t are both
in `∞ (see (A.25), (A.38), (A.40)):
2 2
|η+,2 (t)|2 ≤ η+ (t) 2 log x + η+ (t) log t 2
(9.75)
≤ |η+ |∞ |η+ |2 log x + |η+ (t) log t|∞ |η+ |2 .

0 0
Similarly, η+,2 (t) is in `2 because η+ (t) is in `2 , η+ (t) is in `2 (A.32), and η+ (t),
η+ (t) log t and η+ (t)/t (see (A.41)) are all in `∞ :
0 0 0 2
η+,2 (t) 2
≤ 2η+ (t)η+ (t) 2 log x + 2η+ (t)η+ (t) log t 2
+ η+ (t)/t 2
0 0
≤ 2 |η+ |∞ η+ 2
log x + 2 |η+ (t) log t|∞ η+ 2
+ |η+ (t)/t|∞ |η+ |2 .
(9.76)
In the same way, we see that η+,2 (t)tσ−1 is in `1 for all σ in (−1, ∞) (because the same
0
is true of η+ (t)tσ−1 (A.30), and η+ (t), η+ (t) log t are both in `∞ ) and η+,2 (t)tσ−1 is
σ−1 0 σ−1
in `1 for all σ in (0, ∞) (because the same is true of η+ (t)t and η+ (t)t (A.33),
and η+ (t), η+ (t) log t, η+ (t)/t are all in `∞ ).
We now apply Lemma 9.1.1 with q = 1, δ = 0. Since η+,2 (0) = 0, the residue
term R equals c0 , which, by (9.74), is at most 2/3 times

0
√ 0
√ 
2 (|η+ |∞ log x + |η+ (t) log t|∞ ) η+ (t)/ t + η+ (t) t
1 1
 √ √ 
+ |η+ (t)/t|∞ η+ (t)/ t + η+ (t) t .
1 1

Using the bounds (A.38), (A.40), (A.41) (with the assumption H ≥ 25), (A.30) and
(A.33), we get that this means that

c0 ≤ 18.57606 log x + 8.63264.


190 CHAPTER 9. EXPLICIT FORMULAS

Since q = 1 and δ = 0, we get from (9.76) (and (A.38), (A.40), (A.41), with the
assumption H ≥ 25, and also (A.25) and (A.32)) that
 
0
(log q + 6.01)· η+,2 2
+ 2π|δ| |η+,2 |2 x−1/2
0
= 6.01 η+,2 2
x−1/2 ≤ (162.56 log x + 59.325)x−1/2 .

Using the assumption x ≥ 108 , we obtain

c0 + (185.26 log x + 71.799)x−1/2 ≤ 19.064 log x. (9.77)

We will now apply Lemma 9.1.4 – as we may, because of the finiteness of the norms
we have already checked, together with
2 2
|η+,2 (t) log t|2 ≤ η+ (t) log t 2 log x + η+ (t)(log t)2 2
≤ |η+ (t) log t|∞ (|η+ (t)|2 log x + |η+ (t) log t|2 )
≤ 0.4976 · (0.80365 log x + 0.82999) ≤ 0.3999 log x + 0.41301
(9.78)
(by (A.40), (A.25) and (A.28); use the assumption H ≥ 25). We also need the bounds

|η+,2 (t)|2 ≤ 1.14199 log x + 0.39989 (9.79)

(from (9.75), by the norm bounds (A.38), (A.40) and (A.25), all with H ≥ 25) and
√ √
η+,2 (t)/ t ≤ (|η+ (t)|∞ log x + |η+ (t) log t|∞ ) η+ (t)/ t
1 1 (9.80)
≤ 1.4211 log x + 0.49763

(by (A.38), (A.40) (again with H ≥ 25) and (A.30)).


Applying Lemma 9.1.4, we obtain that the sum ρ |G0 (ρ)|xρ (where G0 (ρ) =
P

M η+,2 (ρ)) over all non-trivial zeros ρ with |=(ρ)| ≤ T0 is at most x1/2 times
p p
(1.54189 log x + 0.8129) T0 log T0 + (4.21245 log x + 6.17301) T0
(9.81)
+ 49.1 log x + 17.2,

where we√are bounding norms by (9.79), √ (9.78) and (9.80). (We


√ √ are using the fact that
T0 ≥ 2π e to ensure that the quantity T0 log T0 − (log 2π e) T0 being multiplied
by |η+,2 |2 is positive; thus, an upper bound for |η+,2 |2 suffices.) By the assumptions
x ≥ 108 , T0 ≥ 200, (9.81) is at most
p
(2.445 T0 log T0 + 50.034) log x.

In comparison, 19.064x−1/2 log x ≤ 0.002 log x, since x ≥ 108 .


It remains to bound the sum of M η+,2 (ρ) over zeros with |=(ρ)| > T0 . This we
will do, as usual, by Lemma 9.1.3. For that, we will need to bound M η+,2 (ρ) for ρ in
the critical strip.
9.5. A SUM FOR η+ (T )2 191

2 2
The Mellin transform of e−t is Γ(s/2)/2, and so the Mellin transform of t2 e−t
2
is Γ(s/2 + 1)/2. By (2.10), this implies that the Mellin transform of (log t)t2 e−t is
0
Γ (s/2 + 1)/4. Hence, by (2.9),
Z ∞
1
M η+,2 (s) = M (h2H )(ir) · Fx (s − ir) dr, (9.82)
4π −∞
where s  1 s 
Fx (s) = (log x)Γ + 1 + Γ0 +1 . (9.83)
2 2 2
Moreover,
Z ∞
1
M (h2H )(ir) = M hH (iu)M hH (i(r − u)) du, (9.84)
2π −∞

and so M (h2H )(ir) is supported on [−2H, 2H]. We also see that |M h2H (ir)|1 ≤
|M hH (ir)|21 /2π. We know that |M hH (ir)|21 /2π ≤ 41.73727 by (A.17).
Hence
Z ∞
1
|M η+,2 (s)| ≤ |M (h2H )(ir)|dr · max |Fx (s − ir)|
4π −∞ |r|≤2H
(9.85)
41.73727
≤ · max |Fx (s − ir)| ≤ 3.32135 · max |Fx (s − ir)|.
4π |r|≤2H |r|≤2H

By (8.51) (Stirling with explicit constants),



√ 1 1 2
+ 180|s| 3 −π|=(s)|/2
|Γ(s)| ≤ 2π|s|σ− 2 e 12|s| e (9.86)
when <(s) ≥ 0, and so
√ !
√ 12.52 + 1.52 1

2
|Γ(s)| ≤ 2π e 12·12.5 + 180·12.53 · |=(s)|e−π|=(s)|/2
12.5 (9.87)
−π|=(s)|/2
≤ 2.542|=(s)|e
for s ∈ C with 0 < <(s) ≤ 3/2 and |=(s)| ≥ 25/2. Moreover, by [OLBC10, 5.11.2]
and the remarks at the beginning of [OLBC10, 5.11(ii)],
Γ0 (s)
 
1 ∗ 1 1
= log s − +O ·
Γ(s) 2s 12|s|2 cos3 θ/2
for | arg(s)| < θ (θ ∈ (−π, π)). Again, for s = σ + iτ with 0 < σ ≤ 3/2 and
|τ | ≥ 25/2, this gives us
p
Γ0 (s) |τ |2 + 1.52
   
1 1 1
= log |τ | + log + O∗ + O∗ · √
Γ(s) |τ | 2|τ | 12|τ |2 (1/ 2)3
O∗ (0.236)
 
9 1
= log |τ | + O∗ 2
+ +
8|τ | 2|τ | |τ |2
 
0.609
= log |τ | + O∗ .
|τ |
192 CHAPTER 9. EXPLICIT FORMULAS

Hence, for 0 ≤ <(s) ≤ 1 (or, in fact, −2 ≤ <(s) ≤ 1) and |=(s)| ≥ 25,


   
1 τ 1 ∗ 0.609 s 
|Fx (s)| ≤ (log x) + log + O Γ +1
2 2 2 |τ /2| 2
(9.88)
1 |τ | −π|τ |/2
≤ 2.542((log x) + log |τ | − 0.297) e .
2 2
Thus, by (9.85), for ρ = σ + iτ with |τ | ≥ T0 ≥ 2H + 25 and 0 ≤ σ ≤ 1,

|M η+,2 (ρ)| ≤ f (τ )

where   
1 |τ | π(|τ |−2H)
f (T ) = 8.45 log x + log T −H · e− 4 . (9.89)
2 2
The functions t 7→ te−πt/2 and t 7→ (log t)te−πt/2 are decreasing for t ≥ e (or in fact
for t ≥ 1.762); setting t = T /2 − H, we see that the right side of (9.89) is a decreasing
function of T for T ≥ T0 , since T0 /2 − H ≥ 25/2 > e.
We can now apply Lemma 9.1.3, and get that
Z ∞  
X 1 T 1
|M η+,2 (ρ)| ≤ f (T ) log + dT. (9.90)
ρ T0 2π 2π 4T
|=(ρ)|>T0

Since T ≥ T0 ≥ 75 > 2, we know that ((1/2π) log(T /2π) + 1/4T ) ≤ (1/2π) log T .
Hence, the right side of (9.90) is at most

8.39 ∞ (log T )2
Z  
π(T −2H)
(log x)(log T ) + (T − 2H)e− 4 dT
4π T0 2
Z ∞ (9.91)
(log t)2
   
2H log t − πt
≤ 0.668 (log x) log t + + + 2H te 4 dt,
T1 t 2 t

where T1 = T0 − 2H and t = T − 2H; we are using the facts that (log t)00 < 0 for
t > 0 and ((logRt)2 )00 < 0 for t > e. (Of course, T1 ≥ 25 > e.)

Of course, T1 e−(π/4)t = (4/π)e−(π/4)T1 . We recall (9.36) and (9.50):
∞  π
4/π e− 4 T1
Z 
−π
4t
log t · e dt ≤ log T1 +
T1 T1 π/4
Z ∞   − π T1
π 4a e 4 log T1
(log t)te− 4 t dt ≤ T1 +
T1 π π/4

for T1 ≥ 1 satisfying log T1 > 4/(πT1 ), where a = 1 + (1 + 4/(πT1 ))/(log T1 −


4/(πT1 )). It is easy to check that log T1 > 4/(πT1 ) and 4a/π ≤ 1.6957 for T1 ≥ 25;
of course, we also have (4/π)/25 ≤ 0.051. Lastly,
Z ∞  π
4b e− 4 T1 (log T1 )2

π
(log t)2 te− 4 t dt ≤ T1 +
T1 π π/4
9.6. A VERIFICATION OF ZEROS AND ITS CONSEQUENCES 193

for T1 ≥ e, where b = 1 + (2 + 8/(πT1 ))/(log T1 − 8/(πT1 )), and we check that


4b/π ≤ 2.1319 for T1 ≥ 25. We conclude that the integral on the second line of (9.91)
is at most
4 (log T1 )2
 
π
(T1 + 2.132) + (log x)(log T1 )(T1 + 1.696) e− 4 T1
π 2
4 π
+ · 2H(log T1 + 0.051 + log x)e− 4 T1 .
π
Multiplying
P this by 0.668 and simplifying further (using T1 ≥ 25), we conclude that
ρ:|=(ρ)|>T0 |M η+,2 (ρ)| is at most

π
((0.462 log T1 + 0.909 log x)(log T1 )T1 + 1.71(log T1 + log x)H) e− 4 T1 .

9.6 A verification of zeros and its consequences


David Platt verified in his doctoral thesis [Pla11], that, for every primitive character χ
of conductor q ≤ 105 , all the non-trivial zeroes of L(s, χ) with imaginary part ≤ 108 /q
lie on the critical line, i.e., have real part exactly 1/2. (We call this a GRH verification
up to 108 /q.)
In work undertaken in coordination with the present work [Plab], Platt has extended
these computations to

• all odd q ≤ 3 · 105 , with Tq = 108 /q,

• all even q ≤ 4 · 105 , with Tq = max(108 /q, 200 + 7.5 · 107 /q).

The method used was rigorous; its implementation uses interval arithmetic.
Let us see what this verification gives us when used as an input to Prop. 9.2.2. We
are interested in bounds on | errη,χ∗ (δ, x)| for q ≤ r and |δ| ≤ 4r/q. We set r = 3·105 .
(We will not be using the verification for q even with 3 · 105 < q ≤ 4 · 105 , though we
certainly could.)
We let T0 = 108 /q. Thus,

108 1000
T0 ≥ = ,
3 · 105 3
(9.92)
T0 108 /q 1000
≥ =
π|δ| π · 4r/q 12π

and so, by |δ| ≤ 4r/q ≤ 1.2 · 106 /q ≤ 1.2 · 106 ,

3.53e−0.1598T0 ≤ 2.597 · 10−23 ,


2
δ 2 −0.1065 (πδ)
T0

22.5 e 2
≤ |δ| · 7.715 · 10−34 ≤ 9.258 · 10−28 .
T0
194 CHAPTER 9. EXPLICIT FORMULAS

Since qT0 ≤ 108 , this gives us that


2 
δ 2 −0.1065 (πδ)

qT0 −0.1598T0
T0

log · 3.53e + 22.5 e 2

2π T0
1.54 · 10−26
≤ 4.3054 · 10−22 + ≤ 4.306 · 10−22 .
q
Again by T0 = 108 /q,
p p
2.337 T0 log qT0 + 21.817 T0 + 2.85 log q + 74.38

is at most
648662
√ + 111,
q
and
1.7 · 107
3 log q + 14|δ| + 17 ≤ 55 + ,
q
1.2 · 108
(log q + 6) · (1 + 5|δ|) ≤ 19 + .
q
Hence, assuming x ≥ 108 to simplify, we see that Prop. 9.2.2 gives us that
648662
√ + 111 1.7·107 1.2·108
−22 q 55 + q 19 + q
errη,χ (δ, x) ≤ 4.306 · 10 +√ + +
x x x3/2
 
1 650400
≤ 4.306 · 10−22 + √ √ + 112
x q
2
for η(t) = e−t /2 . This proves Theorem 7.1.1.
Let us now see what Platt’s calculations give us when used as an input to Prop. 9.3.2
and Cor. 9.3.3. Again, we set r = 3 · 105 , δ0 = 8, |δ| ≤ 4r/q and T0 = 108 /q, so
(9.92) is still valid. We obtain
 2 
qT0 −0.1598T0
T0
−0.1065· (πδ)
T0 log · 6.11e + 1.578e 2


108
 
1000 −0.1598· 1000 8 −0.1065( 1000 )
2
≤ log 6.11 · e 3 + 10 · 1.578e 12π
2π 3
≤ 2.485 · 10−19 ,

since t exp(−0.1598t) is decreasing on t for t ≥ 1/0.1598. We use the same bound


when we have 0.0102 instead of 1.578 on the left side, as in (9.61). (The coefficient
affects what is by far the smaller term, so we are wasting nothing.) Again by T0 =
108 /q and q ≤ r,
p p 279793
1.22 T0 log qT0 + 5.053 T0 + 1.423 log q + 37.19 ≤ √ + 55.2
q
p p 378854
1.679 T0 log qT0 + 6.957 T0 + 1.958 log q + 51.17 ≤ √ + 75.9.
q
9.6. A VERIFICATION OF ZEROS AND ITS CONSEQUENCES 195

For x ≥ 108 , we use |δ| ≤ 4r/q ≤ 1.2 · 106 /q to bound


 
−1 −3/2 1322
(3 + 11|δ|)x + (log q + 6) · (1 + 6|δ|) · x ≤ 0.0004 + x−1/2 .
q
 
2644
(6 + 22|δ|)x−1 + (log q + 6) · (3 + 17|δ|) · x−3/2 ≤ 0.0007 + x−1/2 .
q
Summing, we obtain
 
1 281200
errη,χ ≤ 2.485 · 10−19 + √ √ + 56
x q
2
for η(t) = t2 e−t /2
and
 
−19 1 381500
errη,χ ≤ 2.485 · 10 +√ √ + 76
x q
2
for η(t) = t2 e−t /2 ∗M η2 (t). This proves Theorem 7.1.2 and Corollary 7.1.3.
Now let us work with the smoothing weight η+ . This time around, set r = 150000
if q is odd, and r = 300000 if q is even. As before, we assume

q ≤ r, |δ| ≤ 4r/q.

We can see that Platt’s verification [Plab], mentioned before, allows us to take
250r
T0 = H + , H = 200,
q

since Tq is always at least this (Tq = 108 /q ≥ 200 + 7 · 107 /q > 200 + 3.75 · 107 /q
for q ≤ 150000 odd, Tq ≥ 200 + 7.5 · 107 /q for q ≤ 300000 even).
Thus,
250r 250r
T0 − H = ≥ = 250,
q r

T0 − H 250r 250
≥ ≥ = 19.89436 . . .
πδ πδq 4π
and also

T0 ≤ 200 + 250 · 150000 ≤ 3.751 · 107 , qT0 ≤ rH + 250r ≤ 1.35 · 108 .



Hence, since te−0.1598t is decreasing on t for t ≥ 1/(2 · 0.1598),
(T0 −H)2
p −0.1065
11.308 T0 − He−0.1598(T0 −H) + 16.147|δ|e (πδ)2

4r
≤ 7.9854 · 10−16 + · 7.9814 · 10−18
q
9.5777 · 10−12
≤ 7.9854 · 10−16 + .
q
196 CHAPTER 9. EXPLICIT FORMULAS

Examining (9.70), we get

1.35 · 108 9.5777 · 10−12


 
−16
errη+ ,χ (δ, x) ≤ log · 7.9854 · 10 +
2π q
√ !
1.35 · 10 8 1
+ 1.634 log(1.35 · 108 ) + 12.43

√ + 1.321 log 300000 + 34.51 √
q x
1.2 · 106 1.2 · 106
   
+ 9 + 11 · x−1 + (log 300000) 11 + 6 · x−3/2
q q
1.617 · 10−10
≤ 1.3482 · 10−14 +
q
1.32 · 106 9.1 · 107
 
499845 9 139 1
+ √ + 51.17 + √ + √ + + √
q q x x qx x x

Making the assumption x ≥ 1012 , we obtain

1.617 · 10−10
 
−14 499900 1
errη+ ,χ (δ, x) ≤ 1.3482 · 10 + + √ + 52 √ .
q q x

This proves Theorem 7.1.4 for general q.


Let us optimize things a little more carefully for the trivial character χT . Again,
we will make the assumption x ≥ 1012 . We will also assume, as we did before, that
|δ| ≤ 4r/q; this now gives us |δ| ≤ 600000, since q = 1 and r = 150000 for q odd.
We will go up to a height T0 = H + 600000π · t, where H = 200 and t ≥ 10. Then
T0 − H 600000πt
= ≥ t.
πδ 4πr
Hence
(T0 −H)2
p −0.1065 (πδ)
11.308 T0 − He−0.1598(T0 −H) + 16.147|δ|e 2

2
≤ 10−1300000 + 9689000e−0.1065t .
Looking at (9.70), we get
T0  −1300000 2

errη+ ,χT (δ, x) ≤ log · 10 + 9689000e−0.1065t
2π p
+ ((1.634 log T0 + 12.43) T0 + 34.51)x−1/2 + 6600009x−1 .

The value t = 20 seems good enough; we choose it because it is not far from optimal
for x ∼ 1027 . We get that T0 = 12000000π + 200; since T0 < 108 , we are within the
range of the computations in [Plab] (or for that matter [Wed03] or [Plaa]). We obtain
251400
errη+ ,χT (δ, x) ≤ 4.772 · 10−11 + √ .
x

Lastly, let us look at the sum estimated in (9.72). Here it will be enough to go up
to just T0 = 2H + max(50, H/4) = 450, where, as before, H = 200. Of course, the
9.6. A VERIFICATION OF ZEROS AND ITS CONSEQUENCES 197

verification of the zeros of the Riemann zeta function does go that far; as we already
said, it goes until 108 (or rather more: see [Wed03] and [Plaa]). We make, again, the
assumption x ≥ 1012 . We look at (9.73) and obtain that err`2 ,η+ is at most

(log 50)2
    
log 50 π
0.462 12
+ 0.909 log 50 · 50 + 1.71 1 + 12
· 200 e− 4 50
log 10 log 10
√ −1/2
+ (2.445 450 log 450 + 50.04) · x
366.91
≤ 5.123 · 10−15 + √ .
x
(9.93)
It remains only to estimate the integral in (9.72). First of all,
Z ∞ Z ∞
2
η+ (t) log xt dt = η◦2 (t) log xt dt
0 0
Z ∞ Z ∞
+2 (η+ (t) − η◦ (t))η◦ (t) log xt dt + (η+ (t) − η◦ (t))2 log xt dt.
0 0

The main term will be given by


Z ∞
η◦2 (t) log xt dt = 0.64020599736635 + O 10−14 log x

0
− 0.021094778698867 + O 10−15 ,


where Rthe integrals were computed rigorously using VNODE-LP [Ned06]. (The in-

tegral 0 η◦2 (t)dt can also be computed symbolically.) By Cauchy-Schwarz and the
triangle inequality,
Z ∞
(η+ (t) − η◦ (t))η◦ (t) log xt dt ≤ |η+ − η◦ |2 |η◦ (t) log xt|2
0
≤ |η+ − η◦ |2 (|η◦ |2 log x + |η◦ · log |2 )
274.86
≤ (0.80013 log x + 0.214)
H 7/2
≤ 1.944 · 10−6 · log x + 5.2 · 10−7 ,

where we are using (A.23) and evaluate |η◦ · log |2 rigorously as above. By (A.23) and
(A.24),
Z ∞  2
274.86 27428
(η+ (t) − η◦ (t))2 log xt dt ≤ log x +
0 H 7/2 H7
≤ 5.903 · 10−12 · log x + 2.143 · 10−12 .

We conclude that
Z ∞
2
η+ (t) log xt dt
0 (9.94)
∗ −6 ∗ −7
= (0.640206 + O (1.95 · 10 )) log x − 0.021095 + O (5.3 · 10 )
198 CHAPTER 9. EXPLICIT FORMULAS


We add to this the error term 5.123 · 10−15 + 366.91/ x from (9.93), and simplify
using the assumption x ≥ 1012 . We obtain:

X
2
Λ(n)(log n)η+ (n/x) = 0.640206x log x − 0.021095x
n=1 (9.95)
∗ −6
√ 
+O 2 · 10 x log x + 366.91 x log x ,

and so Prop. 9.5.1 gives us Proposition 7.1.5.


As we can see, the relatively large error term 2 · 10−6 comes from the fact that we
have wanted to give the main term in (9.72) as an explicit constant, rather than as an
integral. This is satisfactory; Prop. 7.1.5 is an auxiliary result that will be needed for
one specific purpose in Part III, as opposed to Thms. 7.1.1–7.1.4, which, while crucial
for Part III, are also of general applicability and interest.
Part III

The integral over the circle

199
Chapter 10

The integral over the major arcs

Let X
Sη (α, x) = Λ(n)e(αn)η(n/x), (10.1)
n

where α ∈ R/Z, Λ is the von Mangoldt function and η : R → C is of fast enough


decay for the sum to converge.
Our ultimate goal is to bound from below
X
Λ(n1 )Λ(n2 )Λ(n3 )η1 (n1 /x)η2 (n2 /x)η3 (n3 /x), (10.2)
n1 +n2 +n3 =N

where η1 , η2 , η3 : R → C. Once we know that this is neither zero nor very close to
zero, we will know that it is possible to write N as the sum of three primes n1 , n2 , n3
in at least one way; that is, we will have proven the ternary Goldbach conjecture.
As can be readily seen, (10.2) equals
Z
Sη1 (α, x)Sη2 (α, x)Sη3 (α, x)e(−N α) dα. (10.3)
R/Z

In the circle method, the set R/Z gets partitioned into the set of major arcs M and the
set of minor arcs m; the contribution of each of the two sets to the integral (10.3) is
evaluated separately.
Our objective here is to treat the major arcs: we wish to estimate
Z
Sη1 (α, x)Sη2 (α, x)Sη3 (α, x)e(−N α)dα (10.4)
M

for M = Mδ0 ,r , where


[ [ a δ0 r a δ0 r
 [ [  a δ0 r a δ0 r 
Mδ0 ,r = − , + ∪ − , +
q 2qx q 2qx q qx q qx
q≤r a mod q q≤2r a mod q
q odd (a,q)=1 q even (a,q)=1
(10.5)

201
202 CHAPTER 10. THE INTEGRAL OVER THE MAJOR ARCS

and δ0 > 0, r ≥ 1 are given.


In other words, our major arcs will be few (that is, a constant number) and narrow.
While [LW02] used relatively narrow major arcs as well, their number, as in all pre-
vious proofs of Vinogradov’s result, was not bounded by a constant. (In his proof of
the five-primes theorem, [Tao14] is able to take a single major arc around 0; this is not
possible here.)
What we are about to see is the general major-arc setup. This is naturally the place
where the overlap with the existing literature is largest. Two important differences can
nevertheless be singled out.
• The most obvious one is the presence of smoothing. At this point, it improves
and simplifies error terms, but it also means that we will later need estimates for
exponential sums on major arcs, and not just at the middle of each major arc. (If
there is smoothing, we cannot use summation by parts to reduce the problem of
estimating sums to a problem of counting primes in arithmetic progressions, or
weighted by characters.)
• Since our L-function estimates for exponential sums will give bounds that are
better than the trivial one by only a constant – even if it is a rather large con-
stant – we need to be especially careful when estimating error terms, finding
cancellation when possible.

10.1 Decomposition of Sη by characters


What follows is largely classical; cf. [HL22] or, say, [Dav67, §26]. The only difference
from the literature lies in the treatment of n non-coprime to q, and the way in which
we show that our exponential sum (10.8) is equal to a linear combination of twisted
sums Sη,χ∗ over primitive characters χ∗ . (Non-primitive characters would give us L-
functions with some zeroes inconveniently placed on the line <(s) = 0.)
Write τ (χ, b) for the Gauss sum
X
τ (χ, b) = χ(a)e(ab/q) (10.6)
a mod q

associated to a b ∈ Z/qZ and a Dirichlet character χ with modulus q. We let τ (χ) =


τ (χ, 1). If (b, q) = 1, then τ (χ, b) = χ(b−1 )τ (χ).

P χ denotes the primitive character inducing a given
Recall that Dirichlet character
χ. Writing χ mod q for a sum over all characters χ of (Z/qZ)∗ ), we see that, for any
a0 ∈ Z/qZ,
1 X 1 X X
τ (χ, b)χ∗ (a0 ) = χ(a)e(ab/q)χ∗ (a0 )
φ(q) φ(q)
χ mod q χ mod q a mod q
(a,q)=1
X e(ab/q) X X e(ab/q) X
= χ∗ (a−1 a0 ) = χ(a−1 a0 ),
φ(q) φ(q)
a mod q χ mod q a mod q χ mod q 0
(a,q)=1 (a,q)=1
(10.7)
10.1. DECOMPOSITION OF Sη BY CHARACTERS 203

where q 0 = q/ gcd(q, a∞ −1
P
0 ). Now, χ mod q 0 χ(a a0 ) = 0 unless a = a0 (in which
−1 0
P
case χ mod q0 χ(a a0 ) = φ(q )). Thus, (10.7) equals

φ(q 0 ) φ(q 0 ) X (a0 + kq 0 )b


X  
e(ab/q) = e
φ(q) φ(q) 0
q
a mod q k mod q/q
(a,q)=1 (k,q/q 0 )=1
a≡a0 mod q 0
0
φ(q 0 )
     
φ(q ) a0 b X kb a0 b
= e e = e µ(q/q 0 )
φ(q) q q/q 0 φ(q) q
k mod q/q 0
(k,q/q 0 )=1

provided that (b, q) = 1. (We are evaluating a Ramanujan sum in the last step.) Hence,
for α = a/q + δ/x, q ≤ x, (a, q) = 1,
1 X X
τ (χ, a) χ∗ (n)Λ(n)e(δn/x)η(n/x)
φ(q) χ n

equals
X µ((q, n∞ ))
Λ(n)e(αn)η(n/x).
n
φ((q, n∞ ))
Since (a, q) = 1, τ (χ, a) = χ(a)τ (χ). The factor µ((q, n∞ ))/φ((q, n∞ )) equals 1
when (n, q) = 1; the absolute value of the factor is at most 1 for every n. Clearly
X n X X  pα 
Λ(n)η = log p η .
n
x x
p|q α≥1
(n,q)6=1

Recalling the definition (10.1) of Sη (α, x), we conclude that

 
   α
1 X δ X X p 
Sη (α, x) = χ(a)τ (χ)Sη,χ∗ , x + O ∗ 2 log p η ,
φ(q) x x
χ mod q p|q α≥1
(10.8)
where X
Sη,χ (β, x) = Λ(n)χ(n)e(βn)η(n/x). (10.9)
n

Hence Sη1 (α, x)Sη2 (α, x)Sη3 (α, x)e(−N α) equals


1 XXX
τ (χ1 )τ (χ2 )τ (χ3 )χ1 (a)χ2 (a)χ3 (a)e(−N a/q)
φ(q)3 χ χ χ (10.10)
1 2 3

· Sη1 ,χ∗1 (δ/x, x)Sη2 ,χ∗2 (δ/x, x)Sη3 ,χ∗3 (δ/x, x)e(−δN/x)
plus an error term of absolute value at most
3 Y

X X X  
2 |Sηj0 (α, x)| log p ηj . (10.11)
j=1 j 0 6=j
x
p|q α≥1
204 CHAPTER 10. THE INTEGRAL OVER THE MAJOR ARCS

We will later see that the integral of (10.11) over S 1 is negligible – for our choices of
ηj , it will, in fact, be of size O(x(log x)A ), A a constant. The error term O(x(log x)A )
should be compared to the main term, which will be of size about a constant times x2 .
In (10.10), we have reduced our problems to estimating Sη,χ (δ/x, x) for χ prim-
itive; a more obvious way of reaching the same goal would have made (10.11) worse

by a factor of about q.

10.2 The integral over the major arcs: the main term
We are to estimate the integral (10.4), where the major arcs Mδ0 ,r are defined as in
(10.5). We will use η1 = η2 = η+ , η3 (t) = η∗ (κt), where η+ and η∗ will be set later.
We can write
Z ∞
Sη,χ (δ/x, x) = Sη (δ/x, x) = η(t/x)e(δt/x)dt + O∗ (errη,χ (δ, x)) · x
0 (10.12)
= ηb(−δ) · x + O∗ (errη,χT (δ, x)) · x

for χ = χT the trivial character, and

Sη,χ (δ/x) = O∗ (errη,χ (δ, x)) · x (10.13)

for χ primitive and non-trivial. The estimation of the error terms err will come later;
let us focus on (a) obtaining the contribution of the main term, (b) using estimates on
the error terms efficiently.
The main term: three principal characters. The main contribution will be given by
the term in (10.10) with χ1 = χ2 = χ3 = χ0 , where χ0 is the principal character mod
q.
The sum τ (χ0 , n) is a Ramanujan sum; as is well-known (see, e.g., [IK04, (3.2)]),
X
τ (χ0 , n) = µ(q/d)d. (10.14)
d|(q,n)

This simplifies to µ(q/(q, n))φ((q, n)) for q square-free. The special case n = 1 gives
us that τ (χ0 ) = µ(q).
Thus, the term in (10.10) with χ1 = χ2 = χ3 = χ0 equals

e(−N a/q)
µ(q)3 Sη+ ,χ∗0 (δ/x, x)2 Sη∗ ,χ∗0 (δ/x, x)e(−δN/x), (10.15)
φ(q)3

where, of course, Sη,χ∗0 (α, x) = Sη (α, x) (since χ∗0 is the trivial character). Summing
(10.15) for α = a/q + δ/x and a going over all residues mod q coprime to q, we obtain
 
q
µ (q,N ) φ((q, N ))
µ(q)3 Sη+ ,χ∗0 (δ/x, x)2 Sη∗ ,χ∗0 (δ/x, x)e(−δN/x).
φ(q)3
10.2. THE INTEGRAL OVER THE MAJOR ARCS: THE MAIN TERM 205

The integral of (10.15) over all of M = Mδ0 ,r (see (10.5)) thus equals

0
Z 2qx δ r
X φ((q, N ))
2
µ(q) µ((q, N )) Sη2+ ,χ∗0 (α, x)Sη∗ ,χ∗0 (α, x)e(−αN )dα
φ(q)3 δ0 r
− 2qx
q≤r
q odd
0
Z qx δ r
X φ((q, N ))
2
+ µ(q) µ((q, N )) Sη2+ ,χ∗0 (α, x)Sη∗ ,χ∗0 (α, x)e(−αN )dα.
φ(q)3 δ0 r
− qx
q≤2r
q even
(10.16)
The main term in (10.16) is

0
Z 2qx δ r
X φ((q, N ))
x3 · µ(q)2
µ((q, N )) η+ (−αx))2 ηb∗ (−αx)e(−αN )dα
(c
φ(q)3 δ0 r
− 2qx
q≤r
q odd
0
Z qx δ r
X φ((q, N ))
3 2
+x · µ(q) µ((q, N )) η+ (−αx))2 ηb∗ (−αx)e(−αN )dα.
(c
φ(q)3 δ0 r
− qx
q≤2r
q even
(10.17)
We would like to complete both the sum and the integral. Before, we should say
that we will want to be able to use smoothing functions η+ whose Fourier transforms
are not easy to deal with directly. All we want to require is that there be a smoothing
function η◦ , easier to deal with, such that η◦ be close to η+ in `2 norm.
Assume, then, that

|η+ − η◦ |2 ≤ 0 |η◦ |,

(3)
where η◦ is thrice differentiable outside finitely many points and satisfies η◦ ∈ L1 .
Then (10.17) equals

0
Z 2qx δ r
X φ((q, N ))
x3 · µ(q)2
µ((q, N )) (ηb◦ (−αx))2 ηb∗ (−αx)e(−αN )dα
φ(q)3 δ0 r
− 2qx
q≤r
q odd
0
Z qx δ r
X φ((q, N ))
3 2
+x · µ(q) µ((q, N )) (ηb◦ (−αx))2 ηb∗ (−αx)e(−αN )dα.
φ(q)3 δ0 r
− qx
q≤2r
q even
(10.18)
plus
!
X µ(q)2 Z ∞
∗ 2 2 2
O x · |(c
η+ (−α)) − (ηb◦ (−α)) ||ηb∗ (−α)|dα . (10.19)
q
φ(q)2 −∞
206 CHAPTER 10. THE INTEGRAL OVER THE MAJOR ARCS

Here (10.19) is bounded by 2.82643x2 (by (C.9)) times


sZ
∞ Z ∞
|ηb∗ (−α)|∞ · η+ (−α) − ηb◦ (−α)|2 dα ·
|c η+ (−α) + ηb◦ (−α)|2 dα
|c
−∞ −∞

≤ |η∗ |1 · |c
η+ − ηb◦ |2 |c
η+ + ηb◦ |2 = |η∗ |1 · |η+ − η◦ |2 |η+ + η◦ |2
≤ |η∗ |1 · |η+ − η◦ |2 (2|η◦ |2 + |η+ − η◦ |2 ) = |η∗ |1 |η◦ |22 · (2 + 0 )0 .

Now, (10.18) equals


Z ∞
3
X φ((q, N ))
x (ηb◦ (−αx))2 ηb∗ (−αx)e(−αN ) µ((q, N ))dα
−∞ q δ0 r
φ(q)3
(q,2) (
≤min 2|α|x ,r )
µ(q)2 =1
 
Z ∞ X φ((q, N ))
= x3 (ηb◦ (−αx))2 ηb∗ (−αx)e(−αN )dα ·  µ(q)2 µ((q, N ))
−∞ φ(q)3
q≥1
Z ∞ X φ((q, N ))
−x3 (ηb◦ (−αx))2 ηb∗ (−αx)e(−αN ) µ((q, N ))dα.
−∞ q δ0 r
φ(q)3
(q,2)
>min( 2|α|x ,r )
µ(q)2 =1
(10.20)
The last line in (10.20) is bounded1 by

µ(q)2
Z X
x2 |ηb∗ |∞ |ηb◦ (−α)|2 dα. (10.21)
−∞ q δ0 r
φ(q)2
(q,2)
>min ( 2|α|
,r )

By (2.1) (with k = 3), (C.16) and (C.17), this is at most


Z δ0 /2
2 4.31004
x |η∗ |1 |ηb◦ (−α)|2 dα
−δ0 /2 r
!2
Z ∞ (3)
2 |η◦ |1 8.62008|α|
+ 2x |η∗ |1 3

δ0 /2 (2πα) δ0 r
!
(3) 2
|η◦ | x2
≤ |η∗ |1 4.31004|η◦ |22 + 0.00113 5 1 .
δ0 r

It is easy to see that

X φ((q, N )) Y 1
 Y
1

µ(q)2 µ((q, N )) = 1− · 1+ .
φ(q)3 (p − 1)2 (p − 1)3
q≥1 p|N p-N

1 This is obviously crude, in that we are bounding φ((q, N ))/φ(q) by 1. We are doing so in order to

avoid a potentially harmful dependence on N .


10.3. THE `2 NORM OVER THE MAJOR ARCS 207

Expanding the integral implicit in the definition of fb,


Z ∞
(ηb◦ (−αx))2 ηb∗ (−αx)e(−αN )dα =

(10.22)
1 ∞ ∞
Z Z  
N
η◦ (t1 )η◦ (t2 )η∗ − (t1 + t2 ) dt1 dt2 .
x 0 0 x

(This is standard. One rigorous way to obtain (10.22) is to approximate the integral
over α ∈ (−∞, ∞) by an integral with a smooth weight, at different scales; as the scale
becomes broader, the Fourier transform of the weight approximates (as a distribution)
the δ function. Apply Plancherel.)
Hence, (10.17) equals
Z ∞ Z ∞  
N
x2 · η◦ (t1 )η◦ (t2 )η∗ − (t1 + t2 ) dt1 dt2
0 0 x
Y 1
 Y
1
 (10.23)
· 1− · 1 + .
(p − 1)2 (p − 1)3
p|N p-N

(the main term) plus

(3)
|η◦ |21
 
4.31004|η◦ |22 + 0.00113 δ05
2.82643|η◦ |22 (2 + 0 ) · 0 +  |η∗ |1 x2 (10.24)
r

Here (10.23) is just as in the classical case [IK04, (19.10)], except for the fact that
a factor of 1/2 has been replaced by a double integral. Later, in chapter 11, we will see
how to choose our smoothing functions (and x, in terms of N ) so as to make the double
integral as large as possible in comparison with the error terms. This is an important
optimization. (We already had a first discussion of this in the introduction; see (1.39)
and what follows.)
What remains to estimate is the contribution of all the terms of the form errη,χ (δ, x)
in (10.12) and (10.13). Let us first deal with another matter – bounding the `2 norm of
|Sη (α, x)|2 over the major arcs.

10.3 The `2 norm over the major arcs


We can always bound the integral of |Sη (α, x)|2 on the whole circle by Plancherel. If
we only want the integral on certain arcs, we use the bound in Prop. 12.1.2 (based on
work by Ramaré). If these arcs are really the major arcs – that is, the arcs on which
we have useful analytic estimates – then we can hope to get better bounds using L-
functions. This will be useful both to estimate the error terms in this section and to
make the use of Ramaré’s bounds more efficient later.
208 CHAPTER 10. THE INTEGRAL OVER THE MAJOR ARCS

By (10.8),

  2
X a δ
Sη + ,χ
q x
a mod q
gcd(a,q)=1
 

1 XX 
0) 
X 
= τ (χ)τ (χ χ(a)χ0 (a) · Sη,χ∗ (δ/x, x)Sη,χ0∗ (δ/x, x)

φ(q)2 χ 0

 
χ a mod q
gcd(a,q)=1
 √ √ 2 
+ O∗ 2(1 + q)(log x)2 |η|∞ max |Sη (α, x)| + (1 + q)(log x)2 |η|∞
α
1 X
= |τ (χ)|2 |Sη,χ∗ (δ/x, x)|2 + Kq,1 (2|Sη (0, x)| + Kq,1 ),
φ(q) χ

where

Kq,1 = (1 + q)(log x)2 |η|∞ .

As is well-known (see, e.g., [IK04, Lem. 3.1])


   
q ∗ q
τ (χ) = µ χ τ (χ∗ ),
q∗ q∗

where q ∗ is the modulus of χ∗ (i.e., the conductor of χ), and



|τ (χ∗ )| = q∗ .

Using the expressions (10.12) and (10.13), we obtain

2
µ2 (q)
 
X a δ 2
Sη + ,x = η (−δ)x + O∗ (errη,χT (δ, x) · x)|
|b
q x φ(q)
a mod q
(a,q)=1
 
 
1 X 2 q
q ∗ · O∗ | errη,χ (δ, x)|2 x2  + Kq,1 (2|Sη (0, x)| + Kq,1 )

+ µ
φ(q) q∗
χ6=χT
2 2
µ (q)x
η (−δ)|2 + O∗ (|errη,χT (δ, x)(2|η|1 + errη,χT (δ, x))|)

= |b
φ(q)
 
+ O∗ max q ∗ | errη,χ∗ (δ, x)|2 x2 + Kq,2 x ,
χ6=χT

where Kq,2 = Kq,1 (2|Sη (0, x)|/x + Kq,1 /x).


10.3. THE `2 NORM OVER THE MAJOR ARCS 209

Thus, the integral of |Sη (α, x)|2 over M (see (10.5)) is

a δ0 r a δ0 r
X X Z q + 2qx X X Z q + qx
2 2
|Sη (α, x)| dα + |Sη (α, x)| dα
a δ0 r a δ0 r
q≤r a mod q q − 2qx q≤2r a mod q q − qx
q odd (a,q)=1 q even (a,q)=1
δ0 r δ0 r
X µ2 (q)x2 Z 2qx 2
X µ2 (q)x2 Z qx 2
= |b
η (−αx)| dα + |b
η (−αx)| dα
φ(q) δ0 r
− 2qx φ(q) δ0 r
− qx
q≤r q≤2r
q odd q even
!
X µ2 (q)x2 gcd(q, 2)δ0 r  

+O · ETη, δ0 r (2|η|1 + ETη, δ0 r )
q
φ(q) qx 2 2

 
 
X δ0 rx Kq,2 
· O∗  ∗ 2

+ max q | errη,χ∗ (δ, x)| + 
q  χ mod q x 
q≤r 
χ6=χT

q odd
|δ|≤δ0 r/2q
 

X 2δ0 rx  
Kq,2 
· O∗  ∗ 2

+ max q | errη,χ∗ (δ, x)| + ,
q  χ mod q x 
q≤2r 
χ6=χT

q even
|δ|≤δ0 r/q
(10.25)
where

ETη,s = max | errη,χT (δ, x)|


|δ|≤s

and χT is the trivial character. If all we want is an upper bound, we can simply remark
that

0 δ r 0 δ r
X µ2 (q) Z 2qx 2
X µ2 (q) Z qx 2
x |b
η (−αx)| dα + x |b
η (−αx)| dα
φ(q) − 2qxδ0 r φ(q) − δqx0r
q≤r q≤2r
q odd q even
 
 X µ (q) 2
X µ (q)  2
X µ2 (q)
2 2
≤ +  |b
η | = 2|η| .
 φ(q) φ(q)  2 2
φ(q)
q≤r q≤2r q≤r
q odd q even q odd

If we also need a lower bound, we proceed as follows.


Again, we will work with an approximation η◦ such that (a) |η − η◦ |2 is small, (b)
210 CHAPTER 10. THE INTEGRAL OVER THE MAJOR ARCS

(3)
η◦ is thrice differentiable outside finitely many points, (c) η◦ ∈ L1 . Clearly,
δ0 r
X µ2 (q) Z 2qx 2
x |b
η (−αx)| dα
φ(q) − 2qxδ0 r
q≤r
q odd
δ0 r !
X µ2 (q) Z 2q
2 2
≤ |ηb◦ (−α)| dα + 2h|ηb◦ | , |b
η − ηb◦ |i + |b
η− ηb◦ |2
φ(q) −
δ0 r
q≤r 2q
q odd
0 δ r
X µ2 (q) Z 2q 2
= |ηb◦ (−α)| dα
φ(q) − δ2q0r
q≤r
q odd
 
1 2

+ O∗ log r + 0.85 2 |η◦ |2 |η − η◦ |2 + |η◦ − η|2 ,
2
where we are using (C.11) and isometry. Also,
δ0 r δ0 r
X µ2 (q) Z qx 2
X µ2 (q) Z 2qx 2
|b
η (−αx)| dα = |b
η (−αx)| dα.
φ(q) − δqx0r φ(q) − 2qx
δ0 r
q≤2r q≤r
q even q odd

By (2.1) and Plancherel,


Z δ2q
0r ∞ ∞ (3)
!
|η◦ |21
Z Z
2 2 ∗
|ηb◦ (−α)| dα = |ηb◦ (−α)| dα − O 2 dα

δ0 r
2q −∞
δ0 r
2q
(2πα)6
!
(3)
|η◦ |21 q 5
= |η◦ |22 + O∗ ,
5π 6 (δ0 r)5
Hence
 
δ0 r
2 2 2 (3)
|η◦ |21 q 5 
X µ (q) Z X µ (q)
2q
2  X µ (q)
|ηb◦ (−α)| dα = |η◦ |22 · + O∗  .
φ(q) δ0 r
− 2q φ(q)  φ(q) 5π 6 (δ0 r)5 
q≤r q≤r q≤r
q odd q odd q odd

Using (C.18), we get that


X µ2 (q) |η◦(3) |2 q 5 (3)
1 X µ2 (q)q |η◦ |21
1
≤ ·
φ(q) 5π 6 (δ0 r)5 r φ(q) 5π 6 δ05
q≤r q≤r
q odd q odd
(3)
|η◦ |21
 
log r 0.425
≤ · 0.64787 + + .
5π 6 δ05 4r r
Going back to (10.25), we use (C.7) to bound
X µ2 (q)x2 gcd(q, 2)δ0 r
≤ 2.59147 · δ0 rx.
q
φ(q) qx
10.3. THE `2 NORM OVER THE MAJOR ARCS 211

We also note that


X1 X 2 X1 X 1 X1
+ = − +
q q q r 2q q
q≤r q≤2r q≤r q≤ 2 q≤r
q odd q even
r
≤ 2 log er − log ≤ log 2e2 r.
2
We have proven the following result.
Lemma 10.3.1. Let η : [0, ∞) → R be in L1 ∩ L∞ . Let Sη (α, x) be as in (10.1) and
let M = Mδ0 ,r be as in (10.5). Let η◦ : [0, ∞) → R be thrice differentiable outside
(3)
finitely many points. Assume η◦ ∈ L1 .
Assume r ≥ 182. Then
   
ETη,δ0 r/2
Z
2 ∗
|Sη (α, x)| dα = Lr,δ0 x + O 5.19δ0 xr ETη, δ0 r · |η|1 +
M 2 2
∗ 2 2

+ O δ0 r(log 2e r) x · Eη,r,δ0 + Kr,2 ,
(10.26)
where
√ ∗
Eη,r,δ0 = max q | errη,χ∗ (δ, x)|, ETη,s = max | errη,χT (δ, x)|,
χ mod q |δ|≤s
q≤r·gcd(q,2)
|δ|≤gcd(q,2)δ0 r/2q
√ √
Kr,2 = (1 + 2r)(log x)2 |η|∞ (2|Sη (0, x)|/x + (1 + 2r)(log x)2 |η|∞ /x)
(10.27)
and Lr,δ0 satisfies both
X µ2 (q)
Lr,δ0 ≤ 2|η|22 (10.28)
φ(q)
q≤r
q odd

and
X µ2 (q) 
2

Lr,δ0 = 2|η◦ |22 + O∗ (log r + 1.7) · 2 |η◦ |2 |η − η◦ |2 + |η◦ − η|2
φ(q)
q≤r
q odd
! 
(3)
2|η◦ |21

∗ log r 0.425
+O · 0.64787 + + .
5π 6 δ05 4r r
(10.29)
Here, as elsewhere, χ∗ denotes the primitive character inducing χ, whereas q ∗ denotes
the modulus of χ∗ .
The error term xrETη,δ0 r will be very small, since it will be estimated using the
Riemann zeta function; the error term involving Kr,2 will be completely negligible.
2
The term involving xr(r +1)Eη,r,δ 0
; we see that it constrains us to have | errη,χ (x, N )|
less than a constant times 1/r if we do not want the main term in the bound (10.26) to
be overwhelmed.
212 CHAPTER 10. THE INTEGRAL OVER THE MAJOR ARCS

10.4 The integral over the major arcs: conclusion

There are at least two ways we can evaluate (10.4). One is to substitute (10.10) into
(10.4). The disadvantages here are that (a) this can give rise to pages-long formulae, (b)
this gives error terms proportional to xr| errη,χ (x, N )|, meaning that, to win, we would
have to show that | errη,χ (x, N )| is much smaller than 1/r. What we will do instead is
to use our `2 estimate (10.26) in order
√ to bound the contribution of non-principal terms.
This will give us a gain of almost r on the error terms; in other words,√ to win, it will
be enough to show later that | errη,χ (x, N )| is much smaller than 1/ r.
The contribution of the error terms in Sη3 (α, x) (that is, all terms involving the
quantities errη,χ in expressions (10.12) and (10.13)) to (10.4) is

X 1 X X
τ (χ3 ) χ3 (a)e(−N a/q)
φ(q)
q≤r χ3 mod q a mod q
q odd (a,q)=1
Z δ0 r
2qx
Sη+ (α + a/q, x)2 errη∗ ,χ∗3 (αx, x)e(−N α)dα
δ0 r
− 2qx
(10.30)
X 1 X X
+ τ (χ3 ) χ3 (a)e(−N a/q)
φ(q)
q≤2r χ3 mod q a mod q
q even (a,q)=1
Z δ0 r
qx
Sη+ (α + a/q, x)2 errη∗ ,χ∗3 (αx, x)e(−N α)dα.
δ0 r
− qx

We should also remember the terms in (10.11); we can integrate them over all of R/Z,
and obtain that they contribute at most

3 Y

Z X X X  
2 |Sηj0 (α, x)| · max log p ηj dα
R/Z j=1 j 0 6=j
q≤r x
p|q α≥1
3 Y

X X  X 
≤2 |Sηj0 (α, x)|2 · max log p ηj
j=1 j 0 6=j
q≤r x
p|q α≥1
X X  pα 
=2 Λ2 (n)η+ 2
(n/x) · log r · max η∗
n
p≤r x
α≥1
X  pα 
s
X X
+4 Λ2 (n)η+ 2 (n/x) · Λ2 (n)η∗2 (n/x) · log r · max η∗
n n
p≤r x
α≥1

by Cauchy-Schwarz and Plancherel.


10.4. THE INTEGRAL OVER THE MAJOR ARCS: CONCLUSION 213

The absolute value of (10.30) is at most


δ0 r
X X Z 2qx
2 √
Sη+ (α + a/q, x) dα · max q ∗ | errη∗ ,χ∗ (δ, x)|
δ0 r
− 2qx χ mod q
q≤r a mod q
|δ|≤δ0 r/2q
q odd (a,q)=1
δ0 r
X X Z qx
2 √
+ Sη+ (α + a/q, x) dα · max q ∗ | errη∗ ,χ∗ (δ, x)|
δ0 r
− qx χ mod q
q≤2r a mod q
|δ|≤δ0 r/q
q even (a,q)=1

Z
2
≤ Sη+ (α) dα · max q ∗ | errη∗ ,χ∗ (δ, x)|.
Mδ0 ,r χ mod q
q≤r·gcd(q,2)
|δ|≤gcd(q,2)δ0 r/q
(10.31)
We can bound the integral of |Sη+ (α)|2 by (10.26).
What about the contribution of the error part of Sη2 (α, x)? We can obviously
proceed in the same way, except that, to avoid double-counting, Sη3 (α, x) needs to
be replaced by
1 µ(q)
τ (χ0 )ηb3 (−δ) · x = ηb3 (−δ) · x, (10.32)
φ(q) φ(q)
which is its main term (coming from (10.12)). Instead of having an `2 norm as in
R square-root of a product of two squares of `2 norms (by Cauchy-
(10.31), we have the
Schwarz), namely, M |Sη∗+ (α)|2 dα and
δ0 r 0 δ r
X µ2 (q) Z 2qx X µ2 (q) Z qx
2 2
|ηb∗ (−αx)x| dα + |ηb∗ (−αx)x| dα
φ(q)2 δ r
0
− 2qx φ(q)2 − δqx
0r
q≤r q≤2r
q odd q even (10.33)
X µ2 (q)
≤ x|ηb∗ |22 · .
q
φ(q)2

By (C.9), the sum over q is at most 2.82643.


As for the contribution of the error part of Sη1 (α, x), we bound it in the same way,
using solely the `2 norm in (10.33) (and replacing both Sη2 (α, x) and Sη3 (α, x) by
expressions as in (10.32)).
The total of the error terms is thus
√ ∗
x· max q · | errη∗ ,χ∗ (δ, x)| · A
χ mod q
q≤r·gcd(q,2)
|δ|≤gcd(q,2)δ0 r/q
√ √ p p (10.34)
+x· max q ∗ · | errη+ ,χ∗ (δ, x)|( A + B+ ) B∗ ,
χ mod q
q≤r·gcd(q,2)
|δ|≤gcd(q,2)δ0 r/q

|Sη+ (α, x)|2 dα (bounded as in (10.26)) and


R
where A = (1/x) M

B∗ = 2.82643|η∗ |22 , B+ = 2.82643|η+ |22 . (10.35)


214 CHAPTER 10. THE INTEGRAL OVER THE MAJOR ARCS

In conclusion, we have proven


00
Proposition 10.4.1. Let x ≥ 1. Let η+ , η∗ : [0, ∞) → R. Assume η+ ∈ C 2 , η+ ∈ L2
1 2
and η+ , η∗ ∈ L ∩ L . Let η◦ : [0, ∞) → R be thrice differentiable outside finitely
(3)
many points. AssumeP η◦ ∈ L1 and |η+ − η◦ |2 ≤ 0 |η◦ |2 , where 0 ≥ 0.
Let Sη (α, x) = n Λ(n)e(αn)η(n/x). Let errη,χ , χ primitive, be given as in
(10.12) and (10.13). Let δ0 > 0, r ≥ 1. Let M = Mδ0 ,r be as in (10.5).
Then, for any N ≥ 0,
Z
Sη+ (α, x)2 Sη∗ (α, x)e(−N α)dα
M

equals
(3)
|η◦ |21
 
4.31004|η◦ |22 + 0.0012 δ05
C0 Cη◦ ,η∗ x2 + 2.82643|η◦ |22 (2 + 0 ) · 0 +  |η∗ |1 x2
r
q
+O∗ (Eη∗ ,r,δ0 Aη+ + Eη+ ,r,δ0 · 1.6812( Aη+ + 1.6812|η+ |2 )|η∗ |2 ) · x2

 q 
+O∗ 2Zη+2 ,2 (x)LSη∗ (x, r) · x + 4 Zη+2 ,2 (x)Zη∗2 ,2 (x)LSη+ (x, r) · x ,
(10.36)
where
Y 1
 Y
1

C0 = 1− · 1+ ,
(p − 1)2 (p − 1)3
p|N p-N
Z ∞ (10.37)
Z ∞  
N
Cη◦ ,η∗ = η◦ (t1 )η◦ (t2 )η∗ − (t1 + t2 ) dt1 dt2 ,
0 0 x

Eη,r,δ0 = max q ∗ · | errη,χ∗ (δ, x)|, ETη,s = max | errη,χT (δ, x)|,
χ mod q |δ|≤s/q
q≤gcd(q,2)·r
|δ|≤gcd(q,2)δ0 r/2q

1
Z
2 X µ2 (q)
Aη = Sη+ (α, x) dα, Lη,r,δ0 ≤ 2|η|22 ,
x M φ(q)
q≤r
q odd
√ √
Kr,2 = (1 + 2r)(log x)2 |η|∞ (2Zη,1 (x)/x + (1 + 2r)(log x)2 |η|∞ /x),
1X k X  pα 
Zη,k (x) = Λ (n)η(n/x), LSη (x, r) = log r · max η ,
x n p≤r x
α≥1
(10.38)
and errη,χ is as in (10.12) and (10.13).
Here is how to read these expressions. The error term in the first line of (10.36)
will be small provided that 0 is small and r is large. The third line of (10.36) will
be negligible, as will be the term 2δ0 r(log er)Kr,2 in the definition of Aη . (Clearly,
Zη,k (x) η (log x)k−1 and LSη (x, q) η τ (q) log x for any η of rapid decay.)
10.4. THE INTEGRAL OVER THE MAJOR ARCS: CONCLUSION 215

It remains to estimate the second line of (10.36). This includes estimating Aη –


a task that was already accomplished in Lemma 10.3.1. We see that we will have to
give very good bounds for Eη,r,δ0 when η = η+ or η = η∗ . We also see that we want
to make C0 Cη+ ,η∗ x2 as large as possible; it will be competing not just with the error
terms here, but, more importantly, with the bounds from the minor arcs, which will be
proportional to |η+ |22 |η∗ |1 .
216 CHAPTER 10. THE INTEGRAL OVER THE MAJOR ARCS
Chapter 11

Optimizing and adapting


smoothing functions

One of our goals is to maximize the quantity Cη◦ ,η∗ in (10.37) relative to |η◦ |22 |η∗ |1 .
One way to do this is to ensure that (a) η∗ is concentrated on a very short1 interval [0, ),
(b) η◦ is supported on the interval [0, 2], and is symmetric around t = 1, meaning that
η◦ (t) ∼ η◦ (2 − t). Then, for x ∼ N/2, the integral
Z ∞ Z ∞  
N
η◦ (t1 )η◦ (t2 )η∗ − (t1 + t2 ) dt1 dt2
0 0 x

in (10.37) should be approximately equal to


Z ∞   Z ∞
N
|η∗ |1 · η◦ (t)η◦ − t dt = |η∗ |1 · η◦ (t)2 dt = |η∗ |1 · |η◦ |22 , (11.1)
0 x 0

provided that η0 (t) ≥ 0 for all t. It is easy to check (using Cauchy-Schwarz in the
second step) that this is essentially optimal. (We will redo this rigorously in a little
while.)
At the same time, the fact is that major-arc estimates are best for smoothing func-
tions η of a particular form, and we have minor-arc estimates from Part I for a different
specific smoothing η2 . The issue, then, is how do we choose η◦ and η∗ as above so that

• η∗ is concentrated on [0, ),

• η◦ is supported on [0, 2] and symmetric around t = 1,

• we can give minor-arc and major-arc estimates for η∗ ,

• we can give major-arc estimates for a function η+ close to η◦ in `2 norm?


1 This is an idea appearing in work by Bourgain in a related context [Bou99].

217
218 CHAPTER 11. OPTIMIZING AND ADAPTING SMOOTHING FUNCTIONS

11.1 The symmetric smoothing function η◦


We will later work with a smoothing function η♥ whose Mellin transform decreases
very rapidly. Because of this rapid decay, we will be able to give strong results based
on an explicit formula for η♥ . The issue is how to define η◦ , given η♥ , so that η◦ is
symmetric around t = 1 (i.e., η◦ (2 − x) ∼ η◦ (x)) and is very small for x > 2.
2
We will later set η♥ (t) = e−t /2 . Let
(
t3 (2 − t)3 et−1/2 if t ∈ [0, 2],
h : t 7→ (11.2)
0 otherwise

We define η◦ : R → R by
( 2
t3 (2 − t)3 e−(t−1) /2 if t ∈ [0, 2],
η◦ (t) = h(t)η♥ (t) = (11.3)
0 otherwise.

It is clear that η◦ is symmetric around t = 1 for t ∈ [0, 2].

11.1.1 The product η◦ (t)η◦ (ρ − t).


We now should go back and redo rigorously what we discussed informally around
(11.1). More precisely, we wish to estimate
Z ∞ Z ∞
η◦ (ρ) = η◦ (t)η◦ (ρ − t)dt = η◦ (t)η◦ (2 − ρ + t)dt (11.4)
−∞ −∞

for ρ ≤ 2 close to 2. In this, it will be useful that the Cauchy-Schwarz inequality


degrades slowly, in the following sense.
Lemma 11.1.1. Let V be a real vector space with an inner product h·, ·i. Then, for
any v, w ∈ V with |w − v|2 ≤ |v|2 /2,

hv, wi = |v|2 |w|2 + O∗ (2.71|v − w|22 ).

Proof. By a truncated Taylor expansion,


√ x x2 1
1+x=1+ + max
2 2 0≤t≤1 4(1 − (tx)2 )3/2
 2 
x x
= 1 + + O∗
2 23/2

for |x| ≤ 1/2. Hence, for δ = |w − v|2 /|v|2 ,

2 hw−v,vi
s
|w|2 2hw − v, vi + |w − v|22 |v|22
+ δ2 
(2δ + δ 2 )2


= 1+ =1+ +O
|v|2 |v|22 2 23/2
1 (5/2)2 |w − v|22
    
hw − v, vi
= 1 + δ + O∗ + 3/2 δ2 = 1 + + O ∗
2.71 .
2 2 |v|22 |v|22
11.2. THE SMOOTHING FUNCTION η∗ : ADAPTING MINOR-ARC BOUNDS219

Multiplying by |v|22 , we obtain that

|v|2 |w|2 = |v|22 + hw − v, vi + O∗ 2.71|w − v|22 = hv, wi + O∗ 2.71|w − v|22 .


 

Applying Lemma 11.1.1 to (11.4), we obtain that


Z ∞
(η◦ ∗ η◦ )(ρ) = η◦ (t)η◦ ((2 − ρ) + t)dt
−∞
sZ sZ
∞ ∞
= |η◦ (t)|2 dt |η◦ ((2 − ρ) + t)|2 dt
−∞ −∞
 Z ∞ 
∗ 2
+O 2.71 |η◦ (t) − η◦ ((2 − ρ) + t)| dt
−∞ (11.5)
Z ∞ Z 2−ρ 2 !
= |η◦ |22 + O∗ 2.71 |η◦0 (r + t)| dr dt
−∞ 0
 Z 2−ρ Z ∞ 
∗ 2
= |η◦ |22 +O 2.71(2 − ρ) |η◦0 (r + t)| dtdr
0 −∞
= |η◦ |22 + O∗ (2.71(2 − ρ)2 |η◦0 |22 ).

We will be working with η∗ supported on the non-negative reals; we recall that η◦


is supported on [0, 2]. Hence
Z ∞Z ∞  
N
η◦ (t1 )η◦ (t2 )η∗ − (t1 + t2 ) dt1 dt2
0 0 x
Z Nx  
N
= (η◦ ∗ η◦ )(ρ)η∗ − ρ dρ
0 x
Z Nx  
N
= (|η◦ |22 + O∗ (2.71(2 − ρ)2 |η◦0 |22 )) · η∗ − ρ dρ
0 x
Z N Z N !
x x
= |η◦ |22 η∗ (ρ)dρ + 2.71|η◦0 |22 · O∗ ((2 − N/x) + ρ)2 η∗ (ρ)dρ ,
0 0
(11.6)
provided that N/x ≥ 2. We see that it will be wise to set N/x very slightly larger than
2. As we said before, η∗ will be scaled so that it is concentrated on a small interval
[0, ).

11.2 The smoothing function η∗ : adapting minor-arc


bounds
Here the challenge is to define a smoothing function η∗ that is good both for minor-arc
estimates and for major-arc estimates. The two regimes tend to favor different kinds of
220 CHAPTER 11. OPTIMIZING AND ADAPTING SMOOTHING FUNCTIONS

smoothing function. For minor-arc estimates, we use, as [Tao14] did,

η2 (t) = 4 max(log 2 − | log 2t|, 0) = ((2I[1/2,1] ) ∗M (2I[1/2,1] ))(t), (11.7)

where I[1/2,1] (t) is 1 if t ∈ [1/2, 1] and 0 otherwise. For major-arc estimates, we will
use a function based on
2
η♥ = e−t /2 .
2
We will actually use here the function t2 e−t /2 , whose Mellin transform is M η♥ (s+2)
(by, e.g., [BBO10, Table 11.1]).)
We will follow the simple expedient of convolving the two smoothing functions,
one good for minor arcs, the other one for major arcs. In general, let ϕ1 , ϕ2 : [0, ∞) →
C. It is easy to use bounds on sums of the form
X
Sf,ϕ1 (x) = f (n)ϕ1 (n/x) (11.8)
n

to bound sums of the form Sf,ϕ1 ∗M ϕ2 :


X n
Sf,ϕ1 ∗M ϕ2 = f (n)(ϕ1 ∗M ϕ2 )
n
x
Z ∞  n  Z ∞
X dw dw
= f (n)ϕ1 ϕ2 (w) = Sf,ϕ1 (wx)ϕ2 (w) .
0 n
wx w 0 w
(11.9)
The same holds, of course, if ϕ1 and ϕ2 are switched, since ϕ1 ∗M ϕ2 = ϕ2 ∗M ϕ1 .
The only objection is that the bounds on (11.8) that we input might not be valid, or
non-trivial, when the argument wx of Sf,ϕ1 (wx) is very small. Because of this, it is
important that the functions ϕ1 , ϕ2 vanish at 0, and desirable that their first derivatives
do so as well.
Let us see how this works out in practice for ϕ1 = η2 . Here η2 : [0, ∞) → R is
given by
η2 = η1 ∗M η1 = 4 max(log 2 − | log 2t|, 0), (11.10)

where η1 = 2 · I[1/2,1] .
Let us restate the bounds from Theorem 3.1.1 – the main result of Part I. We will
use Lemma C.2.2 to bound terms of the form q/φ(q).
Let x ≥ x0 , x0 = 2.16 · 1020 . Let 2α = a/q + δ/x, q ≤ Q, gcd(a, q) = 1,
|δ/x| ≤ 1/qQ, where Q = (3/4)x2/3 . Then, if 3 ≤ q ≤ x1/3 /6, Theorem 3.1.1 gives
us that    
|δ|
|Sη2 (α, x)| ≤ gx max 1, · q x, (11.11)
8
where
p
(Rx,2r log 2r + 0.5) z(r) + 2.5 L2r
gx (r) = √ + + 3.36x−1/6 , (11.12)
2r r
11.2. THE SMOOTHING FUNCTION η∗ : ADAPTING MINOR-ARC BOUNDS221

with !
log 4t
Rx,t = 0.27125 log 1 + 9x1/3
+ 0.41415
2 log 2.004t
  (11.13)
13
Lt = z(t/2) log t + 7.82 + 13.66 log t + 37.55,
4
If q > x1/3 /6, then, again by Theorem 3.1.1,

|Sη2 (α, x)| ≤ h(x)x, (11.14)

where
h(x) = 0.276x−1/6 (log x)3/2 + 1234x−1/3 log x. (11.15)
We will work with x varying within a range, and so we must pay some attention
to the dependence of (11.11) and (11.14) on x. Let us prove two auxiliary lemmas on
this.
Lemma 11.2.1. Let gx (r) be as in (11.12) and h(x) as in (11.15). Then
(
h(x) if x < (6r)3
x 7→
gx (r) if x ≥ (6r)3

is a decreasing function of x for r ≥ 11 fixed and x ≥ 21.


Proof. It is clear from the definitions that x 7→ h(x) (for x ≥ 21) and x 7→ gx (r) are
both decreasing. Thus, we simply have to show that h(xr ) ≥ gxr (r) for xr = (6r)3 .
Since xr ≥ (6 · 11)3 > e12.5 ,

Rxr ,2r ≤ 0.27125 log(0.065 log xr + 1.056) + 0.41415


≤ 0.27125 log((0.065 + 0.0845) log xr ) + 0.41415 ≤ 0.27215 log log xr .

Hence
Rxr ,2r log 2r + 0.5 ≤ 0.27215 log log xr log x1/3
r − 0.27215 log 12.5 log 3 + 0.5
≤ 0.09072 log log xr log xr − 0.255.

At the same time,


1/3
xr 2.50637
z(r) = eγ log log + ≤ eγ log log xr − eγ log 3 + 1.9521
6 log log r (11.16)
γ
≤ e log log xr

for r ≥ 37, and we also get z(r) ≤ eγ log log xr for r ∈ [11, 37] by the bisection
method with 10 iterations. Hence
p
(Rxr ,2r log 2r + 0.5) z(r) + 2.5
p
≤ (0.09072 log log xr log xr − 0.255) eγ log log xr + 2.5
≤ 0.1211 log xr (log log xr )3/2 + 2,
222 CHAPTER 11. OPTIMIZING AND ADAPTING SMOOTHING FUNCTIONS

and so
p
(Rxr ,2r log 2r + 0.5) z(r) + 2.5
√ ≤ (0.21 log xr (log log xr )3/2 + 3.47)x−1/6
r .
2r
Now, by (11.16),
 
13
L2r ≤ eγ log log xr · log(x1/3
r /3) + 7.82 + 13.66 log(x1/3
r /3) + 37.55
4
 
13
≤ eγ log log xr · xr + 4.25 + 4.56 log xr + 22.55.
12
It is clear that
4.25eγ log log xr + 4.56 log xr + 22.55
1/3
< 1234x−1/3
r log xr .
xr /6
for xr ≥ e: we make the comparison for xr = e and take the derivative of the ratio of
the left side by the right side.
It remains to show that
13
0.21 log xr (log log xr )3/2 + 3.47 + 3.36 + eγ x−1/3
r log xr log log xr (11.17)
2
is less than 0.276(log xr )3/2 for xr large enough. Since t 7→ (log t)3/2 /t1/2 is de-
creasing for t > e3 , we see that
−1/3
0.21 log xr (log log xr )3/2 + 6.83 + 13 γ
2 e xr log xr log log xr
<1
0.276(log xr )3/2
3
for all xr ≥ e33 , simply because it is true for x = e33 , which is greater than ee .
1/3
We conclude that h(xr ) ≥ gxr (r) = gxr (xr /6) for xr ≥ e33 . We check that
1/3
h(xr ) ≥ gxr (xr /6) for log xr ∈ [log 663 , 33] as well by the bisection method
(applied with 30 iterations, with log xr as the variable, on the intervals [log 663 , 20],
[20, 25], [25, 30] and [30, 33]). Since r ≥ 11 implies xr ≥ 663 , we are done.
Lemma 11.2.2. Let Rx,r be as in (11.12). Then t → Ret ,r (r) is convex-up for t ≥
3 log 6r.
Proof. Since t → e−t/6 and t → t are clearly convex-up, all we have to do is to show
that t → Ret ,r is convex-up. In general, since
 0 0
00 f f 00 f − (f 0 )2
(log f ) = = ,
f f2
a function of the form (log f ) is convex-up exactly when f 00 f − (f 0 )2 ≥ 0. If f (t) =
1 + a/(t − b), we have f 00 f − (f 0 )2 ≥ 0 whenever
(t + a − b) · (2a) ≥ a2 ,
i.e., a2 + 2at ≥ 2ab, and that certainly happens when t ≥ b. In our case, b =
3 log(2.004r/9), and so t ≥ 3 log 6r implies t ≥ b.
11.2. THE SMOOTHING FUNCTION η∗ : ADAPTING MINOR-ARC BOUNDS223

Now we come to the point where we prove bounds on exponential sums of the form
Sη∗ (α, x) (that is, sums based on the smoothing η∗ ) based on our bounds (11.11) and
(11.14) on the exponential sums Sη2 (α, x). This is straightforward, as promised.
Proposition 11.2.3. Let x ≥ Kx0 , x0 = 2.16 · 1020 , K ≥ 1. Let Sη (α, x) be as
in (10.1). Let η∗ = η2 ∗M ϕ, where η2 is as in (11.10) and ϕ : [0, ∞) → [0, ∞) is
continuous and in L1 .
Let 2α = a/q + δ/x, q ≤ Q, gcd(a, q) = 1, |δ/x| ≤ 1/qQ, where Q = (3/4)x2/3 .
If q ≤ (x/K)1/3 /6, then
   
|δ|
Sη∗ (α, x) ≤ gx,ϕ max 1, q · |ϕ|1 x, (11.18)
8

where
p
(Rx,K,ϕ,2r log 2r + 0.5) z(r) + 2.5 L2r
gx,ϕ (r) = √ + + 3.36K 1/6 x−1/6 ,
2r r
Cϕ,2,K /|ϕ|1
Rx,K,ϕ,t = Rx,t + (Rx/K,t − Rx,t )
log K
(11.19)
with Rx,t and Lt are as in (11.13), and
Z 1
Cϕ,2,K = − ϕ(w) log w dw. (11.20)
1/K

If q > (x/K)1/3 /6, then

|Sη∗ (α, x)| ≤ hϕ (x/K) · |ϕ|1 x,

where
hϕ (x) = h(x) + Cϕ,0,K /|ϕ|1 ,
Z 1/K (11.21)
Cϕ,0,K = 1.04488 |ϕ(w)|dw
0

and h(x) is as in (11.15).


Proof. By (11.9),
Z 1/K Z ∞
dw dw
Sη∗ (α, x) = Sη2 (α, wx)ϕ(w) + Sη2 (α, wx)ϕ(w) .
0 w 1/K w

We bound the first integral by the trivial estimate |Sη2 (α, wx)| ≤ |Sη2 (0, wx)| and
Cor. C.1.3:
Z 1/K Z 1/K
dw dw
|Sη2 (0, wx)|ϕ(x) ≤ 1.04488 wxϕ(w)
0 w 0 w
Z 1/K
= 1.04488x · ϕ(w)dw.
0
224 CHAPTER 11. OPTIMIZING AND ADAPTING SMOOTHING FUNCTIONS

If w ≥ 1/K, then wx ≥ x0 , and we can use (11.11) or (11.14). If q > (x/K)1/3 /6,
then |Sη2 (α, wx)| ≤ h(x/K)wx by (11.14); moreover, |Sη2 (α, y)| ≤ h(y)y for
x/K ≤ y < (6q)3 (by (11.14)) and |Sη2 (α, y)| ≤ gy,1 (r) for y ≥ (6q)3 (by (11.11)).
Thus, Lemma 11.2.1 gives us that
Z ∞ Z ∞
dw dw
|Sη2 (α, wx)|ϕ(w) ≤ h(x/K)wx · ϕ(w)
1/K w 1/K w
Z ∞
= h(x/K)x ϕ(w)dw ≤ h(x/K)|ϕ|1 · x.
1/K

If q ≤ (x/K)1/3 /6, we always use (11.11). We can use the coarse bound
Z ∞
dw
3.36x−1/6 · wx · ϕ(w) ≤ 3.36K 1/6 |ϕ|1 x5/6
1/K w
Since Lr does not depend on x,
Z ∞
Lr dw Lr
· wx · ϕ(w) ≤ |ϕ|1 x.
1/K r w r

By Lemma 11.2.2 and q ≤ (x/K)1/3 /6, y 7→ Rey ,t is convex-up and decreasing


for y ∈ [log(x/K), ∞). Hence
(  
log w log w
1 Rx/K,t + 1 − 1 Rx,t if w < 1,
Rwx,t ≤ log K log K
Rx,t if w ≥ 1.

Therefore
Z ∞
dw
Rwx,t · wx · ϕ(w)
1/K w
Z 1     Z ∞
log w log w
≤ 1 Rx/K,t + 1 − 1 Rx,t xϕ(w)dw + Rx,t ϕ(w)xdw
1/K log K log K 1
Z ∞ Z 1
x
≤ Rx,t x · ϕ(w)dw + (Rx/K,t − Rx,t ) ϕ(w) log wdw
1/K log K 1/K
 
Cϕ,2
≤ Rx,t |ϕ|1 + (Rx/K,t − Rx,t ) · x,
log K
where Z 1
Cϕ,2,K = − ϕ(w) log w dw.
1/K

We finish by proving a couple more lemmas.


Lemma 11.2.4. Let x > K > 1. Let η∗ = η2 ∗M ϕ, where η2 is as in (11.10) and
ϕ : [0, ∞) → [0, ∞) is continuous and in L1 . Let gx,ϕ be as in (11.19).
Then gx,ϕ (r) is a decreasing function of r for 670 ≤ r ≤ (x/K)1/3 /6.
11.2. THE SMOOTHING FUNCTION η∗ : ADAPTING MINOR-ARC BOUNDS225

Proof. Taking derivatives, we can easily see that


log log r log r log r log log r (log r)2 log log r
r 7→ , r 7→ , r 7→ , r 7→ (11.22)
r r r r
are decreasing for r ≥ 20. The same is true if log log r is replaced by z(r), since
z(r)/ log log r is a decreasing function for r ≥ e. Since (Cϕ,2,K /|ϕ| √ 1 )/ log K√≤ 1
(by (11.20)), we see that it is enough to prove that r 7→ Ry,2r log 2r log log r/ 2r is
decreasing on r for y = x and y = x/K (under the assumption that r ≥ 670).
Looking at (11.13) and at (11.22), we see that it remains only to check that
! r
log 8r log log r
r 7→ log 1 + 9y 1/3
log 2r · (11.23)
2 log r
4.008r

is decreasing on r for r ≥ 670. Taking logarithms, and then derivatives, we see that we
have to show that
1 log 8r
r `+ r
2`2 1 1 1
   + + < ,
1+ log 8r
log 1 + log 8r r log 2r 2r log r log log r 2r
2` 2`

9y 1/3
where ` = log We multiply by 2r, and see that this is equivalent to
4.008r .
 
1 1
` 2 − 1+ log 8r 2 1
2`
  + + < 1. (11.24)
log 8r
log 1 + 2` log 2r log r log log r

A derivative test is enough to show that s/ log(1 + s) is an increasing function of s for


s > 0; hence, so is s · (2 − 1/(1 + s))/ log(1 + s). Setting s = (log 8r)/`, we obtain
that the left side of (11.24) is a decreasing function of ` for r ≥ 1 fixed.
Since r ≤ y 1/3 /6, ` ≥ log 54/4.008 > 2.6. Thus, for (11.24) to hold, it is enough
to ensure that
 
1 1
2.6 2 − log
1+ 5.2 8r
2 1
  + + < 1. (11.25)
log 8r
log 1 + 5.2 log 2r log r log log r

A derivative test shows that (2 − 1/s)/ log(1 + s) is a decreasing function of s for


s ≥ 1.23; since log(8 · 75)/5.2 > 1.23, this implies that the left side of (11.25) is a
decreasing function of r for r ≥ 75.
We check that the left side of (11.25) is indeed less than 1 for r = 670; we conclude
that it is less than 1 for all r ≥ 670.

Lemma 11.2.5. Let x ≥ 1025 . Let φ : [0, ∞) → [0, ∞) be continuous and in L1 . Let
gx,φ (r) and h(x) be as in (11.19) and (11.15), respectively. Then
 
3 4/15
gx,φ x ≥ h(2x/ log x).
8
226 CHAPTER 11. OPTIMIZING AND ADAPTING SMOOTHING FUNCTIONS

Proof. We can bound gx,φ (r) from below by


p
(Rx,r log 2r + 0.5) z(r) + 2.5
gmx (r) = √ .
2r

Let r = (3/8)x4/15 . Using the assumption that x ≥ 1025 , we see that


  4/15  
log 3x 2
Rx,r = 0.27125 log 1 +  1
  + 0.41415 ≥ 0.63368.
9 −4/15
2 log 2.004· 3 · x3
8
(11.26)
(It is easy to see that the left side of (11.26) is increasing on x.) Using x ≥ 1025 again,
we get that
2.50637
z(r) = eγ log log r + ≥ 5.68721.
log log r
Since log 2r = (4/15) log x + log(3/4), we conclude that

0.40298 log x + 3.25765


gmx (r) ≥ p .
3/4 · x2/15

Recall that
0.276(log x)3/2 1234 log x
h(x) = 1/6
+ .
x x1/3
We can see that
(log x + 3.3)/x2/15
x 7→ (11.27)
(log(2x/ log x))3/2 /(2x/ log x)1/6
is increasing for x ≥ 1025 (and indeed for x ≥ e27 ) by taking the logarithm of the
right side of (11.27) and then taking its derivative with respect to t = log x. We can
see in the same way that (1/x2/15 )/(log(2x/ log x)/(2x/ log x)1/3 ) is increasing for
x ≥ e22 . Since

0.40298(log x + 3.3) 0.276(log(2x/ log x))3/2


p ≥ ,
3/4 · x2/15 (2x/ log x)1/6
3.25765 − 3.3 · 0.40298 1234 log(2x/ log(x))
p ≥
3/4 · x 2/15 (2x/ log(x))1/3

for x = 1025 , we are done.


Chapter 12

The `2 norm and the large sieve

Our aim here is to give a bound on the `2 norm of an exponential sum over the minor
arcs. While we care about an exponential
P sum in particular, we will prove a result valid
for all exponential sums S(α, x) = n an e(αn) with an of prime support.
We start by adapting ideas from Ramaré’s version of the large sieve for primes to
estimate `2 norms over parts of the circle (§12.1). We are left with the task of giving
an explicit bound on the factor in Ramaré’s work; this we do in §12.2. As a side effect,
this finally gives a fully explicit large sieve for primes that is asymptotically optimal,
meaning a sieve that does not have a spurious factor of eγ in front; this was an arguably
important gap in the literature.

12.1 Variations on the large sieve for primes


We are trying to estimate an integral R/Z |S(α)|3 dα. Instead of bounding it trivially by
R

|S|∞ |S|22R, we can use the fact that large (“major”) values of S(α) have to be multiplied
only by M |S(α)|2 dα, where M R is a union (small in measure) of R major arcs. Now,
can we give an upper bound for M |S(α)|2 dα better than |S|22 = R/Z |S(α)|2 dα?
The first version of [Helb] gave an estimate on that integral using a technique due to
Heath-Brown, which in turn rests on an inequality of Montgomery’s ([Mon71, (3.9)];
see also, e.g., [IK04, Lem. 7.15]). The technique was communicated by Heath-Brown
to the present author, who communicated it to Tao, who used it in his own notable work
on sums of five primes (see [Tao14, Lem. 4.6] and adjoining comments). We will be
able to do better than that estimate here.
The role played by Montgomery’s inequality in Heath-Brown’s method is played
here by a result of Ramaré’s ([Ram09, Thm. 2.1]; see also [Ram09, Thm. 5.2]). The
following proposition is based on Ramaré’s result, or rather on one possible proof of
it. Instead of using the result as stated in [Ram09], we will actually be using elements
of the proof of [Bom74, Thm. 7A], credited to Selberg. Simply integrating Ramaré’s
inequality would give a non-trivial if slightly worse bound.

227
228 CHAPTER 12. THE `2 NORM AND THE LARGE SIEVE

Proposition 12.1.1. Let {an }∞


n=1 , an ∈ C, be√supported on the primes. Assume that
that an = 0 for n ≤ x. Let Q0 ≥ 1, δ0 ≥ 1 be such that
{an } is in `1 ∩ `2 andp
δ0 Q20 ≤ x/2; set Q = x/2δ0 ≥ Q0 . Let
[ [  a δ0 r a δ0 r 
M= − , + . (12.1)
q qx q qx
q≤Q0 a mod q
(a,q)=1
P
Let S(α) = an e(αn) for α ∈ R/Z. Then
n
Z  
2 Gq (Q0 /sq) X
|S(α)| dα ≤ max max |an |2 ,
M q≤Q0 s≤Q0 /q G q (Q/sq) n

where
X µ2 (r)
Gq (R) = . (12.2)
φ(r)
r≤R
(r,q)=1

Proof. By (12.1),
δ0 Q0
Z X Z   2
2
qx X a
|S(α)| dα = S +α dα. (12.3)
M −
δ0 Q0 q
q≤Q0 qx a mod q
(a,q)=1

Thanks to the last equations of [Bom74, p. 24] and [Bom74, p. 25],


  2 2
X a 1 X

X∗ X
S = q · an χ(n)
q φ(q)
a mod q q ∗ |q χ mod q ∗ n
(a,q)=1 (q ∗ ,q/q ∗ )=1
µ2 (q/q ∗ )=1

√ √
for every q ≤ x, where we use the assumption that n is prime and > x (and thus
coprime to q) when an 6= 0. Hence
Z Z δ0 Q0 2
2
X X

qx 1 X
|S(α)| dα = q an e(αn)χ(n) dα
M −
δ0 Q0 φ(q) n
q≤Q0 q ∗ |q qx

(q ∗ ,q/q ∗ )=1
µ2 (q/q ∗ )=1
δ0 Q0 2
q∗ µ2 (r)
Z
X X q ∗ rx X∗ X
= an e(αn)χ(n) dα
φ(q ∗ ) φ(r) δ Q
− q0∗ rx0 χ mod q ∗
q ∗ ≤Q0 r≤Q0 /q ∗ n
(r,q ∗ )=1
δ0 Q0 2
q∗ µ2 (r)
Z
X q∗ x X X∗ X
= an e(αn)χ(n) dα
φ(q ∗ ) −
δ0 Q0 φ(r)
q ∗ ≤Q0 q∗ x Q
r≤ q∗0 min(1, |α|x
0
) δ χ mod q ∗ n

(r,q ∗ )=1
12.1. VARIATIONS ON THE LARGE SIEVE FOR PRIMES 229

Here |α| ≤ δ0 Q0 /q ∗ x implies (Q0 /q)δ0 /|α|x ≥ 1. Therefore,


Gq∗ (Q0 /sq ∗ )
Z  
2
|S(α)| dα ≤ max max · Σ, (12.4)
M q ∗ ≤Q0 s≤Q0 /q ∗ Gq ∗ (Q/sq ∗ )

where
δ0 Q0 2
q∗ µ2 (r)
Z
X q∗ x X X∗ X
Σ= an e(αn)χ(n) dα
φ(q ∗ ) −
δ0 Q0 φ(r)
q ∗ ≤Q0 q∗ x
r≤ qQ∗ min(
δ0
1, |α|x ) χ mod q ∗ n

(r,q ∗ )=1
0 δ Q 2
X q X µ2 (r) Z qrx X∗ X
≤ an e(αn)χ(n) dα.
φ(q) φ(r) − δqrx0Q
n
q≤Q r≤Q/q χ mod q
(r,q)=1

As stated in the proof of [Bom74, Thm. 7A],


qr
X b
χ(r)χ(n)τ (χ)cr (n) = χ(b)e2πin qr
b=1
(b,qr)=1

for χ primitive of modulus q. Here cr (n) stands for the Ramanujan sum
X
cr (n) = e2πnu/r .
u mod r
(u,r)=1

√coprime to r, cr (n) = µ(r). Since χ is primitive, |τ (χ)| =
For n q. Hence, for
r ≤ x coprime to q,
2
2 qr  
X X b
q an e(αn)χ(n) = χ(b)S +α .
n
qr
b=1
(b,qr)=1

Thus,
2
0 δ Q qr
X X µ2 (r) Z qrx X∗ X 
b

Σ= χ(b)S +α dα
φ(rq) − δqrx
0Q qr
q≤Q r≤Q/q χ mod q b=1
(r,q)=1 (b,qr)=1
2
Z δ0 Q q  
X 1 qx X X b
≤ χ(b)S +α dα
φ(q) δ0 Q
− qx q
q≤Q χ mod q b=1
(b,q)=1
δ0 Q q
XZ   2
qx X b
= S +α dα.

δ0 Q q
q≤Q qx b=1
(b,q)=1
230 CHAPTER 12. THE `2 NORM AND THE LARGE SIEVE

p that the intervals (b/q − δ0 Q/qx, b/q + δ0 Q/qx) do not overlap.


Let us now check
Since Q = x/2δ0 , we see that δ0 Q/qx = 1/2qQ. The difference between two
distinct fractions b/q, b0 /q 0 is at least 1/qq 0 . For q, q 0 ≤ Q, 1/qq 0 ≥ 1/2qQ + 1/2Qq 0 .
Hence the intervals around b/q and b0 /q 0 do not overlap. We conclude that
Z X
2
Σ≤ |S(α)| = |an |2 ,
R/Z n

and so, by (12.4), we are done.


We will actually use Prop. 12.1.1 in the slightly modified form given by the follow-
ing statement.
Proposition 12.1.2. Let {an }∞n=1 , an ∈ C, be√supported on the primes. Assume that
{an } is in `1 ∩ `2 andpthat an = 0 for n ≤ x. Let Q0 ≥ 1, δ0 ≥ 1 be such that
δ0 Q20 ≤ x/2; setPQ = x/2δ0 ≥ Q0 . Let M = Mδ0 ,Q0 be as in (10.5).
Let S(α) = n an e(αn) for α ∈ R/Z. Then
 
Z
2 Gq (2Q0 /sq) X
|S(α)| dα ≤  max max  |an |2 ,
Mδ0 ,Q0 q≤2Q0 s≤2Q0 /q Gq (2Q/sq)
q evenn

where
X µ2 (r)
Gq (R) = . (12.5)
φ(r)
r≤R
(r,q)=1

Proof. By (10.5),
δ0 Q0
Z X Z   2
2
2qx X a
|S(α)| dα = S +α dα
M −
δ0 Q0 q
q≤Q0 2qx a mod q
q odd (a,q)=1
δ0 Q0
X Z   2
qx X a
+ S +α dα.

δ0 Q0 q
q≤Q0 qx a mod q
q even (a,q)=1

R 2
We proceed as in the proof of Prop. 12.1.1. We still have (12.3). Hence M
|S(α)| dα
equals
δ0 Q0 2
q∗ µ2 (r)
Z
X 2q ∗ x X X∗ X
an e(αn)χ(n) dα
φ(q ∗ ) δ Q
0 0
− 2q φ(r)
q ∗ ≤Q0 ∗x Q
r≤ q∗0 min(1, 2|α|x
0
) δ χ mod q ∗ n
q ∗ odd
(r,2q ∗ )=1
δ0 Q0 2
q∗ µ2 (r)
Z
X q∗ x X X∗ X
+ an e(αn)χ(n) dα.
φ(q ∗ ) −
δ0 Q0 φ(r)
q ∗ ≤2Q0 q∗ x
r≤
2Q0
min(1, 2|α|x
0 δ
) χ mod q ∗ n
q∗
q ∗ even
(r,q ∗ )=1
12.1. VARIATIONS ON THE LARGE SIEVE FOR PRIMES 231

(The sum with q odd and r even is equal to the first sum; hence the factor of 2 in front.)
Therefore,
 
Z ∗
2 G2q∗ (Q0 /sq ) 
|S(α)| dα ≤  max max · 2Σ1

∗ ≤Q s≤Q /q ∗ G ∗ 
M q 0 0 2q ∗ (Q/sq )
q ∗ odd
  (12.6)
Gq∗ (2Q0 /sq ∗ ) 
+  ∗max max  · Σ2 ,

q ≤2Q0 s≤2Q0 /q ∗ Gq∗ (2Q/sq ∗ )
q ∗ even

where
δ0 Q 2
µ2 (r)
Z
X q X 2qrx X∗ X
Σ1 = an e(αn)χ(n) dα
φ(q) φ(r) δ Q
0
− 2qrx n
q≤Q r≤Q/q χ mod q
q odd (r,2q)=1
0 δ Q 2
X q X µ2 (r) Z qrx X∗ X
= an e(αn)χ(n) dα.
φ(q) φ(r) − δqrx0Q
n
q≤Q r≤2Q/q χ mod q
q odd (r,q)=1
r even
0 δ Q 2
X q X µ2 (r) Z qrx X∗ X
Σ2 = an e(αn)χ(n) dα.
φ(q) φ(r) − δqrx0Q
n
q≤2Q r≤2Q/q χ mod q
q even (r,q)=1

The two expressions within parentheses in (12.6) are actually equal.


Much as before, using [Bom74, Thm. 7A], we obtain that
δ0 Q q  2
X 1 Z 2qx X b
Σ1 ≤ S +α dα,
φ(q) − δ2qx
0Q q
q≤Q b=1
q odd (b,q)=1
δ0 Q q
Z   2
X 1 qx X b
Σ1 + Σ 2 ≤ S +α dα.
φ(q) −
δ0 Q q
q≤2Q qx b=1
q even (b,q)=1

Let us now check that the intervals of integration (b/q − δ0 Q/2qx, b/q + δ0 Q/2qx)
(for q odd), (b/q − δ0 Q/qx, b/q + δ0 Q/qx) (for q even) do not overlap. Recall that
δ0 Q/qx = 1/2qQ. The absolute value of the difference between two distinct fractions
b/q, b0 /q 0 is at least 1/qq 0 . For q, q 0 ≤ Q odd, this is larger than 1/4qQ + 1/4Qq 0 ,
and so the intervals do not overlap. For q ≤ Q odd and q 0 ≤ 2Q even (or vice versa),
1/qq 0 ≥ 1/4qQ + 1/2Qq 0 , and so, again the intervals do not overlap. If q ≤ Q
and q 0 ≤ Q are both even, then |b/q − b0 /q 0 | is actually ≥ 2/qq 0 . Clearly, 2/qq 0 ≥
1/2qQ + 1/2Qq 0 , and so again there is no overlap. We conclude that
Z X
2
2Σ1 + Σ2 ≤ |S(α)| = |an |2 .
R/Z n
232 CHAPTER 12. THE `2 NORM AND THE LARGE SIEVE

12.2 Bounding the quotient in the large sieve for primes


The estimate given by Proposition 12.1.1 involves the quotient
Gq (Q0 /sq)
max max , (12.7)
q≤Q0 s≤Q0 /q Gq (Q/sq)
where Gq is as in (12.2). The appearance of such a quotient (at least for s = 1)
is typical of Ramaré’s version of the large sieve for primes; see, e.g., [Ram09]. We
will see how to bound such a quotient in a way that is essentially optimal, not just
asymptotically, but also in the ranges that are most relevant to us. (This includes, for
example, Q0 ∼ 106 , Q ∼ 1015 .)
As the present work shows, an approach based on Ramaré’s work gives bounds that
are, in some contexts, better than those of other large sieves for primes by a constant
factor (approaching eγ = 1.78107 . . . ). Thus, giving a fully explicit and nearly optimal
bound for (12.7) is a task of clear general relevance, besides being needed for our main
goal.
We will obtain bounds for Gq (Q0 /sq)/Gq (Q/sq) when Q0 ≤ 2 · 1010 , Q ≥ Q20 .
As we shall see, our bounds will be best when s = q = 1 – or, sometimes, when s = 1
and q = 2 instead.
Write G(R) for G1 (R) = r≤R µ2 (r)/φ(r). We will need several estimates for
P
Gq (R) and G(R). As stated in [Ram95, Lemma 3.4],

G(R) ≤ log R + 1.4709 (12.8)

for R ≥ 1. By [MV73, Lem. 7],

G(R) ≥ log R + 1.07 (12.9)

for R ≥ 6. There is also the trivial bound


X µ2 (r) X µ2 (r) Y  −1
1
G(R) = = 1−
φ(r) r p
r≤R r≤R p|r
2
(12.10)
X µ (r) Y X 1 X1
= j
≥ > log R.
r p r
r≤R p|r j≥1 r≤R

The following bound, also well-known and easy,


q
G(R) ≤ Gq (R) ≤ G(Rq), (12.11)
φ(q)

can be obtained by multiplying Gq (R) = r≤R:(r,q)=1 µ2 (r)/φ(r) term-by-term by


P
Q
q/φ(q) = p|q (1 + 1/φ(p)).
We will also use Ramaré’s estimate from [Ram95, Lem. 3.4]:
 
φ(d)  X log p 
 + O∗ 7.284R−1/3 f1 (d)

Gd (R) = log R + cE + (12.12)
d p
p|d
12.2. BOUNDING THE QUOTIENT IN THE LARGE SIEVE FOR PRIMES 233

for all d ∈ Z+ and all R ≥ 1, where


−1
p1/3 + p2/3
Y 
f1 (d) = (1 + p−2/3 ) 1 + (12.13)
p(p − 1)
p|d

and
X log p
cE = γ + = 1.3325822 . . . (12.14)
p(p − 1)
p≥2

by [RS62, (2.11)].
If R ≥ 182, then

log R + 1.312 ≤ G(R) ≤ log R + 1.354, (12.15)

where the upper bound is valid for R ≥ 120. This is true by (12.12) for R ≥ 4 · 107 ;
we check (12.15) for 120 ≤ R ≤ 4 · 107 by a numerical computation.1 Similarly, for
R ≥ 200,
log R + 1.661 log R + 1.698
≤ G2 (R) ≤ (12.16)
2 2
by (12.12) for R ≥ 1.6 · 108 , and by a numerical computation for 200 ≤ R ≤ 1.6 · 108 .
Write ρ = (log Q0 )/(log Q) ≤ 1. We obtain immediately from (12.15) and (12.16)
that
G(Q0 ) log Q0 + 1.354

G(Q) log Q + 1.312
(12.17)
G2 (Q0 ) log Q0 + 1.698

G2 (Q) log Q + 1.661
for Q, Q0 ≥ 200. What is hard is to approximate Gq (Q0 )/Gq (Q) for q large and Q0
small.
Let us start by giving an easy bound, off from the truth by a factor of about eγ .
(Specialists will recognize this as a factor that appears often in first attempts at esti-
mates based on either large or small sieves.) First, we need a simple explicit lemma.
Lemma 12.2.1. Let m ≥ 1, q ≥ 1. Then
Y p
≤ eγ (log(m + log q) + 0.65771). (12.18)
p−1
p|q∨p≤m

Proof. Let P =
Q
p≤m∨p|q p. Then, by [RS75, (5.1)],
Y P
P≤q p = qe p≤m log p
≤ qe(1+0 )m ,
p≤m

where 0 = 0.001102. Now, by [RS62, (3.42)],


n 2.50637 2.50637
≤ eγ log log n + ≤ eγ log log x +
φ(n) log log n log log x
1 Using D. Platt’s implementation [Pla11] of double-precision interval arithmetic based on Lambov’s

[Lam08] ideas.
234 CHAPTER 12. THE `2 NORM AND THE LARGE SIEVE

≥ n ≥ 27 (since, given a, b > 0, the function t 7→ a + b/t is increasing on t


for all xp
for t ≥ b/a). Hence, if qem ≥ 27,

P 2.50637
≤ eγ log((1 + 0 )m + log q) +
φ(P) log(m + log q)
2.50637/eγ
 
γ
≤ e log(m + log q) + 0 + .
log(m + log q)

Thus (12.18) holds when m + log q ≥ 8.53, since then 0 + (2.50637/eγ )/ log(m +
log q) ≤ 0.65771. We verify all choices of m, q ≥ 1 with m + log q ≤ 8.53 compu-
tationally; the worst case is that of m = 1, q = 6, which give the value 0.65771 in
(12.18).
Here is the promised easy bound.
Lemma 12.2.2. Let Q0 ≥ 1, Q ≥ 182Q0 . Let q ≤ Q0 , s ≤ Q0 /q, q an integer. Then
 
γ Q0
Gq (Q0 /sq) e log sq + log q + 1.172 eγ log Q0 + 1.172
≤ ≤ .
Gq (Q/sq) log QQ0 + 1.312 log QQ0 + 1.312

Proof. Let P = p≤Q0 /sq∨p|q p. Then


Q

Gq (Q0 /sq)GP (Q/Q0 ) ≤ Gq (Q/sq)

and so
Gq (Q0 /sq) 1
≤ . (12.19)
Gq (Q/sq) GP (Q/Q0 )
Now the lower bound in (12.11) gives us that, for d = P, R = Q/Q0 ,

φ(P)
GP (Q/Q0 ) ≥ G(Q/Q0 ).
P
By Lem. 12.2.1,
P
   
Q0
≤ eγ log + log q + 0.658 .
φ(P) sq
Hence, using (12.15), we get that
 
Gq (Q0 /sq) P/φ(P) eγ log Qsq + log q + 1.172
0

≤ ≤ , (12.20)
Gq (Q/sq) G(Q/Q0 ) log QQ0 + 1.312

since Q/Q0 ≥ 184. Since


 0  
Q0 Q0 1 1 Q0
+ log q = − 2 + = 1− ≤ 0,
sq sq q q sq
the rightmost expression of (12.20) is maximal for q = 1.
12.2. BOUNDING THE QUOTIENT IN THE LARGE SIEVE FOR PRIMES 235

Lemma 12.2.2 will play a crucial role in reducing to a finite computation the prob-
lem of bounding Gq (Q0 /sq)/Gq (Q/sq). As we will now see, we can use Lemma
12.2.2 to obtain a bound that is useful when sq is large compared to Q0 – precisely the
case in which asymptotic estimates such as (12.12) are relatively weak.
Lemma 12.2.3. Let Q0 ≥ 1, Q ≥ 200Q0 . Let q ≤ Q0 , s ≤ Q0 /q. Let ρ =
(log Q0 )/ log Q ≤ 2/3. Then, for any σ ≥ 1.312ρ,

Gq (Q0 /sq) log Q0 + σ


≤ (12.21)
Gq (Q/sq) log Q + 1.312

holds provided that


Q0 (1−ρ)e−γ
≤ c(σ) · Q0 − log q,
sq
where c(σ) = exp(exp(−γ) · (σ − σ 2 /5.248 − 1.172)).
Proof. By Lemma 12.2.2, we see that (12.21) will hold provided that

log QQ0 + 1.312


 
γ Q0
e log + log q + 1.172 ≤ · (log Q0 + σ). (12.22)
sq log Q + 1.312
The expression on the right of (12.22) equals

(log Q0 + σ) log Q0
log Q0 + σ −
log Q + 1.312
1.312ρ(log Q0 + σ)
= (1 − ρ)(log Q0 + σ) +
log Q + 1.312
≥ (1 − ρ)(log Q0 + σ) + 1.312ρ2

and so (12.22) will hold provided that


 
Q0
eγ log + log q + 1.172 ≤ (1 − ρ)(log Q0 ) + (1 − ρ)σ + 1.312ρ2 .
sq
Taking derivatives, we see that
 σ   σ 2
(1 − ρ)σ + 1.312ρ2 − 1.172 ≥ 1 − σ + 1.312 − 1.172
2.624 2.624
2
σ
=σ− − 1.172.
4 · 1.312
Hence it is enough that
Q0 σ2
 
e−γ (1−ρ) log Q0 +σ− 4·1.312 −1.172 (1−ρ)e−γ
+ log q ≤ e = c(σ) · Q0 ,
sq

where c(σ) = exp(exp(−γ) · (σ − σ 2 /5.248 − 1.172)).


We now pass to the main result of the section.
236 CHAPTER 12. THE `2 NORM AND THE LARGE SIEVE

Proposition 12.2.4. Let Q ≥ 20000Q0 , Q0 ≥ Q0,min , where Q0,min = 105 . Let


ρ = (log Q0 )/ log Q. Assume ρ ≤ 0.6. Then, for every 1 ≤ q ≤ Q0 and every
s ∈ [1, Q0 /q],
Gq (Q0 /sq) log Q0 + c+
≤ , (12.23)
Gq (Q/sq) log Q + cE
where cE is as in (12.14) and c+ = 1.36.
An ideal result would have c+ instead of cE , but this is not actually possible: error
terms do exist, even if they are in reality smaller than the bound given in (12.12); this
means that a bound such as (12.23) with c+ instead of cE would be false for q = 1,
s = 1.
There is nothing special about the assumptions
Q ≥ 20000Q0 , Q0 ≥ 105 , (log Q0 )/(log Q) ≤ 0.6.
They can all be relaxed at the cost of an increase in c+ .
Proof. Define errq,R so that
 
φ(q)  X log p
Gq (R) = log R + cE +  + errq,R . (12.24)
q p
p|q

Then (12.23) will hold if


Q0 X log p q
log + cE + + errq, Q0
sq p φ(q) sq
p|q
  (12.25)
Q X log p q log Q0 + c+
≤ log + cE + + errq, Q  .
sq p φ(q) sq log Q + cE
p|q

This, in turn, happens if


 
X log p  
log sq −  1 − log Q0 + c+ + c+ − cE
p log Q + cE
p|q
 
q log Q0 + c+
≥ errq, Q0 − errq, Q .
φ(q) sq log Q + cE sq

Define
log Q0,min + c+ c+ − ρcE
ω(ρ) = 1 =ρ+ 1 .
ρ log Q0,min + cE ρ log Q0,min + cE
Then ρ ≤ (log Q0 + c+ )/(log Q + cE ) ≤ ω(ρ) (because c+ ≥ ρcE ). We conclude that
(12.25) (and hence (12.23)) holds provided that
 
X log p
(1 − ω(ρ)) log sq −  + c∆
p
p|q (12.26)
q   
≥ errq, Q0 +ω(ρ) max 0, − errq, Q ,
φ(q) sq sq
12.2. BOUNDING THE QUOTIENT IN THE LARGE SIEVE FOR PRIMES 237

where c∆ = c+ − cE . Note that 1 − ω(ρ) > 0.


First, let us give some easy bounds on the error terms; these bounds will yield upper
bounds for s. By (12.8) and (12.11),
 
φ(q)  X log p
errq,R ≤ log q − + (1.4709 − cE )
q p
p|q

for R ≥ 1; by (12.15) and (12.11),


 
φ(q) X log p
errq,R ≥− + (cE − 1.312)
q p
p|q

for R ≥ 182. Therefore, the right side of (12.26) is at most


X log p
log q − (1 − ω(ρ)) + ((1.4709 − cE ) + ω(ρ)(cE − 1.312)),
p
p|q

and so (12.26) holds provided that

(1 − ω(ρ)) log sq ≥ log q + (1.4709 − cE ) + ω(ρ)(cE − 1.312) − c∆ . (12.27)

We will thus be able to assume from now on that (12.27) does not hold, or, what is the
same, that
1
sq < (cρ,2 q) 1−ω(ρ) (12.28)
holds, where cρ,2 = exp((1.4709 − cE ) + ω(ρ)(cE − 1.312) − c∆ ).
What values of R = Q0 /sq must we consider for q given? First, by (12.28), we
can assume R > Q0,min /(cρ,2 q)1/(1−ω(ρ)) . We can also assume
−γ
R > c(c+ ) · max(Rq, Q0,min )(1−ρ)e − log q (12.29)

for c(c+ ) is as in Lemma 12.2.3, since all smaller R are covered by that Lemma.
Clearly, (12.29) implies that
log q
R1−τ > c(c+ ) · q τ − > c(c+ )q τ − log q,

(1−ρ)e−γ
where τ = (1−ρ)e−γ , and also that R > c(c+ )Q0,min −log q. Iterating, we obtain
that we can assume that R > $(q), where
!
Q0,min
$(q) = max $0 (q), c(c+ )Qτ0,min − log q, 1 (12.30)
(cρ,2 q) 1−ω(ρ)
and
 1
 1−τ
 log q
τ
c(c+ )q − if c(c+ )q τ > log q + 1,

τ
$0 (q) = (c(c+ )q τ −log q) 1−τ

0 otherwise.
238 CHAPTER 12. THE `2 NORM AND THE LARGE SIEVE

Looking at (12.26), we see that it will be enough to show that, for all R satisfying
R > $(q), we have
φ(q)
errq,R +ω(ρ) max (0, − errq,tR ) ≤ κ(q) (12.31)
q
for all t ≥ 20000, where
 
X log p
κ(q) = (1 − ω(ρ)) log q −  + c∆ .
p
p|q

Ramaré’s bound (12.12) implies that


| errq,R | ≤ 7.284R−1/3 f1 (q), (12.32)
with f1 (q) as in (12.13), and so
errq,R +ω(ρ) max (0, − errq,tR ) ≤ (1 + βρ ) · 7.284R−1/3 f1 (q),
where βρ = ω(ρ)/200001/3 . This is enough when
 3
q 7.284(1 + βρ )f1 (q)
R ≥ λ(q) = . (12.33)
φ(q) κ(q)
It remains to do two things. First, we have to compute how large q has to be for
$(q) to be guaranteed to be greater than λ(q). (For such q, there is no checking to be
done.) Then, we check the inequality (12.31) for all smaller q, letting R range through
the integers in [$(q), λ(q)]. We bound errq,tR using (12.32), but we compute errq,R
directly.
How large must q be for $(q) > λ(q) to hold? We claim that $(q) > λ(q)
whenever q ≥ 2.2 · 1010 . Let us show this.
It is easy to see that (p/(p − 1)) · f1 (p) and p → (log p)/p are decreasing functions
of p for p ≥ 3; moreover, for both functions,
Q the value at p ≥ 7 is smaller than for
p = 2. Hence, we have that, for q < p≤p0 p, p0 a prime,
!
X log p
κ(q) ≥ (1 − ω(ρ)) log q − + c∆ (12.34)
p<p
p
0

and
 Q 3
Y p 7.284(1 + βρ ) p<p0 f1 (p)
λ(q) ≤  ·    . (12.35)
p<p0
p − 1 (1 − ω(ρ)) log q − P log p
+ c
p<p0 p ∆

If we also assume that 2 · 3 · 5 · 7 - q, we obtain


 
 X log p 
κ(q) ≥ (1 − ω(ρ)) 
log q −
 + c∆ (12.36)
p<p
p 
0
p6=7
12.2. BOUNDING THE QUOTIENT IN THE LARGE SIEVE FOR PRIMES 239

and
 3
Q
Y p 7.284(1 + βρ ) p<p0 ,p6=7 f1 (p) 
λ(q) ≤  ·    (12.37)

p<p0
p − 1 (1 − ω(ρ)) log q − P log p
+ c

p<p0 ,p6=7 p ∆
p6=7
Q
for q < p≤p0 . (We are taking out 7 because it is the “least helpful” prime to omit
among all primes from 2 to 7, again by the fact that (p/(p − 1)) · f1 (p) and p →
(log p)/p are decreasing functions for p ≥ 3.)
We know how to give upper bounds for the expression on the right of (12.35).
The task is in essence simple: we can base our bounds on the classic explicit work in
[RS62], except that we also have to optimize matters so that they are close to tight for
p1 = 29, p1 = 31 and other low p1 .
By [RS62, (3.30)] and a numerical computation for 29 ≤ p1 ≤ 43,
Y p
< 1.90516 log p1
p−1
p≤p1

for p1 ≥ 29. Since ω(ρ) is increasing on ρ and we are assuming ρ ≤ 0.6, Q0,min =
100000,
ω(ρ) ≤ 0.627312, βρ ≤ 0.023111.
For x > a, where a > 1 is any constant, we obviously have
X   X p−2/3
log 1 + p−2/3 ≤ (log p) .
log a
a<p≤x a<p≤x
P
by Abel summation (13.3) and the estimate [RS62, (3.32)] for θ(x) = p≤x log p,
Z x  
X
−2/3 − 32 2 −5
(log p)p = (θ(x) − θ(a))x − (θ(u) − θ(a)) − u 3 du
a 3
a<p≤x
2 x
Z
2 5
≤ (1.01624x − θ(a))x− 3 + (1.01624u − θ(a)) u− 3 du
3 a
2
= (1.01624x − θ(a))x− 3 + 2 · 1.01624(x1/3 − a1/3 ) + θ(a)(x−2/3 − a−2/3 )
= 3 · 1.01624 · x1/3 − (2.03248a1/3 + θ(a)a−2/3 ).

that 104 <p≤x log(1 + p−2/3 ) ≤ 0.33102x1/3 − 7.06909 for x > 104 .
P
We conclude
P
Since p≤104 log p ≤ 10.09062, this means that
 
X 10.09062 − 7.06909
log(1 + p−2/3 ) ≤ 0.33102 + 4/3
x1/3 ≤ 0.47126x1/3
p≤x
10

for x > 104 ; a direct computation for all x prime between 29 and 104 then confirms
that X
log(1 + p−2/3 ) ≤ 0.74914x1/3
p≤x
240 CHAPTER 12. THE `2 NORM AND THE LARGE SIEVE

for all x ≥ 29. Thus,


log(1+p−2/3 ) 1/3
P
Y e p≤x e0.74914x
f1 (p) ≤ Q   ≤
p1/3 +p2/3 6.62365
p≤x p≤29 1 + p(p−1)

for x ≥ 29. Finally, by [RS62, (3.24)], p≤p1 logp p < log p1 .


P
Q
We conclude that, for q < p≤p0 p0 , p0 a prime, and p1 the prime immediately
preceding p0 ,
  1/3
 3
e0.74914p1
 7.45235 · 6.62365 
λ(q) ≤  1.90516 log p 1 · 
 0.37268(log q − log p1 ) + 0.02741 
(12.38)
1/3
190.272(log p1 )3 e2.24742p1
≤ .
(log q − log p1 + 0.07354)3
It is clear from (12.30) that $(q) is increasing as soon as
1−ω(ρ)
q ≥ max(Q0,min , Q0,min /cρ,2 )
and c(c+ )q τ > log q + 1, since then $0 (q) is increasing and $(q) = $0 (q). Here it is
useful to recall that cρ,2 ≥ exp(1.4709 − c+ ), and to note that c(c+ )q τ − (log q + 1)
is increasing for q ≥ 1/(τ · c(c+ ))1/τ ; we see also that 1/(τ · c(c+ ))1/τ ≤ 1/((1 −
−γ
0.6)e−γ c(c+ ))1/((1−0.6)e ) for ρ ≤ 0.6. A quick computation for our value of c+
makes us conclude that q > 1.12Q0,min = 112000 is a sufficient condition for $(q) to
be equal to $0 (q) and for $0 (q) to be increasing.
Since (12.38) is decreasing on q for p1 fixed, and $0 (q) is decreasing on ρ and
increasing on q, we set ρ = 0.6 and check that then
$0 2.2 · 1010 ≥ 846.765,


whereas, by (12.38),
λ(2.2 · 1010 ) ≤ 838.227 < 846.765;
this is enough to ensure that λ(q) < $0 (q) for 2.2 · 1010 ≤ q < p≤31 p.
Q

Q Let us now give some rough bounds that will be enough to cover the case q ≥
τ
p≤31 p. First, as we already discussed, $(q) = $ 0 (q) and, since c(c + )q > log q +
1,
1
$0 (q) ≥ (c(c+ )q τ − log q) 1−τ ≥ (0.911q 0.224 − log q)1.289 ≥ q 0.2797 (12.39)
Q Q Q
by q ≥ p≤31 p. We are in the range p≤p1 p ≤ q ≤ p≤p0 p, where p1 < p0
are two consecutive primes with p1 ≥ 31.QBy [RS62, (3.16)] and a computation for
31 ≤ q < 200, we know that log q ≥ p≤p1 log p ≥ 0.8009p1 . By (12.38) and
(12.39), it follows that we just have to show that
1/3
190.272(log t)3 e2.24742t
e0.224t >
(0.8009t − log t + 0.07354)3
12.2. BOUNDING THE QUOTIENT IN THE LARGE SIEVE FOR PRIMES 241

for t ≥ 31. Now, t ≥ 31 implies 0.8009t − log t + 0.07354 ≥ 0.6924t, and so, taking
logarithms we see that we just have to verify

0.224t − 2.24742t1/3 > 3 log log t − 3 log t + 6.3513 (12.40)

for t ≥ 31, and, since the left side is increasing and the right side is decreasing for
t ≥ 31, this is trivial to check.
We conclude that $(q) > λ(q) whenever q ≥ 2.2 · 1010 .
It remains to see how we can relax this assumption if we assume that 2 · 3 · 5 · 7 - q.
We repeat the same analysis as before, using (12.36) and (12.37) instead of (12.34) and
(12.35). For p1 ≥ 29,
1/3
−log(1+7−2/3 ) 1/3
Y p Y e0.74914x e0.74914x
< 1.633 log p1 , f1 (p) ≤ ≤
p−1 5.8478 7.44586
p≤p1 p≤p1
p6=7 p6=7
P Q
and p≤p1 :p6=7 (log p)/p < log p1 − (log 7)/7. So, for q < p≤p0 :p6=7 p, and p1 ≥ 29
the prime immediately preceding p0 ,
  1/3
 3
e0.74914p1
 7.45235 · 7.44586 
λ(q) ≤  1.633 log p 1 ·   
log 7
0.37268 log q − log p1 + 7 + 0.02741
 

1/3
84.351(log p1 )3 e2.24742p1
≤ .
(log q − log p1 + 0.35152)3
Thus we obtain, just like before, that

$0 (3.3 · 109 ) ≥ 477.465, λ(3.3 · 109 ) ≤ 475.513 < 477.465.

Q also check that $0 (q0 ) ≥ 916.322


We Q is greater than λ(q0 ) ≤ 429.731 for q0 =
p≤31:p6=7 p. The analysis for q ≥ p≤37:p6=7 p is also just like before: since log q ≥
0.8009p1 − log 7, we have to show that
1/3
e0.224t 84.351(log t)3 e2.24742t
>
7 (0.8009t − log t + 0.07354)3
for t ≥ 37, and that, in turn, follows from

0.224t − 2.24742t1/3 > 3 log log t − 3 log t + 6.74849,

which we check for t ≥ 37 just as we checked (12.40).


We conclude that $(q) > λ(q) if q ≥ 3.3 · 109 and 210 - q.
Computation. Now, for q < 3.3·109 (and also for 3.3·109 ≤ q < 2.2·1010 , 210|q),
we need to check that the maximum mq,R,1 of errq,R over all $(q) ≤ R < λ(q)
satisfies (12.31). Note that there is a term errq,tR in (12.31); we bound it using (12.32).
Since log R is increasing on R and Gq (R) depends only on bRc, we can tell from
(12.24) that, since we are taking the maximum of errq,R , it is enough to check integer
242 CHAPTER 12. THE `2 NORM AND THE LARGE SIEVE

values of R. We check all integers R in [$(q), λ(q)) for all q < 3.3 · 109 (and all
3.3 · 109 ≤ q < 2.2 · 1010 , 210|q) by an explicit computation.2

Finally, we have the trivial bound

Gq (Q0 /sq)
≤ 1, (12.41)
Gq (Q/sq)

which we shall use for Q0 close to Q.

Corollary 12.2.5. Let {an }∞ n=1 , an ∈ C, be√supported on the5 primes. Assume that
{an } is in `1 ∩ `2 and that an =
p 0 for n ≤ x. Let Q0 ≥ 10 , δ0 ≥ 1 be such that
(20000Q0 )2 ≤ x/2δ0 ; set Q = x/2δ0 .
Let S(α) = n an e(αn) for α ∈ R/Z. Let M as in (12.1). Then, if Q0 ≤ Q0.6 ,
P

Z
2 log Q0 + c+ X
|S(α)| dα ≤ |an |2 ,
M log Q + cE n
P
where c+ = 1.36 and cE = γ + p≥2 (log p)/(p(p − 1)) = 1.3325822 . . . .
Let Mδ0 ,Q0 as in (10.5). Then, if (2Q0 ) ≤ (2Q)0.6 ,
Z
2 log 2Q0 + c+ X
|S(α)| dα ≤ |an |2 . (12.42)
Mδ0 ,Q0 log 2Q + cE n

2
Here, of course, R/Z |S(α)| dα = n |an |2 (Plancherel). If Q0 > Q0.6 , we will
R P

use the trivial bound


Z Z X
2 2
|S(α)| dα ≤ |S(α)| dα = |an |2 . (12.43)
Mδ0 ,r R/Z n

Proof. Immediate from Prop. 12.1.1, Prop. 12.1.2 and Prop. 12.2.4.

Obviously, one can also give a statement derived from Prop. 12.1.1; the resulting
bound is Z
log Q0 + c+ X
|S(α)|2 dα ≤ |an |2 ,
M log Q + cE n

where M is as in (12.1).
We also record the large-sieve form of the result.
2 This is by far the heaviest computation in the present work, though it is still rather minor (about two

weeks of computing on a single core of a fairly new (2010) desktop computer carrying out other tasks as well;
this is next to nothing compared to the computations in [Plab], or even those in [HP13]). For the applications
here, we could have assumed ρ ≤ 8/15, and that would have reduced computation time drastically; the
lighter assumption ρ ≤ 0.6 was made with views to general applicability in the future. As elsewhere in this
section, numerical computations were carried out by the author in C; all floating-point operations used D.
Platt’s interval arithmetic package.
12.2. BOUNDING THE QUOTIENT IN THE LARGE SIEVE FOR PRIMES 243

Corollary 12.2.6. Let N ≥ 1. Let {an }∞


n=1 , an ∈ C, be supported on the integers
n ≤ N . Let Q0 ≥ 105 , Q ≥ 20000Q0 . Assume that an = 0 for every n for which
there is a p ≤ QP
dividing n.
Let S(α) = n an e(αn) for α ∈ R/Z. Then, if Q0 ≤ Q0.6 ,
X X 2 log Q0 + c+ X
|S(a/q)| dα ≤ · (N + Q2 ) |an |2 ,
log Q + cE n
q≤Q0 a mod q
(a,q)=1
P
where c+ = 1.36 and cE = γ + p≥2 (log p)/(p(p − 1)) = 1.3325822 . . . .
Proof. Proceed as Ramaré does in the proof of [Ram09, Thm. 5.2], with Kq = {a ∈
Z/qZ : (a, q) = 1} and un = an ); in particular, apply [Ram09, Thm. 2.1]. The proof
of [Ram09, Thm. 5.2] shows that
X X 2 Gq (Q0 ) X X 2
|S(a/q)| dα ≤ max · |S(a/q)| dα.
q≤Q0 Gq (Q)
q≤Q0 a mod q q≤Q0 a mod q
(a,q)=1 (a,q)=1

Now, instead of using the easy inequality Gq (Q0 )/Gq (Q) ≤ G1 (Q0 )/G1 (Q/Q0 ), use
Prop. 12.2.4.

***

It would seem desirable to prove a result such as Prop. 12.2.4 (or Cor. 12.2.5, or
Cor. 12.2.6) without computations and with conditions that are as weak as possible.
Since, as we said, we cannot make c+ equal to cE , and since c+ does have to increase
when the conditions are weakened (as is shown by computations; this is not an arti-
fact of our method of proof) the right goal might be to show that the maximum of
Gq (Q0 /sq)/Gq (Q/sq) is reached when s = q = 1.
However, this is also untrue without conditions. For instance, for Q0 = 2 and Q
large, the value of Gq (Q0 /q)/Gq (Q/q) at q = 2 is larger than at q = 1: by (12.12),
 
G2 Q20 1
  ∼  
Q 1 Q log 2
G2 2 2 log 2 + c E + 2

2 2 G(Q0 )
= log 2
> ∼ .
log Q + cE − 2
log Q + cE G(Q)

Thus, at the very least, a lower bound on Q0 is needed as a condition. This also dims
the hopes somewhat for a combinatorial proof of Gq (Q0 /q)G(Q) ≤ Gq (Q/q)G(Q0 );
at any rate, while such a proof would be welcome, it could not be extremely straightfor-
ward, since there are terms in Gq (Q0 /q)G(Q) that do not appear in Gq (Q/q)G(Q0 ).
244 CHAPTER 12. THE `2 NORM AND THE LARGE SIEVE
Chapter 13

The integral over the minor arcs

The time has come to bound the part of our triple-product integral (10.3) that comes
from the minor arcs m ⊂ R/Z. We have an `∞ estimate (from Prop. 11.2.3, based on
Theorem 3.1.1) and an `2 estimate (from §12.2). Now we must put them together.
There
R are two ways in which we must be careful. A trivial bound of the form
`33 = |S(α)|3 dα ≤ `22 · `∞ would introduce a fatal factor of log x coming from `2 .
We avoid this by using the fact that we have `2 estimates over Mδ0 ,Q0 for varying Q0 .
We must also remember to substract the major-arc contribution from our estimate
for Mδ0 ,Q0 ; this is why we were careful to give a lower bound in Lem. 10.3.1, as
opposed to just the upper bound (10.28).

13.1 Putting together `2 bounds over arcs and `∞ bounds


Let us start with a simple lemma – essentially a way to obtain upper bounds by means
of summation by parts.

Lemma 13.1.1. Let f, g : {a, a + 1, . . . , b} → R+ +


0 , where a, b ∈ Z . Assume that, for
all x ∈ [a, b],
X
f (n) ≤ F (x), (13.1)
a≤n≤x

where F : [a, b] → R is continuous, piecewise differentiable and non-decreasing. Then

b
X Z b
f (n) · g(n) ≤ (max g(n)) · F (a) + (max g(n)) · F 0 (u)du.
n≥a a n≥u
n=a

Pn
Proof. Let S(n) = m=a f (m). Then, by partial summation,

b
X b−1
X
f (n) · g(n) ≤ S(b)g(b) + S(n)(g(n) − g(n + 1)). (13.2)
n=a n=a

245
246 CHAPTER 13. THE INTEGRAL OVER THE MINOR ARCS

Let h(x) = maxx≤n≤b g(n). Then h is non-increasing. Hence (13.1) and (13.2) imply
that
Xb Xb
f (n)g(n) ≤ f (n)h(n)
n=a n=a
b−1
X
≤ S(b)h(b) + S(n)(h(n) − h(n + 1))
n=a
b−1
X
≤ F (b)h(b) + F (n)(h(n) − h(n + 1)).
n=a
P
In general, for αn ∈ C, A(x) = a≤n≤x αn and F continuous and piecewise differ-
entiable on [a, x],
X Z x
αn F (x) = A(x)F (x) − A(u)F 0 (u)du. (Abel summation) (13.3)
a≤n≤x a

P
Applying this with αn = h(n) − h(n + 1) and A(x) = a≤n≤x αn = h(a) − h(bxc +
1), we obtain
b−1
X
F (n)(h(n) − h(n + 1))
n=a
Z b−1
= (h(a) − h(b))F (b − 1) − (h(a) − h(buc + 1))F 0 (u)du
a
Z b−1
= h(a)F (a) − h(b)F (b − 1) + h(buc + 1)F 0 (u)du
a
Z b−1
= h(a)F (a) − h(b)F (b − 1) + h(u)F 0 (u)du
a
Z b
= h(a)F (a) − h(b)F (b) + h(u)F 0 (u)du,
a

since h(buc + 1) = h(u) for u ∈


/ Z. Hence
b
X Z b
f (n)g(n) ≤ h(a)F (a) + h(u)F 0 (u)du.
n=a a

We will now seeR our main application of Lemma 13.1.1. We have to bound an
integral of the form Mδ ,r |S1 (α)|2 |S2 (α)|dα, where Mδ0 ,r is a union of arcs defined
0
as in (10.5). Our inputs are (a) a bound on integrals of the form Mδ ,r |S1 (α)|2 dα, (b)
R
0
a bound on |S2 (α)| for α ∈ (R/Z) \ Mδ0 ,r . The input of type (a) is what we derived in
§12.1 and §12.2; the input of type (b) is a minor-arcs bound, and as such was the main
subject of Part I.
13.1. PUTTING TOGETHER `2 BOUNDS OVER ARCS AND `∞ BOUNDS 247

Proposition 13.1.2. Let S1 (α) = n an e(αn), an ∈ C, {an } in L1 . Let S2 : R/Z →


P
C be continuous. Define Mδ0 ,r as in (10.5).
Let r0 be a positive integer not greater than r1 . Let H : [r0 , r1 ] → R+ be a
continuous, piecewise differentiable, non-decreasing function such that
Z
1
P |S1 (α)|2 dα ≤ H(r) (13.4)
|an |2 Mδ0 ,r+1

for some δ0 ≤ x/2r12 and all r ∈ [r0 , r1 ]. Assume, moreover, that H(r1 ) = 1. Let
g : [r0 , r1 ] → R+ be a non-increasing function such that

max |S2 (α)| ≤ g(r) (13.5)


α∈(R/Z)\Mδ0 ,r

for all r ∈ [r0 , r1 ] and δ0 as above.


Then Z
1
P 2
|S1 (α)|2 |S2 (α)|dα
n |an | (R/Z)\Mδ0 ,r0
Z r1 (13.6)
≤ g(r0 ) · (H(r0 ) − I0 ) + g(r)H 0 (r)dr,
r0

where Z
1
I0 = P 2
|S1 (α)|2 dα. (13.7)
n |an | Mδ0 ,r0

The condition δ0 ≤ x/2r12 is there just to ensure that the arcs in the definition of
Mδ0 ,r do not overlap for r ≤ r1 .

Proof. For r0 ≤ r < r1 , let


Z
1
f (r) = P 2
|S1 (α)|2 dα.
n |an | Mδ0 ,r+1 \Mδ0 ,r

Let Z
1
f (r1 ) = P 2
|S1 (α)|2 dα.
n |an | (R/Z)\Mδ0 ,r1

Then, by (13.5),
Z r1
1 X
P 2
|S1 (α)|2 |S2 (α)|dα ≤ f (r)g(r).
n |an | (R/Z)\Mδ0 ,r0 r=r0

By (13.4),
Z
X 1
f (r) = P 2
|S1 (α)|2 dα
r0 ≤r≤x n |an | Mδ0 ,x+1 \Mδ0 ,r0
Z ! (13.8)
1 2
= P 2
|S1 (α)| dα − I0 ≤ H(x) − I0
n |an | Mδ0 ,x+1
248 CHAPTER 13. THE INTEGRAL OVER THE MINOR ARCS

for x ∈ [r0 , r1 ). Moreover,


Z
X 1
f (r) = P 2
|S1 (α)|2
r0 ≤r≤r1 n |an | (R/Z)\Mδ0 ,r0
Z !
1 2
= P 2
|S1 (α)| − I0 = 1 − I0 = H(r1 ) − I0 .
n |an | R/Z

We let F (x) = H(x) − I0 and apply Lemma 13.1.1 with a = r0 , b = r1 . We


obtain that
r1
X Z r1
f (r)g(r) ≤ (max g(r))F (r0 ) + (max g(r))F 0 (u) du
r≥r0 r0 r≥u
r=r0
Z r1
≤ g(r0 )(H(r0 ) − I0 ) + g(u)H 0 (u) du.
r0

13.2 The minor-arc total


We now apply Prop. 13.1.2. Inevitably, the main statement involves some integrals that
will have to be evaluated at the end of the section.

Theorem 13.2.1. Let x ≥ 1025 · κ, where κ ≥ 1. Let


X
Sη (α, x) = Λ(n)e(αn)η(n/x). (13.9)
n

Let η∗ (t) = (η2 ∗M ϕ)(κt), where η2 is as in (11.10) and ϕ : [0, ∞) → [0, ∞) is


continuous and in `1 . Let η+ : [0, ∞) → [0, ∞) be a bounded, piecewise differentiable
function with limt→∞ η+ (t) = 0. Let Mδ0 ,r be as in (10.5) with δ0 = 8. Let 105 ≤
r0 < r1 , where r1 = (3/8)(x/κ)4/15 . Let g(r) = gx/κ,ϕ (r), where
p
(Ry,K,ϕ,2r log 2r + 0.5) z(r) + 2.5 L2r
gy,ϕ (r) = √ + + 3.36K 1/6 y −1/6 , (13.10)
2r r

just as in (11.19), and K = log(x/κ)/2. Here Ry,K,φ,t is as in (11.19), and Lt is as


in (11.13).
Denote Z
Z r0 = |Sη∗ (α, x)||Sη+ (α, x)|2 dα.
(R/Z)\M8,r0

Then
r !2
|ϕ|1 x
q
Z r0 ≤ (M + T ) + Sη∗ (0, x) · E ,
κ
13.2. THE MINOR-ARC TOTAL 249

where
X
S= (log p)2 η+
2
(n/x),

p> x
√ √
 
1 x
T = Cϕ,3 log · (S − ( J − E)2 ),
2 κ (13.11)
Z
J= |Sη+ (α, x)|2 dα,
M8,r0

E = (Cη+ ,0 + Cη+ ,2 ) log x + (2Cη+ ,0 + Cη+ ,1 ) · x1/2 ,




Z ∞
1
Cη+ ,0 = 0.7131 √ (sup η+ (r))2 dt,
0 t r≥t
Z ∞
log t
Cη+ ,1 = 0.7131 √ (sup η+ (r))2 dt,
1 t r≥t (13.12)
Cη+ ,2 = 0.51942|η+ |2∞ ,
Z 1/K
1.04488
Cϕ,3 (K) = |ϕ(w)|dw
|ϕ|1 0

and

log(r0 + 1) + c+ √ √ 2
 
M = g(r0 ) · √ · S − ( J − E)
log x + c−
Z r1 8
−2.14938 + 15
   
2 g(r) 7 log κ
+ dr + + g(r1 ) · S
log x + 2c− r0 r 15 log x + 2c−
(13.13)
where c+ = 2.0532 and c− = 0.6394.

Proof. Let y = x/κ. Let Q = (3/4)y 2/3 , as in Thm. 3.1.1 (applied with y instead
of x). Let α ∈ (R/Z) \ M8,r , where r0 ≤ r ≤ y 1/3 /6 and y is used instead of
x to define M8,r (see (10.5)). There exists an approximation 2α = a/q + δ/y with
q ≤ Q, |δ|/y ≤ 1/qQ. Thus, α = a0 /q 0 + δ/2y, where either a0 /q 0 = a/2q or
a0 /q 0 = (a + q)/2q holds. (In particular, if q 0 is odd, then q 0 = q; if q 0 is even, then
q 0 = 2q.)
There are three cases:

1. q ≤ r. Then either (a) q 0 is odd and q 0 ≤ r or (b) q 0 is even and q 0 ≤ 2r.


Since α is not in M8,r , then, by definition (10.5), |δ|/2y ≥ δ0 r/2qy, and so
|δ| ≥ δ0 r/q = 8r/q. In particular, |δ| ≥ 8.
Thus, by Prop. 11.2.3,
 
|δ|
|Sη∗ (α, x)| = |Sη2 ∗M φ (α, y)| ≤ gy,ϕ q ·|ϕ|1 y ≤ gy,ϕ (r)·|ϕ|1 y, (13.14)
8

where we use the fact that g(r) is a non-increasing function (Lemma 11.2.4).
250 CHAPTER 13. THE INTEGRAL OVER THE MINOR ARCS

2. r < q ≤ y 1/3 /6. Then, by Prop. 11.2.3 and Lemma 11.2.4,


   
|δ|
|Sη∗ (α, x)| = |Sη2 ∗M φ (α, y)| ≤ gy,ϕ max , 1 q · |ϕ|1 y
8 (13.15)
≤ gy,ϕ (r) · |ϕ|1 y.

3. q > y 1/3 /6. Again by Prop. 11.2.3,


 y 
|Sη∗ (α, x)| = |Sη2 ∗M φ (α, y)| ≤ h + Cϕ,3 (K) |ϕ|1 y, (13.16)
K

where h(x) is as in (11.15). (Of course, Cϕ,3 (K), as in (13.12), is equal to


Cϕ,0,K /|φ|1 , where Cϕ,0,K is as in (11.21).) We set K = (log y)/2. Since
y = x/κ ≥ 1025 , it follows that y/K = 2y/ log y > 3.47 · 1023 > 2.16 · 1020 .

Let
(
3 gy,ϕ (r) if r ≤ r1 ,
r1 = y 4/15 , g(r) =
8 gy,ϕ (r1 ) if r > r1 .

By Lemma 11.2.4, for r ≥ 670, g(r) is a non-increasing function and g(r) ≥ gy,φ (r).
Moreover, by Lemma 11.2.5, gy,φ (r1 ) ≥ h(2y/ log y), where h is as in (11.15), and so
g(r) ≥ h(2y/ log y) for all r ≥ r0 ≥ 670. Thus, we have shown that
  
log y
|Sη∗ (y, α)| ≤ g(r) + Cϕ,3 · |ϕ|1 y (13.17)
2

for all α ∈ (R/Z) \ M8,r .


We first need to undertake the fairly dull task of getting non-prime or small n out
of the sum defining Sη+ (α, x). Write
X
S1,η+ (α, x) = (log p)e(αp)η+ (p/x),

p> x
X X
S2,η+ (α, x) = Λ(n)e(αn)η+ (n/x) + Λ(n)e(αn)η+ (n/x).

n non-prime n≤ x

n> x

By the triangle inequality (with weights |Sη+ (α, x)|),


sZ
|Sη∗ (α, x)||Sη+ (α, x)|2 dα
(R/Z)\M8,r0
2
sZ
X
≤ |Sη∗ (α, x)||Sj,η+ (α, x)|2 dα.
j=1 (R/Z)\M8,r0
13.2. THE MINOR-ARC TOTAL 251

Clearly,
Z
|Sη∗ (α, x)||S2,η+ (α, x)|2 dα
(R/Z)\M8,r0
Z
≤ max |Sη∗ (α, x)| · |S2,η+ (α, x)|2 dα
α∈R/Z R/Z
 

X X X
≤ Λ(n)η∗ (n/x) ·  Λ(n)2 η+ (n/x)2 + Λ(n)2 η+ (n/x)2  .

n=1 n non-prime n≤ x

Let η+ (z) = supt≥z η+ (t). Since η+ (t) tends to 0 as t → ∞, so does η+ . By [RS62,


Thm. 13], partial summation and integration by parts,
X X
Λ(n)2 η+ (n/x)2 ≤ Λ(n)2 η+ (n/x)2
n non-prime n non-prime
 
Z ∞ X 0
Λ(n)2  2
 
≤− 
  η+ (t/x) dt
1 n≤t
n non-prime
Z ∞ √ 0
≤− (log t) · 1.4262 t η+ 2 (t/x) dt
1
Z ∞
log e2 t
 
2 t
≤ 0.7131 √ · η+ dt
1 t x
Z ∞ !
2 + log tx 2 √
= 0.7131 √ η+ (t)dt x,
1/x t

while, by [RS62, Thm. 12],


X 1 X
Λ(n)2 η+ (n/x)2 ≤ |η+ |2∞ (log x) Λ(n)
√ 2 √
n≤ x n≤ x

≤ 0.51942|η+ |2∞ · x log x.

This shows that


Z ∞
X
|Sη∗ (α, x)||S2,η+ (α, x)|2 dα ≤ Λ(n)η∗ (n/x) · E = Sη∗ (0, x) · E,
(R/Z)\M8,r0 n=1

where E is as in (13.11).
It remains to bound
Z
|Sη∗ (α, x)||S1,η+ (α, x)|2 dα. (13.18)
(R/Z)\M8,r0

We wish to apply Prop. 13.1.2. Corollary 12.2.5 gives us an input of type (13.4); we
have just derived a bound (13.17) that provides an input of type (13.5). More precisely,
252 CHAPTER 13. THE INTEGRAL OVER THE MINOR ARCS

by (12.42), (13.4) holds with


log(r+1)+c+
(

log x+c−
if r < r1 ,
H(r) =
1 if r ≥ r1 ,

where c+ = 2.0532 > log 2 + 1.36 and c− = 0.6394 < log(1/ 2 · 8) + log 2 +
1.3325822.
p (We can apply Corollary 12.2.5 because 2(r1 + 1) = (3/4)x4/15 + 2 ≤
(2 x/16) for x ≥ 1025 (or even for x ≥ 100000).) Since r1 = (3/8)y 4/15 and
0.6

x ≥ 1025 · κ,

log((3/8)(x/κ)4/15 + 1) + c+
lim+ H(r) − lim− H(r) = 1 − √
r→r1 r→r1 log x + c−
8 −
4/15 log 83 + c+ − 15
4
log κ − 15

c
≤1− + √ −
1/2 log x + c
8
7 −2.14938 + 15 log κ
≤ + −
.
15 log x + 2c

We also have (13.5) with


  
log y
g(r) + Cϕ,3 · |ϕ|1 y (13.19)
2

instead of g(r) (by (13.17)). Here (13.19) is a non-increasing function of r because


g(r) is, as we already checked. Hence, Prop. 13.1.2 gives us that (13.18) is at most
 
log y
g(r0 )·(H(r0 ) − I0 ) + (1 − I0 ) · Cϕ,3
2
Z r1 8
−2.14938 + 15
 
1 g(r) 7 log κ
+ √ dr + + g(r1 )
log x + c− r0 r + 1 15 log x + 2c−
(13.20)
times |ϕ|1 y · p>√x (log p)2 η+ 2
P
(p/x), where
Z
1
I0 = P √ (log p)2 η 2 (n/x)
|S1,η+ (α, x)|2 dα. (13.21)
p> x + M8,r0

By the triangle inequality,


sZ sZ
|S1,η+ (α, x)|2 dα = |Sη+ (α, x) − S2,η+ (α, x)|2 dα
M8,r0 M8,r0
sZ sZ
≥ |Sη+ (α, x)|2 dα − |S2,η+ (α, x)|2 dα
M8,r0 M8,r0
sZ sZ
≥ |Sη+ (α, x)|2 dα − |S2,η+ (α, x)|2 dα.
M8,r0 R/Z
13.2. THE MINOR-ARC TOTAL 253

As we already showed,
Z X
|S2,η+ (α, x)|2 dα = Λ(n)2 η+ (n/x)2 ≤ E.
R/Z n non-prime

or n ≤ x

Thus, √ √
I0 · S ≥ ( J − E)2 ,
and so we are done.

R r1 g(r)
We now should estimate the integral r0 r dr in (13.13). It is easy to see that
Z ∞ Z ∞ Z ∞
1 2 log r log er0 1 1
3/2
dr = 1/2
, 2
dr = , 2
dr = ,
r0 r r0 r0 r r 0 r0 r r0
Z r1 Z ∞ 2 Z ∞
1 r1 log r 2 log e r0 log 2r 2 log 2e2 r0
dr = log , 3/2
dr = √ , dr = √ ,
r0 r r0 r0 r r0 r0 r3/2 r0
Z ∞ Z ∞
(log 2r)2 2P2 (log 2r0 ) (log 2r)3 2P3 (log 2r0 )
3/2
dr = √ , 3/2
dr = 1/2
,
r0 r r 0 r0 r r0
(13.22)
where

P2 (t) = t2 + 4t + 8, P3 (t) = t3 + 6t2 + 24t + 48. (13.23)

We also have Z ∞
dr
= E1 (log r0 ) (13.24)
r0 r2 log r
where E1 is the exponential integral

e−t
Z
E1 (z) = dt.
z t

We must also estimate the integrals


Z r1 p Z r1 Z r1 Z r1
z(r) z(r) z(r) log r z(r)
dr, 2
dr, dr, dr, (13.25)
r0 r 3/2
r0 r r0 r2 r0 r3/2

Clearly, z(r) − eγ log log r = 2.50637/ log log r is decreasing on r. Hence, for
r ≥ 105 ,
z(r) ≤ eγ log log r + cγ ,
where cγ = 1.025742. Let F (t) = eγ log t + cγ . Then F 00 (t) = −eγ /t2 < 0. Hence
p
d2 F (t) F 00 (t) (F 0 (t))2
= − <0
dt2 2 F (t) 4(F (t))3/2
p
254 CHAPTER 13. THE INTEGRAL OVER THE MINOR ARCS
p p
for all t > 0. In other words, F (t) is convex-down, and so we can bound F (t)
p √ 0
from above by F (t0 ) + F (t0 ) · (t − t0 ), for any t ≥ t0 > 0. Hence, for r ≥ r0 ≥
105 , p
p p p d F (t) r
z(r) ≤ F (log r) ≤ F (log r0 ) + |t=log r0 · log
dt r0
p eγ log rr0
= F (log r0 ) + p · .
F (log r0 ) 2 log r0
Thus, by (13.22),
Z ∞p

 
z(r) p 1
dr ≤ F (log r0 ) 2 − √
r0 r3/2 F (log r0 ) r0
eγ log e2 r0
+p √ (13.26)
F (log r0 ) log r0 r0
p

 
2 F (log r0 )
= √ 1+ .
r0 F (log r0 ) log r0
The other integrals in (13.25) are easier. Just as in (13.26), we extend the range of
integration to [r0 , ∞]. Using (13.22) and (13.24), we obtain
Z ∞ Z ∞  
z(r) F (log r) γ log log r0 cγ
2
dr ≤ 2
dr = e + E 1 (log r0 ) + ,
r r r0 r r0 r0
Z ∞ 0  
z(r) log r (1 + log r0 ) log log r0 + 1 cγ log er0
2
dr ≤ eγ + E1 (log r0 ) + ,
r0 r r 0 r0
By [OLBC10, (6.8.2)],
1 1
≤ E1 (log r) ≤ .
r(log r + 1) r log r
(The second inequality is obvious.) Hence
Z ∞
z(r) eγ (log log r0 + 1/ log r0 ) + cγ
2
dr ≤ ,
r0 r r0
 
γ 1
Z ∞
z(r) log r e log log r0 + log r0 + cγ
2
dr ≤ · log er0 .
r0 r r0
Finally,
Z ∞   
z(r) γ 2 log log r0 log r0 2cγ
3/2
≤e √ + 2E1 +√
r0 r r0 2 r0
 γ
 (13.27)
2 2e
≤√ F (log r0 ) + .
r0 log r0
It is time to estimate
r1
Z p
Rz,2r log 2r z(r)
dr, (13.28)
r0 r3/2
13.2. THE MINOR-ARC TOTAL 255

where z = y or z = y/((log y)/2) (and y = x/κ, as before), and where Rz,t is as


defined in (11.13). By Cauchy-Schwarz, (13.28) is at most

sZ sZ
r1 r1
(Rz,2r log 2r)2 z(r)
dr · dr. (13.29)
r0 r3/2 r0 r3/2

We have already bounded the second integral. Let us look at the first one. We can write

Rz,t = 0.27125Rz,t + 0.41415, where

!
◦ log 4t
Rz,t = log 1 + 9z 1/3
. (13.30)
2 log 2.004t

Clearly,
!
◦ t/2
Rz,et /4 = log 1 + 36z 1/3
.
log 2.004 −t

Now, for f (t) = log(c + at/(b − t)) and t ∈ [0, b),

ab −ab((a − 2c)(b − 2t) − 2ct)


f 0 (t) =   , f 00 (t) =  2 .
at
c+ b−t (b − t)2 at
c + b−t (b − t)4

In our case, a = 1/2, c = 1 and b = log 36z 1/3 − log(2.004) > 0. Hence, for t < b,

   
b 3 b 3
−ab((a − 2c)(b − 2t) − 2ct) = 2t + (b − 2t) = b − t > 0,
2 2 2 2

and so f 00 (t) > 0. In other words, t → Rz,e◦


t /4 is convex-up for t < b, i.e., for

e /4 < 9z /2.004. It is easy to check that, since we are assuming y ≥ 1025 ,


t 1/3

1/3
9z 1/3

3 4/15 9 2y
2r1 = y < ≤ .
16 2.004 log y 2.004


We conclude that r → Rz,2r is convex-up on log 8r for r ≤ r1 , and hence so is
2
r → Rz,r , and so, in turn, is r → Rz,r . Thus, for r ∈ [r0 , r1 ],

2 2 log r1 /r 2 log r/r0


Rz,2r ≤ Rz,2r · + Rz,2r · . (13.31)
0
log r1 /r0 1
log r1 /r0
256 CHAPTER 13. THE INTEGRAL OVER THE MINOR ARCS

Therefore, by (13.22),

r1
(Rz,2r log 2r)2
Z
dr
r0 r3/2
Z r1  
2 log r1 /r 2 log r/r0 dr
≤ Rz,2r + R z,2r1 (log 2r)2 3/2
r0
0
log r1 /r0 log r 1 /r0 r
2
2Rz,2r
  
0
P2 (log 2r0 ) P2 (log 2r1 ) P3 (log 2r0 ) P3 (log 2r1 )
= √ − √ log 2r1 − √ + √
log rr10 r0 r1 r0 r1
2
2Rz,2r
   
1
P3 (log 2r0 ) P3 (log 2r1 ) P2 (log 2r0 ) P2 (log 2r1 )
+ √ − √ − √ − √ log 2r0
log rr10 r0 r1 r0 r1
!  
2 log 2r0 2 2 P2 (log 2r0 ) P2 (log 2r1 )
= 2 Rz,2r0 − (Rz,2r1 − Rz,2r0 ) · √ − √
log rr10 r0 r1
2 2
Rz,2r − Rz,2r
 
1 0
P3 (log 2r0 ) P3 (log 2r1 )
+2 r1 √ − √
log r0 r0 r1
 
2 P2 (log 2r0 ) P2 (log 2r1 )
= 2Rz,2r · √ − √
0
r0 r1
2 2  −
Rz,2r1 − Rz,2r0 P2 (log 2r0 ) P3 (log 2r1 ) − (log 2r0 )P2 (log 2r1 )

+2 √ − √ ,
log rr01 r0 r1
(13.32)
where P2 (t) and P3 (t) are as in (13.23), and P2− (t) = P3 (t)−tP2 (t) = 2t2 +16t+48.

Putting all terms together, we conclude that

Z r1
g(r)
dr ≤ f0 (r0 , y) + f1 (r0 ) + f2 (r0 , y), (13.33)
r0 r

where

 s
p q 2
f0 (r0 , y) = (1 − cϕ ) I0,r0 ,r1 ,y + cϕ I0,r0 ,r1 , 2y √ I1,r0
log y r0
p

 
F (log r0 ) 5
f1 (r0 ) = √ 1+ +√
2r0 F (log r0 ) log r0 2r0 (13.34)
  
1 13
+ log er0 + 11.07 Jr0 + 13.66 log er0 + 37.55
r0 4
((log y)/2)1/6 r1
f2 (r0 , y) = 3.36 log ,
y 1/6 r0
13.2. THE MINOR-ARC TOTAL 257

where F (t) = eγ log t + cγ , cγ = 1.025742, y = x/κ (as usual),


 
2 P2 (log 2r0 ) P2 (log 2r1 )
I0,r0 ,r1 ,z = Rz,2r · √ − √
0
r0 r1
2 2  −
Rz,2r1 − Rz,2r0 P2 (log 2r0 ) P3 (log 2r1 ) − (log 2r0 )P2 (log 2r1 )

+ √ − √
log rr10 r0 r1
eγ 2eγ Cϕ,2, log y /|ϕ|1
2
Jr = F (log r) + , I1,r = F (log r) + , cϕ = log y
log r log r log 2
(13.35)
and Cϕ,2,K is as in (11.20).
Let us recapitulate briefly. The term f2 (r0 , y) in (13.34) comes from the term
3.36x−1/16 in (11.12).
p √ term f1 (r0 , y) includes all other terms in (11.12), except
The
Rx,2r log 2r z(r)/( 2r). The contribution of √
for √ that last term is (13.28), divided
by 2. That, in turn, is at most (13.29), divided by 2. The first integral in (13.29)
was bounded in (13.32); the second integral was bounded in (13.27).
258 CHAPTER 13. THE INTEGRAL OVER THE MINOR ARCS
Chapter 14

Conclusion

We now need to gather all results, using the smoothing functions

η∗ = (η2 ∗M ϕ)(κt),
2
where ϕ(t) = t2 e−t /2
, η2 = η1 ∗M η1 and η1 = 2 · I[−1/2,1/2] , and
2
η+ = h200 (t)te−t /2
,

where
Z ∞
dy
hH (t) = h(ty −1 )FH (y) ,
0 y
(
2 3 t−1/2
t (2 − t) e if t ∈ [0, 2], sin(H log y)
h(t) = FH (t) = .
0 otherwise, π log y

We studied η∗ and η+ in Part II. We saw η∗ in Thm. 13.2.1 (which actually works for
general ϕ : [0, ∞) → [0, ∞), as its statement says). We will set κ soon.
We fix a value for r, namely, r = 150000. Our results will have to be valid for any
x ≥ x+ , where x+ is fixed. We set x+ = 4.9 · 1026 , since we want a result valid for
N ≥ 1027 , and, as was discussed in (11.1), we will work with x+ slightly smaller than
N/2.

14.1 The `2 norm over the major arcs: explicit version


We apply Lemma 10.3.1 with η = η+ and η◦ as in (11.3). Let us first work out the
error terms defined in (10.27). Recall that δ0 = 8. By Thm. 7.1.4,

ETη+ ,δ0 r/2 = max | errη,χT (δ, x)|


|δ|≤δ0 r/2
251400 (14.1)
= 4.772 · 10−11 + √ ≤ 1.1405 · 10−8 ,
x+

259
260 CHAPTER 14. CONCLUSION

Eη+ ,r,δ0 = max q ∗ | errη+ ,χ∗ (δ, x)|
χ mod q
q≤r·gcd(q,2)
|δ|≤gcd(q,2)δ0 r/2q
√ 1.617 · 10−10 1  √ 
≤ 1.3482 · 10−14 300000 + √ +√ 499900 + 52 300000
2 x+
≤ 2.3992 · 10−8 ,
(14.2)
where, in the latter case, we are using the fact that a stronger bound for q = 1 (namely,
(14.1)) allows us to assume q ≥ 2.
We also need to bound a few norms: by the estimates in §A.3 and §A.5 (applied
with H = 200),
274.8569
|η+ |1 ≤ 1.062319, |η+ |2 ≤ 0.800129 + ≤ 0.800132
2007/2
4 (14.3)
1+ πlog H
|η+ |∞ ≤ 1 + 2.06440727 · ≤ 1.079955.
H
By (10.12) and (14.1),


|Sη+ (0, x)| = ηc
+ (0) · x + O errη+ ,χT (0, x) · x
≤ (|η+ |1 + ETη+ ,δ0 r/2 )x ≤ 1.063x.

This is far from optimal, but it will do, since all we wish to do with this is to bound the
tiny error term Kr,2 in (10.27):

Kr,2 = (1 + 300000)(log x)2 · 1.079955

· (2 · 1.06232 + (1 + 300000)(log x)2 1.079955/x)
≤ 1259.06(log x)2 ≤ 9.71 · 10−21 x

for x ≥ x+ . By (14.1), we also have


!!
ETη+ , δ0 r
5.19δ0 r ETη+ , δ0 r · |η+ |1 + 2
≤ 0.075272
2 2

and  
δ0 r(log 2e2 r) Eη2+ ,r,δ0 + Kr,2 /x ≤ 1.00393 · 10−8 .

By (A.23) and (A.26),

0.8001287 ≤ |η◦ |2 ≤ 0.8001288 (14.4)

and
274.856893
|η+ − η◦ |2 ≤ ≤ 2.42942 · 10−6 . (14.5)
H 7/2
(3) (2)
We bound |η◦ |1 using the fact that (as we can tell by taking derivatives) η◦ (t)
(2)
increases from 0 at t = 0 to a maximum within [0, 1/2], and then decreases to η◦ (1) =
14.2. THE TOTAL MAJOR-ARC CONTRIBUTION 261

−7, only to increase to a maximum within [3/2, 2] (equal to the maximum attained
within [0, 1/2]) and then decrease to 0 at t = 2:
(3) (2) (2) (2)
|η◦ |1 = 2 max η◦ (t) − 2η◦ (1) + 2 max η◦ (t)
t∈[0,1/2] t∈[3/2,2]
(2)
(14.6)
= 4 max η◦ (t) + 14 ≤ 4 · 4.6255653 + 14 ≤ 32.5023,
t∈[0,1/2]

where we compute the maximum by the bisection method with 30 iterations (using
interval arithmetic, as always).
We evaluate explicitly
X µ2 (q)
= 6.798779 . . . ,
φ(q)
q≤r
q odd

using, yet again, interval arithmetic.


Looking at (10.29) and (10.28), we conclude that
Lr,δ0 ≤ 2 · 6.798779 · 0.8001322 ≤ 8.70531,
Lr,δ0 ≥ 2 · 6.798779 · 0.80012872 − ((log r + 1.7) · (3.888 · 10−6 + 5.91 · 10−12 ))
 
log r 0.425
− 1.342 · 10−5 · 0.64787 +

+ ≥ 8.70517.
4r r
Lemma 10.3.1 thus gives us that
Z
2
Sη+ (α, x) dα = (8.70524 + O∗ (0.00007))x + O∗ (0.075273)x
M8,r0 (14.7)

= (8.7052 + O (0.0754))x ≤ 8.7806x.

14.2 The total major-arc contribution


First of all, we must bound from below
Y 1
 Y
1

C0 = 1− · 1+ . (14.8)
(p − 1)2 (p − 1)3
p|N p-N

The only prime that we know does not divide N is 2. Thus, we use the bound
Y 1

C0 ≥ 2 1− ≥ 1.3203236. (14.9)
p>2
(p − 1)2

The other main constant is Cη◦ ,η∗ , which we defined in (10.37) and already started
to estimate in (11.6):
Z Nx Z Nx !
2 0 2 ∗ 2
Cη◦ ,η∗ = |η◦ |2 η∗ (ρ)dρ + 2.71|η◦ |2 · O ((2 − N/x) + ρ) η∗ (ρ)dρ
0 0
(14.10)
262 CHAPTER 14. CONCLUSION

2
provided that N ≥ 2x. Recall that η∗ = (η2 ∗M ϕ)(κt), where ϕ(t) = t2 e−t /2
.
Therefore,
Z N/x Z N/x Z 1 Z N/x  
κρ dw
η∗ (ρ)dρ = (η2 ∗ ϕ)(κρ)dρ = η2 (w) ϕ dρ
0 0 1/4 0 w w
Z 1 Z ∞
|η2 |1 |ϕ|1 1
= − η2 (w) ϕ(ρ)dρdw.
κ κ 1/4 κN/xw

By integration by parts and [AS64, (7.1.13)],


Z ∞ √ Z ∞ −t2
 
2 1 2
ϕ(ρ)dρ = ye−y /2 + 2 √ e dt < y + e−y /2 .
y y/ 2 y
Hence Z ∞ Z ∞  
1 2
ϕ(ρ)dρ ≤ ϕ(ρ)dρ < 2κ + e−2κ
κN/xw 2κ 2κ
and so, since |η2 |1 = 1,
Z N/x Z 1  
|ϕ|1 1 2
η∗ (ρ)dρ ≥ − η2 (w)dw · 2 + 2
e−2κ
0 κ 1/4 2κ
  (14.11)
|ϕ|1 1 2
≥ − 2+ 2
e−2κ .
κ 2κ
Let us now focus on the second integral in (14.10). Write N/x = 2 + c1 /κ. Then
the integral equals
Z 2+c1 /κ Z ∞
1
(−c1 /κ + ρ)2 η∗ (ρ)dρ ≤ 3 (u − c1 )2 (η2 ∗M ϕ)(u) du
0 κ 0
Z 1 Z ∞
1
= 3 η2 (w) (vw − c1 )2 ϕ(v)dvdw
κ 1/4 0
Z 1  r r 
1 π 2 2 π
= 3 η2 (w) 3 w − 2 · 2c1 w + c1 dw
κ 1/4 2 2
 r r 
1 49 π 9 π 2
= 3 − c1 + c1 .
κ 48 2 4 2

It is thus best to choose c1 = (9/4)/ 2π = 0.89762 . . . .
We must now estimate |η◦0 |22 . We could do this directly by rigorous numerical
integration, but we might as well do it the hard way (which is actually rather easy). By
the definition (11.3) of η◦ ,
2
|η◦0 (x + 1)|2 = x14 − 18x12 + 111x10 − 284x8 + 351x6 − 210x4 + 49x2 e−x


(14.12)
for x ∈ [−1, 1], and η◦0 (x + 1) = 0 for x 6∈ [−1, 1]. Now, for any even integer k > 0,
Z 1 Z 1  
2 2 k+1
xk e−x dx = 2 xk e−x dx = γ ,1 ,
−1 0 2
14.2. THE TOTAL MAJOR-ARC CONTRIBUTION 263

Rr
where γ(a, r) = 0 e−t ta−1 dt is the incomplete gamma function. (We substitute
t = x2 in the integral.) By [AS64,
√ (6.5.16), (6.5.22)], γ(a + 1, 1) = aγ(a, 1) − 1/e for
all a > 0, and γ(1/2, 1) = π erf(1), where
Z 1
2 2
erf(z) = √ e−t dt.
π 0

Thus, starting from (14.12), we see that


     
0 2 15 13 11
|η◦ |2 = γ , 1 − 18 · γ , 1 + 111 · γ ,1
2 2 2
       
9 7 5 3
− 284 · γ , 1 + 351 · γ , 1 − 210 · γ , 1 + 49 · γ ,1
2 2 2 2
9151 √ 18101
= π erf(1) − = 2.7375292 . . . .
128 64e
(14.13)
We thus obtain
Z N
x
2.71|η◦0 |22 · ((2 − N/x) + ρ)2 η∗ (ρ)dρ
0
1 49 π (9/4)2
 r 
2.0002
≤ 7.4188 · 3 − √ ≤ .
κ 48 2 2 2π κ3

We conclude that
 
1 1 2 2.0002
Cη◦ ,η∗ 2 2
≥ |ϕ|1 |η◦ |2 − |η◦ |2 2 + e−2κ − .
κ 2κ 2 κ3

Setting
κ = 49
and using (14.4), we obtain

1
(|ϕ|1 |η◦ |22 − 0.000834).
Cη◦ ,η∗ ≥ (14.14)
κ

Here it is useful to note that |ϕ|1 = π2 , and so, by (14.4), |ϕ|1 |η◦ |22 = 0.80237 . . . .
p

We have finally chosen x in terms of N :

N N
x= c1 = = 0.495461 . . . · N. (14.15)
2+ κ 2 + √9/4 1
2π 49

Thus, we see that, since we are assuming N ≥ 1027 , we in fact have x ≥ 4.95461 . . . ·
1026 , and so, in particular,
x
x ≥ 4.9 · 1026 , ≥ 1025 . (14.16)
κ
264 CHAPTER 14. CONCLUSION

Let us continue with our determination of the major-arcs total. We should com-
pute the quantities in (10.38). We already have bounds for Eη+ ,r,δ0 , Aη+ (see (14.7)),
Lη,r,δ0 and Kr,2 . By Corollary 7.1.3, we have
√ ∗
Eη∗ ,r,8 ≤ max q | errη∗ ,χ∗ (δ, x)|
χ mod q
q≤r·gcd(q,2)
|δ|≤gcd(q,2)δ0 r/2q

 
1 −19 1  (14.17)
≤ 2.485 · 10 + √ 381500 + 76 300000
κ 1025
1.33805 · 10−8
≤ ,
κ
where the factor of κ comes from the scaling in η∗ (t) = (η2 ∗M ϕ)(κt) (which in
effect divides x by κ). It remains only to bound the more harmless terms of type Zη,2
and LSη .
2
P
Clearly, Zη+2 ,2 ≤ (1/x) n Λ(n)(log n)η+ (n/x). Now, by Prop. 7.1.5,

X
Λ(n)(log n)η 2 (n/x)
n=1
(14.18)
  
∗ −6 366.91
= 0.640206 + O 2 · 10 + √ x log x − 0.021095x
x
≤ (0.640206 + O∗ (3 · 10−6 ))x log x − 0.021095x.
Thus,
Zη+2 ,2 ≤ 0.640209 log x. (14.19)
We will proceed a little more crudely for Zη∗2 ,2 :
1X 2 1X
Zη∗2 ,2 = Λ (n)η∗2 (n/x) ≤ Λ(n)η∗ (n/x) · (η∗ (n/x) log n)
x n x n
≤ (|η∗ |1 + | errη∗ ,χT (0, x)|) · (|η∗ (t) · log+ (κt)|∞ + |η∗ |∞ log(x/κ)),
(14.20)
where log+ (t) := max(0, log t). It is easy to see that
η2 (t) 2
|η∗ |∞ = |η2 ∗M ϕ|∞ ≤ |ϕ|∞ ≤ 4(log 2)2 · ≤ 1.414, (14.21)
t 1 e

and, since log+ is non-decreasing and η2 is supported on a subset of [0, 1],

|η∗ (t) · log+ (κt)|∞ = |(η2 ∗M ϕ) · log+ |∞ ≤ |η2 ∗M (ϕ · log+ )|∞


η2 (t)
≤ · |ϕ · log+ |∞ ≤ 1.921813 · 0.381157 ≤ 0.732513
t 1

where we bound |ϕ · log+ |∞ by the bisection method with 25 iterations. We already


know that p
|η2 |1 |ϕ|1 |ϕ|1 π/2
|η∗ |1 = = = . (14.22)
κ κ κ
14.2. THE TOTAL MAJOR-ARC CONTRIBUTION 265

By Cor. 7.1.3,
1
| errη∗ ,χT (0, x)| ≤ 2.485 · 10−19 + √ (381500 + 76) ≤ 1.20665 · 10−7 .
1025
We conclude that
p
Zη∗2 ,2 ≤ ( π/2/49 + 1.20665 · 10−7 )(0.732513 + 1.414 log(x/49)) ≤ 0.0362 log x.
(14.23)
We have bounds for |η∗ |∞ and |η+ |∞ . We can also bound

|(η2 ∗M ϕ) · t|∞ |η2 |1 · |ϕ · t|∞ 33/2 e−3/2


|η∗ · t|∞ = ≤ ≤ .
κ κ κ
We quote the estimate

|η+ · t|∞ = 1.064735 + 3.25312 · (1 + (4/π) log 200)/200 ≤ 1.19073 (14.24)

from (A.42).
We can now bound LSη (x, r) for η = η∗ , η+ :
X  pα 
LSη (x, r) = log r · max η
p≤r x
α≥1
 
 log x X |η · t|∞ 
≤ (log r) · max  |η|∞ + 
p≤r  log p pα /x 
α≥1
pα ≥x
 
log x |η · t|∞
≤ (log r) · max |η|∞ +
p≤r log p 1 − 1/p
(log r)(log x)
≤ |η|∞ + 2(log r)|η · t|∞ ,
log 2
and so
(3/e)3/2
 
1.414
LSη∗ ≤ log x + 2 · log r ≤ 24.32 log x + 0.57,
log 2 49
(14.25)
 
1.07996
LSη+ ≤ log x + 2 · 1.19073 log r ≤ 18.57 log x + 28.39,
log 2
where we are using the bound on |η+ |∞ in (14.3)
We can now start to put together all terms in (10.36). Let 0 = |η+ − η◦ |2 /|η◦ |2 .
Then, by (14.5),
0 |η◦ |2 = |η+ − η◦ |2 ≤ 2.42942 · 10−6 .
Thus,
(3)
|η◦ |21
4.31004|η◦ |22 + 0.0012 δ05
2.82643|η◦ |22 (2 + 0 ) · 0 +
r
266 CHAPTER 14. CONCLUSION

is at most

2.82643 · 2.42942 · 10−6 · (2 · 0.80013 + 2.42942 · 10−6 )


32.5032
4.3101 · 0.800132 + 0.0012 · 85
+ ≤ 2.9387 · 10−5
150000

by (14.4), (14.6), and (14.22).


Since η∗ = (η2 ∗M ϕ)(κx) and η2 is supported on [1/4, 1],

|η2 ∗M ϕ|22
Z Z ∞
1 ∞  w  dt 2
|η∗ |22 = = η2 (t)ϕ dw
κ κ 0 t t
Z ∞ Z ∞ 0
1 1  w  dt
≤ 1− η22 (t)ϕ2 dw
κ 0 4 0 t t2
Z ∞ 2 Z ∞   
3 η2 (t) w dw
= ϕ2 dt
4κ 0 t 0 t t
3 √ 3 32 3√ 1.77082
= |η2 (t)/ t|22 · |ϕ|22 = · (log 2)3 · π≤ ,
4κ 4κ 3 8 κ

where we go from the first to the second line by Cauchy-Schwarz.


Recalling the bounds on Eη∗ ,r,δ0 and Eη+ ,r,δ0 we obtained in (14.2) and (14.17),
we conclude that the second line of (10.36) is at most x2 times

1.33805 · 10−8
· 8.7806 + 2.3922 · 10−8 · 1.6812
κ

r
1.77082 1.7316 · 10−6
· ( 8.7806 + 1.6812 · 0.80014) ≤ ,
κ κ

where we are using the bound Aη+ ≤ 8.7806 we obtained in (14.7). (We are also using
the bounds on norms in (14.3) and the value κ = 49.)
By the bounds (14.19), (14.23) and (14.25), we see that the third line of (10.36) is
at most

2 · (0.640209 log x) · (24.32 log x + 0.57) · x


p
+ 4 0.640209 log x · 0.0362 log x(18.57 log x + 28.39)x ≤ 43(log x)2 x,

where we use the assumption x ≥ x+ = 4.9 · 1026 (though a much weaker assumption
would suffice).
Using the assumption x ≥ x+ again, together with (14.22) and the bounds we have
just proven, we conclude that, for r = 150000, the integral over the major arcs
Z
Sη+ (α, x)2 Sη∗ (α, x)e(−N α)dα
M8,r
14.3. THE MINOR-ARC TOTAL: EXPLICIT VERSION 267

is
p !
2 ∗ π/2 2 1.7316 · 10−6 2
−5
C0 · Cη0 ,η∗ x + O 2.9387 · 10 · x + x + 43(log x)2 x
κ κ
3.85628 · 10−5 · x2
 
2 ∗
= C0 · Cη0 ,η∗ x + O
κ
= C0 · Cη0 ,η∗ x2 + O∗ (7.86996 · 10−7 x2 ),
(14.26)
where C0 and Cη0 ,η∗ are as in (10.37). Notice that C0 Cη0 ,η∗ x2 is the expected asymp-
totic for the integral over all of R/Z. p
Moreover, by (14.9), (14.14) and (14.4), as well as |ϕ|1 = π/2,

|ϕ|1 |η◦ |22


 
0.000834
C0 · Cη0 ,η∗ ≥ 1.3203236 −
κ κ
1.0594003 0.001102 1.058298
≥ − ≥ .
κ κ 49
Hence Z
1.058259 2
Sη+ (α, x)2 Sη∗ (α, x)e(−N α)dα ≥ x , (14.27)
M8,r κ
where, as usual, κ = 49. This is our total major-arc bound.

14.3 The minor-arc total: explicit version


We need to estimate the quantities E, S, T , J, M in Theorem 13.2.1. Let us start by
bounding the constants in (13.12). The constants Cη+ ,j , j = 0, 1, 2, will appear only
in the minor term E, and so crude bounds on them will do.
By (14.3) and (14.24),
 
1.19073
sup η+ (r) ≤ min 1.07996,
r≥t t

for all t ≥ 0. Thus,


Z ∞  2
1
Cη+ ,0 = 0.7131 √ sup η+ (r) dt
0 t r≥t
1 ∞
1.079962 1.190732
Z Z 
≤ 0.7131 √ dt + dt ≤ 2.33744.
0 t 1 t5/2

Similarly,
Z ∞  2
log t
Cη+ ,1 = 0.7131 √ sup η+ (r) dt
1 t r≥t

1.190732 log t
Z
≤ 0.7131 dt ≤ 0.44937.
1 t5/2
268 CHAPTER 14. CONCLUSION

Immediately from (14.3),

Cη+ ,2 = 0.51942|η+ |2∞ ≤ 0.60581.

We get

E ≤ ((2.33744 + 0.60581) log x + (2 · 2.33744 + 0.44937)) · x1/2


(14.28)
≤ (2.94325 log x + 5.12426) · x1/2 ≤ 8.4029 · 10−12 · x,

where E is defined as in (13.11), and where we are using the assumption x ≥ x+ =


4.9 · 1026 . Using (14.17) and (14.22), we see that
p x
Sη∗ (0, x) = (|η∗ |1 + O∗ (ETη∗ ,0 ))x = π/2 + O∗ (1.33805 · 10−8 ) .
κ
Hence
x2
Sη∗ (0, x) · E ≤ 1.05315 · 10−11 · . (14.29)
κ
We can bound
X
2
S≤ Λ(n)(log n)η+ (n/x) ≤ 0.640209x log x − 0.021095x (14.30)
n

2
by (14.18). Let us now estimate T . Recall that ϕ(t) = t2 e−t /2
. Since
u u u
u3
Z Z Z
2
ϕ(t)dt = t2 e−t /2
dt ≤ t2 dt = ,
0 0 0 3

we can bound
  Z 2
1 x 1.04488 log x/κ 2 −t2 /2 0.2779
Cϕ,3 log = p t e dt ≤ .
2 κ π/2 0 ((log x/κ)/2)3

By (14.7), we already know that J = (8.7052 + O∗ (0.0754))x. Hence


√ √ p √
( J − E)2 = ( (8.7052 + O∗ (0.0754))x − 8.4029 · 10−12 · x)2
(14.31)
≥ 8.6297x,

and so
√ √
 
1 x
T = Cϕ,3 log · (S − ( J − E)2 )
2 κ
8 · 0.2779
≤ · (0.640209x log x − 0.021095x − 8.6297x)
(log x/κ)3
8x log x 8x
≤ 0.17792 − 2.40405
(log x/κ)3 (log x/κ)3
x x
≤ 1.42336 − 13.69293 .
(log x/κ)2 (log x/κ)3
14.3. THE MINOR-ARC TOTAL: EXPLICIT VERSION 269

for κ = 49. Since x/κ ≥ 1025 , this implies that

T ≤ 3.5776 · 10−4 · x. (14.32)

It remains to estimate M . Let us first look at g(r0 ); here g = gx/κ,ϕ , where gy,ϕ is
2
defined as in (11.19) and φ(t) = t2 e−t /2 , as usual. Write y = x/κ. We must estimate
the constant Cϕ,2,K defined in (11.21):
Z 1 Z 1
Cϕ,2,K = − ϕ(w) log w dw ≤ − ϕ(w) log w dw
1/K 0
Z 1
2
≤− w2 e−w /2
log w dw ≤ 0.093426,
0

where again we use VNODE-LP for rigorous numerical integration. Since |ϕ|1 =
p
π/2 and K = (log y)/2, this implies that

Cϕ,2,K /|ϕ|1 0.07455


≤ (14.33)
log K log log2 y

and so !
0.07455 0.07455
Ry,K,ϕ,t = Ry/K,t + 1− Ry,t . (14.34)
log log2 y log log2 y

Let t = 2r0 = 300000; we recall that K = (log y)/2. Recall from (14.16) that
y = x/κ ≥ 1025 ; thus, y/K ≥ 3.47435 · 1023 and log((log y)/2) ≥ 3.35976. Going
back to the definition of Rx,t in (11.13), we see that
!
log(8 · 150000)
Ry,,2r0 ≤ 0.27125 log 1 + 9·(1025 )1/3
+ 0.41415 ≤ 0.58341,
2 log 2.004·2·150000
! (14.35)
log(8 · 150000)
Ry/K,2r0 ≤ 0.27125 log 1 + 23 )1/3
+ 0.41415 ≤ 0.60295,
2 log 9·(3.47435·10
2.004·2·150000
(14.36)
and so
 
0.07455 0.07455
Ry,K,ϕ,2r0 ≤ 0.60295 + 1 − 0.58341 ≤ 0.58385.
3.35976 3.35976

Using
2.50637
z(r) = eγ log log r + ≤ 5.42506,
log log r
we see from (11.13) that
 
13
L2r0 = 5.42506 · log 300000 + 7.82 + 13.66 log 300000 + 37.55 ≤ 474.608.
4
270 CHAPTER 14. CONCLUSION

Going back to (11.19), we sum up and obtain that


(0.58385 · log 300000 + 0.5) 5.42506 + 2.5
g(r0 ) = √
2 · 150000
 1/6
474.608 log y
+ + 3.36
150000 2y
≤ 0.041568.

Using again the bound x ≥ 4.9 · 1026 , we obtain

log(150000 + 1) + c+ √ √
√ −
· S − ( J − E)2
log x + c
13.9716
≤ 1 · (0.640209x log x − 0.021095x) − 8.6297x
2 log x + 0.6394
11.7332x
≤ 17.8895x − 1 − 8.6297x
2 log x + 0.6394
≤ (17.8895 − 8.6297)x ≤ 9.2598x,

where c+ = 2.0532 and c− = 0.6394. Therefore,

log(150000 + 1) + c+ √ √ 2
 
g(r0 ) · √ · S − ( J − E) ≤ 0.041568 · 9.2598x
log x + c−
≤ 0.38492x.
(14.37)
This is one of the main terms.

Let r1 = (3/8)y 4/15 , where, as usual, y = x/κ and κ = 49. Then

 
log 8 · 83 y 4/15
Ry,2r1 = 0.27125 log 1 + 9y 1/3
 + 0.41415
2 log 2.004· 34 y 4/15
!
4
15log y + log 3
= 0.27125 log 1 + 1 4 9 + 0.41415 (14.38)
2 3 − 15 log y + 2 log 2.004· 34
!
4
15
≤ 0.27125 log 1 + 1 4
 + 0.41415 ≤ 0.71215.
2 3 − 15
14.3. THE MINOR-ARC TOTAL: EXPLICIT VERSION 271

Similarly, for K = (log y)/2 (as usual),


 
log 8 · 83 y 4/15
Ry/K,2r1 = 0.27125 log 1 + 9(y/K)1/3
 + 0.41415
2 log 2.004· 3 4/15
4y
 
4
15 log y + log 3
= 0.27125 log 1 + 2 2 9·21/3
 + 0.41415
15 log y − 3 log log y + 2 log 3
2.004· 4
!
4
3 log log y − c
= 0.27125 log 3 + 2 2 12·21/3
+ 0.41415,
15 log y − 3 log log y + 2 log 2.004
(14.39)
where c = 4 log(12 · 21/3 /2.004) − log 3. Let
4
3 log t − c
f (t) = 2 2 12·21/3
.
15 t − 3 log t + 2 log 2.004

The bisection method with 32 iterations shows that

f (t) ≤ 0.019562618 (14.40)

for 180 ≤ t ≤ 30000; since f (t) < 0 for 0 < t < 180 (by (4/3) log t − c < 0) and
since, by c > 20/3, we have f (t) < (5/2)(log t)/t as soon as t > (log t)2 (and so, in
particular, for t > 30000), we see that (14.40) is valid for all t > 0. Therefore,

Ry/K,2r1 ≤ 0.71392, (14.41)

and so, by (14.34), we conclude that


 
0.07455 0.07455
Ry,K,ϕ,2r1 ≤ · 0.71392 + 1 − · 0.71215 ≤ 0.71219.
3.35976 3.35976

Since r1 = (3/8)y 4/15 and z(r) is increasing for r ≥ 27, we know that

2.50637
z(r1 ) ≤ z(y 4/15 ) = eγ log log y 4/15 +
log log y 4/15
2.50637 15
= eγ log log y + − eγ log ≤ eγ log log y − 1.43644
log log y − log 15
4
4
(14.42)
for y ≥ 1025 . Hence, (11.13) gives us that
 
13 3 4 3 4
L2r1 ≤ (eγ log log y − 1.43644) log y 15 + 7.82 + 13.66 log y 15 + 37.55
4 4 4
13 γ
≤ e log y log log y + 2.39776 log y + 12.2628 log log y + 23.7304
15
≤ (2.13522 log y + 18.118) log log y.
272 CHAPTER 14. CONCLUSION

Moreover, again by (14.42),


p p 1.43644
z(r1 ) ≤ eγ log log y − √ γ
2 e log log y
and so, by y ≥ 1025 ,
3 4 p
(0.71219 log y 15 + 0.5) z(r1 )
4  
p 1.43644
≤ (0.18992 log y + 0.29512) e log log y − √ γ
γ
2 e log log y
p 0.19505 · 1.43644 log y
≤ 0.19505 eγ log log y − √
2 eγ log log y
p
≤ 0.26031 log y log log y − 3.00147.
Therefore, by (11.19),

0.26031 log y log log y + 2.5 − 3.00147
gy,ϕ (r1 ) ≤ q
4
3 15
4y

(2.13522 log y + 18.118) log log y 3.36((log y)/2)1/6


+ 4 +
3
y 15 y 1/6
√8
0.30059 log y log log y 5.69392 log y log log y
≤ 2 + 4
y 15 y 15
0.57904 48.3147 log log y 2.994(log y)1/6
− 2 + 4 +
y 15 y 15 y 1/6

0.30059 log y log log y 5.69392 log y log log y 1.30151(log y)1/6
≤ 2 + 4 +
y 15 y 15 y 1/6

0.30915 log y log log y
≤ 2 ,
y 15
where
√ we use y ≥ 1025 and verify that the functions t 7→ (log t)1/6 /t1/6−2/15 , t 7→
log log t/t4/15−2/15 and t 7→ (log log t)/t4/15−2/15 are decreasing for t ≥ y (just by
taking derivatives).
Since κ = 49, one of the terms in (13.13) simplifies easily:
8
7 −2.14938 + 15 log κ 7
+ −
≤ .
15 log x + 2c 15
By (14.30) and y = x/κ = x/49, we conclude that

7 7 0.30915 log y log log y
g(r1 )S ≤ · 2 · (0.640209 log x − 0.021095)x
15 15 y 15

0.14427 log y log log y
≤ 2 (0.640209 log y + 2.4705)x ≤ 0.30517x,
y 15
(14.43)
14.3. THE MINOR-ARC TOTAL: EXPLICIT VERSION 273


where we are using the fact that y 7→ (log y)2 log log y/y 2/15 is decreasing for y ≥
1025 (because y 7→ (log y)5/2 /y 2/15 is decreasing for y ≥ e75/4 and 1025 > e75/4 ).
It remains only to bound
Z r1
2S g(r)
dr
log x + 2c− r0 r

in the expression (13.13) for M . We will use the bound on the integral given in (13.33).
The easiest term to bound there is f1 (r0 ), defined in (13.34), since it depends only on
r0 : for r0 = 150000,
f1 (r0 ) = 0.0169073 . . . .
It is also not hard to bound f2 (r0 , x), also defined in (13.34):
3 15 4
((log y)/2)1/6 8y
f2 (r0 , y) = 3.36 1/6
log
x r0
1/6
 
(log y) 4
≤ 3.36 log y + 0.05699 − log r0 ,
(2y)1/6 15

where we recall again that x = κy = 49y. Thus, since r0 = 150000 and y ≥ 1025 ,

f2 (r0 , y) ≤ 0.001399.

Let us now look at the terms I1,r , cϕ in (13.35). We already saw in (14.33) that

Cϕ,2 /|ϕ|1 0.07455


cϕ = ≤ ≤ 0.02219.
log K log log2 y

Since F (t) = eγ log t + cγ with cγ = 1.025742,


2eγ
I1,r0 = F (log r0 ) + = 5.73826 . . . (14.44)
log r0
It thus remains only to estimate I0,r0 ,r1 ,z for z = y and z = y/K, where K =
(log y)/2.
We will first give estimates for y large. Omitting negative terms from (13.35), we
easily get the following general bound, crude but useful enough:

2
2
P2 (log 2r0 ) Rz,2r 1
− 0.414152 P2− (log 2r0 )
I0,r0 ,r1 ,z ≤ Rz,2r · √ + √ ,
0
r0 log rr01 r0

where P2 (t) = t2 + 4t + 8 and P2− (t) = 2t2 + 16t + 48. By (14.38) and (14.41),

Ry,2r1 ≤ 0.71215, Ry/K,2r1 ≤ 0.71392

for y ≥ 1025 . Assume now that y ≥ 10150 . Then, since r0 = 150000,


!
log 4r0
Ry,r0 ≤ 0.27125 log 1 + 150 )1/3
+ 0.41415 ≤ 0.43086,
2 log 9·(10
2.004r0
274 CHAPTER 14. CONCLUSION

and, similarly, Ry/K,r0 ≤ 0.43113. Since

P2 (log 2r0 ) P2 (log 2r0 )


0.430862 · √ ≤ 0.10426, 0.431132 · √ ≤ 0.10439,
r0 r0

we obtain that
p q
(1 − cϕ ) I0,r0 ,r1 ,y + cϕ I0,r0 ,r1 , 2y
log y
s
0.49214
≤ 0.97781 · 0.10426 + 4
15 log y − log 400000 (14.45)
s
0.49584
+ 0.02219 0.10439 + 4 ≤ 0.33239
15 log y − log 400000

for y ≥ 10150 .
For y between 1025 and 10150 , we evaluate the left side of (14.45) directly, using
the definition (13.35) of I0,r0 ,r1 ,z instead, as well as the bound

0.07455
cϕ ≤
log log2 y

from (14.33). (It is clear from the second and third lines of (13.32) that I0,r0 ,r1 ,z is
decreasing on z for r0 , r1 fixed, and so the upper bound for cϕ does give the worst case.)
The bisection method (applied to the interval [25, 150] with 30 iterations, including 30
initial iterations) gives us that
p q
(1 − cϕ ) I0,r0 ,r1 ,y + cϕ I0,r0 ,r1 , 2y ≤ 0.4153461 (14.46)
log y

for 1025 ≤ y ≤ 10140 . By (14.45), (14.46) is also true for y > 10150 . Hence
s
2
f0 (r0 , y) ≤ 0.4153461 · √ 5.73827 ≤ 0.071498.
r0

By (13.33), we conclude that


Z r1
g(r)
dr ≤ 0.071498 + 0.016908 + 0.001399 ≤ 0.089805.
r0 r

By (14.30),

2S 2(0.640209x log x − 0.021095x)



≤ ≤ 2 · 0.640209x = 1.280418x,
log x + 2c log x + 2c−

where we recall that c− = 0.6294 > 0. Hence


Z r1
2S g(r)

dr ≤ 0.114988x. (14.47)
log x + 2c r0 r
14.4. CONCLUSION: PROOF OF MAIN THEOREM 275

Putting (14.37), (14.43) and (14.47) together, we conclude that the quantity M
defined in (13.13) is bounded by
M ≤ 0.38492x + 0.30517x + 0.114988x ≤ 0.80508x. (14.48)
Gathering the terms from (14.29), (14.32) and (14.48), we see that Theorem 13.2.1
states that the minor-arc total
Z
Z r0 = |Sη∗ (α, x)||Sη+ (α, x)|2 dα
(R/Z)\M8,r0

is bounded by
r !2
|ϕ|1 x
q
Z r0 ≤ (M + T ) + Sη∗ (0, x) · E
κ
 √ 2
p x x (14.49)
≤ |ϕ|1 (0.80508 + 3.5776 · 10−4 ) √ + 1.0532 · 10−11 √
κ κ
x2
≤ 1.00948
κ
p
for r0 = 150000, x ≥ 4.9 · 1026 , where we use yet again the fact that |ϕ|1 = π/2.
This is our total minor-arc bound.

14.4 Conclusion: proof of main theorem


As we have known from the start,
X
Λ(n1 )Λ(n2 )Λ(n3 )η+ (n1 )η+ (n2 )η∗ (n3 )
n1 +n2 +n3 =N
Z (14.50)
2
= Sη+ (α, x) Sη∗ (α, x)e(−N α)dα.
R/Z

We have just shown that, assuming N ≥ 1027 , N odd,


Z
Sη+ (α, x)2 Sη∗ (α, x)e(−N α)dα
R/Z
Z
= Sη+ (α, x)2 Sη∗ (α, x)e(−N α)dα
M8,r0
Z !
+ O∗ |Sη+ (α, x)|2 |Sη∗ (α, x)|dα
(R/Z)\M8,r0

x2 x2 x2
 

≥ 1.058259 + O 1.00948 ≥ 0.04877
κ κ κ

for r0 = 150000, where x = N/(2 + 9/(196 2π)), as in (14.15). (We are using
(14.27) and (14.49).) Recall that κ = 49 and η∗ (t) = (η2 ∗M ϕ)(κt), where ϕ(t) =
2
t2 e−t /2 .
276 CHAPTER 14. CONCLUSION

It only remains to show that the contribution of terms with n1 , n2 or n3 non-prime


to the sum in (14.50) is negligible. (Let us take out n1 , n2 , n3 equal to 2 as well, since
some prefer to state the ternary Goldbach conjecture as follows: every odd number ≥ 9
is the sum of three odd primes.) Clearly
X
Λ(n1 )Λ(n2 )Λ(n3 )η+ (n1 )η+ (n2 )η∗ (n3 )
n1 +n2 +n3 =N
n1 , n2 or n3 even or non-prime
X
≤ 3|η+ |2∞ |η∗ |∞ Λ(n1 )Λ(n2 )Λ(n3 )
n1 +n2 +n3 =N (14.51)
n1 even or non-prime
X X
≤ 3|η+ |2∞ |η∗ |∞ ·(log N ) Λ(n1 ) Λ(n2 ).
n1 ≤ N non-prime n2 ≤N
or n1 = 2

By (14.3) and (14.21), |η+ |∞ ≤ 1.079955 and |η∗ |∞ ≤ 1.414. By [RS62, Thms. 12
and 13],
X √ √
Λ(n1 ) < 1.4262 N + log 2 < 1.4263 N ,
n1 ≤ N non-prime
or n1 = 2
X X √
Λ(n1 ) Λ(n2 ) = 1.4263 N · 1.03883N ≤ 1.48169N 3/2 .
n1 ≤ N non-prime n2 ≤N
or n1 = 2

Hence, the sum on the first line of (14.51) is at most

7.3306N 3/2 log N.

Thus, for N ≥ 1027 odd,


X
Λ(n1 )Λ(n2 )Λ(n3 )η+ (n1 )η+ (n2 )η∗ (n3 )
n1 +n2 +n3 =N
n1 , n2 , n3 odd primes

x2
≥ 0.04877 − 7.3306N 3/2 log N
κ
≥ 0.00024433N 2 − 1.4412 · 10−11 · N 2 ≥ 0.0002443N 2

by κ = 49 and (14.15). Since 0.0002443N 2 > 0, this shows that every odd number
N ≥ 1027 can be written as the sum of three odd primes.
Since the ternary Goldbach conjecture has already been checked for all N ≤ 8.875·
1030 [HP13], we conclude that every odd number N > 7 can be written as the sum
of three odd primes, and every odd number N > 5 can be written as the sum of three
primes. The main result is hereby proven: the ternary Goldbach conjecture is true.
Part IV

Appendices

277
Appendix A

Norms of smoothing functions

Our aim here is to give bounds on the norms of some smoothing functions – and, in
particular, on several norms of a smoothing function η+ : [0, ∞) → R based on the
2
Gaussian η♥ (t) = e−t /2 .
As before, we write
(
t2 (2 − t)3 et−1/2 if t ∈ [0, 2],
h : t 7→ (A.1)
0 otherwise

We recall that we will work with an approximation η+ to the function η◦ : [0, ∞) → R


defined by
( 2
t3 (2 − t)3 e−(t−1) /2 for t ∈ [0, 2],
η◦ (t) = h(t)η♥ (t) = (A.2)
0 otherwise.

The approximation η+ is defined by


2
η+ (t) = hH (t)te−t /2
, (A.3)

where
sin(H log y)
FH (t) = ,
π log y
Z ∞ (A.4)
dy
hH (t) = (h ∗M FH )(y) = h(ty −1 )FH (y)
0 y
and H is a positive constant to be set later. By (2.8), M hH = M h · M FH . Now FH is
just a Dirichlet kernel under a change of variables; using this, we get that, for τ real,

1
 if |τ | < H,
M FH (iτ ) = 1/2 if |τ | = H, (A.5)

0 if |τ | > H.

279
280 APPENDIX A. NORMS OF SMOOTHING FUNCTIONS

Thus, 
M h(iτ )
 if |τ | < H,
1
M hH (iτ ) = 2 M h(iτ ) if |τ | = H, (A.6)

0 if |τ | > H.

As it turns out, h, η◦ and M h (and hence M hH ) are relatively easy to work with,
whereas we can already see that hH and η+ have more complicated definitions. Part
of our work will consist in expressing norms of hH and η+ in terms of norms of h, η◦
and M h.

A.1 The decay of a Mellin transform


Now, consider any φ : [0, ∞) → C that (a) has compact support (or fast decay), (b)
satisfies φ(k) (t)tk−1 = O(1) for t → 0+ and 0 ≤ k ≤ 3, and (c) is C 2 everywhere and
quadruply differentiable outside a finite set of points.
By definition, Z ∞
dx
M φ(s) = φ(x)xs .
0 x
Thus, by integration by parts, for <(s) > −1 and s 6= 0,
Z ∞ Z ∞ Z ∞
dx dx xs
M φ(s) = φ(x)xs = lim φ(x)xs = − lim φ0 (x) dx
0 x t→0+ t x t→0+ t s
Z ∞ s+1 Z ∞ s+2
x x
= lim+ φ00 (x) dx = lim+ − φ(3) (x) dx
t→0 t s(s + 1) t→0 t s(s + 1)(s + 2)
Z ∞
xs+3
= lim+ φ(4) (x) dx,
t→0 t s(s + 1)(s + 2)(s + 3)
(A.7)
where φ(4) (x) is understood in the sense of distributions at the finitely many points
where it is not well-defined as a function. R

Let s = it, φ = h. Let Ck = limt→0+ t |h(k) (x)|xk−1 dx for 0 ≤ k ≤ 4. Then
(A.7) gives us that
 
C1 C2 C3 C4
M h(it) ≤ min C0 , , , , .
|t| |t||t + i| |t||t + i||t + 2i| |t||t + i||t + 2i||t + 3i|
(A.8)
We must estimate the constants Cj , 0 ≤ j ≤ 4.
Clearly, h(t)t−1 = O(1) as t → 0+ , hk (t) = O(1) as t → 0+ for all k ≥ 1,
h(2) = h0 (2) = h00 (2) = 0, and h(x), h0 (x) and h00 (x) are all continuous. The
function h000 has a discontinuity at t = 2. As we said, we understand h(4) in the sense
R 2+
of distributions at t = 2; for example, lim→0 2− h(4) (t)dt = lim→0 (h(3) (2 + ) −
h(3) (2 − )).
Symbolic integration easily gives that
Z 2
C0 = t(2 − t)3 et−1/2 dt = 92e−1/2 − 12e3/2 = 2.02055184 . . . (A.9)
0
A.1. THE DECAY OF A MELLIN TRANSFORM 281

We will have to compute Ck , 1 ≤ k ≤ 4, with some care, due to the absolute value
involved in the definition.
The function (x2 (2 − x)3 ex−1/2 )0 = ((x2 (2 − x)3 )0 + x2 (2 − x)3 )ex−1/2 has the
same zeros as H1 (x) = (x2 (2 − x)3 )0 + x2 (2 − x)3 , namely, −4, 0, 1 and 2. The sign
of H1 (x) (and hence of h0 (x)) is + within (0, 1) and − within (1, 2). Hence

Z ∞ √
C1 = |h0 (x)|dx = |h(1) − h(0)| + |h(2) − h(1)| = 2h(1) = 2 e. (A.10)
0

The situation with (x2 (2 − x)3 ex−1/2 )00 is similar: it has zeros at the roots of
H2 (x) = 0, where H2 (x) = H1 (x) + H10 (x) (and, in general, Hk+1 (x) = Hk (x) +
Hk0 (x)). This time, we will prefer to find the roots numerically. It is enough to find
(candidates for) the roots using any available tool1 and then check rigorously that the
sign does change around the purported roots. In this way, we check that H2 (x) = 0 has
two roots α2,1 , α2,2 in the interval (0, 2), another root at 2, and two more roots outside
[0, 2]; moreover,

α2,1 = 0.48756597185712 . . . ,
(A.11)
α2,2 = 1.48777169309489 . . . ,

where we verify the root using interval arithmetic. The sign of H2 (x) (and hence of
h00 (x)) is first +, then −, then +. Write α2,0 = 0, α2,3 = 2. By integration by parts,

Z ∞ Z α2,1 Z α2,2 Z 2
C2 = |h00 (x)|x dx = h00 (x)x dx − h00 (x)x dx + h00 (x)x dx
0 0 α2,1 α2,2
3
!
X Z α2,j
= (−1)j+1 h0 (x)x|α
α2,j−1 −
2,j
h0 (x) dx
j=1 α2,j−1

2
X
=2 (−1)j+1 (h0 (α2,j )α2,j − h(α2,j )) = 10.79195821037 . . . .
j=1
(A.12)
To compute C3 , we proceed in the same way, finding two roots of H3 (x) = 0
(numerically) within the interval (0, 2), viz.,

α3,1 = 1.04294565694978 . . .
α3,2 = 1.80999654602916 . . .

The sign of H3 (x) on the interval [0, 2] is first −, then +, then −. Write α3,0 = 0,
α3,3 = 2. Proceeding as before – with the only difference that the integration by parts

1 Routine find root in SAGE was used here.


282 APPENDIX A. NORMS OF SMOOTHING FUNCTIONS

is iterated once now – we obtain that


Z ∞ 3
X Z α3,j
C3 = |h000 (x)|x2 dx = (−1)j h000 (x)x2 dx
0 j=1 α3,j−1

3
!
X Z α3,j
j 00 00
= (−1) h (x)x2 |α 3,j
α3,j−1 − h (x) · 2x dx
j=1 α3,j−1
(A.13)
3
X
j 00 0
h (x)x − h (x) · 2x + 2h(x) |α
2
 3,j
= (−1) α3,j−1
j=1
2
X
=2 (−1)j (h00 (α3,j )α3,j
2
− 2h0 (α3,j )α3,j + 2h(α3,j ))
j=1

and so interval arithmetic gives us

C3 = 75.1295251672 . . . (A.14)

The treatment of the integral in C4 is very similar, at least as first. There are two
roots of H4 (x) = 0 in the interval (0, 2), namely,

α4,1 = 0.45839599852663 . . .
α4,2 = 1.54626346975533 . . .

The sign of H4 (x) on the interval [0, 2] is first −, +, then −. Using integration by parts
as before, we obtain
Z 2−
h(4) (x) x3 dx
0+
Z α4,1 Z α4,2 Z 2−
=− h(4) (x)x3 dx + h(4) (x)x3 dx − h(4) (x)x3 dx
0+ α4,1 α4,1
2
X  
=2 (−1)j h(3) (α4,j )α4,j
3
− 3h(2) (α4,j )α4,j
2
+ 6h0 (α4,j )α4,j − 6h(α4,j )
j=1

− lim− h(3) (t)t3 = 1152.69754862 . . . ,


t→2

since limt→0+ h(k) (t)tk = 0 for 0 ≤ k ≤ 3, limt→2− h(k) (t) = 0 for 0 ≤ k ≤ 2 and
limt→2− h(3) (t) = −24e3/2 . Now
Z ∞
|h(4) (x)x3 |dx = lim+ |h(3) (2 + ) − h(3) (2 − )| · 23 = 23 · 24e3/2 ,
2− →0

Hence
Z 2−
C4 = h(4) (x) x3 dx + 24e3/2 · 23 = 2013.18185012 . . . (A.15)
0+
A.2. THE DIFFERENCE η+ − η◦ IN `2 NORM. 283

We finish by remarking that can write down M h explicitly:

M h = −e−1/2 (−1)−s (8γ(s+2, −2)+12γ(s+3, −2)+6γ(s+4, −2)+γ(s+5, −2)),


(A.16)
where γ(s, x) is the (lower) incomplete Gamma function
Z x
γ(s, x) = e−t ts−1 dt.
0

We will, however, find it easier to deal with M h by means of the bound (A.8), in part
because (A.16) amounts to an invitation to numerical instability.
For instance, it is easy to use (A.8) to give a bound for the `1 -norm of M h(it).
Since C4 /C3 > C3 /C2 > C2 /C1 > C1 /C0 ,
Z ∞
|M h(it)|1 = 2 M h(it)dt
0
Z C2 /C1 Z C3 /C2 Z C4 /C3 Z ∞ !
C1 dt dt dt dt
≤2 C0 + C1 + C2 2
+ C3 3
+ C4 4
C0 C1 /C0 t C2 /C1 t C3 /C2 t C4 /C3 t

C3 C22 C32 C4 C33


     
C2 C0 C1 C2
=2 C1 + C1 log + C 2 − + − + · ,
C12 C2 C3 2 C32 C42 3 C43

and so
|M h(it)|1 ≤ 16.1939176. (A.17)

This bound is far from tight, but it will certainly be useful.


Similarly, |(t + i)M h(it)|1 is at most two times
C1 C2 C3 C4
Z Z Z Z Z ∞
C0 C1 i C2 dt C3 dt dt
C0 |t + i| dt + C1 1+ dt + C2 + C3 + C4
0
C1
C0
t C2
C1
t C3
C2
t2 C4
C3
t3
s ! √ !!
C0 C14 C2 C1 p t2 + 1 − 1 C2

= 4 + 12 + sinh−1 + C1 t2 + 1 + log | CC11


2 C0 C0 C0 t C0

C4 C32
 
C3 C1 C2 C3
+ C2 log 2 + C3 − + ,
C2 C3 C4 2 C42

and so
|(t + i)M h(it)|1 ≤ 27.8622803. (A.18)

A.2 The difference η+ − η◦ in `2 norm.


We wish to estimate the distance in `2 norm between η◦ and its approximation η+ . This
will be an easy affair, since, on the imaginary axis, the Mellin transform of η+ is just a
truncation of the Mellin transform of η◦ .
284 APPENDIX A. NORMS OF SMOOTHING FUNCTIONS

By (A.2) and (A.3),


Z ∞ 2
2 2
|η+ − η◦ |22 = hH (t)te−t /2 − h(t)te−t /2 dt
0
  Z ∞ (A.19)
−t2 3 dt
≤ max e t · |hH (t) − h(t)|2 .
t≥0 0 t
2
The maximum maxt≥0 t3 e−t is (3/2)3/2 e−3/2 . Since the Mellin transform is an
isometry (i.e., (2.6) holds),
Z ∞ Z ∞
1 ∞
Z
2 dt 1 2
|hH (t) − h(t)| = |M hH (it) − M h(it)| dt = |M h(it)|2 dt.
0 t 2π −∞ π H
(A.20)
By (A.8), Z ∞ Z ∞ 2
2 C4 C42
|M h(it)| dt ≤ dt ≤ . (A.21)
H H t8 7H 7
Hence ∞
C42
Z
dt
|hH (t) − h(t)|2 ≤ . (A.22)
0 t 7πH 7
Using the bound (A.15) for C4 , we conclude that
 3/4
C4 3 1 274.856893
|η+ − η◦ |2 ≤ √ · ≤ . (A.23)
7π 2e H 7/2 H 7/2
It will also be useful to bound
Z ∞
(η+ (t) − η◦ (t))2 log t dt .
0

This is at most
  Z ∞
2 dt
max e−t t3 | log t| · |hH (t) − h(t)|2 .
t≥0 0 t
Now
 
2 2 2
max e−t t3 | log t| = max max e−t t3 (− log t), max e−t t3 log t
t≥0 t∈[0,1] t∈[1,5]

= 0.14882234545 . . .

where we find the maximum by the bisection method with 40 iterations (see 2.6).
Hence, by (A.22),
Z ∞
C2
(η+ (t) − η◦ (t))2 | log t|dt ≤ 0.148822346 4
0 7π
 2 (A.24)
27427.502 165.61251
≤ ≤ .
H7 H 7/2
A.3. NORMS INVOLVING η+ 285

A.3 Norms involving η+


Let us now bound some `1 - and `2 -norms involving η+ . Relatively crude bounds will
suffice in most cases.
First, by (A.23),
274.8569
|η+ |2 ≤ |η◦ |2 + |η+ − η◦ |2 ≤ 0.800129 + ,
H 7/2 (A.25)
274.8569
|η+ |2 ≥ |η◦ |2 − |η+ − η◦ |2 ≥ 0.800128 − ,
H 7/2
where we obtain

|η◦ |2 = 0.640205997 . . . = 0.8001287 . . . (A.26)
by symbolic integration.
Let us now bound |η+ · log |22 . By isometry and (2.10),
Z 12 +i∞ Z 12 +i∞
1 1
|η+ · log |22 = |M (η+ · log)(s)|2 ds = |(M η+ )0 (s)|2 ds.
2πi 12 −i∞ 2πi 12 −i∞
Now, (M η+ )0 (1/2 + it) equals 1/2π times the additive convolution of M hH (it) and
2
(M η♦ )0 (1/2 + it), where η♦ (t) = te−t /2 . Hence, by Young’s inequality,
1
|(M η+ )0 (1/2 + it)|2 ≤ |M hH (it)|1 |(M η♦ )0 (1/2 + it)|2 .

Again by isometry and (2.10),

|(M η♦ )0 (1/2 + it)|2 = 2π|η♦ · log |2 .
Hence, by (A.17),
1
|η+ · log |2 ≤ |M hH (it)|1 |η♦ · log |2 ≤ 2.5773421 · |η♦ · log |2 .

Since, by symbolic integration,
r√
π
|η♦ · log |2 ≤ (8(log 2)2 + 2γ 2 + π 2 + 8(γ − 2) log 2 − 8γ)
32 (A.27)
≤ 0.3220301,
we get that
|η+ · log |2 ≤ 0.8299818. (A.28)
Let us bound |η+ (t)tσ |1 for σ ∈ (−2, ∞). By Cauchy-Schwarz and Plancherel,
2 2 √
|η+ (t)tσ |1 = hH (t)t1+σ e−t /2 ≤ tσ+3/2 e−t /2 |hH (t)/ t|2
1 2
sZ s
∞ Z H
2 dt 2 1
= tσ+3/2 e−t /2 |hH (t)|2 = tσ+3/2 e−t /2 · |M h(ir)|2 dr
2 0 t 2 2π −H
s Z ∞
2
σ+3/2 −t /2 1 2 √
≤ t e · |M h(ir)|2 dr = tσ+3/2 e−t /2 · |h(t)/ t|2 .
2 2π −∞ 2
(A.29)
286 APPENDIX A. NORMS OF SMOOTHING FUNCTIONS

Since sZ r

σ+3/2 −t2 /2 Γ(σ + 2)
t e = e−t2 t2σ+3 dt = ,
2 0 2
r
√ 31989 585e3
|h(t)/ t|2 = − ≤ 1.5023459,
8e 8
we conclude that p
|η+ (t)tσ |1 ≤ 1.062319 · Γ(σ + 2) (A.30)
for σ > −2.

0
A.4 Norms involving η+
0
By one of the standard transformation rules (see (2.10)), the Mellin transform of η+
equals −(s − 1) · M η+ (s − 1). Since the Mellin transform is an isometry in the sense
of (2.6),
1
Z 2 +i∞
Z − 12 +i∞
0 2 1 0 2 1 2
|η+ |2 = M (η+ )(s) ds = |s · M η+ (s)| ds.
2πi 1
2 −i∞
2πi − 12 −i∞

2
Recall that η+ (t) = hH (t)η♦ (t), where η♦ (t) = te−t /2 . Thus, by (2.9), the func-
tion M η+ (−1/2 + it) equals 1/2π times the (additive) convolution of M hH (it) and
M η♦ (−1/2 + it). Therefore, for s = −1/2 + it,
H
|s|
Z
|s| |M η+ (s)| = M h(ir)M η♦ (s − ir)dr
2π −H
Z H
3 (A.31)
≤ |ir − 1||M h(ir)| · |s − ir||M η♥ (s − ir)|dr
2π −H
3
= (f ∗ g)(t),

where f (t) = |it − 1||M h(it)| and g(t) = | − 1/2 + it||M η♦ (−1/2 + it)|. (Since
|(−1/2 + i(t − r)) + (1 + ir)| = |1/2 + it| = |s|, either | − 1/2 + i(t − r)| ≥ |s|/3 or
|1+ir| ≥ 2|s|/3; hence |s−ir||ir−1| = |−1/2+i(t−r)||1+ir| ≥ |s|/3.) By Young’s
inequality (in a special case that follows from Cauchy-Schwarz), |f ∗ g|2 ≤ |f |1 |g|2 .
By (A.18),
|f |1 = |(r + i)M h(ir)|1 ≤ 27.8622803.
Yet again by Plancherel,
Z − 21 +i∞
|g|22 = |s|2 |M η♦ (s)|2 ds
− 12 −i∞
1 3
2 +i∞
Z
0 0 2 3π 2
= |(M (η♦ ))(s)|2 ds = 2π|η♦ |2 = .
1
2 −i∞
4
0
A.4. NORMS INVOLVING η+ 287

Hence
0 1 3
|η+ |2 ≤ √ · |f ∗ g|2
2π 2π
s
3
(A.32)
1 3 3π 2
≤√ · 27.8622803 ≤ 10.845789.
2π 2π 4

0
Let us now bound |η+ (t)tσ |1 for σ ∈ (−1, ∞). First of all,

 2
0
0
|η+ (t)tσ |1 = hH (t)te−t /2

1
 2 2

≤ h0H (t)te−t /2 + hH (t)(1 − t2 )e−t /2
· tσ
1
2
≤ h0H (t)tσ+1 e−t /2
+ |η+ (t)tσ−1 |1 + |η+ (t)tσ+1 |1 .
1

We can bound the last two terms by (A.30). Much as in (A.29), we note that

2 2 √
h0H (t)tσ+1 e−t /2
≤ tσ+1/2 e−t /2
|h0H (t) t|2 ,
1 2

and then see that


sZ s
√ ∞ Z ∞
1
|h0H (t) t|2 = |h0H (t)|2 t dt =|M (h0H )(1 + ir)|2 dr
0 2π −∞
s Z ∞ s Z H
1 2
1
= |(−ir)M hH (ir)| dr = |(−ir)M h(ir)|2 dr
2π −∞ 2π −H
s s Z ∞
1
Z H
1 √
= 0 2
|M (h )(1 + ir)| dr ≤ |M (h0 )(1 + ir)|2 dr = |h0 (t) t|2 ,
2π −H 2π −∞

where we use the first rule in (2.10) twice. Since


r r
σ+1/2 −t2 /2 Γ(σ + 1) 0
√ 103983 1899e3
t e = , |h (t) t|2 = − = 2.6312226,
2 2 16e 16

we conclude that
r
0
p p Γ(σ + 1)
|η+ (t)tσ |1 ≤ 1.062319 · ( Γ(σ + 1) + Γ(σ + 3)) + · 2.6312226
2
p p
≤ 2.922875 Γ(σ + 1) + 1.062319 Γ(σ + 3)
(A.33)
for σ > −1.
288 APPENDIX A. NORMS OF SMOOTHING FUNCTIONS

A.5 The `∞ -norm of η+


Let us now get a bound for |η+ |∞ . Recall that η+ (t) = hH (t)η♦ (t), where η♦ (t) =
2
te−t /2 . Clearly

|η+ |∞ = |hH (t)η♦ (t)|∞ ≤ |η◦ |∞ + |(h(t) − hH (t))η♦ (t)|∞


h(t) − hH (t) (A.34)
≤ |η◦ |∞ + |η♦ (t)t|∞ .
t ∞

Taking derivatives, we easily see that

|η◦ |∞ = η◦ (1) = 1, |η♦ (t)t|∞ = 2/e.

It remains to bound |(h(t) − hH (t))/t|∞ . By (7.6),


Z ∞ Z ∞  
−1 sin(H log y) dy t sin w
hH (t) = h(ty ) = h w/H dw. (A.35)
t
2
π log y y 2
−H log t e πw

The sine integral Z x


sin t
Si(x) = dt
0 t
is defined for all x; it tends to π/2 as x → +∞ and to −π/2 as x → −∞ (see [AS64,
(5.2.25)]). We apply integration by parts to the second integral in (A.35), and obtain
1 ∞
Z   
d t
hH (t) − h(t) = − h w/H Si(w)dw − h(t)
π −H log 2t dw e
1 ∞ d
Z    
t π
=− h w/H Si(w) − dw
π 0 dw e 2
Z 0    
1 d t π
− h w/H Si(w) + dw.
π −H log 2t dw e 2

Now
te−w/H 0 t|h0 |∞
   
d t t
h = h ≤ .
dw ew/H H ew/H Hew/H
Integration by parts easily yields the bounds | Si(x) − π/2| < 2/x for x > 0 and
| Si(x) + π/2| < 2/|x| for x < 0; we also know that 0 ≤ Si(x) ≤ x < π/2 for
x ∈ [0, 1] and −π/2 < x ≤ Si(x) ≤ 0 for x ∈ [−1, 0]. Hence
Z 1 Z ∞ −w/H 
2t|h0 |∞ π −w/H 2e
|hH (t) − h(t)| ≤ e dw + dw
πH 0 2 1 w
 
4 E1 (1/H)
= t|h0 |∞ · (1 − e−1/H ) + ,
π H
where E1 is the exponential integral

e−t
Z
E1 (z) = dt.
z t
A.5. THE `∞ -NORM OF η+ 289

By [AS64, (5.1.20)],
log(H + 1)
0 < E1 (1/H) < ,
e1/H
and, since log(H +1) = log H +log(1+1/H) < log H +1/H < (log H)(1+1/H) <
(log H)e1/H for H ≥ e, we see that this gives us that E1 (1/H) < log H (again for
H ≥ e, as is the case). Hence

1 + π4 log H
 
|hH (t) − h(t)| 1 4 log H
< |h0 |∞ · 1 − e− H + < |h0 |∞ · , (A.36)
t π H H
and so, by (A.34),

2 h(t) − hH (t) 2 1 + π4 log H


|η+ |∞ ≤ 1 + < 1 + |h0 |∞ · .
e t ∞ e H
By (A.11) and interval arithmetic, we determine that

|h0 |∞ = |h0 (α2,2 )| ≤ 2.805820379671, (A.37)

where α2,2 is a root of h00 (x) = 0 as in (A.11). We have proven

2 1 + π4 log H 1 + π4 log H
|η+ |∞ < 1 + · 2.80582038 · < 1 + 2.06440727 · . (A.38)
e H H
We will need three other bounds of this kind, namely, for η+ (t) log t, η+ (t)/t and
η+ (t)t. We start as in (A.34):

|η+ log t|∞ ≤ |η◦ log t|∞ + |(h(t) − hH (t))η♦ (t) log t|∞
≤ |η◦ log t|∞ + |(h − hH (t))/t|∞ |η♦ (t)t log t|∞ ,
(A.39)
|η+ (t)/t|∞ ≤ |η◦ (t)/t|∞ + |(h − hH (t))/t|∞ |η♦ (t)|∞
|η+ (t)t|∞ ≤ |η◦ (t)t|∞ + |(h − hH (t))/t|∞ |η♦ (t)t2 |∞ .

By the bisection method with 30 iterations, implemented with interval arithmetic,

|η◦ (t) log t|∞ ≤ 0.279491, |η♦ (t)t log t|∞ ≤ 0.3811561.

Hence, by (A.36) and (A.37),


4
1+ πlog H
|η+ log t|∞ ≤ 0.279491 + 1.069456 · . (A.40)
H
By the bisection method with 32 iterations,

|η◦ (t)/t|∞ ≤ 1.08754396.


0
√ this by solving (η◦ (t)/t) = 0 symbolically.) It is easy to show
(We can also obtain
that |η♦ |∞ = 1/ e. Hence, again by (A.36) and (A.37),
4
1+ πlog H
|η+ (t)/t|∞ ≤ 1.08754396 + 1.70181609 · . (A.41)
H
290 APPENDIX A. NORMS OF SMOOTHING FUNCTIONS

By the bisection method with 32 iterations,

|η◦ (t)t|∞ ≤ 1.06473476.

Taking derivatives, we see that |η♦ (t)t2 |∞ = 33/2 e−3/2 . Hence, yet again by (A.36)
and (A.37),
1 + π4 log H
|η+ (t)t|∞ ≤ 1.06473476 + 3.25312 · . (A.42)
H
Appendix B

Norms of Fourier transforms

B.1 The Fourier transform of η200


Our aim here is to give upper bounds on |ηc 00
2 |∞ , where η2 is as in (3.4). We will do
considerably better than the trivial bound |ηc00 |∞ ≤ |η 00 |1 .
Lemma B.1.1. For every t ∈ R,
|4e(−t/4) − 4e(−t/2) + e(−t)| ≤ 7.87052. (B.1)
We will describe an extremely simple, but rigorous, procedure to find the maxi-
mum. Since |g(t)|2 is C 2 (in fact smooth), there are several more efficient and equally
rigourous algorithms – for starters, the bisection method with error bounded in terms
of |(|g|2 )00 |∞ .
Proof. Let
g(t) = 4e(−t/4) − 4e(−t/2) + e(−t). (B.2)
For a ≤ t ≤ b,
t−a 1
g(t) = g(a) + (g(b) − g(a)) + (b − a)2 · O∗ ( max |g 00 (v)|). (B.3)
b−a 8 v∈[a,b]

(This formula, in all likelihood well-known, is easy to derive. First, we can assume
without loss of generality that a = 0, b = 1 and g(a) = g(b) = 0. Dividing by g
by g(t), we see that we can also assume that g(t) is real (and in fact 1). We can also
assume that g is real-valued, in that it will be enough to prove (B.3) for the real-valued
function <g, as this will give us the bound g(t) = <g(t) ≤ (1/8) maxv |(<g)00 (v)| ≤
maxv |g 00 (v)| that we wish for. Lastly, we can assume (by symmetry) that 0 ≤ t ≤ 1/2,
and that g has a local maximum or minimum at t. Writing M = maxu∈[0,1] |g 00 (u)|,
we then have:
Z t Z tZ v Z t Z v 
g(t) = g 0 (v)dv = g 00 (u)dudv = O∗ M du dv
0 0 t 0 t
Z t     
1 2 1
= O∗ (v − t)M dv = O∗ t M = O∗ M ,
0 2 8

291
292 APPENDIX B. NORMS OF FOURIER TRANSFORMS

as desired.)
We obtain immediately from (B.3) that
1
max |g(t)| ≤ max(|g(a)|, |g(b)|) + (b − a)2 · max |g 00 (v)|. (B.4)
t∈[a,b] 8 v∈[a,b]

For any v ∈ R,
 π 2
|g 00 (v)| ≤ · 4 + π 2 · 4 + (2π)2 = 9π 2 . (B.5)
2
Clearly g(t) depends only on t mod 4π. Hence, by (B.4) and (B.5), to estimate

max |g(t)|
t∈R

p an error of at most , it is enough to subdivide [0, 4π] into intervals of length


with
≤ 8/9π 2 each. We set  = 10−6 and compute.
Lemma B.1.2. Let η2 : R+ → R be as in (3.4). Then

|ηc00
2 |∞ ≤ 31.521. (B.6)

This should be compared with |η200 |1 = 48.


Proof. We can write

η200 (x) = 4(4δ1/4 (x) − 4δ1/2 (x) + δ1 (x)) + f (x), (B.7)

where δx0 is the point measure at x0 of mass 1 (Dirac delta function) and

0
 if x < 1/4 or x ≥ 1,
f (x) = −4x−2 if 1/4 ≤ x < 1/2,
 −2

4x if 1/2 ≤ x < 1.

00 0
2 (t) = 4g(t) + f (t), where g is as in (B.2). It is easy to see that |f |1 =
Thus ηc b
2 maxx f (x) − 2 minx f (x) = 160. Therefore,
|f 0 |1 80
fb(t) = fb0 (t)/(2πit) ≤ = . (B.8)
2π|t| π|t|
Since 31.521 − 4 · 7.87052 = 0.03892, we conclude that (B.6) follows from Lemma
B.1.1 and (B.8) for |t| ≥ 655 > 80/(π · 0.03892).
It remains to check the range t ∈ (−655, 655); since 4g(−t)+fb(−t) is the complex
conjugate of 4g(t) + fb(t), it suffices to consider t non-negative. We use (B.4) (with
4g + fb instead of g) and obtain that, to estimate maxt∈R |4g + fb(t)|
qwith an error of at
most , it is enough to subdivide [0, 655) into intervals of length ≤ 2/|(4g + fb)00 |∞
each and check |4g + fb(t)| at the endpoints. Now, for every t ∈ R,
 00
fb (t) = (−2πi)2 xd 2 f (t) = (2π)2 · O ∗ |x2 f | 2

1 = 12π .
B.2. BOUNDS INVOLVING A LOGARITHMIC FACTOR 293

By this and (B.5), |(4g + fb)00 |∞ ≤ 48π 2 . Thus, intervals of length δ1 give an error
term of size at most 24π 2 δ12 . We choose δ1 = 0.001 and obtain an error term less than
0.000237 for this stage.
To evaluate fb(t) (and hence 4g(t) + fb(t)) at a point, we integrate usingpSimpson’s
rule on subdivisions of the intervals [1/4, 1/2], [1/2, 1] into 200 · max(1, b |t|c) sub-
intervals each.1 The largest value of fb(t) we find is 31.52065 . . . , with an error term
of at most 4.5 · 10−5 .

B.2 Bounds involving a logarithmic factor


00
Our aim now is to give upper bounds on |ηd
(y) |∞ , where η(y) (t) = log(yt)η2 (t) and
y ≥ 4.
Lemma B.2.1. Let η2 : R+ → R be as in (3.4). Let η(y) (t) = log(yt)η2 (t), where
y ≥ 4. Then
0
|η(y) |1 < (log y)|η20 |1 . (B.9)
Proof. Recall that supp(η2 ) = (1/4, 1). For t ∈ (1/4, 1/2),

0 4 log 4t 4 log yt 8 log 4t


η(y) (t) = (4 log(yt) log 4t)0 = + ≥ > 0,
t t t
whereas, for t ∈ (1/2, 1),

0 4 log yt 4 log t 4 log yt2


η(y) (t) = (−4 log(yt) log t)0 = − − =− < 0,
t t t
where we are using the fact that y ≥ 4. Hence η(y) (t) is increasing on (1/4, 1/2) and
0
decreasing on (1/2, 1); it is also continuous at t = 1/2. Hence |η(y) |1 = 2|η(y) (1/2)|.
We are done by
y y
2|η(y) (1/2)| = 2 log · η2 (1/2) = log · 8 log 2 < log y · 8 log 2 = (log y)|η20 |1 .
2 2

Lemma B.2.2. Let y ≥ 4. Let g(t) = 4e(−t/4) − 4e(−t/2) + e(−t) and k(t) =
2e(−t/4) − e(−t/2). Then, for every t ∈ R,

|g(t) · log y − k(t) · 4 log 2| ≤ 7.87052 log y. (B.10)

Proof. By Lemma B.1.1, |g(t)| ≤ 7.87052. Since y ≥ 4, k(t) · (4 log 2)/ log y ≤ 6.
For any complex numbers z1 , z2 with |z1 |, |z2 | ≤ `, we can have |z1 − z2 | > ` only if
| arg(z1 /z2 )| > π/3. It is easy to check that, for all t ∈ [−2, 2],
   
g(t) · log y g(t) π
arg = arg < 0.7 < .
4 log 2 · k(t) k(t) 3
(It is possible to bound maxima rigorously as in (B.4).) Hence (B.10) holds.
1 As usual, the code uses interval arithmetic (§2.6).
294 APPENDIX B. NORMS OF FOURIER TRANSFORMS

Lemma B.2.3. Let η2 : R+ → R be as in (3.4). Let η(y) (t) = (log yt)η2 (t), where
y ≥ 4. Then
00 |
|ηd ∞ < 31.521 · log y.
(y) (B.11)

Proof. Clearly
 
00 2 1
η(y) (x) = η200 (x)(log y) + (log x)η200 (x) + η20 (x) − 2 η2 (x)
x x
= η200 (x)(log y) + 4(log x)(4δ1/4 (x) − 4δ1/2 (x) + δ1 (x)) + h(x),

where 
0
 if x < 1/4 or x > 1,
4
x2 (2 − 2 log 2x) if 1/4 ≤ x < 1/2,
h(x) =
4

x2 (−2 + 2 log x) if 1/2 ≤ x < 1.
(Here we are using the expression (B.7) for η200 (x).) Hence

00
(y) (t) = (4g(t) + f (t))(log y) + (−16 log 2 · k(t) + h(t)),
ηd (B.12)
b b

where k(t) = 2e(−t/4) − e(−t/2). Just as in the proof of Lemma B.1.2,

|f 0 |1 80 160(1 + log 2)
|fb(t)| ≤ ≤ , |b
h(t)| ≤ . (B.13)
2π|t| π|t| π|t|
Again as before, this implies that (B.11) holds for
 
1 160(1 + log 2)
|t| ≥ 80 + = 2252.51.
π · 0.03892 (log 4)

Note also that it is enough to check (B.11) for t ≥ 0, by symmetry. Our remaining task
is to prove (B.11) for 0 ≤ t ≤ 2252.21.
Let I = [0.3, 2252.21] \ [3.25, 3.65]. For t ∈ I, we will have
!
4g(t) + fb(t)  π π
arg ⊂ − , . (B.14)
−16 log 2 · k(t) + bh(t) 3 3

(This is actually true for 0 ≤ t ≤ 0.3 as well, but we will use a different strategy in
that range in order to better control error terms.) Consequently, by Lemma B.1.2 and
log y ≥ log 4,
00
|ηd
(y) (t)| < max(|4g(t) + f (t)| · (log y), |16 log 2 · k(t) − h(t)|)
b b

< max(31.521(log y), |48 log 2 + 25|) = 31.521 log y,

where we bound b h(t) by (B.13) and by a numerical computation of the maximum of


|b
h(t)| for 0 ≤ t ≤ 4 as in the proof of Lemma B.1.2.
It remains to check (B.14). Here, as in the proof of Lemma B.2.2, the allowable
error is relatively large (the expression on the left of (B.14) is actually contained in
B.2. BOUNDS INVOLVING A LOGARITHMIC FACTOR 295

(−1, 1) for t ∈ I). We decide to evaluate the argument in (B.14) at all t ∈ 0.005Z ∩ I,
computing fb(t) and bh(t) by numerical integration (Simpson’s rule) with a subdivision
of [−1/4, 1] into 5000 intervals. Proceeding as in the proof of Lemma B.1.1, we see
that the sampling induces an error of at most
1 0.0001
0.0052 max((4|g 00 (v)| + |(fb)00 (t)|) ≤ 48π 2 < 0.00593 (B.15)
2 v∈I 8

in the evaluation of 4g(t) + fb(t), and an error of at most


1
0.0052 max((16 log 2 · |k 00 (v)| + |(b
h)00 (t)|)
2 v∈I
(B.16)
0.0001
≤ (16 log 2 · 6π 2 + 24π 2 · (2 − log 2)) < 0.0121
8

in the evaluation of 16 log 2 · |k 00 (v)| + |(b


h)00 (t)|.
Running the numerical evaluation just described for t ∈ I, the estimates for the left
side of (B.14) at the sample points are at most 0.99134 in absolute value; the absolute
values of the estimates for 4g(t) + fb(t) are all at least 2.7783, and the absolute values
of the estimates for | − 16 log 2 · log k(t) + b h(t)| are all at least 2.1166. Numerical
integration by Simpson’s rule gives errors bounded by 0.17575 percent. Hence the
absolute value of the left side of (B.14) is at most
   
0.00593 0.0121
0.99134 + arcsin + 0.0017575 + arcsin + 0.0017575
2.7783 2.1166
π
≤ 1.00271 <
3
for t ∈ I.
Lastly, for t ∈ [0, 0.3] ∪ [3.25, 3.65], a numerical computation (samples at 0.001Z;
interpolation as in Lemma B.1.2; integrals computed by Simpson’s rule with a subdi-
vision into 1000 intervals) gives
!
| − 16 log 2 · k(t) + h(t)|
b
max |(4g(t) + fb(t))| + < 29.08,
t∈[0,0.3]∪[3.25,3.65] log 4

00
and so maxt∈[0,0.3]∪[3.25,3.65] |ηd
(y) |∞ < 29.1 log y < 31.521 log y.

An easy integral gives us that the function log ·η2 satisfies

| log ·η2 |1 = 2 − log 4 (B.17)

The following function will appear only in a lower-order term; thus, an `1 estimate will
do.
Lemma B.2.4. Let η2 : R+ → R be as in (3.4). Then

|(log ·η2 )00 |1 = 96 log 2. (B.18)


296 APPENDIX B. NORMS OF FOURIER TRANSFORMS

Proof. The function log ·η(t) is 0 for t ∈


/ [1/4, 1], is increasing and negative for t ∈
(1/4, 1/2) and is decreasing and positive for t ∈ (1/2, 1). Hence
    
00 0 1 0 1
|(log ·η2 ) |∞ = 2 (log ·η2 ) − (log ·η2 )
2 4
= 2(16 log 2 − (−32 log 2)) = 96 log 2.
Appendix C

Sums involving Λ and φ

C.1 Sums over primes


P
Here we treat some sums of the type n Λ(n)ϕ(n), where ϕ has compact support.
Since the sums are over all integers (not just an arithmetic progression) and there is no
phase e(αn) involved, the treatment is relatively straightforward.
The following is standard.
Lemma C.1.1 (Explicit formula). Let ϕ : [1, ∞) → C be continuous and piecewise
C 1 with ϕ00 ∈ `1 ; let it also be of compact support contained in [1, ∞). Then
Z ∞ 
X 1 X
Λ(n)ϕ(n) = 1− ϕ(x)dx − (M ϕ)(ρ), (C.1)
n 1 x(x2 − 1) ρ

where ρ runs over the non-trivial zeros of ζ(s).


The non-trivial zeros of ζ(s) are, of course, those in the critical strip 0 < <(s) < 1.
Remark. Lemma C.1.1 appears as exercise 5 in [IK04, §5.5]; the condition there
that ϕ be smooth can be relaxed, since already the weaker assumption that ϕ00 be in L1
implies that the Mellin transform (MP ϕ)(σ + it) decays quadratically on t as t → ∞,
thereby guaranteeing that the sum ρ (M ϕ)(ρ) converges absolutely.
Lemma C.1.2. Let x ≥ 10. Let η2 be as in (11.7). Assume that all non-trivial zeros of
ζ(s) with |=(s)| ≤ T0 lie on the critical line.
Then
log eT
n   0
 
X
∗ 1/2 9.7 2π 9/4 6.03
Λ(n)η2 = x + O 0.135x + 2 + + x.
n
x x T0 2π T0
(C.2)
In particular, with T0 = 3.061 · 1010 in the assumption, we have, for x ≥ 2000,
X n
Λ(n)η2 = (1 + O∗ ())x + O∗ (0.135x1/2 ),
n
x

where  = 2.73 · 10−10 .

297
298 APPENDIX C. SUMS INVOLVING Λ AND φ

The assumption that all non-trivial zeros up to T0 = 3.061 · 1010 lie on the critical
line was proven rigorously in [Plaa]; higher values of T0 have been reached elsewhere
([Wed03], [GD04]).
Proof. By Lemma C.1.1,
n Z ∞   Z ∞
X t η2 (t/x) X
Λ(n)η2 = η2 dt − dt − (M ϕ)(ρ),
n
x 1 x 1 t(t2 − 1) ρ

where ϕ(u)R ∞= η2 (u/x) and ρ runs over all non-trivial zeros of ζ(s). Since η2 is non-
negative, 1 η2 (t/x)dt = x|η2 |1 = x, while
Z ∞ Z 1 !  
η2 (t/x) ∗ η2 (t) ∗ 9.61114
dt = O dt = O .
1 t(t2 − 1) 2 2
1/4 tx (t − 1/100) x2

By (2.11),
X X X  1 − 2−ρ 2
(M ϕ)(ρ) = M η2 (ρ) · xρ = xρ
ρ ρ ρ
ρ
= S1 (x) − 2S1 (x/2) + S1 (x/4),

where
X xρ
Sm (x) = . (C.3)
ρ
ρm+1

Setting aside the contribution of all ρ with |=(ρ)| ≤ T0 and all ρ with |=(ρ)| > T0 and
<(s) ≤ 1/2, and using the symmetry provided by the functional equation, we obtain
X 1 X 1
|Sm (x)| ≤ x1/2 · +x·
ρ
|ρ|m+1 ρ
|ρ|m+1
|=(ρ)|>T0
|<(ρ)|>1/2
X 1 x X 1
≤ x1/2 · + · .
ρ
|ρ|m+1 2 ρ
|ρ|m+1
|=(ρ)|>T0

We bound the first sum by [Ros41, Lemma 17] and the second sum by [RS03, Lemma
2]. We obtain
 
1 2.68 eT0
|Sm (x)| ≤ + x log + κm x1/2 , (C.4)
2mπT0m T0m+1 2π

where κ1 = 0.0463, κ2 = 0.00167 and κ3 = 0.0000744.


Hence

3 √
   
X 1 2.68 9x eT0
(M η)(ρ) · xρ ≤ + 2 log + + 2 κ1 x1/2 .
ρ
2πT0 T0 4 2π 2
C.2. SUMS INVOLVING φ 299

For T0 = 3.061 · 1010 and x ≥ 2000, we obtain


X n
Λ(n)η2 = (1 + O∗ ())x + O∗ (0.135x1/2 ),
n
x

where  = 2.73 · 10−10 .

Corollary C.1.3. Let η2 be as in (11.7). Assume that all non-trivial zeros of ζ(s) with
|=(s)| ≤ T0 , T0 = 3.061 · 1010 , lie on the critical line. Then, for all x ≥ 1,
X n  
Λ(n)η2 ≤ min (1 + )x + 0.2x1/2 , 1.04488x , (C.5)
n
x

where  = 2.73 · 10−10 .

Proof. Immediate from Lemma C.1.2 for P x ≥ 2000. For x < 2000, we use computa-
tion as follows. Since |η20 |∞ = 16 and x/4≤n≤x Λ(n) ≤ x for all x ≥ 0, computing
P
n≤x Λ(n)η2 (n/x) only for x ∈ (1/1000)Z ∩ [0, 2000] results in an inaccuracy of at
most (16 · 0.0005/0.9995)x ≤ 0.00801x. This resolves the matter at all points outside
(205, 207) (for the first estimate) or outside (9.5, 10.5) and (13.5, 14.5) (for the second
estimate). In those intervals, the prime powers n involved do not change (since whether
x/4 < n ≤ x depends only on n and [x]), and thus we can find the maximum of the
sum in (C.5) just by taking derivatives.

C.2 Sums involving φ


We need estimates for several sums involving P φ(q) in the denominator.
2
QThe easiest are convergent sums, such as q µ (q)/(φ(q)q). We can express this
as p (1 + 1/(p(p − 1))). This is a convergent product, and the main task is to bound
a tail: for r an integer,
Y 1
 X
1 X 1 1
log 1+ ≤ ≤ = . (C.6)
p>r
p(p − 1) p>r
p(p − 1) n>r
n(n − 1) r

A quick computation1 now suffices to give


X gcd(q, 2)µ2 (q)
2.591461 ≤ < 2.591463 (C.7)
q
φ(q)q

and so
X µ2 (q)
1.295730 ≤ < 1.295732, (C.8)
φ(q)q
q odd

since the expression bounded in (C.8) is exactly half of that bounded in (C.7).
1 Using D. Platt’s integer arithmetic package.
300 APPENDIX C. SUMS INVOLVING Λ AND φ

Again using (C.6), we get that


X µ2 (q)
2.826419 ≤ < 2.826421. (C.9)
q
φ(q)2

In what follows, we will use values for convergent sums obtained in much the same
way – an easy tail bound followed by a computation.
By [Ram95, Lemma 3.4],
X µ2 (q)
= log r + cE + O∗ (7.284r−1/3 ),
φ(q)
q≤r
X µ2 (q) 1

log 2
 (C.10)
= log r + cE + + O∗ (4.899r−1/3 ),
φ(q) 2 2
q≤r
q odd

where
X log p
cE = γ + = 1.332582275 + O∗ (10−9 /3)
p
p(p − 1)
by [RS62, (2.11)]. As we already said in (12.15), this, supplemented by a computation
for r ≤ 4 · 107 , gives
X µ2 (q)
log r + 1.312 ≤ ≤ log r + 1.354
φ(q)
q≤r

for r ≥ 182. In the same way, we get that


1 X µ2 (q) 1
log r + 0.83 ≤ ≤ log r + 0.85 (C.11)
2 φ(q) 2
q≤r
q odd

for r ≥ 195. (The numerical verification here goes up to 1.38 · 108 ; for r > 3.18 · 108 ,
use C.11.)
Clearly
X µ2 (q) X µ2 (q)
= . (C.12)
φ(q) φ(q)
q≤2r q≤r
q even q odd
We wish to obtain bounds for the sums
X µ2 (q) X µ2 (q) X µ2 (q)
, , ,
φ(q)2 φ(q)2 φ(q)2
q≥r q≥r q≥r
q odd q even

where N ∈ Z+ and r ≥ 1. To do this, it will be helpful to express some of the


quantities within these sums as convolutions.2 For q squarefree and j ≥ 1,
µ2 (q)q j−1 X fj (b)
j
= , (C.13)
φ(q) a
ab=q
2 The author would like to thank O. Ramaré for teaching him this technique.
C.2. SUMS INVOLVING φ 301

where fj is the multiplicative function defined by

pj − (p − 1)j
fj (p) = , fj (pk ) = 0 for k ≥ 2.
(p − 1)j p

We will also find the following estimate useful.

Lemma C.2.1. Let j ≥ 2 be an integer and A a positive real. Let m ≥ 1 be an integer.


Then −1
X µ2 (a) ζ(j)/ζ(2j) Y

1
≤ · 1+ j . (C.14)
aj Aj−1 p
a≥A p|m
(a,m)=1

It is useful to note that ζ(2)/ζ(4) = 15/π 2 = 1.519817 . . . and ζ(3)/ζ(6) =


1.181564 . . . .

Proof. The right side of (C.14) decreases as A increases, while the left side depends
only on dAe. Hence, it is enough to prove (C.14) when A is an integer.
For A = 1, (C.14) is an equality. Let
 −1
ζ(j) Y 1
C= · 1+ j .
ζ(2j) p
p|m

Let A ≥ 2. Since
X µ2 (a) X µ2 (a)
=C−
aj aj
a≥A a<A
(a,m)=1 (a,m)=1

and

µ2 (a) µ2 (a)
Z
X X 1 1
C= < + j + dt
a
aj aj A A tj
a<A
(a,m)=1 (a,m)=1
X µ2 (a) 1 1
= + j + ,
aj A (j − 1)Aj−1
a<A
(a,m)=1

we obtain
X µ2 (a) 1 Aj−1 − 1 X µ2 (a)
j
= j−1 · C + ·C −
a A Aj−1 aj
a≥A a<A
(a,m)=1 (a,m)=1

Aj−1 − 1 µ2 (a)
 
C 1 1 1 X
< j−1 + · + −
A Aj−1 A j (j − 1)Aj−1 Aj−1 aj
a<A
(a,m)=1
   
C 1 1 1 1
≤ + j−1 1− + −1 .
Aj−1 A Aj−1 A j−1
302 APPENDIX C. SUMS INVOLVING Λ AND φ

Since (1 − 1/A)(1/A + 1) < 1 and 1/A + 1/(j − 1) ≤ 1 for j ≥ 3, we obtain that


  
1 1 1
1 − j−1 + <1
A A j−1
for all integers j ≥ 2, and so the statement follows.
We now obtain easily the estimates we want: by (C.13) and Lemma C.2.1 (with
j = 2 and m = 1),
X µ2 (q) X X f2 (b) µ2 (q) X f2 (b) X µ2 (a)
= ≤
φ(q)2 a q b a2
q≥r q≥r ab=q b≥1 a≥r/b
15 Y   (C.15)
ζ(2)/ζ(4) X 2 2p − 1 6.7345
≤ f2 (b) = π 1+ ≤ .
r r p (p − 1)2 p r
b≥1

Similarly, by (C.13) and Lemma C.2.1 (with j = 2 and m = 2),


X µ2 (q) X f2 (b) X µ2 (a) ζ(2)/ζ(4) 1 X
= ≤ f2 (b)
φ(q)2 b a2 1 + 1/22 r
q≥r b≥1 a≥r/b b odd
q odd b odd a odd (C.16)
 
12 1 Y 2p − 1 2.15502
= 2 1+ 2

π r p>2 (p − 1) p r

X µ2 (q) X µ2 (q) 4.31004


= ≤ . (C.17)
φ(q)2 φ(q)2 r
q≥r q≥r/2
q even q odd

Lastly,
X µ2 (q)q X X 1 X 1 X X 1 r 
= µ2 (q) = µ2 (q) ≤ +1
φ(q) φ(d) φ(d) 2φ(d) d
q≤r q≤r d|q d≤r q≤r d≤r
q odd q odd d odd d|q d odd
q odd
r X 1 1X 1 log r
≤ + ≤ 0.64787r + + 0.425,
2 φ(d)d 2 φ(d) 4
d odd d≤r
d odd
(C.18)
where we are using (C.8) and (C.11).

***

Since we are on the subject of φ(q), let us also prove a simple lemma that we use
at various points in the text to bound q/φ(q).
Lemma C.2.2. For any q ≥ 1 and any r ≥ max(3, q),
q
< z(r),
φ(q)
C.2. SUMS INVOLVING φ 303

where
2.50637
z(r) = eγ log log r + . (C.19)
log log r
Proof. Since z(r) is increasing for r ≥ 27, the statement follows immediately for
q ≥ 27 by [RS62, Thm. 15]:
q
< z(q) ≤ z(r).
φ(q)

√ q/φ(q) ≤ 2 · 3/(1 · 2) = 3. By the arithmetic/geometric


For q < 27, it is clear that
mean inequality, z(t) ≥ 2 eγ 2.50637 > 3 for all t > e, and so the lemma holds for
q < 27.
304 APPENDIX C. SUMS INVOLVING Λ AND φ
Appendix D

Checking small n by checking


zeros of ζ(s)

In order to show that every odd number n ≤ N is the sum of three primes, it is enough
to show for some M ≤ N that
1. every even integer 4 ≤ m ≤ M can be written as the sum of two primes,
2. the difference between any two consecutive primes ≤ N is at most M − 4.
(If we want to show that every odd number n ≤ N is the sum of three odd primes,
we just replace M − 4 by M − 6 in (2).) The best known result of type (1) is that
of Oliveira e Silva, Herzog and Pardi ([OeSHP14], M = 4 · 1018 ). As for (2), it was
proven in [HP13] for M = 4 · 1018 and N = 8.875694 · 1030 by a direct computation
(valid even if we replace M − 4 by M − 6 in the statement of (2)).
Alternatively, one can establish results of type (2) by means of numerical verifica-
tions of the Riemann hypothesis up to a certain height. This is a classical approach,
followed in [RS75] and [Sch76], and later in [RS03]; we will use the version of (1)
kindly provided by Ramaré in [Ramd]. We carry out this approach in full here, not
because it is preferrable to [HP13] – it is still based on computations, and it is slightly
more indirect than [HP13] – but simply to show that one can establish what we need
by a different route.
A numerical verification of the Riemann hypothesis up to a certain height consists
simply in checking that all (non-trivial) zeroes z of the Riemann zeta function up to a
height H (meaning: =(z) ≤ H) lie on the critical line <(z) = 1/2.
The height up to which the Riemann hypothesis has actually been fully verified is
not a matter on which there is unanimity. The strongest claim in the literature is in
[GD04], which states that the first 1013 zeroes of the Riemann zeta function lie on the
critical line <(z) = 1/2. This corresponds to checking the Riemann hypothesis up to
height H = 2.44599 · 1012 . It is unclear whether this computation was or could be
easily made rigorous; as pointed out in [SD10, p. 2398], it has not been replicated yet.
Before [GD04], the strongest results were those of the ZetaGrid distributed com-
puting project led by S. Wedeniwski [Wed03]; the method followed in it was more

305
306 APPENDIX D. CHECKING SMALL N BY CHECKING ZEROS OF ζ(S)

traditional, and should allow rigorous verification involving interval arithmetic. Unfor-
tunately, the results were never formally published. The statement that the ZetaGrid
project verified the first 9 · 1011 zeroes (corresponding to H = 2.419 · 1011 ) is often
quoted (e.g., [Bom10, p. 29]); this is the point to which the project had got by the
time of Gourdon and Demichel’s announcement. Wedeniwski asserts in private com-
munication that the project verified the first 1012 zeroes, and that the computation was
double-checked (by the same method).
The strongest claim prior to ZetaGrid was that of van de Lune (H = 3.293 · 109 ,
first 1010 zeroes; unpublished). Recently, Platt [Plaa] checked the first 1.1 · 1011 ze-
roes (H = 3.061 · 1010 ) rigorously, following a method essentially based on that
in [Boo06a]. Note that [Plaa] uses interval arithmetic, which is highly desirable for
floating-point computations.
Proposition D.0.3. Every odd integer 5 ≤ n ≤ n0 is the sum of three primes, where

29
5.90698 · 10
 if [GD04] is used (H = 2.44 · 1012 ),
28
n0 = 6.15697 · 10 if ZetaGrid results are used (H = 2.419 · 1011 ),

1.23163 · 1027 if [Plaa] is used ( H = 3.061 · 1010 ).

Proof. For n ≤ 4 · 1018 + 3, this is immediate from [OeSHP14]. Let 4 · 1018 + 3 <
n ≤ n0 . We need to show that there is a prime p in [n − 4 − (n − 4)/∆, n − 4],
where ∆ is large enough for (n − 4)/∆ ≤ 4 · 1018 − 4 to hold. We will then have that
4 ≤ n − p ≤ 4 + (n − 4)/∆ ≤ 4 · 1018 . Since n − p is even, [OeSHP14] will then
imply that n − p is the sum of two primes p0 , p00 , and so

n = p + p0 + p00 .

Since n − 4 > 1011 , the interval [n − 4 − (n − 4)/∆, n − 4] with ∆ = 28314000


must contain a prime [RS03]. This gives the solution for (n − 4) ≤ 1.1325 · 1026 , since
then (n − 4) ≤ 4 · 1018 − 4. Note 1.1325 · 1026 > e59 .
From here onwards, we use the tables in [Ramd] to find acceptable values of ∆.
Since n − 4 ≥ e59 , we can choose

52211882224 if [GD04] is used (case (a)),

∆ = 13861486834 if ZetaGrid is used (case (b)),

307779681 if [Plaa] is used (case (c)).

This gives us (n − 4)/∆ ≤ 4 · 1018 − 4 for n − 4 < er0 , where r0 = 67 in case (a),
r0 = 66 in case (b) and r0 = 62 in case (c).
If n − 4 ≥ er0 , we can choose (again by [Ramd])

146869130682 in case (a),

∆ = 15392435100 in case (b),

307908668 in case (c).

This is enough for n − 4 < e68 in case (a), and without further conditions for (b) or (c).
307

Finally, if n − 4 ≥ e68 and we are in case (a), [Ramd] assures us that the choice

∆ = 147674531294

is valid; we verify as well that (n0 − 4)/∆ ≤ 4 · 1018 − 4.

In other words, the rigorous results in [Plaa] are enough to show the result for all
odd n ≤ 1027 . Of course, [HP13] is also more than enough, and gives stronger results
than Prop. D.0.3.
308 APPENDIX D. CHECKING SMALL N BY CHECKING ZEROS OF ζ(S)
Bibliography

[AS64] M. Abramowitz and I. A. Stegun. Handbook of mathematical func-


tions with formulas, graphs, and mathematical tables, volume 55 of
National Bureau of Standards Applied Mathematics Series. For sale by
the Superintendent of Documents, U.S. Government Printing Office,
Washington, D.C., 1964.

[BBO10] J. Bertrand, P. Bertrand, and J.-P. Ovarlez. Mellin transform. In A. D.


Poularikas, editor, Transforms and applications handbook. CRC Press,
Boca Raton, FL, 2010.

[Bom74] E. Bombieri. Le grand crible dans la théorie analytique des nombres.


Société Mathématique de France, Paris, 1974. Avec une sommaire en
anglais, Astérisque, No. 18.

[Bom10] E. Bombieri. The classical theory of zeta and L-functions. Milan J.


Math., 78(1):11–59, 2010.

[Bom76] E. Bombieri. On twin almost primes. Acta Arith., 28(2):177–193,


1975/76.

[Boo06a] A. R. Booker. Artin’s conjecture, Turing’s method, and the Riemann


hypothesis. Experiment. Math., 15(4):385–407, 2006.

[Boo06b] A. R. Booker. Turing and the Riemann hypothesis. Notices Amer.


Math. Soc., 53(10):1208–1211, 2006.

[Bor56] K. G. Borodzkin. On the problem of I. M. Vinogradov’s constant (in


Russian). In Proc. Third All-Union Math. Conf., volume 1, page 3.
Izdat. Akad. Nauk SSSR, Moscow, 1956.

[Bou99] J. Bourgain. On triples in arithmetic progression. Geom. Funct. Anal.,


9(5):968–984, 1999.

[BR02] G. Bastien and M. Rogalski. Convexité, complète monotonie et


inégalités sur les fonctions zêta et gamma, sur les fonctions des
opérateurs de Baskakov et sur des fonctions arithmétiques. Canad.
J. Math., 54(5):916–944, 2002.

309
310 BIBLIOGRAPHY

[But11] Y. Buttkewitz. Exponential sums over primes and the prime twin prob-
lem. Acta Math. Hungar., 131(1-2):46–58, 2011.
[Che73] J. R. Chen. On the representation of a larger even integer as the sum of
a prime and the product of at most two primes. Sci. Sinica, 16:157–176,
1973.
[Che85] J. R. Chen. On the estimation of some trigonometrical sums and their
application. Sci. Sinica Ser. A, 28(5):449–458, 1985.
[Chu37] N.G. Chudakov. On the Goldbach problem. C. R. (Dokl.) Acad. Sci.
URSS, n. Ser., 17:335–338, 1937.
[Chu38] N.G. Chudakov. On the density of the set of even numbers which are
not representable as the sum of two odd primes. Izv. Akad. Nauk SSSR
Ser. Mat. 2, pages 25–40, 1938.
[Chu47] N. G. Chudakov. Introduction to the theory of Dirichlet L-functions.
OGIZ, Moscow-Leningrad, 1947. In Russian.
[CW89] J. R. Chen and T. Z. Wang. On the Goldbach problem. Acta Math.
Sinica, 32(5):702–718, 1989.
[CW96] J. R. Chen and T. Z. Wang. The Goldbach problem for odd numbers.
Acta Math. Sinica (Chin. Ser.), 39(2):169–174, 1996.
[Dab96] H. Daboussi. Effective estimates of exponential sums over primes.
In Analytic number theory, Vol. 1 (Allerton Park, IL, 1995), volume
138 of Progr. Math., pages 231–244. Birkhäuser Boston, Boston, MA,
1996.
[Dav67] H. Davenport. Multiplicative number theory. Markham Publishing
Co., Chicago, Ill., 1967. Lectures given at the University of Michigan,
Winter Term.
[dB81] N. G. de Bruijn. Asymptotic methods in analysis. Dover Publications
Inc., New York, third edition, 1981.
[Des08] R. Descartes. Œuvres de Descartes publiées par Charles Adam et
Paul Tannery sous les auspices du Ministère de l’Instruction publique.
Physico-mathematica. Compendium musicae. Regulae ad directionem
ingenii. Recherche de la vérité. Supplément à la correspondance. X.
Paris: Léopold Cerf. IV u. 691 S. 4◦ , 1908.
[Des77] J.-M. Deshouillers. Sur la constante de Šnirel0 man. In Séminaire
Delange-Pisot-Poitou, 17e année: (1975/76), Théorie des nombres:
Fac. 2, Exp. No. G16, page 6. Secrétariat Math., Paris, 1977.
[DEtRZ97] J.-M. Deshouillers, G. Effinger, H. te Riele, and D. Zinoviev. A com-
plete Vinogradov 3-primes theorem under the Riemann hypothesis.
Electron. Res. Announc. Amer. Math. Soc., 3:99–104, 1997.
BIBLIOGRAPHY 311

[Dic66] L. E. Dickson. History of the theory of numbers. Vol. I: Divisibility


and primality. Chelsea Publishing Co., New York, 1966.
[DLDDD+ 10] C. Daramy-Loirat, F. De Dinechin, D. Defour, M. Gallet, N. Gast, and
Ch. Lauter. Crlibm, March 2010. version 1.0beta4.
[DR01] H. Daboussi and J. Rivat. Explicit upper bounds for exponential sums
over primes. Math. Comp., 70(233):431–447 (electronic), 2001.
[Dre93] F. Dress. Fonction sommatoire de la fonction de Möbius. I. Majorations
expérimentales. Experiment. Math., 2(2):89–98, 1993.
[DS70] H. G. Diamond and J. Steinig. An elementary proof of the prime num-
ber theorem with a remainder term. Invent. Math., 11:199–258, 1970.
[Eff99] G. Effinger. Some numerical implications of the Hardy and Littlewood
analysis of the 3-primes problem. Ramanujan J., 3(3):239–280, 1999.
[EM95] M. El Marraki. Fonction sommatoire de la fonction de Möbius. III. Ma-
jorations asymptotiques effectives fortes. J. Théor. Nombres Bordeaux,
7(2):407–433, 1995.
[EM96] M. El Marraki. Majorations de la fonction sommatoire de la fonction
µ(n)
n . Univ. Bordeaux 1, preprint (96-8), 1996.

[Est37] T. Estermann. On Goldbach’s Problem : Proof that Almost all Even


Positive Integers are Sums of Two Primes. Proc. London Math. Soc.,
S2-44(4):307–314, 1937.
[FI98] J. Friedlander and H. Iwaniec. Asymptotic sieve for primes. Ann. of
Math. (2), 148(3):1041–1065, 1998.
[FI10] J. Friedlander and H. Iwaniec. Opera de cribro, volume 57 of American
Mathematical Society Colloquium Publications. American Mathemat-
ical Society, Providence, RI, 2010.
[For02] K. Ford. Vinogradov’s integral and bounds for the Riemann zeta func-
tion. Proc. London Math. Soc. (3), 85(3):565–633, 2002.
[GD04] X. Gourdon and P. Demichel. The first 1013 zeros of the Rie-
mann zeta function, and zeros computation at very large height.
http://numbers.computation.free.fr/Constants/
Miscellaneous/zetazeros1e13-1e24.pdf, 2004.
[GR94] I. S. Gradshteyn and I. M. Ryzhik. Table of integrals, series, and prod-
ucts. Academic Press, Inc., Boston, MA, fifth edition, 1994. Transla-
tion edited and with a preface by Alan Jeffrey.
[GR96] A. Granville and O. Ramaré. Explicit bounds on exponential sums
and the scarcity of squarefree binomial coefficients. Mathematika,
43(1):73–107, 1996.
312 BIBLIOGRAPHY

[Har66] G. H. Hardy. Collected papers of G. H. Hardy (Including Joint pa-


pers with J. E. Littlewood and others). Vol. I. Edited by a committee
appointed by the London Mathematical Society. Clarendon Press, Ox-
ford, 1966.

[HB79] D. R. Heath-Brown. The fourth power moment of the Riemann zeta


function. Proc. London Math. Soc. (3), 38(3):385–422, 1979.

[HB85] D. R. Heath-Brown. The ternary Goldbach problem. Rev. Mat.


Iberoamericana, 1(1):45–59, 1985.

[HB11] H. Hong and Ch. W. Brown. QEPCAD B – Quantifier elimination by


partial cylindrical algebraic decomposition, May 2011. version 1.62.

[Hela] H. A. Helfgott. Major arcs for Goldbach’s problem. Preprint. Available


at arXiv:1203.5712.

[Helb] H. A. Helfgott. Minor arcs for Goldbach’s problem. Preprint. Available


as arXiv:1205.5252.

[Helc] H. A. Helfgott. The Ternary Goldbach Conjecture is true. Preprint.


Available as arXiv:1312.7748.

[Hel13a] H. Helfgott. La conjetura débil de Goldbach. Gac. R. Soc. Mat. Esp.,


16(4), 2013.

[Hel13b] H. A. Helfgott. The ternary Goldbach conjecture, 2013. Avail-


able at http://valuevar.wordpress.com/2013/07/02/
the-ternary-goldbach-conjecture/.

[Hel14a] H. A. Helfgott. La conjecture de Goldbach ternaire. Gaz. Math.,


(140):5–18, 2014. Translated by Margaret Bilu, revised by the author.

[Hel14b] H. A. Helfgott. The ternary Goldbach problem. To appear in Proceed-


ings of the International Congress of Mathematicians (Seoul, Korea,
2014), 2014.

[HL22] G. H. Hardy and J. E. Littlewood. Some problems of ‘Partitio numero-


rum’; III: On the expression of a number as a sum of primes. Acta
Math., 44(1):1–70, 1922.

[HP13] H. A. Helfgott and David J. Platt. Numerical verification of the ternary


Goldbach conjecture up to 8.875 · 1030 . Exp. Math., 22(4):406–409,
2013.

[HR00] G. H. Hardy and S. Ramanujan. Asymptotic formulæ in combinatory


analysis [Proc. London Math. Soc. (2) 17 (1918), 75–115]. In Collected
papers of Srinivasa Ramanujan, pages 276–309. AMS Chelsea Publ.,
Providence, RI, 2000.
BIBLIOGRAPHY 313

[Hux72] M. N. Huxley. Irregularity in sifted sequences. J. Number Theory,


4:437–454, 1972.

[IK04] H. Iwaniec and E. Kowalski. Analytic number theory, volume 53 of


American Mathematical Society Colloquium Publications. American
Mathematical Society, Providence, RI, 2004.

[Kad] H. Kadiri. An explicit zero-free region for the Dirichlet L-functions.


Preprint. Available as arXiv:0510570.

[Kad05] H. Kadiri. Une région explicite sans zéros pour la fonction ζ de Rie-
mann. Acta Arith., 117(4):303–339, 2005.

[Kar93] A. A. Karatsuba. Basic analytic number theory. Springer-Verlag,


Berlin, 1993. Translated from the second (1983) Russian edition and
with a preface by Melvyn B. Nathanson.

[Knü99] O. Knüppel. PROFIL/BIAS, February 1999. version 2.

[Kor58] N. M. Korobov. Estimates of trigonometric sums and their applications.


Uspehi Mat. Nauk, 13(4 (82)):185–192, 1958.

[Lam08] B. Lambov. Interval arithmetic using SSE-2. In Reliable Implemen-


tation of Real Number Algorithms: Theory and Practice. Interna-
tional Seminar Dagstuhl Castle, Germany, January 8-13, 2006, volume
5045 of Lecture Notes in Computer Science, pages 102–113. Springer,
Berlin, 2008.

[Leh66] R. Sherman Lehman. On the difference π(x) − li(x). Acta Arith.,


11:397–410, 1966.

[LW02] M.-Ch. Liu and T. Wang. On the Vinogradov bound in the three primes
Goldbach conjecture. Acta Arith., 105(2):133–175, 2002.

[Mar41] K. K. Mardzhanishvili. On the proof of the Goldbach-Vinogradov the-


orem (in Russian). C. R. (Doklady) Acad. Sci. URSS (N.S.), 30(8):681–
684, 1941.

[McC84a] K. S. McCurley. Explicit estimates for the error term in the prime num-
ber theorem for arithmetic progressions. Math. Comp., 42(165):265–
285, 1984.

[McC84b] K. S. McCurley. Explicit zero-free regions for Dirichlet L-functions.


J. Number Theory, 19(1):7–32, 1984.

[Mon68] H. L. Montgomery. A note on the large sieve. J. London Math. Soc.,


43:93–98, 1968.

[Mon71] H. L. Montgomery. Topics in multiplicative number theory. Lecture


Notes in Mathematics, Vol. 227. Springer-Verlag, Berlin, 1971.
314 BIBLIOGRAPHY

[MV73] H. L. Montgomery and R. C. Vaughan. The large sieve. Mathematika,


20:119–134, 1973.

[MV74] H. L. Montgomery and R. C. Vaughan. Hilbert’s inequality. J. London


Math. Soc. (2), 8:73–82, 1974.

[MV07] H. L. Montgomery and R. C. Vaughan. Multiplicative number the-


ory. I. Classical theory, volume 97 of Cambridge Studies in Advanced
Mathematics. Cambridge University Press, Cambridge, 2007.

[Ned06] N. S. Nedialkov. VNODE-LP: a validated solver for initial value prob-


lems in ordinary differential equations, July 2006. version 0.3.

[OeSHP14] T. Oliveira e Silva, S. Herzog, and S. Pardi. Empirical verification of


the even Goldbach conjecture, and computation of prime gaps, up to
4 · 1018 . Math. Comp., 83:2033–2060, 2014.

[OLBC10] F. W. J. Olver, D. W. Lozier, R. F. Boisvert, and Ch. W. Clark, edi-


tors. NIST handbook of mathematical functions. U.S. Department of
Commerce National Institute of Standards and Technology, Washing-
ton, DC, 2010. With 1 CD-ROM (Windows, Macintosh and UNIX).

[Olv58] F. W. J. Olver. Uniform asymptotic expansions of solutions of lin-


ear second-order differential equations for large values of a parameter.
Philos. Trans. Roy. Soc. London. Ser. A, 250:479–517, 1958.

[Olv59] F. W. J. Olver. Uniform asymptotic expansions for Weber parabolic


cylinder functions of large orders. J. Res. Nat. Bur. Standards Sect. B,
63B:131–169, 1959.

[Olv61] F. W. J. Olver. Two inequalities for parabolic cylinder functions. Proc.


Cambridge Philos. Soc., 57:811–822, 1961.

[Olv65] F. W. J. Olver. On the asymptotic solution of second-order differential


equations having an irregular singularity of rank one, with an applica-
tion to Whittaker functions. J. Soc. Indust. Appl. Math. Ser. B Numer.
Anal., 2:225–243, 1965.

[Olv74] F. W. J. Olver. Asymptotics and special functions. Academic Press


[A subsidiary of Harcourt Brace Jovanovich, Publishers], New York-
London, 1974. Computer Science and Applied Mathematics.

[Plaa] D. Platt. Computing π(x) analytically. To appear in Math. Comp..


Available as arXiv:1203.5712.

[Plab] D. Platt. Numerical computations concerning GRH. Preprint. Available


at arXiv:1305.3087.

[Pla11] D. Platt. Computing degree 1 L-functions rigorously. PhD thesis, Bris-


tol University, 2011.
BIBLIOGRAPHY 315

[Rama] O. Ramaré. État des lieux. Preprint. Available as http://math.


univ-lille1.fr/˜ramare/Maths/ExplicitJNTB.pdf.

[Ramb] O. Ramaré. Explicit estimates on several summatory functions involv-


ing the Moebius function. To appear in Math. Comp.

[Ramc] O. Ramaré. A sharp bilinear form decomposition for primes and Moe-
bius function. Preprint. To appear in Acta. Math. Sinica.

[Ramd] O. Ramaré. Short effective intervals containing primes. Preprint.

[Ram95] O. Ramaré. On Šnirel0 man’s constant. Ann. Scuola Norm. Sup. Pisa
Cl. Sci. (4), 22(4):645–706, 1995.

[Ram09] O. Ramaré. Arithmetical aspects of the large sieve inequality, volume 1


of Harish-Chandra Research Institute Lecture Notes. Hindustan Book
Agency, New Delhi, 2009. With the collaboration of D. S. Ramana.

[Ram10] O. Ramaré. On Bombieri’s asymptotic sieve. J. Number Theory,


130(5):1155–1189, 2010.

[Ram13] O. Ramaré. From explicit estimates for primes to explicit estimates for
the Möbius function. Acta Arith., 157(4):365–379, 2013.

[Ram14] O. Ramaré. Explicit estimates on the summatory functions of the


Möbius function with coprimality restrictions. Acta Arith., 165(1):1–
10, 2014.

[Ros41] B. Rosser. Explicit bounds for some functions of prime numbers. Amer.
J. Math., 63:211–232, 1941.

[RR96] O. Ramaré and R. Rumely. Primes in arithmetic progressions. Math.


Comp., 65(213):397–425, 1996.

[RS62] J. B. Rosser and L. Schoenfeld. Approximate formulas for some func-


tions of prime numbers. Illinois J. Math., 6:64–94, 1962.

[RS75] J. B. Rosser and L. Schoenfeld. Sharper bounds for the Chebyshev


functions θ(x) and ψ(x). Math. Comp., 29:243–269, 1975. Collection
of articles dedicated to Derrick Henry Lehmer on the occasion of his
seventieth birthday.

[RS03] O. Ramaré and Y. Saouter. Short effective intervals containing primes.


J. Number Theory, 98(1):10–33, 2003.

[RV83] H. Riesel and R. C. Vaughan. On sums of primes. Ark. Mat., 21(1):46–


74, 1983.

[Sao98] Y. Saouter. Checking the odd Goldbach conjecture up to 1020 . Math.


Comp., 67(222):863–866, 1998.
316 BIBLIOGRAPHY

[Sch33] L. Schnirelmann. Über additive Eigenschaften von Zahlen. Math. Ann.,


107(1):649–690, 1933.
[Sch76] L. Schoenfeld. Sharper bounds for the Chebyshev functions θ(x) and
ψ(x). II. Math. Comp., 30(134):337–360, 1976.
[SD10] Y. Saouter and P. Demichel. A sharp region where π(x) − li(x) is
positive. Math. Comp., 79(272):2395–2405, 2010.
[Sel91] A. Selberg. Lectures on sieves. In Collected papers, vol. II, pages
66–247. Springer Berlin, 1991.
[Sha14] X. Shao. A density version of the Vinogradov three primes theorem.
Duke Math. J., 163(3):489–512, 2014.
[Shu92] F. H. Shu. The Cosmos. In Encyclopaedia Britannica, Macropaedia,
volume 16, pages 762–795. Encyclopaedia Britannica, Inc., 15 edition,
1992.
[Tao14] T. Tao. Every odd number greater than 1 is the sum of at most five
primes. Math. Comp., 83(286):997–1038, 2014.
[Tem10] N. M. Temme. Parabolic cylinder functions. In NIST Handbook of
mathematical functions, pages 303–319. U.S. Dept. Commerce, Wash-
ington, DC, 2010.
[Tru] T. S. Trudgian. An improved upper bound for the error in the
zero-counting formulae for Dirichlet L-functions and Dedekind zeta-
functions. Preprint.
[Tuc11] W. Tucker. Validated numerics: A short introduction to rigorous com-
putations. Princeton University Press, Princeton, NJ, 2011.
[Tur53] A. M. Turing. Some calculations of the Riemann zeta-function. Proc.
London Math. Soc. (3), 3:99–117, 1953.
[TV03] N. M. Temme and R. Vidunas. Parabolic cylinder functions: exam-
ples of error bounds for asymptotic expansions. Anal. Appl. (Singap.),
1(3):265–288, 2003.
[van37] J. G. van der Corput. Sur l’hypothèse de Goldbach pour presque tous
les nombres pairs. Acta Arith., 2:266–290, 1937.
[Vau77a] R. C. Vaughan. On the estimation of Schnirelman’s constant. J. Reine
Angew. Math., 290:93–108, 1977.
[Vau77b] R.-C. Vaughan. Sommes trigonométriques sur les nombres premiers.
C. R. Acad. Sci. Paris Sér. A-B, 285(16):A981–A983, 1977.
[Vau80] R. C. Vaughan. Recent work in additive prime number theory. In Pro-
ceedings of the International Congress of Mathematicians (Helsinki,
1978), pages 389–394. Acad. Sci. Fennica, Helsinki, 1980.
BIBLIOGRAPHY 317

[Vau97] R. C. Vaughan. The Hardy-Littlewood method, volume 125 of Cam-


bridge Tracts in Mathematics. Cambridge University Press, Cam-
bridge, second edition, 1997.
[Vin37] I. M. Vinogradov. A new method in analytic number theory (Russian).
Tr. Mat. Inst. Steklova, 10:5–122, 1937.

[Vin47] I.M. Vinogradov. The method of trigonometrical sums in the theory of


numbers (Russian). Tr. Mat. Inst. Steklova, 23:3–109, 1947.
[Vin54] I. M. Vinogradov. The method of trigonometrical sums in the theory
of numbers. Interscience Publishers, London and New York, 1954.
Translated, revised and annotated by K. F. Roth and Anne Davenport.
[Vin58] I. M. Vinogradov. A new estimate of the function ζ(1 + it). Izv. Akad.
Nauk SSSR. Ser. Mat., 22:161–164, 1958.
[Vin04] I. M. Vinogradov. The method of trigonometrical sums in the theory of
numbers. Dover Publications Inc., Mineola, NY, 2004. Translated from
the Russian, revised and annotated by K. F. Roth and Anne Davenport,
Reprint of the 1954 translation.
[Wed03] S. Wedeniwski. ZetaGrid - Computational verification of the Riemann
hypothesis. Conference in Number Theory in honour of Professor H.
C. Williams, Banff, Alberta, Canada, May 2003.

[Wei84] A. Weil. Number theory: An approach through history. From Hammu-


rapi to Legendre. Birkhäuser Boston, Inc., Boston, MA, 1984.
[Whi03] E. T. Whittaker. On the functions associated with the parabolic cylinder
in harmonic analysis. Proc. London Math. Soc., 35:417–427, 1903.

[Wig20] S. Wigert. Sur la théorie de la fonction ζ(s) de Riemann. Ark. Mat.,


14:1–17, 1920.
[Won01] R. Wong. Asymptotic approximations of integrals, volume 34 of Clas-
sics in Applied Mathematics. Society for Industrial and Applied Math-
ematics (SIAM), Philadelphia, PA, 2001. Corrected reprint of the 1989
original.
[Zin97] D. Zinoviev. On Vinogradov’s constant in Goldbach’s ternary problem.
J. Number Theory, 65(2):334–358, 1997.

You might also like