0% found this document useful (0 votes)
45 views

Prime Numbers Gaurish

This document is a summer internship project report on prime numbers by Gaurish Korpal, a 3rd year master's student at the National Institute of Science Education and Research in India. The report provides an overview of key concepts in analytic number theory related to the distribution of prime numbers, including the prime number theorem, Riemann zeta function, Dirichlet L-functions, and proofs that the number of primes below x is asymptotically x/log(x). It also examines primes in arithmetic progressions using Dirichlet characters and L-functions. The report contains acknowledgments, an abstract, contents, introduction, and two main chapters on the topics.

Uploaded by

rickyjames
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views

Prime Numbers Gaurish

This document is a summer internship project report on prime numbers by Gaurish Korpal, a 3rd year master's student at the National Institute of Science Education and Research in India. The report provides an overview of key concepts in analytic number theory related to the distribution of prime numbers, including the prime number theorem, Riemann zeta function, Dirichlet L-functions, and proofs that the number of primes below x is asymptotically x/log(x). It also examines primes in arithmetic progressions using Dirichlet characters and L-functions. The report contains acknowledgments, an abstract, contents, introduction, and two main chapters on the topics.

Uploaded by

rickyjames
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 57

Prime Numbers

Gaurish Korpal1
[email protected]

Summer Internship Project Report

1 rd
3 year Int. MSc. Student, National Institute of Science Education and Research, Jatni
(Bhubaneswar, Odisha)
Certificate

Certified that the summer internship project report “Prime Numbers” is the bona fide work of
“Gaurish Korpal”, 3rd Year Int. MSc. student at National Institute of Science Education and
Research, Jatni (Bhubaneswar, Odisha), carried out under my supervision during June 05, 2017
to July 15, 2017.

Place: Chennai
Date: July 15, 2017

Prof. Kotyada Srinivas


Supervisor
Professor H,
The Institute of Mathematical Sciences,
Tharamani, Chennai 600113
Abstract

The fact of being prime (or composite) is just a property of number itself, regardless of the
way we write it. In general, the divisibility properties are independent of base system (decimal
or binary) or writing system we choose (roman or hindu-arabic). In this report I will scratch
the surface of the vast topic known as distribution of primes.
Acknowledgements

Foremost, I would like to express my sincere gratitude to my advisor Prof. Kotyada Srinivas for
his motivation and immense knowledge. I am also thankful to my fellow interns Sayan Kundu 1
and Aditya Kumar Shukla 2 for the enlightening discussions.
Last but not the least, I would like to thank
– Donald Knuth for TEX
– Michael Spivak for AMS-TEX
– Sebastian Rahtz for TEX Live
– Leslie Lamport for LATEX
– American Mathematical Society for AMS-LATEX
– Hàn Thê´ Thành for pdfTEX
– Christian Feuersänger & Till Tantau for PGF/TikZ interpreter
– Heiko Oberdiek for hyperref package
– Steven B. Segletes for stackengine package
– Alan Jeffrey & Frank Mittelbach for inputenc package
– Axel Sommerfeldt for subcaption package
– David Carlisle for graphicx package
– Javier Bezos for enumitem package
– Hideo Umeki for geometry package
– Philipp Lehman & Joseph Wright for csquotes package
– Peter R. Wilson & Will Robertson for epigraph package
– Sebastian Rahtz for textcomp package
– Walter Schmidt for gensymb package
– Philipp Kühl & Daniel Kirsch for Detexify (a tool for searching LATEX symbols)
– TeX.StackExchange community for helping me out with LATEX related problems
GeoGebra was used to generate all the PGF/TikZ codes for this document.

1
2nd year B.Sc.(Hons) Mathematics and Computing, IMA-Bhubaneswar
2
3rd year B.Sc.(Hons) Mathematics, BHU-Varanasi
Contents

Abstract 1

Introduction 3

1 Prime Number Theorem 4


1.1 Entire Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.1.1 Finite Order without Zeros . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.1.2 Finite Order with Zeros . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.1.3 Order 1 with Zeros . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2 Gamma Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.2.1 Analytic Continuation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.2.2 Stirling’s Formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.3 Zeta Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.3.1 Euler Zeta Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.3.2 Euler Product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.3.3 Analytic Properties of Euler Zeta Function . . . . . . . . . . . . . . . . . 11
1.3.4 Riemann Zeta Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.3.5 Zeros of Riemann Zeta Function . . . . . . . . . . . . . . . . . . . . . . . 14
1.4 von Mangoldt Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.5 Prime Counting Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.6 Chebyshev Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.7 The Proof . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.7.1 Zero-Free Region for ζ(s) . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
1.7.2 Counting the Zeros of ζ(s) . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
1.7.3 The Explicit Formula for ψ(x) . . . . . . . . . . . . . . . . . . . . . . . . 28
1.7.4 Completing the Proof . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
1.8 Some Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
1.8.1 Lindelöf Hypothesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
1.8.2 Elementary Proof . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
1.8.3 Heuristic Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

2 Primes in Arithmetic Progression 38


2.1 Dirichlet Density . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.2 Dirichlet Characters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.2.1 Number of Dirichlet Characters . . . . . . . . . . . . . . . . . . . . . . . . 39
2.2.2 Orthogonality Relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
2.3 L-function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
2.3.1 Dirichlet L-function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
2.3.2 Product Formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
2.3.3 Logarithm Formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
2.3.4 Analytic Continuation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

1
2.3.5 Product of L-functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
2.4 The Proof . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
2.4.1 Trivial Dirichlet Character . . . . . . . . . . . . . . . . . . . . . . . . . . 43
2.4.2 Non-trivial Dirichlet Character . . . . . . . . . . . . . . . . . . . . . . . . 43
2.4.3 Completing the Proof . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.5 Some Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.5.1 Chebotarëv Density Theorem . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.5.2 Prime Number Theorem for Arithmetic Progressions . . . . . . . . . . . . 46
2.5.3 Tchébychev Bias . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

Conclusion 48

A Abel’s Summation Formula 50

Bibliography 51
Introduction

“Prime numbers are what is left when you have taken all the patterns away. I think
prime numbers are like life. They are very logical but you could never work out the
rules, even if you spent all your time thinking about them.”
— Christopher Boone, narrator in Mark Haddon’s The Curious Incident of the Dog
in the Night Time

Prime numbers are the integers bigger than 1 which are only divisible by 1 and themselves.
We encounter these numbers as soon as we start analysing the integers, in form of fundamental
theorem of arithmetic, which says that prime numbers are the “building blocks” of integers
[21]. Moreover, in the past reports we saw that these numbers play an important role in solving
Diophantine equations [24, §1.5, 1.6], generalising the concept of factorisation [27] and extending
the concept of number system [28]. An exposition on further extension of these three ideas can
be seen in Richard Taylor’s lecture [21].
In this report I intend to introduce the basic tools involed in study of distribution of prime
numbers. These tools will be developed with the help of complex function theory. Effectively
we will be scratching the surface of the multiplicative aspects of analytic number theory. In
analytic number theory one looks for good approximations. For the sort of quantity that one
estimates in analytic number theory, one does not expect an exact formula to exist, except
perhaps one of a rather artificial and unilluminating kind [9].
The first good guess about the asymptotic estimate for the primes emerged at the beginning
of the nineteenth century, by Carl Friedrich Gauss at 16 years of age, that “the density of primes
around x is about 1/ log(x)”. This statement was proved by the end of nineteenth century, by
showing that Riemann zeta function ζ(s), has no zeros near the line Re(s) = 1. I will discuss
this proof in the first chapter of this report. Later in 1930s, Harald Cramér gave a probabilistic
way of interpreting Gauss’ prediction. The Gauss-Cramér model provides a nice way to think
about distribution questions concerning the prime numbers, but it does not give proofs. For a
discussion on distribution of primes based on this model, refer to Andrew Granville’s article [9].
We can obtain two types of results about the distribution of primes – results of a qualitative
and quantitative kind [2]. The second chapter is devoted to a result of qualitative nature. Long
time ago it had been asserted that every arithmetic progression a, a + m, a + 2m, . . ., in which a
and m has no common factor, includes infinitely many primes. The first proof was that of Peter
Gustav Lejeune Dirichlet published in 1837 and to do this he introduced the L-functions which
bear his name. But I will discuss that proof with a slight modification introduced by Charles
Jean de la Vallée Poussin in 1896. One must note that the use of Dirichlet L-functions extends
beyond the proof of this theorem [5, §16.6].
Unlike the lemma-theorem-proof style followed in my past reports, in this report I have tried
to follow Harold Davenport’s prose style [3]. My attempt is to let the reader appreciate how
random appearing tricks or ideas come together to build this beautiful theory of prime numbers.
It’s similar to the idea that while reading a nicely written story, the reader is able to appreciate
the ingenuity of ideas. Some steps in proofs may appear to be presented without any proper
logical motivation behind them, and that’s why it took great imagination to even conjecture
the statements whose proof we try to understand in this report.

3
Chapter 1

Prime Number Theorem

“It should be pointed out that, among the most celebrated and cultivated of the
Greeks, Eratosthenes was considered an extraordinary man who hurled the javelin,
wrote poems, defeated the great runners, and solved astronomical problems. Various
works of his have come down to posterity. He presented King Ptolemy III of Egypt
with a table on which the prime numbers were etched on a metal plate, with those
numbers with multiples marked with a small hole. And so they give the name of
Eratosthenes’ sieve to the process the wise astronomer used to draw up his table.”
— Malba Tahan, The Man who counted

It’s easy to prove that there are infintely many prime numbers [7, §1.4] and it’s apparent
from a glance at a list of primes that the sequence is rather irregular, but the list seems to thin
out at a regular rate, even if it never ends [13]. Despite not having a good exact formula for the
sequence of primes, we do have a fairly good inexact formula [20], in the form of prime number
theorem.

1.1 Entire Functions


An entire function (or integral function) is a complex-valued function that is holomorphic at
all finite points over the whole complex plane. We call an entire function f (z) to be of finite
order if there exists a number α such that
 α
f (z) = O e|z| as |z| → ∞

We must have α > 0, excluding the case when f (z) is just a constant. The lower bound of the
numbers α with this property is called the order of f (z) [3, §11].

1.1.1 Finite Order without Zeros


Consider an entire function f (z) of finite order with no zeros. Then the function g(z) = log(f (z))
can be defined as to be single valued, and is itself an entire function. It satisfies
Re(g(z)) = log |f (z)| < 2rα (1.1)
on any large circle |z| = r. Since g(z) is an analytic function, we can write

X
g(z) = (an + ibn )z n
n=0

and then for z = reiθ



X
Re(g(z)) = (an cos(nθ) − bn sin(nθ)) rn
n=0

4
Now we wish to obtain an inequality similar to Cauchy’s inequality but involving upper bound
of Re(g(z)) [1, §2.53]. Since the series for Re(g(z)) converges uniformly with respect to θ, we
may multiply by cos(nθ) or sin(nθ) and integrate term by term to obtain
Z 2π Z 2π
n
Re(g(z)) cos(nθ)dθ = πan r , Re(g(z)) sin(nθ)dθ = −πbn rn
0 0

for n > 0, while Z 2π


Re(g(z))dθ = 2πa0 (1.2)
0
Hence for n > 0 Z 2π
1
(an + ibn )rn = Re(g(z))e−inθ dθ
π 0
But we know that [1, §2.31]
Z 2π
n 1 n
|(an + ibn )r | = |an + ibn | r ≤ |Re(g(z))| dθ
π 0
p
because eik = 1 for all k ∈ R. Also, |an | ≤ |an + ibn | = a2n + b2n , therefore
Z 2π
n1
|an |r ≤ |Re(g(z))| dθ
π 0

Adding (1.2) to this, we get


Z 2π
1  
|an |rn + 2a0 ≤ |Re(g(z))| + Re(g(z)) dθ
π 0

Without loss of generality, we can take g(0) = 0 i.e. a0 = 0, and then using (1.1) we get

1 2π
Z
n
|an |r ≤ (2rα + 2rα ) dθ = 8rα
π 0
It follows, on making r → ∞, that an = 0 if n > α, and similarly for bn . This proves that g(z)
is a polynomial, where f (z) = eg(z) .

An entire function of finite order with no zeros is necessarily of the form eg(z) , where
g(z) is a polynomial, and its order is simply the order of g(z) and so is an integer.

1.1.2 Finite Order with Zeros


Consider an entire function f (z) of order ρ with zeros z1 , z2 , . . . , zn in |z| < R (multiple zeros
being repeated as appropriate) and no zeros on |z| = R. For α > ρ, we have
 
log f Reiθ < Rα

for all sufficiently large R. Then we can write


Z 2π Z 2π
1  
iθ 1
log f Re dθ < Rα dθ = Rα (1.3)

2π 0 2π 0
Let n(r) be the number of zeros of f (z) in |z| ≤ r and f (0) 6= 0, then using Jensen’s formula
for analytic functions [1, §3.61]
Z 2π  Z R
Rn

1  
iθ n(r)
log f Re dθ − log |f (0)| = log = dr

2π 0 |z1 | . . . |zn | 0 r

5
we can rewrite (1.3) as
Z R
n(r)
dr + log |f (0)| < Rα
0 r
Z R
n(r)
⇒ dr < Rα − log |f (0)| < 2Rα (1.4)
0 r
Moreover, since n(r) is non-decreasing
Z 2R Z 2R
n(r) 1
dr ≥ n(R) dr = n(R) log(2)
R r R r

Hence using (1.4), we conclude that

n(R) = O(Rα )

Now let rk = |zk | for k = 1, 2, . . ., where zk is a zero1 of f (z), then for R = rk and β > α > ρ
we get
1 A0
k < Arkα =⇒ β < β
rk nα
Therefore,
∞ ∞
X 1 X A0
< <∞
rkβ
β
k=1 k=1 nα
n−` converges for ` > 1.
P
since

If r1 , r2 , . . . are the moduli of the zeros of an entire function f (z) of order ρ, with
P −β
f (0) 6= 0, then the series rn converges if β > ρ.
P −β
The lower bound of the positive numbers β for which rn is convergent is called the exponent
of cenvergence of the zeros, and is denoted by ρ1 . What we proved above is that ρ1 ≤ ρ, But
we may have ρ1 < ρ; for example, if f (z) = ez then ρ = 1 but ρ1 = 0 since there are no zeros.
Moreover, any function with finite number of zeros has ρ1 = 0. Therefore, ρ1 > 0 implies that
there are infinitely many zeros [1, §8.22].

1.1.3 Order 1 with Zeros


Consider an entire function f (z) of order 1 with with zeros z1 , z2P
, . . . (multiple zeros being
repeated as appropriate)Pand f (0) 6= 0. We can then assert that rn−1−ε converges for any
ε > 0, and in particular −2
rn converges. Hence the product
∞  
Y z z
P (z) = 1− e zn
zn
n=1

converges absolutely for all z, and converges uniformly in any bounded domain not containing
the points zn [1, §1.44]. So, P (z) is an entire function with zeros (of the appropriate multiplic-
ities) at z1 , z2 , . . .. If we put
f (z) = P (z)F (z) (1.5)
then F (z) is an integral function without zeros.
1
An entire function which is not a polynomial may have infinitely many zeros zn .

6
P −2
rn converges, the total length of all the intervals rn − rn−2 , rn + rn−2 on the real

Since
line is finite, and consequently there exist arbitrarily large values of R with the property that
1
|R − rn | > for all n (1.6)
rn2
Put P (z) = P1 (z)P2 (z)P3 (z), where these are the sub-products extended over the following
sets of n:
R
P1 : |zn | <
2
R
P2 : ≤ |zn | ≤ 2R
2
P3 : |zn | > 2R

For the factors of P1 we have, on |z| = R


     
1 − z e zn ≥ z − 1 e− |zn | > R
z |z| −R
− rR
− 1 e n = e rn
zn zn R/2
and since
X 1 ∞
 ε X
R 1
< 1+ε
rn 2 rn
R
rn < 2 n=1

it follows that
 

!
1+ε X
 
Y z
z Y −R X 1 R 1
|P1 (z)| = 1− e zn > e rn = exp −R  > exp − ε


|zn |< R2
z n
|zn |< R2 R
rn 2 r1+ε
n=1 n
rn < 2

⇒ |P1 (z)| > exp −R1+2ε



(1.7)
For the factors of P2 we have, on |z| = R

|z| − |z |

e−2
 

1− z z z − z n −
|z| |z − z n | −2
n
−2 C
e zn > e |zn | > e > e > 2
> 3
zn zn 2R 2R 2rn R R
where C is a positive constant as per (1.6). And by the result proved in subsection 1.1.2,
n(R) = O(R1+ε ), we know that the number of factors of P2 is less than R1+ε .
 R1+ε
C
⇒ |P2 (z)| > > exp(−R1+2ε ) (1.8)
R3
For the factors of P3 we have, on |z| = R
       2
1 − z e zn ≥ 1 − z e− |zn | > 1 − 1 e− rn > e−c rn
z |z| R R

zn zn 2
for some positive constant c, and since
X 1 ∞
1 X 1
< 1+ε
rn2 (2R)1−ε r
r >2R
n n=1 n

it follows that


! !
Y −  R 2 X −R2 −R1+ε X 1
 
Y z z
|P3 (z)| =
1− e >
zn e rn = exp > exp
|zn |>2R zn |zn |>2R rn >2R
rn2 21−ε r1+ε
n=1 n

7
⇒ |P3 (z)| > exp −R1+2ε

(1.9)
From (1.7), (1.8) and (1.9) it follows that

|P (z)| > exp −R1+3ε




Hence from (1.5) and f (z) = O exp(|z|1+ε ) , it follows that




|F (z)| < exp R1+4ε




As seen in subsection 1.1.1, this implies that F (z) = eg(z) where g(z) is a polynomial of
degree at most 1. Finally we have

∞  
Y z z
f (z) = eA+Bz 1− e zn
zn
n=1

where A and B are constants. P −1−ε P −1


We know that the series rn converges for any ε > 0. Then series rn may or may
not converge, but if it does then f (z) satisfies the inequality

|f (z)| < eC|z|

for some constant C, because for all w ∈ C we have


√ 2 2
|1 − w| ≤ 1 + |w| < e|w| and |ew | = eRe(w) < e Re(w) +Im(w) = e|w|

⇒ |(1 − w)ew | ≤ e2|w|

An entire function f (z) of order 1 necessarily has the form


∞  
A+Bz
Y z z
f (z) = e 1− e zn
zn
n=1
P −1−ε
If r
P −1n = |z n |, where z n are the zeros of f (z), then rn converges for any ε > 0. If
rn converges, then f (z) satisfies |f (z)| < eC|z| .

1.2 Gamma Function


The gamma-function
Z ∞
Γ(z) = e−t tz−1 dt
0

is a uniformly convergent integral over any finite region throughout which Re(z) > 0 [1, §1.51]
and so a continuous function for Re(z) > 0 [1, §1.52]. It is an analytic function, regular for
Re(z) > 0 [1, §2.85].

1.2.1 Analytic Continuation


Consider the function Z
f (z) = e−w (−w)z−1 dw
C

8
where C consists of the real axis from ∞ to δ, the circle |w| = δ described in the positive
direction, and the real axis from δ to ∞ again. Hence f (z) is regular2 for all finite values of z
and the function
Z
1 i
g(z) = if (z)cosec(πz) = e−w (−w)z−1 dw
2 2 sin(πz) C

is regular for all values of z except at the poles z = 0, −1, −2, . . . [1, §4.41]. We can therefore
take g(z) as the analytic continuation of Γ(z) over C \ {0, −1, −2, . . .}. All the gamma-function
formulae for real z [1, §1.86] can now be extended to general complex values of z. For example,
we get the functional equations:

Integration by Parts: Γ(z + 1) = zΓ(z) ∀z ∈ C \ −N


Euler’s Reflection Formula: Γ(z)Γ(1 − z)= πcosec(πz) ∀z ∈ C \ Z

Legendre’s Duplication Formula: Γ(z)Γ z + 12 = 21−2z πΓ(2z) ∀z ∈ C \ −N/2
−γz Q∞ z −1 nz
Γ(z) = e z

Weierstrass’ Formula: n=1 1 + n e ∀z ∈ C \ −N

where N = {0, 1, 2, 3, . . .} and γ is the Euler-Mascheroni constant given by


∞    
Y 1 −1 1 1
γ = − log 1+ e n = lim 1+ + ... + − log N ≈ 0.57721
n N →∞ 2 N
n=1

1.2.2 Stirling’s Formula


When we try to observe the asymptotic behaviour of Γ(x) as x → ∞ we get the Stirlinig’s
theorem [1, §1.87]
1 √
Γ(x) = xx− 2 e−x 2π (1 + o(1))
f (x)
where f (x) = o(g(x)) means that lim = 0. By taking logarithm of the Weierstrass’
x→∞ g(x)
formula
∞ 
X z  z 
log(Γ(z)) = − log 1 + − γz − log(z)
n n
n=1

with each logarithm having its principal value. Then after appropriate manipulations we get
[1, §4.42]
   
1 1 1
log(Γ(z)) = z − log(z) − z + log(2π) + O
2 2 |z|

for −π + δ ≤ arg(z) ≤ π − δ, which is the extension of Stirling’s formula to complex values of


z. Therefore, for any constant a, we can write
   
1 1 1
log(Γ(z + a)) = z + a − log(z) − z + log(2π) + O
2 2 |z|

as |z| → ∞, uniformly for −π + δ ≤ arg(z) ≤ π − δ. And for any fixed value of x, as y → ±∞


π|y| 1√
|Γ(x + iy)| ∼ e− 2 |y|x− 2 2π
2
The many-valued function (−w)z−1 = e(z−1) log(−w) is made definite by taking log(−w) to be real at w = −δ.

9
1.3 Zeta Function
1.3.1 Euler Zeta Function
For a given real number k > 1, consider the series

X 1 1 1
k
= 1 + k + k + ...
n 2 3
n=1

Note that n−k decreases steadily as n increases but is always positive and
∞ Z ξ
X 1 1
< 1 + lim dx
nk ξ→∞ 1 xk
n=1
P∞ −k
So the series n=1 n converges because
Z ξ
1 1
limdx =
xk
ξ→∞ 1 k−1

is finite. Hence we conclude that the series ∞ −k is uniformly convergent for a ≤ k ≤ b, if


P
n=1 n
1 < a < b.

X 1
Now let s = σ + it be a complex number, then ζ(s) = converges for σ > 1 since
ns
n=1

∞ ∞ ∞ ∞
X 1 X −s X −s log(n)
X 1
< n = e =

ns nσ
n=1 n=1 n=1 n=1
P∞
and is called the zeta function. Also, the series n=1 n−s is uniformly convergent throughout
any finite region in which σ ≥ a > 1. Therefore, the function ζ(s) is continuous at all points of
the region σ > 1.
Since ζ(s) is absolutely convergent for σ > 1, by Dirichlet multiplication [1, §1.66(xi)] we
conclude that
∞ ∞ ∞ ∞
X 1 X 1 X 1 X X d(n)
(ζ(s))2 = = 1 = , σ>1
as bs cs ns
a=1 b=1 c=1 ab=c n=1

where d(n) denotes the number of divisors of n. More generally


X dk (n)
(ζ(s))k = , σ>1
ns
n=1

where k = 1, 2, . . . and dk (n) is the number of ways of expressing n as a product of k factors.

1.3.2 Euler Product


The product
Y 1

1− s
p
p

where p runs through the primes 2, 3, 5, . . . is uniformly convergent


P −s in any finite region through-
out which σ > 1 because same thing is true of the series p |p |, which consists of some of the
terms of the series ∞
n=1 |n | studied in subsection 1.3.1 [1, §1.44].
−s
P

10
1
The value of the product is . Since for
ζ(s)
 
1 1 1
1 − s ζ(s) = 1 + s + s + . . .
2 3 5

all terms containing the factor 2 being omitted on the right3 . Next
  
1 1 1 1
1− s 1 − s ζ(s) = 1 + s + s + . . .
2 3 5 7

all terms containing the factors 2 or 3 being omitted. So generally, if pn is the nth prime,
    
1 1 1 1
1− s 1 − s · · · 1 − s ζ(s) = 1 + s + . . .
2 3 pn `

where all numbers containing the factors 2, 3, . . . , pn are omitted. Since all numbers upto pn
are of this form,
1 s 1 s
    
1− 1 1 1

1 − s · · · 1 − s ζ(s) − 1 ≤ +
+ ...
2s 3 pn pn + 1 pn + 2

which tends to 0 as pn → ∞. Hence


    
1 1 1
lim 1 − s 1 − s · · · 1 − s ζ(s) = 1
n→∞ 2 3 pn

hence proving the result4 that

Y 1 −1

ζ(s) = 1− s , σ>1
p
p

This was proved by Leonhard Euler5 in 1737. Since a convergent infinite product of non-zero
factors is non-zero [4, §1.1], we conclude that:

ζ(s) has no zeros for σ > 1.

1.3.3 Analytic Properties of Euler Zeta Function


(s) = n−s is analytic inside any region D ⊂ C with σ > 1, and the series ∞
P
Since
P∞ un−s n=1 un (s) =
n is uniformly convergent throughout every region D 0 ⊂ D. The function ζ(s) =
P∞n=1
−s is an analytic function, regular6 for σ > 1, and all its derivatives can be calculated
n=1 n
by term-by-term differentiation [1, §2.8]. Therefore, in general


(k) k
X (log(n))k
ζ (s) = (−1) , σ>1
ns
n=2

3
Notice the resemblance to the sieve of Eratosthenes.
4
This identity is an analytic equivalent of the fundamental theorem of arithmetic [3, pp. 2].
5
“Variae observationes circa series infinitas.” St. Petersburg Acad., 1737. http://eulerarchive.maa.org/
pages/E072.html
6
A one-valued analytic function is regular at any point which is interior to one of the circles used in continuation
from the original element [1, §4.2].

11
1.3.4 Riemann Zeta Function
Consider the following term-by-term integration over an infinite range

Z ∞ s−1 Z ∞ X !
x
dx = xs−1 enx dx
0 ex − 1 0 n=1
X∞ Z ∞
= xs−1 e−nx dx
n=1 0
X∞ Z ∞
−s
= n y s−1 e−y dy
n=1 0
X∞
= n−s Γ(s)
n=1
P∞
Then since n=1 n
−s is convergent for σ > 1, we have [1, §1.78]


xs−1
Z
1
ζ(s) = dx, σ>1
Γ(s) 0 ex − 1

We can use this formula to continue ζ(s) across σ = 1, in the same way that we continue Γ(s)
across σ = 0. In fact we can prove it precisely the same way as for gamma-function [1, §4.43].
Consider the complex integral
ws−1
Z
I(s) = w
dw
C e −1
where the contour C starts at infinity on the positive real axis, encircles the origin once in the
positive direction, excluding the points ±2iπ, ±4iπ, . . . and returns to positive infinity7 . Here
the many-valued function (w)s−1 = e(s−1) log(w) is made definite by taking log(w) to be real at
the beginning of the contour; thus Im(log(w)) varies from 0 to 2π round the contour [4, §2.4]. If
s is any integer, the integrand in I(s) is one-valued, and I(s) can be evaluated by the theorem
of residues [1, §3.11].
We can take C to consist of the real axis from ∞ to δ (0 < δ < 2π), the circle |w| = δ, and
the real axis from δ to ∞.

7
The only difference between this contour and the one used for gamma-function is that C must now exclude
all the poles of 1/(ew − 1) other than w = 0, i.e. the points w = ±2iπ, ±4iπ, . . .

12
We note that
|ws−1 | = e(σ−1) log |w|−t arg(w) ≤ |w|σ−1 e2π|t|
w2 w3

w

|e − 1| = w +
+ + . . . > A|w|, for some A > 0
2 6
Hence on the circle |w| = δ, for fixed s, we have
Z
ws−1 δ σ−1 e2π|t|
dw ≤ · 2πδ → 0

|z|=δ ew − 1 Aδ

as δ → 0 and σ > 1. Thus on letting δ → 0, if σ > 1, we get


Z ∞ s−1 Z ∞ s−1
x xe2πi
I(s) = − dx + dx
0 ex − 1 0 ex − 1
= −ζ(s)Γ(s) + e2πi(s−1) ζ(s)Γ(s) (using the boxed formula on previous page)
 
= e2πi(s−1) − 1 ζ(s)Γ(s)
  π
= e2πi(s−1) − 1 ζ(s) (using Euler’s Reflection Formula)
sin(πs)Γ(1 − s)
e2πis − e2πi ζ(s) 2πi
=
e 2πi Γ(1 − s) e − e−iπs
iπs

2πieiπs
= ζ(s) (using e2πi = 1)
Γ(1 − s)
Hence we get
e−iπs Γ(1 − s) ws−1
Z
ζ(s) = dw
2πi C ew − 1
This formula has been proved for σ > 1. The integral I(s), however, is uniformly convergent
in any finite region of the complex plane, and so defines an integral function of s. Hence the
formula provides the analytic continuation of ζ(s) over the whole complex plane. The only
possible singularities are the poles of Γ(1 − s), i.e. s = 1, 2, 3, 4, . . .. From the subsection 1.3.3
we know that ζ(s) is regular at s = 2, 3, . . ., and in fact it follows at once from Cauchy’s theorem
[1, §2.33] that I(s) vanishes at these points. Hence the only possible singularity is a simple pole
at s = 1. For s = 1 the contour integral is equal to
Z
dw
I(1) = w
= 2πi
C e −1

by the residue theorem [1, §3.11], and Γ(1 − s) has a pole (as seen in section 1.2). Hence s = 1
is the only pole. Hence the above formula is analytic continuation of ζ(s) over C \ {1}.
To deduce the functional equation from the analytic continuation formula, take the integral
ws−1
of w along the contour Cn , n = 1, 2, 3, . . . , consisting of the positive real axis from infinity
e −1
to (2n + 1)π, then round the square with the corners (2n + 1)π(±1 ± i), and then back to infinity
along the positive real axis. Between the contours C and Cn the integrand has poles at the
points ±2iπ, . . . , ±2inπ.
The residues at 2mπi and −2mπi are taken together8
iπ s−1 3iπ s−1
s−1 s−1
   
lim (z − 2mπi) (2mπi)
e2mπi −1
+ lim (z + 2mπi) (−2mπi)
e−2mπi −1
= 2mπe 2 + 2mπe 2
z→2mπi z→−2mπi
8
Can use L’Hopital’s rule to evaluate this limit. This rule is a local statement i.e. it is concerned about
behaviour of a function (single valued/multi valued) near a particular point and not the global issues (like branch
cuts).

13
 
π(s−1)
= (2mπ)s−1 eiπ(s−1) 2 cos 2

= −2 (2mπ)s−1 eiπs sin πs



2

(2n + 1)π(−1 + i) (2n + 1)π(1 + i)

2nπi

2πi

(2n + 1)π

−2πi

−2nπi

(2n + 1)π(−1 − i) (2n + 1)π(1 − i)

Hence by the theorem of residues [1, §3.11]


n
!
ws−1 ws−1
Z Z  πs  X
iπs s−1
w
dw = w
dw + 2πi −2e sin (2mπ)
Cn e − 1 C e −1 2
m=1

Now let σ < 0 and make n → ∞. The function 1/(ez − 1) is bounded on the contours Cn , and
z s−1 = O |z|σ−1 . Hence the integral round Cn tends to zero, and we obtain



 πs  X
I(s) = 4πieiπs sin (2mπ)s−1
2
m=1
 πs 
= 4πieiπs sin (2π)s−1 ζ(1 − s)
2
Now using the analytic continuation formula, we deduce the desired functional equation:
 πs 
ζ(s) = 2s π s−1 sin Γ(1 − s)ζ(1 − s)
2
This was proved by Bernhard Riemann9 in 1859 [4, §2.4].

1.3.5 Zeros of Riemann Zeta Function


Using the Euler’s reflection formula and Legendre’s duplication formula we can rewrite the
functional equation as:
1−s

s− 21 Γ 2
ζ(s) = π ζ(1 − s)
Γ 2s
We can rewrite this as
 
− 2s
s
− 1−s 1−s
π Γ ζ(s) = π 2 Γ ζ(1 − s)
2 2
9
“Ueber die Anzahl der Primzahlen unter einer gegebenen Grösse.” Monatsberichte der Berliner Akademie,
1859. http://www.maths.tcd.ie/pub/HistMath/People/Riemann/Zeta/

14
which can be expressed by saying that the function on the left is an even function of s − 21
[3, §8]. The functional equation allows the properties of ζ(s) for σ < 0 to be inferred from its
properties for σ > 1. In particular, the only zeros of ζ(s) for σ < 0 are at the poles of Γ 2s ,


except 0 since it’s a simple pole of ζ(1 − s). So, the points s = −2, −4, −6, . . . are called the
trivial zeros.

Figure 1.1: Plot of Riemann zeta function for real values of s, i.e. when t = 0, illustrating the trivial
zeros. Plotted using plot(zeta(x), (x,-13,-1), rgbcolor=(0,0,0), legend label= 0 $\ zeta(x)$0 , thickness=2) in SageMath 7.5.1.

The remainder of the plane, where 0 ≤ σ ≤ 1, is called the critical strip. To analyse zeta
function in this strip, we will consider the function ξ(s) defined by
1 s
s
ξ(s) = s(s − 1)π − 2 Γ ζ(s) (1.10)
2 2
This is an entire function because it has no pole for σ ≥ 12 and is an even function of s − 1/2.
Note that ξ(s) = ξ(1 − s), hence it will suffice to prove the statements about ξ(s) just for
σ ≥ 1/2. Observe that
1
s(s − 1)π − 2s < exp(C1 |s|)

2 (1.11)

for some constant C1 , and by Stirling’s formula (subsection 1.2.2)


 s 
Γ < exp(C2 |s| log |s|) (1.12)

2
for some constant C2 , where − π2 < arg(s) < π
2. And using Abel’s partial summation (Ap-
pendix A) we can write for σ > 1
Z x
X 1 bxc btc
s
= s
+ s s+1
dt
n x 1 t
n≤x

and by taking x → ∞ we get


∞ Z ∞
X 1 1
= s dt
xs 1 ts+1
n=1

15
Therefore we can rewrite ζ(s) for σ > 1 as
Z ∞
bxc
ζ(s) = s dx
xs+1
Z1 ∞
x − {x}
=s dx (1.13)
1 xs+1
Z ∞
s {x}
= −s s+1
dx
s−1 1 x

Since the integral on the right is absolutely convergent for σ > 0, we get the analytic continua-
tion10 of ζ(s) for σ > 0. Hence for σ ≥ 1/2 we have

|ζ(s)| < C3 |s| (1.14)

when |s| is large. Combining (1.11), (1.12) and (1.14) we get

|ξ(s)| < exp(C|s| log |s|)

as |s| → ∞ for some constant C. Hence ξ(s) is of order at most 1. Moreover, as s → +∞


through real values, the above inequality is the best possible (apart from the value of C), since
log(Γ(s)) ∼ s log(s) and ζ(s) → 1. Hence, if fact, ξ(s) is of order 1.
Consequently, ξ(s) does not satisfyP the more precise inequality indicated in subsection 1.1.3,
for entire functions of order 1. Hence rn−1 diverges, i.e. ξ(s) has infinitely many zeros.
The zeros of ξ(s) are the non-trivial zeros of ζ(s), for in (1.10) the trivial zeros of ζ(s) are
cancelled by the poles of Γ( 2s ) (subsection 1.2.1), and 2s Γ( 2s ) has no zeros, and the zero of s − 1
is cancelled by the pole of ζ(s) (subsection 1.3.4). Then, recalling that ζ(s) has no zero for
σ > 1 (subsection 1.3.2), we get

ζ(s) has infinitely many non-trivial zeros in the critical strip 0 ≤ σ ≤ 1.

This was proved by Jacques Hadamard11 in 1893 [3, §12]. We can analyse the function ξ(s) to
write its product formula explicitly as shown in subsection 1.1.3:
Y s

ξ(s) = eA+Bs 1− es/ρ (1.15)
ρ
ρ

where ρ is a zero of ξ(s). Eliminating ξ 0 (s)/ξ(s) from the logarithmic differentiation of (1.10)
and (1.15) we get:

log(π) 1 Γ0 s

ζ 0 (s) X 1 
1 2 + 1 1
=B− + − s + +
ζ(s) s−1 2 2Γ 2 +1 ρ
s−ρ ρ

This exhibits the pole of ζ(s) at s = 1 and the non-trivial zeros at s = ρ.


Moreover, Riemann conjectured that any non-trivial zero of ζ(s) has σ = 1/2 (known as
Riemann Hypothesis). This can be illustrated by the following figure [11][6, Chapter 8]
10
So unlike previous proof of analytic continuation for all s ∈ C, it’s really easy to define ζ(s) as a meromorphic
function for σ > 0 with only pole at s = 1.
11
“Etude sur les propriétés des fonctions entières et en particulier d’une fonction considérée par Riemann.”
Journal de Mathématiques Pures et Appliquées (1893): 171-216. http://eudml.org/doc/234668

16
Figure 1.2: A plot of the real and imaginary parts of the Riemann zeta function ζ(1/2 + it) for
0 < t < 30. Plotted using i = CDF.gen(); v = [zeta(0.5 + n/10 * i) for n in range(300)]; L = [(z.real(), z.imag()) for z in v]; line(L,
rgbcolor=(0,0,0), thickness=2) in SageMath 7.5.1.

1.4 von Mangoldt Function


We define the von Mangoldt function, Λ(n), as:
(
log(p) if n = pk for some prime p and integer k ≥ 1,
Λ(n) =
0 otherwise
P
Hence we have the identity: m|n Λ(m) = log(n).
Furthermore, from the Euler product we obtain that
 
X 1
log(ζ(s)) = − log 1 − s , σ > 1 (1.16)
p
p

And for the principal branch of the complex logarithm we have



X zn
− log(1 − z) =
n
n=1

where |z| ≤ 1, z 6= 1. Hence we can re-write (1.16) as


XX 1 XX 1
log(ζ(s)) = ms
= e−ms log(p) , σ>1
p m
mp p m
m

where p runs through all primes, and m through all positive integers. On differentiating both
sides (since analytic), we get
ζ 0 (s) X X log(p)
=− , σ>1
ζ(s) p m
pms

Which can be re-written as



ζ 0 (s) X Λ(n)
− = , σ>1
ζ(s) ns
n=1

and gives us the generating function for Λ(n) [7, §17.7].

17
1.5 Prime Counting Function
We define π(x) to be the number of primes which do note exceed x, i.e.
X
π(x) = 1
p≤x

so that π(1) = 0, π(2) = 1, and so on. If pn is the nth prime, then π(pn ) = n. Therefore, π(x)
as a function of x and pn as a function of n, are inverse functions [7, §1.5]. Therefore, to ask
for an exact formula for π(x) is same as to ask for an exact formula for pn . No such formula is
known.
The prime number theorem, states that
x
π(x) ∼
log(x)
which is equivalent to
pn ∼ n log(n)
because log(log(x)) = o(log(x)) [7, §1.8].
Since there are infinitely many primes, π(x) → ∞ as x → ∞. Moreover, we can use the
arguments used for proving existence of infinitely many primes to get a lower bound for π(x)
and upper bound for pn . Following are the two bounds [7, §2.2, 2.6]:
n
Bound 1. pn < 22 for any positive integer n and π(x) ≥ log(log(x)) for all x ≥ 2
n
Proof. Using induction, we can prove that pn < 22 for any positive integer n. Also,
n−1 n n
we observe that for n ≥ 4, en−1 > 2n i.e. ee > e2 > 22 (since ex is an increasing
function).
h 3
i  n−1 n i
Now we divide the interval x ≥ 2 into smaller disjoint intervals 2, ee and ee , ee
 n−1 n i h 3
i
for all n ≥ 4. It’s enough to prove for x ∈ ee , ee for all n ≥ 4, since for x ∈ 2, ee
the inequality follows from the fact that log(log(x)) ∈ (−0.4, 3] and log(x) is an increasing
function. i
n−1 n n
Let x ∈ ee , ee where n ≥ 4. Since π(x) is a non-decreasing function, pn < 22 <
n−1
ee < x implies that π(pn ) ≤ π(x) i.e. n ≤ π(x) for all n ≥ 4. Since log(x) is
n
an increasing function, x < ee implies that log(log(x)) ≤ n. Hence we conclude that
log(log(x)) ≤ n ≤ π(x) for all n ≥ 4. Thus completing the proof.

log(x)
Bound 2. π(x) ≥ for all x ≥ 1 and pn ≤ 4n for any positive integer n.
2 log(2)

Proof. Suppose that 2, 3, 5, . . . , pj are the first j primes and let N (x) be the number of
positive integers, n, not exceeding x which are not divisible by any prime p > pj . If we
express such an n in the form n = n21 m, where m is a square free integer i.e. not divisible
b
by the square of any prime. Hence m = 2b1 3b2 · · · pjj , with bi is either 0 or 1. There are
just 2j possible choices of the exponents and so not more than 2j different values of m.
√ √ √
Again, n1 ≤ n ≤ x and so there are not more than x different values of n1 . Hence

N (x) ≤ 2j x
Now we take, j = π(x), so that pj+1 > x and N (x) = x. Hence we have
√ log(x)
N (x) = x ≤ 2π(x) x =⇒ ≤ π(x)
2 log(2)
If we put x = pn , so that π(x) = n, we get pn ≤ 4n .

18
The importance of ζ(s) in the theory of prime numbers lies in the fact that it combines two
expressions, one of which contains the primes explicitly, while the other does not. The theory
of primes is largely concerned with the function π(x), the number of primes not exceeding x.
We can transform the Euler product into a relation between ζ(s) and π(x) [4, §1.1]. For σ > 1,

 
X 1
log (ζ(s)) = − log 1 − s
p
p
∞  
X 1
=− (π(n) − π(n − 1)) log 1 − s
n
n=2
∞     
X 1 1
=− π(n) log 1 − s − log 1 −
n (n + 1)s
n=2
∞ Z n+1   
X d 1
= π(n) log 1 − s dx
n dx x
n=2
∞ Z n+1
X s
= π(n) s
dx
n x(x − 1)
n=2
Z ∞
π(x)
=s s − 1)
dx
2 x(x

The rearrangement of the series is justified since log 1 − n1s = O(n−σ ) and π(n) ≤ n. Therefore


we have Z ∞
π(x)
log (ζ(s)) = s s − 1)
dx
2 x(x

1.6 Chebyshev Functions


The first Chebyshev function ϑ(x) is given by
 
X Y
ϑ(x) = log(p) = log  p
p≤x p≤x

and the second Chebyshev function ψ(x) is given by


X X
ψ(x) = log(p) = Λ(n)
pk ≤x n≤x

Therefore, if pm is the highest power of p not exceeding x log(p) occurs m times in ψ(x). Also
pm is the highest power of p which divides any number upto x, so that

ψ(x) = log (U (x))

where U (x) is the least common multiple of all numbers upto x. We can also express φ(x) in
the form
X  log(x) 
ψ(x) = log(p)
log(p)
p≤x

The definition of ϑ(x) and ψ(x) are more complicated than that of π(x), but they are in
reality more ‘natural’ functions [7, §22.1]. The function ψ(x) can be thought of as a novel way
of counting primes. Instead of adding 1 every time a prime occurs, it gives a greater weight to
the primes and their powers, namely the weight log(p). With such weights the count of primes

19
becomes almost equal to the upper bound of the interval in which they are counted. Hence the
graph of ψ(x) is almost a straight line (with jump discontinuities) that makes a 45° angle with
horizontal axis, expecially for large numbers x [13].
1 1
Since p2 ≤ x, p3 ≤ x, . . . are equivalent to p ≤ x 2 , p ≤ x 3 , . . ., we have
 1  1
ψ(x) = ϑ(x) + ϑ x 2 + ϑ x 3 + . . .

1
The series breaks off when x m < 2, i.e. when

log(x)
m>
log(2)

And from the definition of ϑ(x), we can conclude that ϑ(x) < x log(x) for all x ≥ 2, and in fact
 1 1 1
ϑ x m < x m log(x) ≤ x 2 log(x)

for m ≥ 2. Therefore X  1  1 
ϑ x m = O x 2 (log(x))2
m≥2

since there are only O(log(x)) terms in the series. Hence proving that
 1 
ψ(x) = ϑ(x) + O x 2 (log(x))2 (1.17)

Next we note, without proof, that ϑ(x) is of the order x [7, §22.2],

ϑ(x)  x and ψ(x)  x for x ≥ 2 (1.18)

where f (x)  φ(x) means Aφ(x) < f (x) < Bφ(x) for some positive constants A and B inde-
pendent of x, i.e. f is of the same order of magnitude as φ [7, §1.6]. Hence using (1.17) we can
conclude that ψ(x) is about the same as ϑ(x) when x is large.
Now we can write
X X
ϑ(x) = log(p) ≤ log(x) 1 = π(x) log(x)
p≤x p≤x

and so by (1.18)
ϑ(x) Ax
π(x) ≥ > (1.19)
log(x) log(x)
for some positive constant A independent of x. On the other hand, if 0 < δ < 1,
X
ϑ(x) ≥ log(p)
x1−δ <p≤x
X
≥ (1 − δ) log(x) 1
x1−δ<p≤x
  
= (1 − δ) log(x) π(x) − π x1−δ
 
≥ (1 − δ) log(x) π(x) − x1−δ

and so
ϑ(x) Bx
π(x) ≤ x1−δ + < (1.20)
(1 − δ) log(x) log(x)

20
for some positive constant B independent of x. By (1.19) and (1.20) we conclude that [7, §1.8]
x
π(x) 
log(x)

Also, from (1.19) and (1.20) it follows that


π(x) log(x) x1−δ log(x) 1
1≤ ≤ +
ϑ(x) ϑ(x) 1−δ
For any ε > 0, we can choose δ = δ(ε) so that
1 ε
<1+
1−δ 2
and then choose x0 = x0 (δ, ε) = x0 (ε) so that
x1−δ log(x) C log(x) ε
< δ
<
ϑ(x) x 2
for all x > x0 and some positive constant C independent of x. Hence
π(x) log(x)
1≤ <1+ε
ϑ(x)
for all x > x0 . Since ε is arbitrary, it follows that
ϑ(x)
π(x) ∼
log(x)
Then by (1.17) and (1.18) it follows that

ϑ(x) ψ(x)
π(x) ∼ ∼
log(x) log(x)

1.7 The Proof


We will prove the following version of prime number theorem12
 √ 
π(x) = Li(x) + O xe−c log(x)
R x dt
where Li(x) = 2 log(t) and c is some positive constant, following the method used by Charles
Jean de la Vallée Poussin (1896, 1899) [3, §18].
Following is Terence Tao’s very informal sketch of the proof [20]:
1. Create a “sound wave” (von Mangoldt function) which is noisy at prime number times,
and quiet at other times.
2. “Listen” (Mellin transform) to this wave and record the notes that you hear (the zeros of
the Riemann zeta function, or the music of the primes). Each such note corresponds to a
hidden pattern in the distribution of the primes.
3. Show that certain types of notes do not appear in this music.
4. From this (and tools such as Fourier analysis) one can prove the prime number theorem.
We can illustrate the statement of prime number theorem by the following graph:
 1/10

12
A weaker result, where the error term in O xe−c(log(x)) , can be proved without much machinery from
complex analysis (like the theory of entire functions of finite order, Hadamard products, etc.). That proof was
given by Edmund Landau (1903, 1912) and proceeds directly to the function π(x) but involves a nonabsolutely
convergent integral. For that proof one may refer to the book by Ayoub [2, pp. 47–72]

21
x
Figure 1.3: The dashed, solid and dotted curves represent Li(x), π(x) and log(x) respectively for 2 ≤
x ≤ 10000. Plotted using P = plot(Li(x), (x,2,10000), linestyle="--", thickness=2, rgbcolor=(0,0,0)); Q = plot(prime pi(x), (x,2,10000),
thickness=2, rgbcolor=(0,0,0)); R =plot(x/log(x), (x,2,10000), linestyle=":", thickness=3, rgbcolor=(0,0,0)); (P+Q+R) in SageMath 7.5.1.

1.7.1 Zero-Free Region for ζ(s)


The vital step in the proof of prime number theorem is to show the existence of a thin zero-free
region for ζ(s) to the left of σ = 1. We will work with the function ζ 0 (s)/ζ(s), since its analytic
continuation is easy and has poles only at the zeros of ζ(s) for σ > 0. From section 1.4 we know
that

ζ 0 (s) X Λ(n)
− = , σ>1
ζ(s) ns
n=1

Therefore, we have

ζ 0 (s)
  X Λ(n)
− Re = cos(t log(n)), σ>1 (1.21)
ζ(s) nσ
n=1

since Re (n−s ) = Re e −(σ+it) log(n)



.
Now to proceed, we need to use following observation by Franz Mertens13 for all θ ∈ R,

2(1 + cos(θ))2 ≥ 0
⇒2 + 2 cos2 (θ) + 4 cos(θ) ≥ 0
⇒2 + cos(2θ) + 1 + 4 cos(θ) ≥ 0
⇒3 + 4 cos(θ) + cos(2θ) ≥ 0

Applied to (1.21) with t replaced14 by 0, t, 2t in succession, it gives


 0    0    0 
ζ (σ) ζ (σ + it) ζ (σ + 2it)
3 − + 4 − Re + − Re ≥0 (1.22)
ζ(σ) ζ(σ + it) ζ(σ + 2it)
13
According to Davenport [3, pp. 84], the usage of this trick was discovered in 1898, but I couldn’t find that
article. A less known fact is that the symbol µ for Möbius function was introduce by Mertens in 1874 [6, pp.
382].
14
Note that the real parts are functions of cos(t) etc.

22
The behaviour of −ζ 0 (σ)/ζ(σ) as σ → 1, in view of the simple pole of ζ(s) at s = 1, we have
ζ 0 (σ) 1
− < + A1 (1.23)
ζ(σ) σ−1
for 1 < σ ≤ 2, where A1 denotes a positive absolute constant.
The behaviour of the other two functions near σ = 1 is influenced by any zero that ζ(s)
has to the left of σ = 1, at a height near to t or 2t. Considered the following partial fraction
formula shown at the end of subsection 1.3.5
log(π) 1 Γ0 2s + 1

ζ 0 (s) X 1 
1 1
− = −B− +  − + (1.24)
ζ(s) s−1 2 2 Γ 2s + 1 ρ
s−ρ ρ

Using Stirling’s formula (subsection 1.2.2) we know that


!
 s   s + 1  s  s  1 1
log Γ +1 = log +1 − + 1 + log(2π) + O s
2 2 2 2 2 + 1
2

Therefore, on differentiating it we get


!
Γ0 2s + 1

1 s  s+1 1 1
s
 = log + 1 + − +O s < C log(s)
Γ 2 +1 2 2 2(s + 2) 2 + 1
2

for some positive constant C. Now if we bound σ such that 1 ≤ σ ≤ 2 but keep t unbounded
such that 2 ≤ t, then we can replace C log(s) by some C 0 log(t). Hence for some positive
constant A2 we have
!
1 log(π) 1 Γ0 2s + 1
Re −B− +  < A2 log(t)
s−1 2 2 Γ 2s + 1

if 1 ≤ σ ≤ 2 and 2 ≤ t. Hence, in this region using (1.24) we get


 0   
ζ (s) X 1 1
− Re < A2 log(t) − Re + (1.25)
ζ(s) ρ
s−ρ ρ

The sum over ρ is positive since15


   
1 σ−β 1 β
Re = and Re = 2 (1.26)
s−ρ |s − ρ|2 ρ |ρ|
where s = σ + it and ρ = β + i`.
We obtain a valid inequality when s = σ + 2it by just omitting the sum:
 0 
ζ (σ + 2it)
− Re < A2 log(t) (1.27)
ζ(σ + 2it)
For the case when s = σ + it, we choose t to coincide with the ordinate ` of the zero ρ = β + i`,
with ` ≥ 2 and just the one term 1/(s − ρ) in the sum which corresponds to this zero:
 0 
ζ (σ + it) 1
− Re < A2 log(t) − (1.28)
ζ(σ + it) σ−β
Substituting (1.23), (1.27) and (1.28) in (1.22) we obtain:
   
1 1
3 + A1 + 4 A2 log(t) − + A2 log(t) > 0
σ−1 σ−β
15
Note that 2 Re(z) = z + z̄

23
4 3
⇒ < + A3 log(t)
σ−β σ−1
Take σ = 1 + δ/ log(t), where δ is a positive constant. Then

δ 4δ
β <1+ −
log(t) (3 + A3 δ) log(t)

and if δ is suitably chosen in relation to A3 , this gives


c
β <1−
log(t)

where c is a positive constant to which a numerical value could be assigned. Thus we have:

There exists a positive constant numerical constant c such that ζ(s) has no zero in
the region
c
σ ≥1− , t≥2
log(t)

Hence, ζ(s) 6= 0 on σ = 1. This was proved independently by Hadamard16 and de la Vallée


Poussin17 in 1896 [3, §13].

1.7.2 Counting the Zeros of ζ(s)


Consider a rectangle 0 < σ < 1, 0 < t < T , and we want to find an approximate formula for
N (T ), the number of zeros of ζ(s) in this rectangle. But, it is convenient to work initially with
ξ(s) rather than with ζ(s) because it has a simple functional equation and is an entire function
(subsection 1.3.5), hence it’s an analyticEfunction.
K F

1
−1 + iT iT 2 + iT 2 + iT

−1 O 1 2

16
“Sur la distribution des zéros de la fonction ζ(s) et ses conséquences arithmétiques.” Bulletin de la Société
H J G
Mathématique de France 24 (1896): 199–220. http://eudml.org/doc/85858
17
“Recherches analytiques la théorie des nombres premiers.” Ann. Soc. scient. Bruxelles 20 (1896): 183–256.
https://archive.org/details/recherchesanaly00pousgoog

24
If we assume (for simplicity) that T (which we suppose to be large) does not coincide with
the ordinate of a zero of ξ(s) and −1 < σ < 2, then using Cauchy’s argument principle [1, §3.41]
we can say that
1
N (T ) = ∆R arg (ξ(s)) (1.29)

where ∆R denotes the variation of arg (ξ(s)) round the contour R, which is a rectangle with
vertices 2, 2 + iT , −1 + iT , −1, described in the positive sense.
There is no change in arg(ξ(s)) as s describes the base of the base of the rectangle, since
ξ(s) is then real and nowhere 0. Further, the change in arg(ξ(s)) as s moves from 21 + iT to
−1 + iT and then to −1 is equal to the change as s moves from 2 to 2 + iT and then to 21 + iT ,
since
ξ(σ + it) = ξ(1 − σ − it) = ξ(1 − σ + it)
Hence, we can rewrite (1.29) as
 
1 1
N (T ) = 2 ∆L arg (ξ(s)) = ∆L arg (ξ(s)) (1.30)
2π π
1
where L consists of the edges of rectangle from 2 to 2 + iT and then to 2 + iT .
The definition of ξ(x), from (1.10) can be written as
s
s 
ξ(s) = (s − 1)π − 2 Γ + 1 ζ(s)
2
Hence we have
 s  s 
∆L arg(ξ(s)) = ∆L arg(s − 1) + ∆L arg π − 2 + ∆L arg Γ + 1 + ∆L arg (ζ(s)) (1.31)
2
We note that by using the facts that
 
1 π
arctan(x) + arctan = , x>0
x 2

and

X (−1)n 2n+1
arctan(x) = − x , |x| ≤ 1
2n + 1
n=0
we get  
π 1 π 1 1 1
arctan(x) = − arctan = − + 3 − 5 + ...
2 x 2 x 3x 5x
for large values of x > 0. Hence we have
 
1
∆L arg(s − 1) = arg + iT − 1 − arg(2 − 1)
2
 
1
= arg − + iT
2
= π − arctan(2T )
(1.32)
 
π 1
= + arctan
2 2T
π 1 1 1
= + − 3
+ − ...
2 2T 3(2T ) 5(2T )5
 
π 1
= +O
2 T

25
We know that ab = eb log(a) hence using the definition of argument we get
 s   s 
∆L arg π − 2 = ∆L arg exp − log(π)
  2 
1 + i2T
= arg exp − log(π) − arg (exp (− log(π))) (1.33)
4
T
= − log(π)
2
Using the fact that log(f (z)) = log |f (z)| + i arg(f (z)) and Stirling’s formula (subsection 1.2.2),
we get
 s    s 
∆L arg Γ + 1 = ∆L Im log Γ +1
2    2 
5 T
= Im log Γ +i − Im (log (Γ (2)))
4 2
     
3 T 5 T 5 T 1 1
= Im +i log +i − − i + log(2π) + O
4 2 4 2 4 2 2 T
   
3 5 T T 5 T T 1
= arg +i + log + i − + O
4 4 2 2 4 2 2 T

2
   
3 2T T 25 + 4T T
1
= arctan + log − +O
4 5 2 4 2 T
      
3 π 5 T T T 1
= − arctan + log − +O
4 2 2T 2 2 2 T
   
3π T T T 1
= + log − +O
8 2 2 2 T
(1.34)

Now to find ∆L arg(ζ(s)), we will make use of (1.25) from the previous section, for 1 ≤ σ ≤ 2
and 2 ≤ t:  0   
ζ (s) X 1 1
− Re < A log(t) − Re +
ζ(s) ρ
s−ρ ρ

where A is some positive constant. In this formula we take s = 2 + iT . Since |ζ 0 (s)/ζ(s)| is


bounded for such s, we obtain
 
X 1 1
Re + < A log(T )
ρ
s−ρ ρ

As seen earlier in (1.26), all the terms in both series are positive, and since
 
1 2−β 1
Re = ≥
s−ρ (2 − β)2 + (T − `)2 4 + (T − `)2
we conclude that if ρ = β + i` runs through the non-trivial zeros of ζ(s), then for large T
X 1
= O(log(T ))
ρ
1 + (T − `)2

From this it follows that


(a) the number of zeros with T − 1 < ` < T + 1 is O(log(T ))

(b) the sum (T − `)−2 extended over the zeros with ` outside the interval T − 1 < ` < T + 1
P
is also O(log(T ))

26
Also by (1.24), applied at s and 2 + it, and subtracted, we get

ζ 0 (s) X 1 
1
− = O(log(t)) + −
ζ(s) ρ
s − ρ 2 + it − ρ

We can split the summation over ρ in two parts, namely when |t − `| < 1 and when |t − `| ≥ 1.
For the terms with |t − `| ≥ 1, we have

1 1 2−σ 3
s − ρ 2 + it − ρ = |(s − ρ)(2 + it − rho)| ≤ |` − t|2

and sum of these is of O(log(t)) by the conclusion (b) above. As for the terms with |` − t| < 1,
we have |2 + it − ρ| ≥ 1, and the number of terms is O(log(t)) by the conclusion (a) above.
Hence we deduce that for large t (not coinciding with the ordinate of a zero) and −1 ≤ σ ≤ 2

ζ 0 (s) X 1
= + O(log(t)) (1.35)
ζ(s) 0
s − ρ0
ρ

where ρ0 = β + i` are those non-trivial zeros of ζ(s) for which |t − `| < 1.


From this fact we can find ∆L arg(ζ(s)), as
  
1
∆L arg(ζ(s)) = arg ζ + iT − arg(ζ(2))
2
   
1
= Im log ζ + iT
2
Z 2+iT  0 
ζ (s)
= O(1) − Im ds
1
+iT ζ(s)
2
 
Z 2+iT X 1
= O(1) − Im  + O(log(t)) ds
1
+iT s − ρ0
2 ρ0
X Z 2+iT 
1

= O(1) − Im ds + O(log(T ))
1
+iT s − ρ0
ρ0 2
X
= O(1) − ∆L0 arg(s − ρ) + O(log(T ))
ρ0

where O(1) term is from the variation along σ = 2 and L0 is the line joining 1
P 2 + iT to 2 + iT .
Since |∆L0 arg(s − ρ)| is at most π and the number of terms in the sum ρ0 ∆L0 arg(s − ρ) is
O(log(T )) by the conclusion (a) above, we get

∆L arg(ζ(s)) = O(1) + O(log(T )) + O(log(T ))


(1.36)
= O(log(T ))

Now using (1.32), (1.33), (1.34) and (1.36) in (1.31) we get


 
T T T 7π
∆L arg(ξ(s)) = log − + + O(log(T ))
2 2π 2 8

And using this in (1.30) we finally get


 
T T T 7
N (T ) = log − + + O(log(T ))
2π 2π 2π 8

27
This was proved by von Mangoldt, first in 1895 with a slightly less good error term and then
fully in 1905. It would seem at first sight that we might as well omit the term 7/8; but as we
shall see later, it has a certain significance [3, §15].
From the following plot for T = 50 we can see the number of zeros of the Riemann zeta
function along the critical line18 is 10, and N (50) ≈ 9.4.

Figure 1.4: The dashed and solid curves represent the plot of arg(ζ(s)) and |ζ(s)| respectively, for
σ = 1/2 and 1 ≤ t ≤ 50. Plotted using i = CDF.0; p1 = plot(lambda t: arg(zeta(0.5+t*i)), 1, 50,linestyle="--", thickness=2,
rgbcolor=(0,0,0)); p2 = plot(lambda t: abs(zeta(0.5+t*i)), 1, 50,thickness=2, rgbcolor=(0,0,0)); p1+p2 in SageMath 7.5.1.

1.7.3 The Explicit Formula for ψ(x)


Riemann’s great accomplishment was transforming the problem of describing prime numbers
into the problem of describing the zeros of the Riemann zeta function - which can be attacked
directly [13]. There is an explicit formula for ψ(x), valid for x > 1, which consists of a sum over
the complex (non-trivial) zeros ρ of ζ(s). It is astonishing that there can be such a formula, an
exact expression for the number of primes upto x in terms of the zeros of a complicated function
[9, §3]. To avoid some minor complications we shall suppose that x ≥ 2 , though the formula
will be valid for x > 1.
Next we will use the inverse Mellin transform called Perron’s Formula [1, §9.42], which states
that if x is not an integer, c is any positive number, and σ > σ0 − c, then
Z c+i∞
X an 1 xs
= f (s + w) ds
n<x
nw 2πi c−i∞ s
P∞ an
where f (s) = n=1 ns is a Dirichlet series. The particular case w = 0 is
c+i∞
xs
Z
X 1
an = f (s) ds (1.37)
n<x
2πi c−i∞ s
18
Assuming Riemann hypothesis, all zeros in region 0 < σ < 1 lie on σ = 1/2.

28
and applying this to the result obtained in section 1.4 we get
Z c+i∞  0  s
X 1 ζ (s) x
Λ(n) = − ds, c>1
n<x
2πi c−i∞ ζ(s) s

and since x is not an integer, we can write


Z c+i∞  0  s
1 ζ (s) x
ψ(x) = − ds, x 6∈ Z, c > 1 (1.38)
2πi c−i∞ ζ(s) s

If we can move the vertical line of integration away to negative infinity on the left we shall
express ψ(x) as the sum of the residues [1, §3.11] of the function (−ζ 0 (s)/ζ(s)) xs /s at its poles.
The pole of ζ(s) at s = 1 contributes x, the pole of 1/s at s = 0 contributes −ζ 0 (0)/ζ(0); and
each zero ρ of ζ(s), whether trivial or not, contributes −xρ /ρ.
The basic idea is to make use of following discontinuous integral [2, Theorem 3.2]:

1
Z c+i∞ s
y 0 if 0 < y < 1

ds = 12 if y = 1 (1.39)
2πi c−i∞ s 
1 if y > 1

where c > 0. We can obtain (1.37) from (1.39) by taking y = x/n where n < x such that x 6∈ Z,
as
Z c+i∞ Z c+i∞ X ∞
1 xs 1 an xs
f (s) ds = ds
2πi c−i∞ s 2πi c−i∞ ns s
n=1
∞ Z c+i∞  
1 X x s1
= an ds
2πi c−i∞ n s
n=1
X
= an
n<x

Moreover, the method used to prove19 (1.39) can be extended to prove [3, §17] that if

0 if 0 < y < 1

1
Z c+iT s
y
1
δ(y) = 2 if y = 1 and I(y, T ) = ds
 2πi c−iT s
1 if y > 1

for y > 0, c > 0, T > 0, then


(  
1
y c min 1, T | log(y)| if y 6= 1
|δ(y) − I(y, T )| < (1.40)
c
T if y = 1

To be able to use this result, we will have to rewrite ψ(x). Note that, just like π(x), ψ(x) is
also a discontinuous function with jump discontinuities at the points where x is a prime power.
So we modify the definition by taking the mean of the values as
 
1 X X
ψ0 (x) = Λ(n) + Λ(n)
2 n<x
n≤x
( (1.41)
ψ(x) − Λ(x)
2 if x = p k for some prime p and k > 0
=
ψ(x) otherwise
19
We will have to use ML inequality [1, §2.31] after selecting the appropriate contour.

29
By replacing y by x/n, δ(y) by ψ0 (x) and I(y, T ) by
Z c+iT  0  s
1 ζ (s) x
J(x, T ) = − ds (1.42)
2πi c−iT ζ(s) s

we get from (1.40) that



!
X  x c 1 c
|ψ0 (x) − J(x, T )| < Λ(n) min 1, x
 + Λ(x) (1.43)
n T log n
T
n=1
n6=x

where c > 1 and the term containing Λ(x) is present only if x is a prime power.
Now we make a choice of c as per our convenience as
1
c=1+
log(x)

which is equivalent to xc = ex .
We want to estimate the series on the right of (1.43), and we will achieve that by considering
various cases20 (so as to avoid n = x):

Case 1. If n ≤ 34 x or n ≥ 45 x.
Then | log(x/n)| has a positive lower bound, so we get

!
X  x c 1 x X Λ(n)
Λ(n) min 1, 
T log nx

n
n T nc
n=1
 0 
x ζ (c)
= −
T ζ(c)
x
 log(x)
T

Case 2. If 34 x < n < x.


Let x1 be the largest prime power less than x, we can suppose that 43 x < x1 < x, since
otherwise the terms under consideration vanish.

(a) For the term n = x1 , using the series expansion of log function (section 1.4) we have
 
x x − x1 x − x1
log = − log 1 − ≥
n x x

and therefore the contribution of this term is


!  
X  x c 1 x
Λ(n) min 1,  Λ(x 1 ) min 1,
T log nx

n
n T (x − x1 )
 
x
 log(x) min 1,
T (x − x1 )

(b) For other terms, we can put n = x1 − ν, where 0 < ν < 14 x, and then again using
the series expansion of log function (section 1.4) we have
x x   
1 ν ν
log ≥ log = − log 1 − ≥
n n x1 x1
20
We will be using Vinogradov’s symbolism [3, pp. 107] where f (x)  g(x) is equivalent to f (x) = O(g(x)).

30
Hence the contribution of these terms is
!
X  x c 1 X Λ(x1 − ν) x1
Λ(n) min 1, x
  ·
n
n T log n
T ν
0<ν< 41 x
x
 (log(x))2
T

Case 3. If x < n < 45 x.


Let x2 be the least prime power greater than x, we can suppose that x < x2 < 45 x, since
otherwise the terms under consideration vanish.

(a) For the term n = x2 , using the series expansion of log function (section 1.4) we have
 
 x  x x2 − x x2 − x
log = − log = − log 1 − ≥

n n x2 x2

and therefore the contribution of this term is


!  
X  x c 1 x2
Λ(n) min 1,   Λ(x2 ) min 1,
n
n T log nx T (x2 − x)
 
x
 log(x) min 1,
T (x2 − x)

(b) For other terms, we can put n = x2 + ν, where 0 < ν < 14 x, and then again using
the series expansion of log function (section 1.4) we have
 x  x  
n  ν ν
log = − log ≥ log = − log 1 − ≥

n n x2 n n

Hence the contribution of these terms is


!
X  x c 1 X Λ(x2 + ν) x2 + ν
Λ(n) min 1, x
  ·
n
n T log n
T ν
0<ν< 14 x
x
 (log(x))2
T

If we define hxi to be the distance from x to the nearest prime power, other than x itself in
case x is a prime power and replace21 ψ0 (x) by ψ(x), we can rewrite the above estimates for
(1.43) as
x(log(x))2
 
x
|ψ(x) − J(x, T )|  + log(x) min 1, (1.44)
T T hxi
The next step is to replace the vertical line of integration in (1.42) by the other three sides
of the rectangle with vertices at c − iT, c + iT, −U + iT, −U − iT where U is a large odd integer.
(in order to achieve what we earlier commented for (1.38)). Thus the left vertical side passes
halfway between two of the trivial zeros of ζ(s).
Hence we can write
Z  0  s Z c+iT  0  s
1 ζ (s) x 1 ζ (s) x
J(x, T ) = − ds + − ds
2πi C ζ(s) s 2πi −U +iT ζ(s) s
21
Note that ψ0 (x) is the same as ψ(x) except that at its jump discontinuities (the prime powers) it takes the
value halfway between the values to the left and the right. So while analysing the asymptotic behaviour, we can
safely make this replacement.

31
Z −U −iT  0  s Z −U +iT  0  s
1 ζ (s) x 1 ζ (s) x
+ − ds + − ds
2πi c−iT ζ(s) s 2πi −U −iT ζ(s) s

The sum of the residues of the integrand at its poles inside the rectangle is (as earlier
commented for (1.38))
Z  0  s
1 ζ (s) x ζ 0 (0) X xρ X x−2m
− ds = x − − − (1.45)
2πi C ζ(s) s ζ(0) ρ −2m
|`|<T 0<2m<U

where ρ = β + i` is the non-trivial zero of ζ(s). E F

−U + iT c + iT

-4 -2 0 1

−U − iT c − iT
H G
Now we need to carefully choose the T . By the conclusion (a) of subsection 1.7.2, we know
that for any large T , the number of zeros with |` − T | < 1 is O(log(T )), i.e. N (T )  log(T ).
1
Among the ordinates of these zeros there must be a gap of length22  log(T ) . Hence by varying
T by a bounded amount, we can ensure that
1
 |` − T |
log(T )

for all zeros ρ = β + i`.


Now we will determine the contribution made by horizontal integrals depending upon the
choice of σ

Case 1. If −1 ≤ σ ≤ 2
Recall (1.35) from subsection 1.7.2 that

ζ 0 (s) X 1
= + O (log(T ))
ζ(s) s−ρ
|`−T |<1

22
Recall that f (x) = O(g(x)) ⇐⇒ ∃M, x0 ∈ R, M > 0 such that |f (x)| < M |g(x)|∀x > x0 and hence
1 1 1 1
g(x)
= O f (x) . In Vinogradov symbolism, f (x)  g(x) ⇐⇒ g(x)  f (x) .

32
for s = σ + iT and −1 ≤ σ ≤ 2. With the present choice of T , each term is  log(T ), and
the number of terms is also  log(T ). Hence on the new horizontal lines of integration
we have
ζ 0 (s)
= O (log(T ))2

ζ(s)
The contribution made to the horizontal integrals by this range of σ, −1 ≤ σ ≤ 2 is
therefore

Z c+iT  0  s Z −U −iT  0  s Z c s
1 ζ (s) x 1 ζ (s) x 2
x
− ds + − ds  (log(T )) dσ
2πi −U +iT ζ(s) s 2πi c−iT ζ(s) s −1 s

(log(T ))2 c σ
Z
 x dσ
T −∞
x (log(T ))2

T log(x)
(1.46)

Case 2. If −U ≤ σ ≤ −1.
We need to estimate for |ζ 0 (s)/ζ(s)| for σ ≤ −1. From the functional equation stated at
the end of subsection 1.3.4
 πs 
ζ(1 − s) = 21−s π −s cos Γ(s)ζ(s)
2
we note that if 1 − σ ≤ −1 the functions on the right have to be considered only for σ ≥ 2.
The logarithmic derivative is

ζ 0 (1 − s) π  πs  Γ0 (s) ζ 0 (s)
− = − log(2) − log(π) − tan + +
ζ(1 − s) 2 2 Γ(s) ζ(s)

Observe that tan πs is bounded if |s − (2m + 1))| ≥ 21 , that is, if |(1 − s) + 2m| ≥ 12 .

2
0 (s) 0 (s)
Also, ΓΓ(s)  log |s| and therefore  log (2|1 − s|) for σ ≥ 2. And ζζ(s) is bounded. Hence
it follows that 0
ζ (s)
ζ(s)  log(2|s|) (1.47)

in the half-plane σ ≤ −1, provided that the circles of radius 1/2 (say) around all the
trivial zeros at s = −2, −4, . . . are excluded. Hence the contribution to the remainder of
the horizontal integrals for −U ≤ σ ≤ −1 is
Z −U −iT  0  s
log(2T ) −1 σ
Z c+iT  0  s Z
1 ζ (s) x 1 ζ (s) x
− ds + − ds  x dσ
2πi −U +iT ζ(s) s 2πi c−iT ζ(s) s T −U
log(T )

T x log(x)

which is negligible compared with (1.46).

Also by using (1.47) we get the contribution made by the vertical integral at σ = −U as
−U +iT  0  s
log(2U ) T −U
Z Z
1 ζ (s) x
− ds  x dt
2πi −U −iT ζ(s) s U −T
T log(U )

U xU

33
which vanishes as U → ∞.
Making U → ∞ and adding the estimate (1.46) to (1.45), then using (1.38) and (1.44) we
obtain
X xρ ζ 0 (0) 1 
1

ψ(x) = x − − − log 1 − 2 + R(x, T ) (1.48)
ρ ζ(0) 2 x
|`|<T

where
x(log(xT ))2
 
x
|R(x, T )|  + log(x) min 1, (1.49)
T T hxi
As T → ∞ for any given x ≥ 2, we have R(x, T ) → 0, and therefore we get the desired explicit
formula
X xρ ζ 0 (0) 1 
1

ψ(x) = x − − − log 1 − 2
ρ
ρ ζ(0) 2 x

where thePsum over the non-trivial zeros ρ of ζ(s) is to be understood in the symmetric sense as

limT →∞ |`|<T ρ . The convergence is uniform in any closed interval of x which doesn’t contain
a prime power, but not otherwise, since ψ(x) is discontinuous at each prime power value of x.
This was proved by von Mangoldt in 1895 [3, §17].
The result (1.48) and (1.49) constitute the more precise form of the explicit formula. We
proved them subject to a restriction on T , but this can now be removed. The effect of varying
T by a bounded amount is to change the sum over ρ by O(log(T )) terms, and each term is
O(x/T ). Hence the variation in the’ sum is O(x(log(T ))/T ), and this is covered by the estimate
on the right of (1.49).
We note for future reference that, if x is an integer, then hxi ≥ 1, and (1.49) takes the
simpler form
x
|R(x, T )|  (log(xT ))2 (1.50)
T
The results (1.48) and (1.49) continue to hold for 1 < x < 2, with a slight modification in the
form of the estimate for R(x, T ).

1.7.4 Completing the Proof


ρ
We have to estimate the sum |`|<T xρ in (1.48) of subsection 1.7.3. Firstly we will estimate
P
xρ using the fact that the real part β of ρ is not too near 1 (subsection 1.7.1). It follows from
the conclusion of subsection 1.7.1 that if ` < T , where T is large, then
c1
β <1−
log(T )
where c1 is a positive absolute constant. Hence
 
ρ β log(x)
|x | = x < x exp −c1 (1.51)
log(T )
Secondly we will estimate 1/ρ. Since |ρ| ≥ ` for ` > 0, so we just need to estimate 0<`<T 1` .
P
If N (t) denotes, as in subsection 1.7.2, the number of zeros in the critical strip with ordinates
less than t, this sum is
X 1 Z T 1 1
Z T
1
= dN (t) = N (T ) + 2
N (t)dt
` 0 t T 0 t
0<`<T

From the conclusion of subsection 1.7.2 we know that N (t)  t log(t) for large t, hence we have
X 1 X 1
<  (log(T ))2 (1.52)
|ρ| `
0<`<T 0<`<T

34
Hence combining (1.51) and (1.52) we conclude that
X xρ  
 x (log(T ))2 exp −c1 log(x) (1.53)
ρ log(T )
|`|<T

Now, without loss of generality, we can take x to be an integer. Then using (1.53) and (1.50)
in (1.48) of subsection 1.7.3, it follows that

x(log(xT ))2
 
2 log(x)
|ψ(x) − x|  + x(log(T )) exp −c1 (1.54)
T log(T )

for large x. Next we determine T by a function of x, by equating (log(T ))2 to log(x) so that
1  p 
= exp − log(x)
T
and using this in (1.54) we get
 p   p 
|ψ(x) − x|  x(log(x))2 exp − log(x) + x log(x) exp −c1 log(x)
 p 
 x exp −c2 log(x)

provided c2 is a constant that is less than both 1 and c1 . This proves


  p 
ψ(x) = x + O x exp −c2 log(x)

Now from this we will derive the desired relation for π(x). First we consider the function
X Λ(n)
π1 (x) = (1.55)
log(n)
n≤x

This can be expresses in terms of the function ψ(x) by


Z x
X dt 1 X
π1 (x) = Λ(n) 2
+ Λ(n)
n≤x n t(log(t)) log(x)
n≤x
Z x
ψ(t) ψ(x)
= 2
dt +
2 t(log(t)) log(x)

We can replace ψ(t) by t to get


Z x Z x  
1 x d 1 x
2
dt + = t − dt +
2 (log(t)) log(x) dt log(t) log(x)
2 x Z x
−t 1 x
= + dt +
log(t) 2 2 log(t) log(x)
2
= Li(x) +
log(2)

Thus we have
2
π1 (x) = Li(x) + + E(x, t)
log(2)
where Z x    
p p
|E(x, t)|  exp −c2 log(t) dt + x exp −c2 log(x)
2

35
1 1
The contribution ofpthe range tp< x 4 to the integral is trivially less than x 4 , and in the rest of
1
the range we have log(t) > 2 log(x).
Hence   p 
π1 (x) = Li(x) + O x exp −c3 log(x) (1.56)

where c3 = c22 .
Using the definition of Λ(n) from section 1.4, we can rewrite (1.55) as
Xlog(p)
π1 (x) =
m log(p)
pm ≤x
1  1 1  1
= π(x) + π x 2 + π x 3 + . . .
2 3
P  1 1
 1 1
since π(x) = p≤x 1 from section 1.5. Also, since π x 2 ≤ x 2 , π x 3 ≤ x 3 , . . ., the difference
1
between π1 (x) and π(x) is O(x 2 ). Thus using this in (1.56), we get
  p 
π(x) = Li(x) + O x exp −c3 log(x)

This was proved by de la Vallée Poussin in 1899 [3, §18].

1.8 Some Remarks


1.8.1 Lindelöf Hypothesis
Besides the Riemann hypothesis, another important conjecture involving the Riemann zeta
function is the Lindelöf hypothesis, given by Ernst Leonard Lindelöf in 1908. The conjecture
states that  
1
ζ + it = O(tε )
2
for every positive ε. In other words, it’s concerned with the growth of the Riemann zeta function
on the line σ = 12 , and conjectures that the modulus of ζ(1/2+it) grows slower than any positive
power t as t tends to infinity.
Riemann hypothesis implies Lindelöf hypothesis, but Lindelöf hypothesis does not imply
Riemann hypothesis. It was the Lindelöf hypothesis that lead to the study of moments of
Riemann zeta function. The 2k th moment of the modulus of the Riemann zeta function is
defined as  2k
1 T
Z 
1
Ik (T ) = ζ + it dt
T 0
2
The Lindelöf hypothesis is equivalent to the statement that for any k and any positive ε,
Ik (T ) = O(T ε ).
However, moments are now appriciated in their own right becuase they represent mean
values of the Riemann zeta function on the critical line over a finite interval and estimate of
these average values can provide information about the zeros of ζ(s). A nice exposition about
the moments of Riemann zeta function is given in Jennifer Beineke and Chris Hughes’s article
[11].

1.8.2 Elementary Proof


A simple question like “How many primes are there up to x?” deserves a simple answer, one
that uses elementary methods rather than all the methods of complex analysis, which may seem
far from the question at hand [9, §3]. In 1948, using a fundamental inequality of Atle Selberg,

36
Selberg and Paul Erdős succeeded in giving an elementary proof of the prime number theorem.
The proof is based on Selberg’s formula
X X
(log(p))2 + (log(p))(log(q)) = 2x log(x) + O(x)
p<x pq<x

which is completely combinatorial in nature. For that proof one may refer [7, Ch. XXII]. Much
about this elementary proof can be summarised by quoting following lines by Carl Pomerance
[17]

Thus, far from being an isolated intellectual challenge, the elementary proof of the
prime number theorem was a signal that good ideas and strong tools are close at
hand. We already had an inkling of this in Riemann’s era when Chebyshev used
combinatorial methods to show that there is a prime in [n, 2n] for every natural
number n. And a century ago, the elementary proof of Brun, stating that most
primes are not part of twin-prime pairs, opened the door for combinatorial sieve
methods and their many glorious consequences.
Since the elementary proof, some of the most profound and exciting results in the
field have had strong elementary and combinatorial leanings. After Roth used the
(analytic) circle method to show that dense sets of integers must have 3-term arith-
metic progressions, Szeméredi used an elementary (and very complicated) proof to
generalize this to k-term arithmetic progressions. This result became an intrinsic
tool in the recent Green–Tao proof that the set of primes contains arbitrarily long
arithmetic progressions.

The Green-Tao theorem was proved by Ben Green and Terence Tao23 in 2004 [20][9].

1.8.3 Heuristic Approach


A heuristic technique is any approach to problem-solving that employs a practical method
not guaranteed to be optimal or perfect, but sufficient for the immediate goals. Common
heuristic techniques are drawing a picture, working backward by assuming a solution, examining
a concrete example if problem is too abstract and examining a more general problem if the
problem is too specific.
But for the prime number theorem, there is a problem with an attempt at a heuristic
explanation because the sieve of Eratosthenes does not behave as one might guess it would from
pure probabilistic considerations [9, §2]. Sieving out the composites under x using primes upto

x would lead to (as seen in subsection 1.3.2)
Y  1

x 1−
√ p
p< x

2 x
which turns out to be asymptotic to eγ log(x) , instead of log(x) (proved by Franz Mertens in
1874). Thus the sieve is about 11% more efficient at eliminating composites than one might
expect. In 2006, Hugh L. Montgomery and Stan Wagon transformed an old heuristic approach
into a proof of following result[16]
If x/π(x) is asymptotic to an increasing function, then π(x) ∼ x/ log(x).
The proof involves the use of elementary calculus and Tauberian result, instead of complex
function theory. Hence, there is a possibility of finding another proof of prime number theorem
by proving that x/π(x) is asymptotic to an increasing function.
23
“The primes containing arbitrarily long arithmetic progressions.” Annals of Mathematics 167, no. 2 (2008),
481–547. http://arxiv.org/abs/math/0404188

37
Chapter 2

Primes in Arithmetic Progression

“The great fusion between arithmetic and analysis–between counting and measuring,
between numbers staccato and numbers legato–came about as the result of an inquiry
into prime numbers, conducted by Lejeune Dirichlet in 1830s”
— John Derbyshire, Prime Obsession

It is an interesting question whether it is possible to define an infinite number of prime


numbers using only addition, subtraction and multiplication. Let’s consider the polynomials in
one variable whose coefficients are natural numbers. Peter Gustav Lejeune Dirichlet proved that
the linear polynomial ax + b, where a, b, and x are positive integers with gcd(a, b) = 1, defines
an infinite number of prime numbers. It is easy to prove that no non-constant polynomial can
generate only prime values, but it is still unknown whether there exists a polynomial of degree
2 or more that generates infinitely many primes [8][23].

2.1 Dirichlet Density


The Dirichlet density, D(P), of a set P of prime numbers, if exists, is given by
−1 X 1
D(P) = lim
σ→1 log(σ − 1) pσ
p∈P

We can draw the following conclusions from this definition [5, §16.1]:

(a) If P has finitely many elements, then D(P) = 0

(b) If P consists all but finitely many positive primes, then D(P) = 1

(c) If P = P1 ∪ P2 where P1 and P2 are disjoint and D(P1 ) and D(P2 ) both exist, then
D(P) = D(P1 ) + D(P2 ).

2.2 Dirichlet Characters


Let m be a fixed positive integer. Let χ0 : (Z/mZ)× → C× be a homomorphism. Given χ0 ,
define χ : Z → C× as follows:
(
χ0 (a) if gcd(a, m) = 1
χ(a) =
0 if gcd(a, m) 6= 1

The functions χ defined in this manner are called Dirichlet characters modulo m.

38
2.2.1 Number of Dirichlet Characters
If G is a multiplicatively written finite abelian group, then the character on G is a homomor-
phism from G to C× [28, Chapter 2]. We will denote the set of all such characters by G. b If
0 0 0 0
χ, χ ∈ G define χχ to be the function which takes g ∈ G to χ(g)χ (g). Then χχ is also a
b
character. Let’s define χ0 to be the trivial character, i.e. χ0 (g) = 1 for all g ∈ G. If χ ∈ G
b
−1 −1 1
define χ by χ (g) = χ(g) for all g ∈ G. With these definitions G becomes an abelian group
b
with χ0 as the identity element.
If n is the order of G then for g ∈ G we have g n = e, where e is the identity element
of G. So if χ ∈ G b then (χ(g))n = 1, i.e. the values of χ are nth roots of unity . Therefore
1
χ(g) = χ(g) = χ−1 (a), where bar denotes the complex conjugation. Therefore, χ−1 can be
written as χ and called the conjugate character of χ.
In general, G is a direct product of cyclic groups, i.e. there are elements g1 , g2 , . . . , gt ∈ G
such that order of gk is nk with n = n1 n2 · · · nt and every element g ∈ G can be uniquely written
in the form g = g1m1 g2m2 · · · gtmt where 0 ≤ mk < nk for all k. Hence we can conclude that G b
2πi
is generated by χk for all 1 ≤ k ≤ t such that χk (gk ) = e nk and χk (g` ) = 1 for k 6= `. And
hence we have G ∼=G b where G b is the direct product of the cyclic subgroups generated by the
b = |G| = n. If we take G = (Z/mZ)×
χk with order of χk being nk [5, §16.3]. Therefore, |G|
and note that though Dirichlet characters are defined on Z but are induced by the elements in
the character group of (Z/mZ)× , we can conclude that:

There are exactly φ(m) Dirichlet characters modulo m, where φ(m) = | (Z/mZ)× |.

2.2.2 Orthogonality Relations


b then we have P
Let χ ∈ G, 0
g∈G χ(g) = n for χ = χ0 . But if χ 6= χ0 then there is a g ∈ G such
0
that χ(g ) 6= 1, and hence
X X X
χ(g) = χ(gg 0 ) = χ(g 0 ) χ(g)
g∈G g∈G g∈G

= 0 if χ 6= χ0 . Therefore for χ, χ0 ∈ G,
P
So g∈G χ(g)
b we have
(
X
0−1
X n if χ = χ0
χχ (g) = χ(g)χ0 (g) = (2.1)
g∈G g∈G
6 χ0
0 if χ =

Also for g ∈ G, we have χ∈Gb χ(g) = n if g = e. But if g 6= e then g = g1m1 g2m2 · · · gtmt with
P

0 ≤ mk < nk for all k and at least one mk 6= 0. Since χ(g) = χ1 (g)e1 χ2 (g)e2 · · · χt (g)et with
2πimk
0 ≤ ek < nk , we have χk (g) = χk (gkmk ) = χk (gk )mk = e nk
6= 1, and hence
X X X
χ(g) = χk χ(g) = χk (g) χ(g)
χ∈G
b χ∈G
b χ∈G
b

χ(g) = 0 for a 6= e. Therefore for g, g 0 ∈ G, we have


P
So χ∈G
b

(
X
0−1
X n if g = g 0
χ(gg )= χ(g)χ(g 0 ) = (2.2)
6 g0
0 if g =
χ∈G
b χ∈G
b

From (2.1) and (2.2) we conclude that

39
Let χ and χ0 be Dirichlet characters modulo m, and a, b ∈ Z, then
m−1
(
X φ(m) if χ = χ0
(a) χ(a)χ0 (a) =
a=0
0 if χ 6= χ0
(
X φ(m) if a ≡ b (mod m)
(b) χ(a)χ(b) =
χ 0 if a 6≡ b (mod m)

2.3 L-function
2.3.1 Dirichlet L-function
Let χ be a Dirichlet character modulo m. We define the Dirichlet L-function associated to χ
by the formula

X χ(n)
L(s, χ) =
ns
n=1

where s = σ + it is a complex number. Since |χ(n)| = 1 (subsection 2.2.1) we have



χ(n) 1
ns ≤ ns

and hence we see that the terms of L(s, χ) are dominated in absolute value by the corresponding
terms of Euler zeta function ζ(s) (subsection 1.3.1). Thus L(s, χ) converges and is analytic for
σ > 1.

2.3.2 Product Formula


Since χ is completely multiplicative we have a product formula for L(s, χ) in exactly the same
way as for ζ(s) (subsection 1.3.2):

Y χ(p) −1

L(s, χ) = 1− , σ>1
p
p

Since χ(p) = 0 for p|m the above product is over positive primes not dividing m. There is
a close connection between L(s, χ0 ) and ζ(s). In fact,

Y 1 −1

L(s, χ0 ) = 1− s
p
p-m
Y 1 −1
  
1 Y
= 1− s 1− s (2.3)
p p
p
p|m
Y 1

= 1 − s ζ(s)
p
p|m

From this we conclude that

L(s, χ0 ) has a pole at s = 1, hence L(1, χ0 ) 6= 0.

40
2.3.3 Logarithm Formula
The values of L(s, χ) are in general complex (even if we restrict s to be real). So we choose the
principal branch of the logarithm of product formula of L(s, χ) and then do its series expansion
(section 1.4) to get

X X χ pk

G(s, χ) = log(L(s, χ)) = , σ>1
p
kpks
k

where p runs through all primes, and k through all positive integers. Moreover, we have

χ pk

X χ(p) X X
G(s, χ) = +
ps p
kpks
p-m k=2

and using triangle inequality and (1 − x)−1 = 1 + x + x2 + . . . for |x| < 1, we get
∞ ∞
 
X X χ pk X X χ pk


kpks kpks


p k=2 p k=2

XX 1
=
p
kpks
k=2

XX 1

p k=2
pks
X 1  1 −1

= 1 −
p
p2s ps

1 −1 X 1
 
≤ 1− s
2 p
p2s
≤ 2ζ(2)

Therefore, we have
X χ(p)
G(s, χ) = + Rχ (s), σ>1 (2.4)
ps
p-m

where Rχ (s) remains bounded around s = 1. Now let’s multiply both sides of (2.4) by χ(a)
where a ∈ Z with gcd(a, m) = 1 and then sum over all Dirichlet characters modulo m, to get
X X 1 X X
χ(a)G(s, χ) = χ(a)χ(p) + χ(a)Rχ (s)
χ
ps χ χ
p-m

Using the result (b) from subsection 2.2.2, we see


X X 1
χ(a)G(s, χ) = φ(m) + Rχ,a (s), σ>1 (2.5)
χ
ps
p≡a (mod m)

where Rχ,a (s) remains bounded around s = 1.


Now restricting s to be real, i.e. t = 0 and s = σ, in (2.4) and (2.5) we get

41
Let σ > 1 be a real number, then
X χ(p)
(a) G(σ, χ) = + Rχ (σ)

p-m
where Rχ (σ) remains bounded as σ → 1.
X X 1
(b) χ(a)G(σ, χ) = φ(m) + Rχ,a (σ)
χ

p≡a (mod m)
where Rχ,a (σ) remains bounded as σ → 1.

2.3.4 Analytic Continuation


We want to analytically continue L(s, χ) to σ > 0 from σ > 1 when χ is a non-trivial Dirichlet
character modulo m. We have already seen such analytic continuationPof ζ(s) in subsection 1.3.5
as (1.13). We can use same technique to extend L(s, χ). Let A(x) = a≤x χ(a), then by taking
x → ∞ in Abel’s partial summation (Appendix A), we get
Z ∞
A(x)
L(s, χ) = s dx
1 xs+1
Since χ(a + m) = χ(a) for all a ∈ Z (χ 6= χ0 ), we note that if N = bxc = qm + r for some
q, r ∈ Z with 0 ≤ r < m then

N m−1 r
!
X X X
|A(x)| = χ(a) = q χ(a) + χ(a)


a=0
r a=0 a=0
X
= χ(a)


a=0
m−1
X
≤ |χ(a)|
a=0
= φ(m)
using the result (a) of subsection 2.2.2 which says m−1
P
a=0 χ(a) = 0 and that |χ(a)| = 1. From this
we conclude that |A(x)| ≤ φ(m) for all x, and hence we get that the above integral converges.
Therefore if χ is non-trivial Dirichlet character modulo m, then L(s, χ) can be continued to a
analytic function in the region {s ∈ C|σ > 0}.

2.3.5 Product of L-functions


Y
Let F (s) = L(s, χ) , where the product is over all Dirichlet character modulo m. Assume s
χ
is real and s > 1, i.e. s = σ > 1. From subsection 2.3.3 we know that

XX χ(pk )
G(σ, χ) =
p
kpkσ
k=1

Summing over χ and using the conclusion (b) of subsection 2.2.2, we get
X XX 1
G(σ, χ) = φ(m) (2.6)
χ p
kpkσ
k

where the sum is over all primes p and integers k such that pk ≡ 1 (mod m). The right-hand
side of (2.6) is non-negative, hence taking the exponential of both sides we get
For s real and s > 1, i.e. s = σ > 1 we have F (s) = F (σ) ≥ 1.

42
2.4 The Proof
Let P(a; m) be the set of prime numbers p such that p ≡ a (mod m). We wish to prove
that P(a; m) has infinite number of elements. We will divide the proof into two parts, while
restricting s to real, i.e. t = 0 we will write s = σ [5, §16.4].

2.4.1 Trivial Dirichlet Character


Let χ0 denote the trivial character modulo m, then L(σ, χ0 ) is a real function of positive real
numbers. From (2.3) it follows that
 
X 1
G(σ, χ0 ) = log 1 − σ + log(ζ(σ)) (2.7)
p
p|m

As seen in subsection 1.3.1, for s = σ,


lim (σ − 1)ζ(σ) = 1
σ→1

which implies that [5, §16.1]


− log(ζ(σ))
lim =1 (2.8)
σ→1 log(σ − 1)
Using (2.8) in (2.7) we get
−G(σ, χ0 )
lim =1
σ→1 log(σ − 1)

2.4.2 Non-trivial Dirichlet Character


Let χ be a non-trivial Dirichlet character modulo m. We need to ensure that L(1, χ) 6= 0, and
to prove this we will consider following two cases.

Complex Character
Let χ be a complex character modulo m, i.e. a character which takes only non-real values.
From the series defining L(s, χ) we see that for s = σ real, σ > 1,
L(σ, χ) = L(σ, χ)
Assume L(1, χ) = 0, then L(1, χ) = 0. Hence the functions
Q L(σ, χ) and L(σ, χ) are distinct and
both have a zero at σ = 1. In the product F (σ) = χ L(σ, χ) we know L(σ, χ0 ) has a simple
pole at σ = 1 (subsection 2.3.2) and all other factors are analytic about σ = 1 (subsection 2.3.4).
It follows that F (1) = 0. But from subsection 2.3.5 we know that F (σ) ≥ 1 for all σ > 1. This
contradicts our assumption. Therefore L(1, χ) 6= 0 when χ is a non-trivial complex character
modulo m.

Real Character
Let χ be a non-trivial real character, i.e. χ(a) = 0, 1 or −1 for all a ∈ Z. We will make use of fol-
lowing result whose proof is analogous to the proof of Euler’s product formula (subsection 1.3.2)
[5, §16.5]
Suppose f is a non-negative, multiplicative function of Z+ , i.e. for all a, b > 0with
gcd(a, b) = 1, f (ab) = f (a)f (b). Assume there is a constant c such that f pk < c
for all prime powers pk . Then ∞ −s converges for all σ > 1. Moreover
P
n=1 f (n)n
∞ ∞ !
X f (n) Y X f pk
= 1+ (2.9)
ns p
pks
n=1 k=1

43
Assume L(1, χ) = 0 and consider the function

L(s, χ)L(s, χ0 )
ω(s) = (2.10)
L(2s, χ0 )

The zero of L(s, χ) at s = 1 cancels the simple pole of L(s, χ0 ) so the numerator is analytic
on σ > 0 (subsection 2.3.4). The denominator is non-zero and analytic for σ > 21 . Thus ω(s)
is analytic on σ > 12 (compare with subsection 1.3.5). Moreover, since L(2s, χ0 ) has a pole at
s = 21 (subsection 2.3.2) we have for real s, i.e. s = σ, ω(σ) → 0 as σ → 21 .
We can re-write (2.10) to get the infinite product expansion (subsection 2.3.2)
Y χ(p) −1 χ0 (p) −1
    
χ0 (p)
ω(s) = 1− s 1− 1 − 2s
p
p ps p
Y χ(p) −1 1 −1
    
1
= 1− s 1− s 1 − 2s
p p p
p-m
Y (1 + p−s )
=
(1 − χ(p)p−s )
p-m

If χ(p) = −1 the p−factor is equal to 1. Thus


Y 1 + p−s
ω(s) = (2.11)
1 − p−s
χ(p)=1

where the product over all p such that χ(p) = 1. Moreover,



!
1 + p−s
  X
1 1
= 1+ s
1 − p−s p pks
k=0
2 2
= 1 + s + 2s + . . .
p p
Using this and applying (2.9) in (2.11) we find that

X ak
ω(s) = (2.12)
ks
k=1

where ak ≥ 0 and the series converges for σ > 1. Note that a1 = 1.


Since ω(s) is analytic for σ > 21 , expanding ω(s) in a power series about s = 2 using Taylor’s
theorem [1, §2.43], we get

X ω (j) (2)
ω(s) = (s − 2)j (2.13)
j!
j=0

where ω (j) (s) is the j th derivative of ω(s) with the radius of convergence at least 3
2 [1, §7.1].
But from (2.12) we get

(j)
X ak (− log(k))j
ω (2) = = (−1)j cj (2.14)
k2
k=1

where cj ≥ 0. Now using (2.14) in (2.13) we get



X
ω(s) = cj (2 − s)j
j=0

44
where cj is non-negative and

X ak
c0 = ω(2) = ≥ a1 = 1
k2
k=1

It follows that for real s, i.e. s = σ, in the interval 21 < σ < 2 we have ω(σ) ≥ 1. This
contradicts our assumption that ω(σ) → 0 as σ → 12 , and so L(1, χ) 6= 0.
Combining the above two cases, we can say that

If χ is a non-trivial Dirichlet character modulo m, then G(σ, χ) remains bounded as σ → 1.

This was proved by de la Vallée Poussin1 in 1896 [3, §4].

2.4.3 Completing the Proof


Now we simply divide all terms on both sides of the conclusion (b) of subsection 2.3.3 by
− log(σ − 1) and take the limit σ → 1
X −G(σ, χ) −1 X 1 −Rχ,a (σ)
lim χ(a) = φ(m) lim σ
+ lim
σ→1
χ
log(σ − 1) σ→1 log(σ − 1) p σ→1 log(σ − 1)
p∈P(a;m)

From the results of subsection 2.4.1 and subsection 2.4.2, we get the limit on left-hand side is
1 (because all except trivial-character will vanish) whereas the limit on the right-hand side is
φ(m)D(P(a; m)) (section 2.1). Thus D(P(a; m)) = 1/φ(m) and we are done.

If a, m ∈ Z, with gcd(a, m) = 1, then


1
D(P(a; m)) =
φ(m)

where φ(m) is the number of positive integers n, less than m, with gcd(n, m) = 1.

Since D(P(a; m)) 6= 0, this implies that P(a; m) has infinite number of elements.

2.5 Some Remarks


2.5.1 Chebotarëv Density Theorem
Chebotarëv’s density theorem may be regarded as the least common generalization of Dirich-
let’s theorem on primes in arithmetic progressions (1837) and a theorem of Frobenius (1880;
published 1896) [28, §1.1]. Generally, Frobenius’s theorem for f (X) = X m − 1 is implied by
Dirichlet’s theorem for the same m, but not conversely. One can formulate a sharper version of
Frobenius’s theorem that for f (X) = X m −1 does come down to Dirichlet’s theorem. To do this
we need do something called Frobenius substitution of p, σp . Once the Frobenius substitution
has been defined, one can wonder about the density of the set of primes p for which σp is equal
to a given element of the Galois group G of f (X), corresponding to the field K generated by
the zeros of f (X). Note that the Frobenius map Frobp is an automorphism of the field Fp of
characteristic p, and the Frobenius substitution σp is going to be an automorphism of the field
K of characteristic zero.
Theorem (Reformulation of Dirichlet’s Theorem). If f (X) = X m − 1 for some positive integer
m, then the set of prime numbers p for which σp is equal to a given element of the Galois group
G of f (X) has a density, and this density equals 1/|G|.
1
“Recherches analytiques sur la théorie des nombres premiers.” Deuxieme partie. Ann. Soc. Sci. Bruxelles.
20, 281-362 (1896).

45
Thus the Frobenius substitution is equidistributed over the Galois group if p varies over all
primes not dividing m. This leads to the desired generalization of the theorems of Dirichlet and
Frobenius. It was formulated as a conjecture by Ferdinand Georg Frobenius, and was ultimately
proved by Nikolaı̆ Chebotarëv in 1922. Chebotarëv’s theorem extends this to all f (X).

Theorem (Chebotarëv’s Density Theorem). Let f (X) be a polynomial with integer coefficients
and with leading coefficient 1. Assume that the discriminant ∆(f ) of f (X) does not vanish. Let
C be a conjugacy class of the Galois group G of f (X). Then the set of primes p not dividing
∆(f ) for which σp belongs to C has a density, and this density equals |C|/|G|.

For details refer to the article by P. Stevenhagen and H. W. Lenstra Jr.[14].

2.5.2 Prime Number Theorem for Arithmetic Progressions


We can combine the the results in the two chapters to talk about quantitative aspects of distri-
bution of primes in arithmetic progression

π(x)
π(x; m, a) ∼
φ(m)

The analytic proof of this statement is analogous to that of prime number theorem, with ζ(s)
replaced by L(s, χ). For that proof one can refer to Davenport’s book [3, §20]. For discus-
sion regarding the elementary proof on may refer to this MathOverflow discussion initiated by
Qiaochu Yuan: Is a “non-analytic” proof of Dirichlet’s theorem on primes known or possible?,
URL (version: 2011-09-29): https://mathoverflow.net/q/16735.
One can make enormous improvements in the error term of this estimate, provided there
are no Siegel zeros. And the Siegel zeros are rare as a consequence of Deuring-Heilbronn phe-
nomenon, which states that, the zeros of L-functions repel each other just like in Roth’s theorem
[26] different algebraic numbers repel each other. There is a close relation between Siegel zeros
and class numbers [27, §4.1]. For details refer to Andrew Granville’s article [9].

2.5.3 Tchébychev Bias


This phenomenon was first observed in a letter written by Pafnuty Tchébychev to M. Fuss on
23 March 1853:

There is a notable difference in the splitting of the prime numbers between the two
forms 4k + 3, 4k + 1: the first form contains a lot more than the second.

This bias is perhaps unexpected because as per the the prime number theorem for arithmetic
progressions discussed above, the primes tend to be equally split amongst the various forms p ≡ a
(mod m) with gcd(a, m) = 1 for any given modulus m. Hence we have
x
π(x; 4, 1) ∼ π(x; 4, 3) ∼
2 log(x)

i.e. half the primes are of the form 4k + 1, and half of the form 4k + 3. This asymptotic result
does not inform us about any of the fine details of these prime number counts, so neither verifies
nor contradicts the observation that π(x; 4, 3) > π(x; 4, 1).
But as we count the number of primes for various values of x, we observe that from time
to time, more primes of the form 4k + 1 than of the form 4k + 3, but this lead is held only
very briefly and then relinquished for a long stretch. Now one might guess that 4n + 1 will
occasionally take the lead as we continue to watch for bigger x. Indeed this is the case, as John
Edensor Littlewood discovered in 1914:

46
Theorem (Littlewood, 1914). There are arbitrarily large values of x for which there are more
primes of the form 4k + 1 up to x than primes of the form 4k + 3. In fact there are arbitrarily
large values of x for which

x log(log(log(x)))
π(x; 4, 1) − π(x; 4, 3) ≥
2 log(x)

At first sight, this seems to be the end of the story. But in 1962, Stanislaw Knapowski
and Pál Turán made a conjecture that is consistent with Littlewood’s result but also bears out
Tchébychev’s observation:

As X → ∞, the percentage of integers x ≤ X for which there are more primes of


form 4k + 3 up to x than of the form 4k + 1 goes to 100%

This conjecture may be paraphrased as “Tchébychev was correct almost all of the time.” For
details and recent develpoments refer to the article by Andrew Granville and Greg Martin [15].

47
Conclusion

“Riemann discovered that prime numbers too could be studied by harmonic analysis,
albeit of a slightly different kind. He realized that the psi function can be thought of
as a sum of elementary waveforms;. . . . . . To me, that the distribution of prime
numbers can be so accurately represented in a harmonic analysis is absolutely
amazing and incredibly beautiful. It tells of an arcane music and a secret harmony
composed by the prime numbers.”
— Enrico Bombieri, Prime Territory

One very important tool for our understanding of primes, that I couldn’t discuss in this
report, is the computational tool which enables us to establish faith in the conjectures that
we are trying to prove. For example, Alan Turing2 (1912–1954) was very much interested in
disproving Riemann hypothesis by finding a counter-example using an automatic computer. For
details about Turing’s attempts in this direction refer to the lecture by Yuri Matiyasevich [22].
As pointed in the beginning of second chapter, there can’t be any polynomial in one variable
that gives only prime number values. But we have following remarkable result by Julia Robinson
and Hilary Putnam:

There exists an exponential polynomial R(x0 , . . . , xk ) such that all of its positive
values for positive integer variables are prime numbers, and every prime number is
so presented.

One can find a good discussion on this theorem by Yuri Matiyasevich [8], who in 1971
extended this result[23]. This result played an important role in solving Hilbert’s Tenth Problem
whose statement I have discussed earlier [24, Conclusion]. But this result is of no practical use,
since it will be computationally difficult to calculate prime numbers by solving this system of
equations.
Luckily we have better ways of determining whether a given number is a prime or not. In
1970s, Gary Miller3 showed how a generalization of the Riemann Hypothesis allows a determin-
istic, polynomial-time procedure for recognizing primes, a few decades later in 2002, Manindra
Agrawal, Neeraj Kayal, and Nitin Saxena4 showed the same with completely elementary (and
rigorous) methods [17]. But, so far, this new polynomial-time primality test is not good in
practice. But there exist ways for primanlity testing, like using the arithmetic of elliptic curves,
which are conjectured to run in polynomial time but we have not even proved that they always
terminates [10].
Practical importance of prime numbers lies in the modern public key encryption [25] which is
based on the belief that there can’t exist a polynomial-time algorithm for factorising composite
numbers (unlike the efficient Euclid’s algorithm for finding the greatest common divisor). Most
practical factoring algorithms are based on unproved but reasonable-seeming hypotheses about
2
He is widely considered to be the father of theoretical computer science and artificial intelligence.
3
“Riemann’s Hypothesis and tests for primality.” Proceedings of the seventh annual ACM symposium of
Theory of Computing 1975, pp. 234–239. doi:10.1145/800116.803773
4
“PRIMES is in P.” Annals of Mathematics 160, no. 2 (2004), 781–793. doi: 10.4007/annals.2004.160.781.

48
the natural numbers. Although we may not know how to prove rigorously that these methods
will always produce a factorisation, or do so quickly, in practice they do. Broadly speaking there
are two types of factorisation methods, sieve-based methods [12] and elliptic curve method. A
nice exposition about computational aspects of analytic number theory (the distribution of
primes and the Riemann hypothesis), Diophantine equations (Fermat’s last theorem and the
abc conjecture) and elementary number theory (primality and factorisation) can be found in
Carl Pomerance’s article [10].
I will end my report by stating the two recent developments towards our understanding of
prime numbers:

ˆ In 2013, Yitang Zhang5 proved that

lim inf (pn+1 − pn ) < 7 × 106


n→∞

where pn denotes the nth prime. This was a result of Zhang’s attempt to prove the
Twin Primes conjecture. The proof was inspired by idea of Bombieri-Friedlander-Iwaniec6
and Goldston-Pintz-Yı́ldı́rı́m7 sieve (which makes the connection between prime gaps and
primes in arithmetic progression). Few months later James Maynard and Terence Tao (in-
dependently) gave a different simpler approach than Zhang’s which reduced the bounded
gaps between primes to 246 [23]. One interesting aspect of the better bounds is that the
best bounds were obtained by using important theorems in Algebraic Geometry, what
are known as the ‘Weil Conjectures’, which at first glance appear to be far removed from
Twin Primes [18].

ˆ In 2016, Kannan Soundararajan and Robert Lemke Oliver8 discovered that that primes
seem to avoid being followed by another prime with the same final digit. They presented
both numerical and theoretical evidence that prime numbers repel other would-be primes
that end in the same digit, and have varied predilections for being followed by primes
ending in the other possible final digits [19].

5
“Bounded gaps between primes.” Annals of Mathematics 179, no. 3 (2014), 1121–1174. doi: 10.4007/an-
nals.2014.179.3.7
6
Bombieri, E., Friedlander, J. B. and Iwaniec, H. “Primes in arithmetic progressions to large moduli.”
Acta Mathematica 156 (1986), 203–251. doi:10.1007/BF02399204. http://projecteuclid.org/euclid.acta/
1485890416.
7
Goldston, D. A., Pintz, J. and Yı́ldı́rı́m, C. Y. “Primes in Tuples I.” Annals of Mathematics 170, no. 2
(2009): 819–862. doi:10.4007/annals.2009.170.819. https://arxiv.org/abs/math/0508185
8
“Unexpected Biases in the Distribution of Consecutive Primes.” Proceedings of the National Academy of
Sciences 113, no. 31 (2016): E4446–E4454. doi:10.1073/pnas.1605366113. https://arxiv.org/abs/1603.03720

49
Appendix A

Abel’s Summation Formula

This formula is based on our understanding of Riemann–Stieltjes integral. We will follow [7,
§22.5].
X
Theorem. Suppose that a1 , a2 , . . . is a sequence of numbers, such that A(t) = an and f (t)
n≤t
is any function of t. Then
X X  
an f (n) = A(n) f (n) − f (n + 1) + A(x)f (bxc) (*)
n≤x n≤x−1

If, in addition, aj = 0 for all j less than some positive integer m and f (t) has a continuous
derivative for t ≥ m, then
X Z x
an f (n) = A(x)f (x) − A(t)f 0 (t)dt (**)
n≤x m

Proof. We rewrite the left-hand-side of (*) as


X    
an f (n) = A(1)f (1) + A(2) − A(1) f (2) + . . . + A(bxc) − A(bxc − 1) f (bxc)
n≤x
   
= A(1) f (1) − f (2) + . . . + A(bxc − 1) f (bxc − 1) − f (bxc) + A(bxc)f (bxc)

Since A(bxc) = A(x), this proves (*). To deduce (**) we observe that A(t) = A(n), a constant,
when n ≤ t < n + 1 and so
  Z n+1
A(n) f (n) − f (n + 1) = − A(t)f 0 (t)dt
n

Also A(t) = 0 when t < m. Hence proving (**).

If we put an = 1 and f (t) = 1/t, we have A(x) = bxc and (**) becomes
Z x
X1 bxc btc
= + 2
dt
n x 1 t
n≤x
Z x
x − {x} 1
= + t − {t} 2 dt
x 1 t
Z x
{x} {t}
=1− + log(x) − dt
x 1 t2

50
Observe that
x
Z Z x
{t} 1

2
dt ≤ |{t}| 2 dt

1 t t
Z1 x
1
≤ 2
dt
1 t
1
=1−
x
x
{t}
Z
Hence the integral dt converges to a limit as x → ∞. Thus we can write
1 t2
Z x Z ∞ Z ∞
{t} {t} {t}
2
dt = 2
dt − dt
1 t 1 t x t2

Therefore, we have
 Z ∞  Z ∞ 
X1 {t} {t} {x}
= log(x) + 1 − dt + dt −
n 1 t2 x t2 x
n≤x
Z ∞  
O(1) 1
= log(x) + γ + 2
dt + O
x t x

Therefore  
X1 1
= log(x) + γ + O
n x
n≤x

where γ is the Euler-Mascheroni constant given by


 
1 1
γ = lim 1 + + . . . + − log N ≈ 0.57721
N →∞ 2 N

51
Bibliography

[1] Titchmarsh, E. C. The Theory of Functions. London: Oxford University Press, 1939.
https://archive.org/details/TheTheoryOfFunctions

[2] Ayoub, R. An Introduction to the Analytic Theory of Numbers (Mathematical Surveys


and Monographs 10), Providence: American Mathematical Society, 1963. http://doi.
org/10.1090/surv/010

[3] Davenport, H. Multiplicative Number Theory. New York: Springer-Verlag, 1980. http:
//doi.org/10.1007/978-1-4757-5927-3

[4] Titchmarsh, E. C. The Theory of the Riemann Zeta-Function. Oxford: Oxford University
Press, 1986.

[5] Ireland, K. and Rosen, M. A Classical Introduction to Modern Number Theory. New
York: Springer-Verlag, 1990. http://doi.org/10.1007/978-1-4757-2103-4

[6] Derbyshire, J. Prime Obsession: Bernhard Riemann and the Greatest Unsolved Prob-
lem in Mathematics. Washington, D.C.: Joseph Henry Press, 2003. http://www.
johnderbyshire.com/Books/Prime/page.html

[7] Hardy, G. H. and Wright E. M. An Introduction to the Theory of Numbers. Oxford:


Oxford University Press, 2008.

[8] Matiyasevich, Y. “Formulas for Prime Numbers.” in Kvant Selecta: Algebra and Analysis
- II, edited by S. Tabachnikov, pp. 13–24. Providence: The American Mathematical
Society, 1999 (The Russian original was published in Kvant 1975. no. 5. pp. 5-13. http://
kvant.mccme.ru/1975/05/formuly_dlya_prostyh_chisel.htm; Translated by N. K.
Kulman)

[9] Granville, A. “Analytic Number Theory.” in The Princeton Companion for Mathematics,
edited by T. Gowers, J. Barrow-Green and I. Leader, pp. 332–348. Princeton and Oxford:
Princeton University Press, 2008. Preprint available at http://www.dms.umontreal.ca/
~andrew/PDF/PrinceComp.pdf
[10] Pomerance, C. “Computational Number Theory.” in The Princeton Companion for
Mathematics, edited by T. Gowers, J. Barrow-Green and I. Leader, pp. 348–362.
Princeton and Oxford: Princeton University Press, 2008. Preprint available at https:
//math.dartmouth.edu/~carlp/PDF/pcm0049.pdf

[11] Beineke, J. and Hughes, C. “Great Moments of the Riemann zeta Function.” in Biscuits
of Number Theory, edited by A. T. Benjamin and E. Brown, pp. 199–215. Providence:
The Mathematical Association of America, 2009.

[12] Pomerance, C. “The Search for Prime Numbers.” Scientific American 247, no. 6
(1982), 136–147. doi: 10.1038/scientificamerican1282-136. https://math.dartmouth.
edu/~carlp/PDF/search.pdf

52
[13] Bombieri, E. “Prime Territory.” The Sciences 32, no. 5 (1992), 30–36. http://doi.org/
10.1002/j.2326-1951.1992.tb02416.x
[14] Stevenhagen, P. and Lenstra, H. W. “Chebotarëv and His Density Theorem.” The Mathe-
matical Intelligencer 18, no. 2 (1996), 26–37. http://dx.doi.org/10.1007/bf03027290
[15] Granville, A. and Martin, G. “Prime Number Races.” The American Mathematical
Monthly 113, no. 1 (2006), 1–33. http://doi.org/10.2307/27641834
[16] Montgomery, H. L. and Wagon, S. “A Heuristic for the Prime Number Theorem.” The
Mathematical Intelligencer 28, no. 3 (2006), 6–9. http://doi.org/10.1007/BF02986877
[17] Spencer, J. and Graham, R. “The Elementary proof of Prime Number Theorem.”
The Mathematical Intelligencer 31, no. 3 (2009), 18–23. http:/doi.org/10.1007/
s00283-009-9063-9.
[18] Sreekantan, R. “Yitang Zhang and The Twin Primes Conjecture.” At Right
Angles 2, no. 3 (2013), 14–17. http://teachersofindia.org/en/article/
yitang-zhang-and-twin-primes-conjecture
[19] Klarreich, E. “Mathematicians Discover Prime Conspiracy.” Quanta
Magazine March 13, 2016. https://www.quantamagazine.org/
mathematicians-discover-prime-conspiracy-20160313/
[20] Tao, T. (22 Jan 2009) Structure and Randomness in the Prime Numbers [Video
File]. University of California, Los Angeles. Retrieved from https://www.youtube.com/
watch?v=PtsrAw1LR3E. Lecture slides available at https://www.math.ucla.edu/~tao/
preprints/Slides/primes.pdf
[21] Taylor, R. (27 April 2012) Primes and Equations [Video file]. Institute for Advanced
Study, Princeton. Retrieved from https://www.youtube.com/watch?v=5pTaZu3C--s
[22] Matiyasevich, Y. (10 Dec 2012) Alan Turing and Number Theory [Video File]. Alan Tur-
ing Centenary Conference Manchester, 2012. Retrieved from https://www.youtube.
com/watch?v=5ADOT72v_9U. Lecture slides available at http://videolectures.net/
turing100_matiyasevich_number_theory/
[23] Granville, A. (20 August 2015) The Patterns in the Primes [Video file]. Yale University,
New Haven. Retrieved from https://www.youtube.com/watch?v=pO7Egc5Dtqs. Lec-
ture slides available at http://www.dms.umontreal.ca/~andrew/PrimeLect.html
[24] Korpal, G. “Diophantine Equations.” Summer Internship Project Report, guided by Prof.
S. A. Katre (18 May 2015 – 16 June 2015)
[25] Korpal, G. “Enigma Cryptanalysis.” Summer Internship Project Report, guided by Prof.
Geetha Venkataraman (6 July 2015 – 26 July 2015)
[26] Korpal, G. “Diophantine Approximations.” Winter Internship Project Report, guided by
Prof. R. Thangadurai (13 December 2015 – 8 January 2016)
[27] Korpal, G. “Number Fields.” Summer Internship Project Report, guided by Prof. Ramesh
Sreekantan (1 June 2016 – 31 July 2016)
[28] Korpal, G. “Reciprocity Laws.” Winter Internship Project Report, guided by Prof. Chan-
dan Dalawat (9 December 2016 – 7 January 2017)

Prepared in LATEX 2ε by Gaurish Korpal

53

You might also like