Parameswaran 2023
Parameswaran 2023
S.A. Parameswaran
The Rudolf Peierls Centre for Theoretical Physics
Oxford University, Oxford OX1 3PU, UK
1
Contents
2
6.5 Example: α decay . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
9 Radiative transitions 46
10 Selection rules 46
10.1 Selection rules from conservation of angular momentum . . . . . . . . . . . . . . 46
10.2 Selection rules from matrix elements . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
12 Atomic Structure 55
12.1 Electronic states of helium . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
12.1.1 Helium ground state . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
12.1.2 First excited state of helium . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
12.1.3 Higher excited states . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
12.1.4 Variational Approximation for the Ground State of Helium . . . . . . . . 58
12.2 Many-electron atoms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
12.3 Ionisation energy as a function of nuclear charge Z . . . . . . . . . . . . . . . . . 60
3
What these lectures are about
These are notes for 20 hours of lectures on Quantum Mechanics delivered over the last three weeks of Hilary
Term and the first two weeks of Trinity Term. They follow on from Fabian Essler’s lectures on Quantum
Mechanics; together the two courses are intended to provide an introduction to quantum mechanics, with
no particular significance to how topics are split between the lecturers — except for the obvious fact that
the present set of lectures will build on the earlier material.
Broadly speaking, the Quantum Mechanics course up to this point has focused on developing the pos-
tulates and formalism of quantum theory, and using them to solve the Schrödinger equation for a variety
of physically-meaningful potentials: square wells, harmonic oscillators, and hydrogen-like atoms. This more
or less exhausts the set of simple examples that can be exactly solved. However, problems “close to” these
canonical examples often arise, roughly for two different reasons. The first is that we are often interested
in how weak external fields can be used to extract information about the quantum-mechanical eigenstates
and eigenenergies of one of these standard problems. The second is that we sometimes want to understand
the new eigenstates and eigenenergies for a problem where the Hamiltonian is a small deformation of one of
our standard examples. Often, such problems can be understood by systematically building on our results
for the original solvable example via a technique known as “perturbation theory” — usually organized as a
series expansion in the “small” parameter governing the deviation from the solvable point. [Those of you
who have heard of Feynman diagrams may be interested to know that these are just an ingenious way of
organizing a very complicated perturbation series.] Perturbation theory can be time-independent — where
we examine how energies and wavefunctions change on introducing a small correction — or time-dependent,
where the focus is on asking how the state of the system changes over time. We will discuss both of these,
but also other approximation methods such as the variational and WKB methods for time-independent
problems, and the sudden and adiabatic approximations to time-dependent ones, that provide alternatives
to perturbation theory in cases where it is difficult to implement.
In tandem, we will apply these ideas to develop a deeper understanding of atomic physics, addressing
important phenomena such as the Stark and Zeeman effects (the perturbative effects of electric and magnetic
fields on atoms) as well as understanding how electromagnetic radiation can drive transitions between energy
levels, and how this can be used to understand atomic structure.
We will also touch upon three deep fundamental aspects of quantum mechanics, each of which can be
viewed as a “trailer” for more advanced topics covered later on in the physics curriculum: the motion of
charged particles in magnetic fields, the properties of identical particles, and the physics of density matrices.
Throughout, we will use the ideas of symmetry and the techniques for solving the Schrödinger equation
developed over the year so far in order to organize and simplify our calculations.
• Principles of Quantum Mechanics, R. Shankar, denoted RS. This is an all-time classic text by one of
the master pedagogues of the subject. While occasionally wordy, this probably has some of the most
detailed and insightful physical explanations and analogies. It has a very clear introduction to the
mathematics of quantum mechanics, and also contains excellent introductions to some more advanced
topics such as Berry’s phase and path integrals for the interested reader.
• The Physics of Quantum Mechanics, J. Binney and D. Skinner (Oxford University Press) [B&S].
Probably the book that most closely matches the progression through the course, as it was written by
4
one of the previous lecturers.
• Introduction to Quantum Mechanics, D.J. Griffiths, [DG]. This is a well-known undergraduate text-
book and is very pedagogical with many good problems and worked examples. Beware early editions,
which used a non-standard definition of creation and annihilation operators that ought to have at-
tracted criminal sanctions against the author.
• Modern Quantum Mechanics, J.J. Sakurai and J. Napolitano [SN]. Another classic textbook, with
often somewhat more concise explanations than many of the others; yet particularly good for clear
exposition of the mathematical aspects of groups and representations.
• Quantum Mechanics, Vol. III. of A Course of Theoretical Physics by L.D. Landau and E.M. Lifshitz
[LL]. A more advanced textbook, that covers far more material than possible in an undergraduate
course. Nevertheless an invaluable resource with many unusual and ‘one of a kind’ discussions and
tricks. Very useful to refer to if under interrogation by a Russian colleague, and heavy enough to serve
as a defensive weapon of last resort.
Hilary Term
1. f Motion of a charged particle in a magnetic field [B&S Chapter 9 and Section 10.1.3.]
2. a Time-independent perturbation theory [B&S Section 10.1, RS Chapter 17, DG Chapter 6, SN Sec-
tions 5.1 and 5.2.]
5. a The variational principle [B&S Section 10.2, RS Section 16.1,DG Section 8.1, SN Section 5.4]
6. a WKB Approximation [B&S Section 12.6, RS Section 16.2, DG Chapter 9.] Note: WKB is not
examinable, but may be conceptually helpful for exam questions involving potential wells, and also in
later years of the course.
7. a Time-dependent problems: sudden and adiabatic approximations [B&S Sections 10.3 and 12.1,
RS Chapter 18, DG Chapter 7.]
8. a Time-dependent perturbation theory [B&S Sections 10.3 and 12.1, RS Chapter 18, DGChapter 7.]
9. a Fermi’s Golden Rule [B&S Sections 10.3 and 12.1, RS Chapter 18, DG Chapter 7.]
Trinity Term
10. p Radiative transitions [B&S Sections 10.3 and 12.1, RS Chapter 18, DG Chapter 7.]
11. p Selection rules [B&S Sections 10.3 and 12.1, RS Chapter 18, DG Chapter 7.]
12. f Identical particles [B&S Sec 11.1, RS Section 10.3, DGChapter 5, SN Chapter 7.]
5
13. p Atomic Structure [B&S Section 11.2, SN Section 7.4] Note: the ‘term’ notation (writing atomic
levels in the form 2S+1 LJ ) is not strictly speaking examinable but rears its head again on the B3 paper
in the third year.
14. f Density operators [B&S Section 6.3, RS Sec 4.2.] Note: density matrices are not examinable, but
are very useful to clear up confusions with the differences between classical and quantum probability,
and for anyone contemplating further study in fields such as atomic physics or quantum information.
6
1 Quantum Mechanics of Charged Particles in Magnetic Fields
In this section of the course, we discuss how to write down the Hamiltonian for a charged particle in
a magnetic field. This will naturally lead us to some fundamental questions on the connection between
quantum mechanics and electromagnetism. It is also a necessary first step in order for us to understand the
energy levels and eigenstates for quantum systems of charged particles in magnetic fields.
There are two cases that will be of interest to us. The first is when the magnetic field acts on an otherwise
free particle, i.e. there are no forces besides the Lorentz force. In this situation, we must solve the problem
of the magnetic field at “step zero” to determine the energy levels and eigenstates, and this immediately
raises the various fundamental issues alluded to earlier. (We will restrict ourselves to a full solution only
in the simplest case of a uniform magnetic field). In the second case, the magnetic field is applied on top
of an existing potential — usually the Coulomb potential of an atomic nucleus — and so our goal will be
to understand how the field changes the energy levels that we produced by solving the original, field-free
problem at “step zero”. We will defer discussion of this case until we introduce perturbation theory, and
cover the Zeeman effect in Section 3.
1
H= [p − qA(x)]2 + qϕ(x), (1)
2m
where ϕ and A are the electrostatic (scalar) potential and magnetic vector potential that produce the
electromagnetic fields in which the particles move, which are in turn given by
B = ∇×A (2)
E = −∇ϕ − ∂t A. (3)
This principle of modifying the momentum to include the vector potential is more generally applicable and
is sometimes called “minimal coupling”.
As promised, we will now check that (1) reproduces the equations of motions expected for a classical
particle in an electromagnetic field, i.e. that
d2
m ⟨x⟩ = q[⟨v × B + E⟩], (4)
dt2
7
although, as we will see, the magnetic field part of the Lorentz force on the RHS will require some modifi-
cation in quantum mechanics.
Recall that for any observable Q (which possibly depends explicitly on time), we have
d 1 ∂Q
⟨Q⟩ = − ⟨[H, Q]⟩ + (5)
dt iℏ ∂t
Let us set Q = xk , i.e. the k th Cartesian component of x. Using the standard commutator product rule,
i.e. that for any two operators P, Q [P 2 , Q] = P [P, Q] + [P, Q]P , we have
1 iℏ
[H, xk ] = [(p − qA)2 , xk ] = − (pk − qAk ), (6)
2m m
where we have used the fact that xk has no explicit time dependence, and that the vector potential is only
a function of the position and hence commutes with xk . This means that the particle velocity is simply
d 1 1
vk ≡ ⟨xk ⟩ = − [H, xk ] = ⟨pk − qAk ⟩. (7)
dt iℏ m
1
[H, vk ] = [H, pk − qAk ]
m( )
1 1 X 2
= [(pl − qAl ) , pk − qAk ] + [qϕ, pk − qAk ]
m 2m
l
( )
1 1 X ∂ϕ
= (pl − qAl )[pl − qAl , pk − qAk ] + [pl − qAl , pk − qAk ](pl − qAl ) + iℏq
m 2m ∂xk
l
( )
(−iℏ) q X ∂Al ∂Ak ∂Al ∂Ak ∂ϕ
= (pl − qAl ) − + − (pl − qAl ) − q
m 2m ∂xk ∂xl ∂xk ∂xl ∂xk
l
( )
(−iℏ) q X ∂Al ∂Ak ∂Al ∂Ak ∂ϕ
= vl − + − vl − q . (8)
m 2 ∂xk ∂xl ∂xk ∂xl ∂xk
l
P P P
We
P now observe that (v×B) k = l,m ϵklm v l B m = l,m,p,q ϵ
Pklm v l ϵmpq ∂ p A q = l,m,p,q (δkp δlq −δkq δlp )vl ∂p Aq =
l vl (∂k Al − ∂l Ak ) while a similar argument shows that l (∂k Al − ∂l Ak )vl = −(B × v)k . Using (5) with
Q = v, simplifying according to (8) and being careful to remember the possible explicit time-dependence in
A, we find that
d⟨v⟩ 1 ∂ q v×B−B×v q v×B−B×v
= (p − qA) + −∇ϕ + = −∇ϕ − ∂t A + . (9)
dt m ∂t m 2 m 2
d2 ⟨x⟩
d⟨v⟩ v×B−B×v
m = m = q E + , (10)
dt2 dt 2
where the RHS is the correct quantum version of the Lorentz force. The peculiar ‘antisymmetrized’ rewriting
of the cross product is necessary to account for the fact that B and v depend on p and x and hence could
be non-commuting, so we should be careful about their ordering; clearly it reduces to the usual v × B in
the classical limit. You can verify that it also does so in cases where ∇ × B = 0, which includes the uniform
field considered in John Chalker’s notes. Eq. (10) here is more general and includes cases where ∇ × B ̸= 0,
which requires either non-zero currents or time-varying electrical fields à la Maxwell’s equations.
8
1.2 Aharonov-Bohm Effect
If you step back and think about the minimal coupling prescription in light of what you have learned in the
electromagnetism course, something should bother you. The Hamiltonian (1) involves the vector potential
A, and not just the magnetic field B. This is surprising because in a standard classical presentation of
electromagnetism, we are often told — and indeed, learn from experience — that B is “physically real”
whereas A is just a mathematical convenience. However, in quantum mechanics, Aharonov and Bohm
devised an experimental set-up to show that A is just as “physically real” as B. This can be done by
arranging for a charged particle to move in a region where A ̸= 0 but B = 0, and showing that the
nonvanishing of A has physical consequences — in the Aharnov-Bohm example, manifest in an interference
pattern. The proposed arrangement is shown in Fig. 1.
interference pattern
on screen
coherent matter waves of − fringes shifted by A
charged particles incident
on double slits
FIG. 1: Arrangement demonstrating the Aharonov-Bohm effect, using a version of the double slit experiment. A
coherent beam of charged particles is incident on double slits in the standard way, and forms an interference pattern
on a screen behind. There is a tube perpendicular to the plane of this drawing in the region between the double slits
and the screen, carrying magnetic flux. The magnetic flux outside the tube is zero everywhere. The charged particles
cannot reach the interior of the tube, and therefore would not experience any Lorentz force in a classical version of
the experiment. Outside the tube, there is necessarily non-zero vector potential (see main text) and this affects the
interference pattern.
In the set-up shown, non-zero flux Φ within the tube implies a non-zero vector potential somewhere in
the region outside the tube: this follows from Stokes’ theorem, which gives
Z Z
Φ= ∇ × A · dS = A · dl (11)
S C
for the flux Φ through a loop C = ∂S bounding a surface S. The result of the experiment is that the
position of the interference fringes on the screen depends on the value of Φ, demonstrating that A is just
as ‘physically real’ as B. The experiment was done by R.G. Chambers at Bristol University, as described
in Phys. Rev. Lett. 5, 3 (1960).
9
leaves the fields E and B, defined by (3), unchanged (verify this, if you don’t recall how it works.)
The problem we face is that since the Hamiltonian (1) involves A it depends on the gauge — which seems
to be very unpleasant, since we wouldn’t want any physical consequences derived using it to depend on our
choice of gauge. Remarkably, it turns out that the although the Hamiltonian is indeed gauge-dependent,
physical quantities such as the energy levels and the probability density — though not the probability
amplitude, a.k.a. the wavefunction — are gauge independent. This is best understood by considering how the
gauge transformation affects the time-dependent Schrödinger equation (TDSE): suppose the wavefunction
ψ satisfies
1
iℏ∂t ψ = Hψ with H= (p − qA(x))2 + qϕ(x). (13)
2m
Under a gauge transformation, we have
1
H → H′ = (p − qA′ (x))2 + qϕ′ (x) (14)
2m
where the primes indicate the gauge-transformed potentials defined in (12). Now, if we simultaneously
transform the wavefunction acording to
ψ → ψ ′ = eiqχ/ℏ ψ (15)
then, as we can check by substitution, the Schrödinger equation is once again satisfied, iℏ∂t ψ ′ = H ′ ψ ′ .
Observe that as a global phase, the gauge transformation of ψ does not change the probability density
|ψ|2 = |ψ ′ |2 , as promised; each eigenfunction picks up the phase factor, but the eigenvalues (energy levels)
will remain unchanged since the form of the TDSE (and hence the TI SE) is invariant.
Note also that while the probability amplitude itself is gauge-dependent, any physical consequences will
still depend only on A in a gauge-invariant manner, which means ultimately the results of the experiment
described above can be framed in terms of a dependence of the intereference fringes on the flux enclosed.
However, since the particles always moved in a field-free region, any local picture of quantum mechanics
must require that A itself, and not just B, have physical reality, even if the answers only depend on the
latter.
ℏ q
J= (ψ ∗ ∇ψ − ψ∇ψ ∗ ) − |ψ|2 A . (18)
2im m
10
to a magnetic field of strength B. We know that the particle moves in a circle with a speed v that can be
determined by balancing the centripetal acceleration and the Lorentz force:
mv 2
= qvB (19)
r
which yields an angular frequency for the motion of
v qB
ωc ≡ = . (20)
r m
This is known as the cyclotron frequency. Note that there is an analogy with the harmonic oscillator, in the
sense that in both cases the frequency is independent of the amplitude (or speed) of the motion. As we will
see, this analogy carries over to the quantum treatment. Generalizing the problem to allow the particle to
move in three dimensions, we see that the momentum parallel the field will be conserved, and therefore the
particle trajectory is a helix whose projection onto the xy plane describes uniform circular motion at the
cyclotron frequency.
To discuss the quantum case, we need to first pick a gauge for the vector potential. Let us fix the
magnetic field in the z direction, and choose the so-called ‘Landau gauge’
Here ℓB is a lengthscale known as the magnetic length, and is the size of the smallest possible circular orbit
in the quantum problem.
(24) is the TISE for a displaced harmonic oscillator with spring constant κ = (qB)2 /m and
Observe that p
hence frequency κ/m = qB/m = ωc . The energy levels are therefore
1
E = (n + )ℏωc . (24)
2
The probability density associated with the eigenfunctions is |ψ(x, y)|2 = |φ(x)|2 and φ(x) is the harmonic
oscillator eigenfunction associated with the nth level, displaced in x by a distance x0 . Clearly this density
is independent of y: it forms a strip in the x-y plane, parallel to the y-axis. Such a behaviour is rather
puzzling when we think back to the classical motion, since it seems nothing like what we might expect from
quantising motion around a circle. The explanation is that, while we can find eigenfunctions that are of
the expected circular form1 , their energy is independent of the location of the centre of the circle. This
means that the Hamiltonian has many degenerate eigenstates. Any linear superposition of such degenerate
11
y
x
x0
FIG. 2: Illustration of the connection between circular orbits of a charged particle in a magnetic field and the Landau
level wavefunction given in (23). This wavefunction is a superposition of states that are degenerate in energy and each
represent motion around a circle, with centres on the line x = x0 . Taken individually, these states have a probability
current running around their circumference, as illustrated for the lowest circle in the figure. When superimposed,
the current densities in the y-direction add to produce net current densities parallel to the y-axis, while the current
densities in the x-direction cancel, as shown in the upper half of the figure and as found in (25).
states will also be an eigenstate. We can therefore build up probability density in a strip by superposing
many eigenstates each describing circular motion, with their centres laid out on the line x = x0 , as shown
in Fig. 2.
To test this interpretation, we can evaluate the probability current density for this state. Using (24) and
taking φ(x) real, we have current components
ℏk qB qB
Jx = 0 and Jy = − x |φ(x)|2 = − (x − x0 )|φ(x)|2 , (25)
m m m
which is consistent with the picture of superimposed circular orbits as in Fig. 2, since in this picture
contributions to Jx from different orbits cancel, and contributions to Jy add with a sign that depends on
the sign of x − x0 .
We have argued that the energy levels in this system (which are known as Landau levels) are degenerate,
and it is interesting to ask how many states there are in total at a given energy. Consider a system of size
Lx × Ly and take periodic boundary conditions in the y-direction. Then in the usual way k = (2π/Ly )m
with m an integer (not the mass of the particle! ). This implies that x0 = 2π(ℓ2B /Ly )m. Now the value of
x0 should lie in the range 0 < x0 < Lx , otherwise the state would have its centre outside the system, so the
integer m must lie in the range
Lx Ly
0<m< . (26)
2πℓ2B
1
You can confirm this yourself by reproducing the calculation of energy levels and wavefunctions but working in ‘symmetric
gauge’ where A = 21 x × B, which preserves rotational invariance around z so that states can be labeled by their Lz quantum
numbers. Note that matching the spectra between the two gauges is a little tricky: you will find that the symmetric gauge
energy levels have the form En,m = ℏωc n + 12 + |m|−m 2
, where n = 0, 1, 2, . . . and m = 0, ±1, ±2, . . . label the radial and Lz
quantum numbers. Observe the asymmetry between positive and negative values of m — only the former lead to no change in
the energy, which is important when you set about computing the degeneracy of a Landau level. (Ignore O(1) corrections when
you do so, as this allows you to drop the apparent extra contribution of the m < 0 states to the degeneracy of higher Landau
levels.) It is important to always remember that both choices of gauge are equally legitimate and there is nothing inherently
physical about either choice — they are more a matter of computational convenience. Similarly the preservation of translational
versus rotational symmetry is really an artefact of the gauge choice rather than an inherent property of the system.
12
This means that the number of states in a Landau level is proportional to the area Lx Ly of the system, and
2 . This extensive degeneracy of a Landau level is very unusual and arises in a
that the area per state is 2πlB
handful of quantum mechanical problems, nearly all of which are extremely interesting. The reason for this
is that when we consider perturbations — e.g. due to an external electrostatic potential or incorporating
interactions between particles — the methods of degenerate perturbation theory that we will discuss later
on in this course are very challenging to apply to the extensively-degenerate Landau level . For the specific
case of electrons in two dimensions, adding potentials or interactions or both to Landau levels leads to a
remarkably rich set of phenomena linked to the integer and fractional quantum Hall effect. Developing an
understanding of these phenomena has led to three Nobel Prizes to date, and remains an important area
of research. At Oxford, these and related problems are studied by several faculty in theoretical physics
including Profs. Chalker, Sondhi, Simon, and myself.
If we now reintroduce the motion parallel to the magnetic field, i.e. along z, it should be clear that
translational symmetry is preserved, so that pz is conserved. So the spectrum of the problem is a mix of the
highly-degenerate behaviour of a Landau level in the plane perpendicular to the field, and a free-particle
spectrum for the motion parallel to the field. Referring back to the statistical physics course where you
should have learned about the ‘density of states’, we see that a magnetic field dramatically reshapes the
density of states of charged particles. In two dimensions, instead of the usual constant density of states
m
g(E) = A 2πℏ 2 per spin species for a free electron gas, instead we have a series of δ-function peaks at the
cyclotron energies En = n + 21 ℏωc . In three dimensions, we need to include the one-dimensional motion
parallel to the field.√Recall that a dispersion of the form E(p) = p2 /2m usually leads to a density of states
that diverges as 1/ E; generalizingPthis to include an energy offset En , we see that the density of states
now looks something like g(E) ∝ 1/2 Θ(E − E ). Observe that the cyclotron frequency —
n (E − En ) n
which sets the spacing between Landau levels — depends inversely on the electronic mass. You should
also recall from your statistical mechanics lectures on quantum gases (likely just a week or so ago!) that
the thermodynamics of a Fermi gas depends sensitively on the density of states. Thus if we can measure
thermodynamic quantities in a magnetic field, we can extract the cyclotron frequency and hence the mass
of the electron.
This may seem like an overly-complicated way to measure the electron mass and indeed is not used for
free electrons2 . However, in crystalline solids, one often has complicated dispersions that can be understood
in terms of an “effective mass”, as you will learn in the condensed matter course next year. This arises due
to the interaction between the electron and the periodic array of positive charges in the crystal and can be
very different from the electron mass: for instance, it is often anisotropic i.e. vary in different directions.
Combining the ideas of this section with semiclassical techniques akin to the WKB approximation that
we develop later in the course, it turns out that one can show that measuring electrical conductivity or
thermodynamic quantities such as the specific heat or the magnetization in the presence of a strong magnetic
field can reveal details of the effective mass of electrons. This remains one of the most powerful routes to
understanding the properties of metals. At Oxford, Prof. Amalia Coldea’s group has pioneered applications
of this method to new and interesting materials such as the iron-based superconductors and topological
semimetals.
2
Although it’s worth remembering that classical cyclotron motion was one of the original routes to measuring the electron
charge-to-mass ratio. This could then be combined with Millikan’s measurements of the electron charge to extract the mass.
13
2 Time-Independent Perturbation Theory
The problems you studied in the Quantum Mechanics lectures (for example: the harmonic oscillator or a
particle in an infinite square well or Coulomb potential) are atypical in being solvable exactly using pen
and paper. As noted in the introduction, often we are interested in problems that are not exactly solvable,
or in the response of one of an exactly solvable system to external fields. To treat such situations, we will
need approximation methods. In these lectures we will discuss different types of approximation method,
which are useful in various different settings. In one very common setting, we want to solve approximately
the TISE for a system with a Hamiltonian that differs only by a small term from a simpler Hamiltonian
for which we know the eigenvalues and eigenvectors exactly. Time-independent perturbation theory gives a
way of doing this, based on the idea of a Taylor expansion.
Generically we can write a Hamiltonian H of this type as
H = H0 + λV . (27)
Here λ is a dimensionless small parameter, H0 is the simpler Hamiltonian and V , the difference between H
and H0 , is known as the perturbation. We write the solutions to the TISE for the unperturbed problem,
which we know at the outset, in the form
A crucial point is that all the dependence on λ is shown explicitly, so that the coefficients in the Taylor
(1)
expansion (|δψn ⟩ for example) are λ-independent. It is worth commenting on the notation: for example,
(2)
in the quantity δEn the subscript n is a label for the state we are considering, while the superscript (2)
indicates the order in λ of the term in the Taylor expansion. Although it looks messy to have both the
subscript and the superscript, they are each things we need to keep track of.
The key idea is simply to substitute 30 into 29. This gives
To make sense of this result, we should collect terms at each order in λ on the left and right sides of the
equation. Since the equation is intended to hold for a range of values of λ, we expect equality order-by-order
in λ. In this way we get
Now we want to extract from these equations expressions for the unknown quantities — the δE and the
|δψ⟩. We do this step-by-step.
First consider ⟨n| acting on (32). This gives
14
Using ⟨n|(En − H0 ) = 0 and ⟨n|n⟩ = 1 we get from this a very useful result that the the energy change
at first order is given by
(1)
To get |δψn ⟩ we take a similar approach, considering ⟨m| acting on (32) but now with m ̸= n. This
gives the wavefunction change at first order
where we have used ⟨m|n⟩ = 0 for m ̸= n, and the fact that ⟨m|H0 = Em ⟨m|. This implies
⟨m|V |n⟩
⟨m|δψn(1) ⟩ = (36)
En − Em
and hence, summing the contributions from different |m⟩s, that
X ⟨m|V |n⟩
|δψn(1) ⟩ = |m⟩ (37)
En − Em
m̸=n
Note that this requires the state n to be non-degenerate, so that for m ̸= n the denominator En − Em is
non-zero. We will treat the degenerate case later.
As an aside, we can ask whether |ψ⟩ given in this way is normalised. We have
X |⟨n|V |m⟩|2
δEn(2) = . (38)
En − Em
m̸=n
Note that again we assume energy levels are non-degenerate. Note also that the ground state energy shift
is necessarily negative, regardless of the perturbation.
15
The new set of sets is as good a choice for perturbation theory as the old set, since if H0 |ℓ⟩ = E|ℓ⟩ then
H0 |ℓ′ ⟩ = E|ℓ′ ⟩ as well. In addition, we have
†
X X
⟨k ′ |ℓ′ ⟩ = Uk∗′ p Uℓ′ m ⟨p|m⟩ = Uℓ′ m Umk †
′ = (U U )ℓ′ k ′ = δℓ′ k ′
pm m
so the new states are orthonormal. The key idea is that we can use an appropriate choice of basis to ensure
(1)
there is no divergence in |δψn ⟩. We simply need to pick U so that ⟨ℓ′ |V |n′ ⟩ is diagonal in the degenerate
subspace. This takes some effort but is much easier than attempting to diagonalise a much bigger P matrix
that
P represents H in the full Hilbert space. If we make this choice, then in (37) we can replace m̸=n with
En ̸=Em and previous results hold. In particular, we have
(1)
δEℓ′ = ⟨ℓ′ |V |ℓ′ ⟩ . (40)
Note that the process of diagonalizing V within the degenerate subspace automatically generates the ‘first
order’ change in the wavefunctions within this space. I have put ‘first order’ in quotes since this is not,
strictly speaking, a small correction to the wavefunction; as we will see in the examples, the unitary rotation
needed to diagonalize V can substantially mix states within the degenerate subspace. In a loose sense, the
‘infinite’ result of blindly using the non-degenerate formula (37) is indicating that within the degenerate
subspace correction is not a simple perturbative change in λ.
Then
1 0
|1⟩ ≡ , |2⟩ ≡ , E1 = ε1 and E2 = ε2 .
0 1
Furthermore, since
V1 0 1
⟨1|V |2⟩ = (0, 1) = 0,
0 V2 0
(1)
we have |δψ1 ⟩ = 0. We can conclude that H0 + λV has an eigenvalue ε1 + λV1 + O(λ2 ) and a corresponding
eigenvector (1, 0)T + O(λ2 ), which of course we know is correct.
As our second example, we take H0 as before, but now
0 v
V = with v real . (42)
v 0
Then
(1) 0 v 1
δE1 = ⟨1|V |1⟩ = (1, 0) =0 and ⟨2|V |1⟩ = v .
v 0 0
16
So
(2) |⟨1|V |2⟩|2 v2 (1) v
δE1 = = and |δψ1 ⟩ = |2⟩ . (43)
E1 − E2 ε1 − ε2 ε1 − ε2
We can check these results by doing the exact calculation and expanding the result for small v. The
eigenvalue equation is given by
ε1 − E λv
0 = det(H − λ1) =
λv ε2 − E
we get
E 2 − (ε1 + ε2 )E + ε1 ε1 − λ2 v 2 = 0
and hence
s
ε1 + ε2 ε1 − ε 2 4λ2 v 2 λ2 v 2
E= ± 1+ = ε 1 + + O(λ2 ) . (44)
2 2 (ε1 − ε2 )2 ε1 − ε2
where we picked the positive sign in going from the middle to the right-hand expression. Hence we see that
(2)
our result for δE1 from perturbation theory is correct.
As our third example, we consider a degenerate problem which we construct from the second example by
setting ε1 = ε2 . As discussed, in this case we should take a linear combination of |1⟩ and |2⟩ that diagonalises
V . We set
′ 1 1 1
|1 ⟩ = √ (|1⟩ + |2⟩) ≡ √
2 2 1
′ 1 1 1
and |2 ⟩ = √ (|1⟩ − |2⟩) ≡ √ .
2 2 −1
Then
so that
(1) (1)
δE1′ = v and δE2′ = −v (45)
Again we can check this against the exact result, by setting ε1 = ε2 in (44), which gives E = ε1 ± v in
agreement with the results of perturbation theory.
Two aspects of this final example are noteworthy. First, observe that |1′ ⟩ and |2′ ⟩ are not ‘small’ changes
to the original eigenstates |1⟩ and |2⟩; indeed, we have |1′ ⟩ = |1⟩+|2⟩
√
2
and |2′ ⟩ = |1⟩−|2⟩
√
2
. Indeed, v does not
′ ′
appear anywhere in the expressions for |1 ⟩ and |2 ⟩, indicating that they are not perturbative-in-v corrections
to |1⟩ and |2⟩. This is consistent with our general discussion about the change in the wavefunction within a
degenerate subspace. The second point is special to the 2 × 2 problem we considered: it appears as though
the degenerate perturbation theory gives the exact answer. This is in fact an artefact of restricting attention
to the degenerate subspace. You should be able to convince yourself that for |m⟩, |n⟩ in the degenerate
subspace with energy En = Em = ε,
Since the first term is diagonal in any basis, the step of diagonalizing V within the degenerate subspace
exactly diagonalizes the part of H restricted to that subspace. However, since a generic H will involve more
than just a single degenerate subspaces, this step does not exactly diagonalize all of H.
17
As a final example (motivated by an excellent question after a lecture), let us consider the interplay
between degenerate and non-degenerate perturbation theory. A natural question that arises is that, since
first-order degenerate perturbation theory involves a change of basis in the degenerate subspace, does it
change the answers obtained at first order in the non-degenerate case? To get some intuiton for this, let
us consider the simplest non-trivial example: a H0 with a single nondegenerate level |0⟩ and a twofold
degenerate pair of levels |1⟩, |2⟩, coupled via a weak perturbation; choosing the zero of energy to correspond
to the energy of the nondegenerate level, we have the most general form possible,
0 w1 w2
H = w1 ε v (47)
w2 v ε
where for simplicity we have taken the perturbation to be real. Applying perturbation theory to the non-
degenerate level |0⟩ does not pose any problem: we find that
If we reverse the order of our calculations, and apply first-order perturbation theory to |0⟩ after resolving
(1)
the degeneracy, it seems as though our answers have changed: while δE0 = 0 as before, instead of the
second and third equations in (48), we find
2 2
w1√
+w2 −w2
w1√
(2) 2 2 (1) w1 + w2 ′ w1 − w2 ′
δE0 = − − , |δψ0 ⟩ = − √ |1 ⟩ − √ |2 ⟩. (51)
ε+v ε−v 2(ε + v) 2(ε − v)
However, notice that if we are working to first order for the states and second order for the energies, we can
ignore the energy shifts in the denominators ε ± v as these only produce corrections at higher order. Then, it
is straightforward to demonstrate, using (49) that (48) and (51) yield identical results. The message seems
to be that for the nondegenerate level, the second-order perturbation theory result for the new energy and
the first-order result for the corrected wavefunction do not depend on whether we first diagonalized the
perturbation for the degenerate level. However, the answers will change when we consider the correction
to the wavefunction at second order, but we will not discuss second-order perturbation theory for the
wavefunction in this course.
This simple 3 × 3 example illustrates a special case of a more general result. To see this, it’s helpful
to first prove a small lemma: namely, that the projector onto a subspace is a invariant under unitary
transformations that only mix states within that space. Intuitively, this is because one can view the projector
onto a subspace as a matrix that evaluates to identity on the subspace and vanishing everywhere else, which
is clearly invariant under a unitary rotation that only acts within the subspace (there is only one way to
18
write the identity!) More formally, consider a unitary transformation (39) between a set of states |ℓ′ ⟩ and
|m⟩. We then have
X X X X
P′ ≡ |ℓ′ ⟩⟨ℓ′ | = ∗
Uℓ′ m |m⟩⟨m̃|Um̃ℓ ′ = (U † U )m̃m |m⟩⟨m̃| = |m⟩⟨m̃| ≡ P, (52)
ℓ′ ℓ′ ,m,m̃ m,m̃ m
where we have used the fact that (U † U )m̃m = δm̃m . Let us now consider the second-order correction
to the energy of a single non-degenerate level n with energy En due to a set of degenerate levels with
energy Em = E. It is helpful to again be explicit about powers of λ. Suppose we first perform degenerate
perturbation theory; this means that the degenerate levels split, so we now have levels ℓ′ with energies
(1)
Eℓ′ = E + λδEℓ′ where we have explicitly showed the λ dependence of the shifted energies. The second
order change in the energy due to coupling between the nondegenerate level and this new set of states is
then
h i′ X |⟨ℓ′ |V |n⟩|2
δEn(2) = λ2
′
En − Eℓ′
ℓ
X ⟨n|V |ℓ′ ⟩⟨ℓ′ |V |n⟩
2
= λ (1)
(En − E) + λδEℓ′
ℓ′
⟨n|V P ′ V |n⟩
= λ2 + O(λ3 )
En − E
⟨n|V PV |n⟩
= λ2 + O(λ3 )
En − E
X ⟨n|V |m⟩⟨m|V |n⟩
= λ2 + O(λ3 ) = δEn(2) + O(λ3 ) (53)
m
(E n − Em )
where we have used (52). Note that it is important to this argument that we have separated out the piece of
V that operates within the degenerate space, so that we can group all the terms within the degenerate space
as a projector. We see that the second-order perturbative correction answer is unchanged as promised; the
differences only appear at higher order. A similar calculation shows that the first-order correction to the
wavefunction remains unchanged to first-order in λ. It is clear that this argument generalizes to the case
when there are more degenerate subspaces, so the result holds in general.
19
−
+ −
+
l=1, m=2 l=2, m=1
− −
+ +
ψ1,2 + ψ2,1 ψ1,2 −ψ2,1
FIG. 3: Illustration of how symmetry can lead to degeneracy: the first excited states for a particle in a square box.
Dashed lines indicate nodes of the wavefunction. The effect of taking linear combinations of the two states in the top
row is depicted in the bottom row of the figure.
20
Recall that Poisson’s equation
ρ(r)
∇2 ϕ(r) = − (56)
ϵ0
has a solution that is just given by Coulomb’s law,
1 ρ(r′ )
Z
ϕ(r) = dr′ . (57)
4πϵ0 |r − r′ |
Unfortunately, the pesky factor of k 2 ψ (1) (r) on the LHS of (55) means we cannot directly apply this result.
However, it is straightforward to check3 that
Z ′ ik|r−r′ |
′ 1 ρ(r )e
ϕ̃(r) = dr , (58)
4πϵ0 |r − r′ |
satisfies4
ρ(r)
(∇2 + k 2 )ϕ̃(r) = − . (59)
ϵ0
2m
Applying this result to (55) and making the identification ϕ̃ = ψ (1) (r), 1/ϵ0 = ℏ2
, ρ = V (r)ψ (0) (r) we see
that the first correction to the wavefunction is
m
Z ′ (0) ′ ik|r−r′ |
(1) ′ V (r )ψ (r )e
ψ (r) = − dr . (60)
2πℏ2 |r − r′ |
In order for this result to make sense, we should ask what conditions must be imposed on V in order for the
perturbative approach to make sense. Clearly, this requires that the correction ψ (1) (r) ≪ ψ (0) (r) (really this
should be stared formally in terms of norms, etc., but we will be a little sloppy here for the sake of getting
quickly to the intuition). Of course, the result will work if V ≪ E, but it turns out that this approach
can be applied even for low energies if we are careful enough. One thing to note is that even if V were
very small, if it extended over all of space things would look very strange. For example, if we just took V
to be a uniform shift, then the RHS of (60) would diverge; but in this case we know the correct answer is
just to shift the wavenumber k. So it probably makes sense to require that V is nonzero only over some
finite region of space, characterized by linear dimension a. By small, what we mean is that ak ≲ 1. Then
′
we can ignore the factor eik|r−r | in the RHS of (60) for making an order-of-magnitude estimate, which is
2
ψ (1) (r) ∼ m|Vℏ2|a ψ (0) (r), and so we see that we need
ℏ2
|V | ≪ (61)
ma2
in order for the perturbative calculation to make sense, and for ψ (1) (r) to indeed be a ‘small’ correction.
This makes intuitive sense: the RHS of is simply the order of magnitude of the kinetic energy the particle
would have had if confined to a region of linear dimension a. Note however that this does not require E to
be small.
This calculation can be used to derive a nice result: the fact that an arbitrary weak negative potential
cannot bind a particle in three dimensions. Consider a well so shallow that (61) holds. We want to show
that there is no way we can get a negative energy level. Let us consider the limit E → 0. In this case,
the unperturbed wavefunction ψ (0) → 1. However, since we have just shown that when (61) is satisfied
ψ (1) ≪ ψ (0) , it follows that then the wavefunction 1 + ψ (1) cannot vanish anywhere in the region where
′
3
This is easiest done by working in Cartesian coordinates, and showing that terms where both derivatives act on the eik|r−r |
′
terms combine with ones where one derivative acts on eik|r−r | and the other on 1/|r − r′ | to give an overall factor of k2 ϕ̃(r).
4
Properly speaking we can add to the solution any solution of the homogeneous equation (∇2 + k2 )ϕ = 0, but this ambiguity
can be fixed by boundary conditions and the answer is the result we have quoted.
21
V ̸= 0, and so the wave function has no nodes. But this is a special property of the ground state (one can
prove this using the variational method) and so we conclude that there cannot be any state with energy
E < 0, such that the wavefunction decays outside the well (i.e. V cannot create a bound state). This result
is special to quantum mechanics: classically, one can always find an energy such that the particle is trapped
in the well. It is also special to three dimensions; the pertubative calculation we gave breaks down for one
and two dimensions5 .
The expression (60) also has a nice interpretation for scattering problems. To see this it is convenient
to write the full first-order wavefunction as follows, using the fact that the unperturbed wavefunctions are
plane waves; being cavalier with normalization, we have
m
Z ′ ik·r′ eik|r−r′ |
ik·r ′ V (r )e
ψ(r) = e − dr . (62)
2πℏ2 |r − r′ |
We can view this as the superposition of an incident wave eik·r , and a sum over a bunch of scattered waves
′
eik·r , with a complicated ‘kernel’ that explains how one translates to the other. This is quite similar in
spirit to how we formulated scattering from a potential barrier of finite extent in d = 1 using the TISE: an
incident wave, usually moving to the right, was ‘scattered’ into transmitted and reflected waves. Crucially
the ‘scattered’ waves had the same magnitude of the wavevector far away from the potential, but took both
of the two possible directions. The natural generalization of this to three dimensions is for the scattered
wave to propagate radially outward from the potential at the origin, with an amplitude that depends on the
angular coordinates θ and ϕ, i.e. we would like to write (for r ≫ a) that
eikr
ψ(r) = eik·r + f (θ, ϕ) , (63)
r
where f (θ, ϕ) is the scattering amplitude, and θ, ϕ refer to the angles of the outgoing wavevector relative to the
incident k; by choosing this to lie along the z axis these match the usual polar coordinate convention. Eq. (62)
′
is almost of this form; it is tempting to simply drop the r′ dependence and approximate eik|r−r | /|r − r′ | ≈
eikr /r, but this is too severe: we would then have no dependence on the outgoing wave. A more careful
calculation, valid for r ≫ a, is to realize that since the integrand in(62) is always bounded by a (since
r·r′
′
V ̸= 0 only in a region no bigger than a), we can expand |r − r | ≈ r 1 − r2 . We then have k|r − r′ | =
kr − kr̂ · r′ = kr − kf · r′ , where we have recognized in the second line that the outgoing momentum kf has
magnitude k and points along r̂. Using these expansions, we have
′
eik|r−r | eikr −ikf ·r′
≈ e (64)
|r − r′ | r
If we now label the incident momentum as ki (= kẑ), we see that for r ≫ a, we have
eikr m
Z
iki ·r ′ ′
ψ(r) = e − e−ikf ·r V (r′ )eiki ·r dr′ (65)
r 2πℏ2
which gives us for the scattering amplitude
Z
m ′ ′ m
f (θ, ϕ) = − 2
e−ikf ·r V (r′ )eiki ·r dr′ = − Ṽ (kf − ki ) (66)
2πℏ 2πℏ2
where as noted the angles are those between kf and ki . This result is known as the first Born approximation,
and can be systematically extended using the perturbation theory to generate successively better estimates
of the scattering amplitude in powers of the scatterign potential. Note the rather vivid physical picture,
that at this order, measuring the scattering amplitude ‘takes a Fourier transform’ of the scattering potential.
This is used extensively in the B6 Condensed Matter course.
5
We can see this from the different form of the solution of the Poisson equation in these dimensions: Coulomb’s law in two
dimensions gives a logarithmical potential, and in one dimension a linearly growing one. In these dimensions, arbitrarily weak
potentials can form bound states.
22
3 Magnetic fields in atomic physics: the Zeeman effect
If we want to understand the effect that magnetic fields have on atoms, perturbation theory is a very good
tool. Even in the strongest magnetic fields that it is possible to produce in the laboratory, the Lorentz force
acting on an electron in an atom is small compared to the Coulomb force, and so the magnetic field is a
weak perturbation. Note that this is to be contrasted with the case considered earlier which involved free
particles moving in an external magnetic field.6
The central physical idea is that an orbiting charge is equivalent to a current loop, and so generates a
magnetic moment µ. In turn, the energy of this moment in a magnetic field B depends on its orientation:
∆E = −µ · B . (67)
It is simple to see classically that we should expect a connection between µ and angular momentum L.
Specifically, consider a particle with mass me and charge q moving around a circle of radius r with speed v.
It has angular momentum of magnitude |L| = me vr and its motion is equivalent to a current I = qv/2πr
flowing around the loop, which generates a moment of magnitude |µ| = πr2 I = qvr/2. Moreover, the two
vectors have the same orientation, and so µ = (q/2me )L.
Two possibilities now arise.
(1) The moment exists before the field is applied. In this case, the energy splitting ∆E is known as
the Zeeman effect, and the alignment by the field of the moments of a collection of atoms is called
paramagnetism.
(2) The moment is induced by the presence of the field: µ = χB. Since the induced moment is in
the opposite direction to the field (think of Lenz’s law), χ < 0. This phenomenon is known as
diamagnetism. In this case
Z B
1
∆E = −χ B′ · dB′ = − χB 2 (68)
0 2
d 2
so that χ = − dB 2 ∆E.
The next step is to introduce the magnetic field into the atomic Hamiltonian. For this we need a form
for the vector potential. A convenient choice to represent a uniform field B is A = 12 B × r. With the field
orientation B = B ẑ we have in Cartesian coordinates A = B2 (−y, x, 0). Splitting the full atomic Hamiltonian
into an unperturbed part H0 and a perturbation V we have
q q2
V = − p·A+ |A|2 (69)
me 2me
q q2B 2 2
= − B · (r × p) + (x + y 2 )
2me 8me
q q2B 2 2
= − B·L+ (x + y 2 ) ,
2me 8me
where we have used the cyclic permutation symmetry of the scalar triple product A · (B × C). Note in the
last line that the first term has the form expected from a classical calculation of the magnetic moment µ
associated with the angular momentum L of a particle with charge q and mass me .
6
Some of the rich physics of the quantum Hall regime mentioned earlier is linked to Coulomb interactions, but this arises in
a different situation where the electrons are in a conducting solid. Instead of being tightly bound to an individual atom, they
instead behave almost as free electrons with an “effective mass” that is often small compared to the free-space mass. You will
learn how to explain this in the Condensed Matter lectures in the third year. Such effectively free electrons also form Landau
levels. Often the effective mass is low enough and the background dielectric constant is large enough that the Coulomb energy
and the cyclotron energy become comparable, in contrast to the discussion in this section.
23
What is the relative size of the two terms? For a rough order-of-magnitude estimate we can simply
replace L by ℏ in the first term, and use the fact that in an atomic state we can approximate ⟨r2 ⟩ ≈ a2B ,
where aB is the Bohr radius. Both these estimates come with O(1) prefactors that depend on the specific
angular momentum and radial state and as we will see below, the first term can evenvanish in an ℓ = 0, but
let us ignore such details. The ratio of the two terms is then (again ignoring O(1) factors)
q 2 B 2 a2B 2
me aB
= (70)
ℏqB ℓB
me ℏ
i.e. the ratio between the Bohr radius and the magnetic length. Given that the Bohr radius is around an
angstrom (10−10 m and the magnetic length is around 10 nm even for a very large magnetic field B ∼ 10, T,
we see that the second term is much smaller than the first.
Furthermore, the consequences of this perturbation depend on the atomic state we consider: as we shall
see, we really only worry about the second term in the case when L|ψ0 ⟩ = 0, i.e. in an ℓ = 0 state, and even
here only when we ignore spin. We can neglect it when discussing the Zeeman effect, where the first term
completely dominates.
24
and hence
ℏqB
δEn(1) = − m. (75)
2me
The resulting dependence of energy on magnetic field is shown in Fig. 4 for a set of states with ℓ = 1.
E
m=+1
m=0
m=−1
B
FIG. 4: Zeeman effect: dependence of energy levels with quantum numbers ℓ = 1 and m = −1, 0, 1 on magnetic field
B, omitting spin effects.
Note that there is a similar relationship between spin angular momentum and magnetic moment as for
orbital angular momentum, but with a different proportionality constant. This is parameterised using what
is called the g-factor, writing
qB
µ = −g ŝ (76)
2me
with ŝ the spin angular momentum operator.
25
4 Electric fields in atomic physics: the Stark effect
We now turn to the effect of an electric field on an atom. As in the case of the magnetic field, we must
consider two possibilities:
(1) An atom may have an electric dipole moment p in the absence of an electric field field. In this case
its energy depends on its orientation relative to the field via
∆E = −p · E . (77)
ℏ2 2 e2
H0 = − ∇ − and V = eEz = eEr cos θ . (79)
2me 4πϵ0 r
The eigenstates of H0 are the hydrogenic waverfunctions, i.e. the states that we have denoted |n, ℓ, m⟩ and
the unperturbed energies are En = − nR2 with R the Rydberg. Recall the form of hydrogenic wavefunctions
with r r
1 3 3
Y00 (θ, ϕ) = √ , Y1±1 (θ, ϕ) = sin θe±iϕ , and Y10 = cos θ . (81)
4π 8π 4π
At second order we need ⟨1, 0, 0|V |n, ℓ, m⟩. It is non-zero only for ℓ = 1 and m = 0, and so
∞
(2)
X |⟨1, 0, 0|V |n, 1, 0⟩|2 e2 E 2 a20
δE1 = − =− ×C (83)
n=2
R( 112 − 1
n2
) R
where a0 is the Bohr radius and C is a dimensionless constant. In fact, evaluation of the radial integrals
and sum gives C = 9/4.
The physical interpretation of these results is that the atom in its ground state is unpolarised since it
is spherically symmetric. An electric field induces a dipole moment, and this moment is aligned along the
field, lowering the energy of the atom. Using (78), we see that the ground-state polarizability is given by
e2 a2
χ = 92 R 0 = 92
26
(a) z (b) z (c) z (d) z
+ +
−
+
+ x x x x
− −
FIG. 5: Sketches of wavefunctions relevant for n = 2 Stark effect. Shading and ± signs indicate signs of wavefunctions
in different regions. (a) ⟨r|2, 0, 0⟩, (b) ⟨r|2, 1, 0⟩, (c) simplified version of [⟨r|2, 0, 0⟩ + ⟨r|2, 1, 0⟩], and (d) simplified
version of [⟨r|2, 0, 0⟩ − ⟨r|2, 1, 0⟩]. Consider the probability densities associated with these wavefunctions: the linear
combination (c) has its probability density centred at negative z while the linear combination (d) is centred at positive
z. Taking into account the fact that the electron charge is negative and allowing for both electron and nuclear charges,
this means that (c) represents a dipole orientated towards positive z and (d) represents one orientated towards negative
z. Since the electric field is in the positive z direction, we expect classically that (c) is lower in energy and (d) is
higher. This is indeed the result we have from perturbation theory (remember that v is negative).
27
5 The Variational Method
In Section 2 we discussed how to use perturbation theory to solve the TISE for a Hamiltonian that is close to
one we can treat exactly. Clearly, we may also be interested in problems that are far from any solvable case,
and it would be useful to have techniques that we can apply in this situation. An important example is the
variational method, which we set out here. As will become clear, it is most useful for finding approximations
to ground state energies and ground state wavefunctions.
The overall idea has two components. First, we guess a form for the ground state wavefunction. This
guess is called a trial wavefunction or a variational ansatz. It should include one (or perhaps more than one7 )
adjustable parameter. It is easy to show (see below) that whatever our guess, it gives an approximation for
the ground state energy that is always an overestimate. The second step is therefore to adjust the parameter
(or parameters) in the trial wavefunction in order to minimise the estimate for the ground state energy that
is derived from it. The resulting energy and wavefunction are our ground state approximations.
5.1 Proof that a trial wavefunction gives an upper bound to the ground state
energy
Let the Hamiltonian of interest be H with
H|n⟩ = En |n⟩ and E0 ≤ E1 ≤ E2 . . . for n = 0, 1, 2 . . . with ⟨m|n⟩ = δmn . (88)
Let |ψ⟩ denote the trial wavefunction, which we take to be normalised so that ⟨ψ|ψ⟩ = 1. We can always
imagine expanding the trial wavefunction in the basis of exact eigenstates, even though we don’t have an
explicit form for these states. Doing so, we write
X X
|ψ⟩ = cn |n⟩ with |cn |2 = 1 . (89)
n n
We estimate the ground state energy by calculating the expectation value of the Hamiltonian in the trial
state. This gives
∗
P
⟨ψ|H|ψ⟩ = mn cm cn ⟨m|H|n⟩
2E
P
= n |cn | n
2 2
P P
= E0 n |cn | + n |cn | (En − E0 ) .
In the last line of this equation the first term is simply E0 and the second term is positive (or zero if cn = 0
for n > 0). Hence we can conclude
⟨ψ|H|ψ⟩ ≥ E0 (90)
for any normalised trial state |ψ⟩, with equality only if |ψ⟩ is the exact ground state.
28
A reasonable trial wavefunction should satisfy the following conditions: it should approach zero as x ap-
proaches zero from positive values and be zero for x < 0 (think of behaviour for a particle in an infinite
potential well); and it should approach zero as x → ∞ (think of behaviour in the harmonic oscillator). A
simple form that meets these conditions is
N xe−ax
x>0
ψ(x) = (92)
0 x < 0.
Both the potential and this variational form for a wavefunction are illustrated in Fig. 6.
V(x)
ψ( x )
FIG. 6: In black: the potential energy V (x) of a ball bouncing in a uniform gravitational field above an impenetrable
surface, as a function of its height x. In red: the trial form we take for the ground-state wavefunction.
We would like to use this trial wavefunction to evaluate ⟨ψ|H|ψ⟩. In doing so we will need integrals of
the form
Z ∞ Z ∞
dx xn e−bx = b−(n+1) ds sn e−s = n! b−(n+1) . (93)
0 0
29
5.3 Example: derivation of the virial theorem
As a further application of the variational method, we will use it to derive the virial theorem, which is a
connection between kinetic and potential energies for a particle moving in a potential V (r) that is given by
a power law. That is
V (r) = V0 rn . (96)
Examples of such potentials occur in the harmonic oscillator (n = 2) and the hydrogen atom (n = −1).
Power law potentials are special because they are not characterised by a definite length scale, as would be
the case for a potential well of any other shape.
To derive the virial theorem, let’s suppose that we know the exact form of the ground state wavefunction,
which we denote as ψ(r), with r a position vector in d dimensions. Next, we consider a variational wavefunc-
tion of the form ψ(ar), where a is a variational parameter. We are going to apply the variational method
and make use of the fact that we know in advance that the optimal value of the variational parameter must
be a = 1.
We can relate the variational potential energy to its exact value via
R d
d r|ψ(ar)|2 V0 rn
R d
−n d R|ψ(R)|2 V0 Rn
⟨PE⟩variational = R d = a R = a−n ⟨PE⟩exact
d r|ψ(ar)|2 dd R|ψ(R)|2
where we changed variables using R = ar. Similarly for the kinetic energy we find
⟨KE⟩variational = a2 ⟨KE⟩exact .
Combining both contributions to the energy and minimising with respect to a gives at the minimum
In this case, however, we are in a different position to the one we were in for the discussion of the quantum
bouncing ball. For the case of the power-law potential our starting point was to imagine we have an exact
ground state wavefunction, implying that the energy is minimised by the choice a = 1. This means that
Eq. 97 is not an equation to determine a; instead, we substitute into it a = 1 and obtain the relation
n
⟨KE⟩exact = ⟨PE⟩exact (98)
2
between kinetic and potential energies. Thus for the harmonic oscillator (n = 2) the two contribution to
the energy are equal, while for the hydrogen atom (n = −1) the potential energy (which is negative) is −2×
the kinetic energy (which is positive). Although our derivation has been for ground states of a quantum
Hamiltonian, the virial theorem also applies to excited eigenstates and to classical systems.
5.4 Example: The No-Node Theorem and the Uniqueness of the Ground State
We can also use the variational approach to prove two useful theorems about ground states — that they
have no nodes, and that they are generically unique. We first consider the no-node case, and for simplicity
work in d = 1, although the answer can be generalized8 to d > 1. The discussion here follows that of the
1956 PhD thesis of Michael Cohen, one of Feynman’s few graduate students.
8
This is usually credited to Feynman in his lectures on Statistical Mechanics, although the argument there is intuitive
but does not provide careful estimates on the energies. It turns out that the no-node theorem can be related to a theorem
of Courant’s that says that the nth eigenstate of a Stürm-Liouville problem has at most n nodes, with one of the “nodes”
corresponding to the point at infinity (since we require ψ → ∞ at spatial infinity). So this bound corresponds to n − 1 “true”
nodes. It turns out that the bound is only saturated in d = 1, but the ground state always satisfies this bound in any d.
30
FIG. 7: Different steps of the variational proof of the no-node theorem. See the discussion in the main text.
Let us first write down an expression for the ground state energy in a state ψ, without necessarily
assuming that the state is normalized.
R∞ h 2 i
ℏ dψ 2 2 V (x)
⟨ψ|H|ψ⟩ −∞ dx |
2m dx | + |ψ(x)|
E[ψ] = = R∞
2
, (99)
⟨ψ|ψ⟩ −∞ dx |ψ(x)|
where we have used integration by parts to write the kinetic energy as a manifestly nonnegative contribution.
We will further assume that the wavefunction is real9 and that the potential is everywhere finite. We will
prove by contradiction: we assume that the ground state wavefunction ψ(x) has a node, and construct from
it a variational state with demonstrably lower energy Without loss of generality let us take the node in
ψ be at the origin, as in Fig. 7(a). Now, we can replace ψ with a function that is everywhere positive,
|ψ(x)| (Fig. 7(b)) without changing the energy (this follows since we only ever take the first derivative10 as
computed from (99). Now |ψ(x)| still has a node at the origin, but now imagine smoothing it out over a
distance of O(ϵ) around the origin, while also pushing it away from zero by an amount O(ϵ), producing our
variational guess, the “smoothed state” φ(x),as shown in Fig. 7(c).
We now estimate the energy of φ(x), relative to that of our putative ground state ψ(x), using (99).
Evidently, there are three places where the energy can change, relative to ψ: the normalization, the kinetic
energy, and the potential energy. Now, to get precise details of numerical prefactors requires us to be more
precise about just how exactly we smooth things out near the node, but we will show that there is an
unambigous lowering of the energy that is independent of such details. To see this, let us first ask how the
normalization changes. We can assume that the original wavefunction was normalized to begin with; relative
to that, it’s clear that the “extra area” under the curve is O(ϵ3 ) (because it comes from integrating a function
of magnitude φ2 in a region of size ϵ, within which φ ∼ ϵ). So, we conclude that ⟨φ|φ⟩ = 1 + c1 ϵ3 , where c1
is some positive constant whose (unimportant) value depends on details. Using similar order-of-magnitude
reasoning we see that the potential energy goes up by O(ϵ3 ) relative to that of ψ, since
Z ϵ/2
⟨φ|V |φ⟩ − ⟨ψ|V |ψ⟩ = dx V (x)(|φ(x)|2 − |ψ(x)|2 ) ∼ c2 V (0)ϵ3 (100)
−ϵ/2
where we have assumed that V (x) is well-behaved — and in particular does not blow up — near the node.
Finally, we turn to the kinetic energy. Approximating the curved region as flat (the errors are at higher
orders in ϵ), we see that for φ the slope, and hence the contribution to the kinetic energy, vanishes in
9
It turns out one can also demonstrate this explicitly in the absence of a magnetic field or other velocity-dependent forces,
using time reversal symmetry, but that is outside the scope of the course.
10
If you’re worried about the fact that as originally written H involved a second derivative that appears to jump on making
this switch, you can with some work convince yourself that this single-point discontinuity does not, in fact, contribute to the
energy evaluated in ψ.
31
(−ϵ/2, ϵ/2), whereas the original wavefunction ψ(x) has a nonzero contribution to the kinetic energy owing
to its slope ψ ′ (0) ̸= 0. Thus, we see that we have
ℏ2 ′
⟨φ|T |φ⟩ − ⟨ψ|T |ψ⟩ = −c3 ϵ |ψ (0)|2 (101)
m
where c3 > 0 is another constant that depends on details.
Putting it all together, we find that
⟨φ|H|φ⟩ ⟨ψ|H|ψ⟩ ℏ2
E[φ] − E[ψ] = − == −c3 ϵ |ψ ′ (0)|2 + O(ϵ3 ). (102)
⟨φ|φ⟩ ⟨ψ|ψ⟩ m
Observe that it does not matter if the O(ϵ3 ) correction is positive or negative; for small ϵ it can never beat
the linear gain obtained by our smoothing. Thus, we see that given any wavefunction ψ(x) with a node, we
can always construct a smoothed wavefunction φ with a lower energy than ψ. Thus, ψ cannot be a ground
state.
An immediate corollary of this is that the ground state must be nondegenerate. Again, we prove this by
contradiction. Suppose the ground state was degnerate. This means that there are two wavefunctions ψ1 (x)
and ψ2 (x) both of which have the same energy and are ground states. But now, consider the wavefunction
ψ(x) = ψ1 (0)ψ2 (x) − ψ2 (0)ψ1 (x). Evidently, this has the same energy as ψ1,2 (and hence is just as valid a
choice for a ground state), but has a node at the origin. But we have shown that this is impossible for a
ground state. Ergo, it follows that our original assumption of degeneracy must be false.
32
6 The WKB approximation
Note that WKB is not examinable material but can be conceptually very useful, especially when thinking
qualitatively about motion in a one dimensional potential.
One aspect of quantum mechanics that we should be particularly concerned to understand well is the
semiclassical limit in which some aspects of quantum behaviour are close to the corresponding classical
behaviour. This limit is the one in which the quantum wavelength of a particle is short compared to the
lengthscale over which potential energy in a system changes. The WKB approximation (named after the
physicists Wentzel, Kramers and Brillouin) is designed to treat exactly this limit. For reasons there isn’t
space to discuss, the applicability of the WKB approximation is restricted to one-dimensional systems, or to
systems in higher dimensions where one can use separation of variables, such as for a spherically symmetric
potential in three dimensions.
As a starting point, recall some results for a quantum particle with mass m and total energy E moving
in a potential V that we take initially to be independent of position x, and consider E > V . We can write
an eigenstate of the TISE in the form ψ(x) = Aeikx with the wavevector k given by
p
ℏ2 k 2 2m(E − V )
= E − V implying k = . (103)
2m ℏ
For this wavefunction the probability current is
dψ ∗ (x)
ℏ dψ(x) ℏk 2
j= ψ ∗ (x) − ψ(x) = |A| .
2im dx dx m
Now we ask what changes if we replace the constant potential V with a (slowly varying) potential V (x). A
natural guess is that we should make the replacements
p
2m(E − V (x)) 1
k → k(x) = and A → A(x) ∝ p .
ℏ k(x)
The second of these deserves an explanation: it is chosen so that j remains independent of x, as required
by current conservation in a stationary stationary state, even though we have an x-dependent wavevector.
Equivalently, it ensures that the probability density is inversely proportional to the local speed (ℏk(x)/m)
of the particle, as we should expect from classical considerations.
6.1 Derivation
So far, we have simply written down a motivated guess. Now let’s attempt a derivation. We write
Rx
k(x′ )dx′
ψ(x) = A(x)ei 0 (104)
Rx ′ ′
and then substitute into the TISE to find the functions A(x) and k(x). Note that the form ei 0 k(x )dx is
one of the possible ways to generalise eikx written for constant k to a situation in which k(x) depends on x;
there are alternative choices, but this turns out to be the most convenient one. The derivatives we need for
the TISE are
dψ(x) Rx ′ ′
= [A′ (x) + ik(x)A(x)]ei 0 k(x )dx
dx
and
d2 ψ(x) ′′ ′ ′ 2 i 0x k(x′ )dx′
R
= [A (x) + 2ik(x)A (x) + ik (x)A(x) − k (x)A(x)]e .
dx2
Then from the TISE we have
ℏ2 2
[k (x)A(x) − i{2k(x)A′ (x) + k ′ (x)A(x)} − A′′ (x)] = [E − V (x)]A(x) . (105)
2m
33
Here the terms on the left in square brackets have been arranged into three groups, according to the
number of derivatives. The idea is that if V (x) is slowly varying, each derivative will be small. The first
of these groups is k 2 (x)A(x) and would be present even if V (x) were constant. The terms in the second
group, {2k(x)A′ (x) + k ′ (x)A(x)} involve a single derivative and so should be ‘quite’ small, while the third
one, A′′ (x), contains two derivatives and so should be ‘very’ small. We will find a solution considering these
groups successively. Neglecting the small terms completely, we satisfy the TISE by taking k(x) as in Eq. 103.
In the next approximation we neglect the very small term but require the quite small terms to cancel each
other.
1 dA(x) 1 dk(x)
2k(x)A′ (x) + k ′ (x)A(x) = 0 implies =−
A(x) dx 2k(x) dx
as anticipated. The physical picture that this gives is as follows. In a smoothly varying potential, the
wavelength
Rx of a quantum particle changes, but without scattering of the Rforward-going wave (described by
′ ′ x ′ ′
ei 0 k(x )dx ) into a backward-going one (which would be described by e−i 0 k(x )dx ) so long as neglect of the
very small term A′′ (x) is justified. A feature of the result that is worth emphasising is that |ψ(x)|2 is large
where E − V (x) is small, so that (viewed classically) the particle is moving slowly.
34
number of half-wavelengths between the classical turning points. Specifically, for a potential with classical
turning points at x1 and x2 , the eigenenergies are given by the condition
Z x2
dx k(x) = (n + 21 )π for integer n . (108)
x1
Reassuringly, this is the answer we expected. From this treatment we also gain an understanding of the
wavefunctions of high-lying states of the harmonic oscillator that is much more illuminating than if we had
simply starred at the exact solution in the form of Hermite polynomials. To see this, consider the probability
density of the 30th excited state, shown in Fig. 8. Two aspects are worth noting. First, the probability
density is highest near the classical turning points and has a minimum in the middle of the well. This is as
expected from the form of the prefactor A(x) ∝ [k(x)]−1/2 in Eq. 104, since the particle moves slowly near
the classical turning points and rapidly near the middle of the well. Second, the wavelength of oscillations
in the probability density is shortest in the middle of the well and longest near the classical turning points,
for the same reasons.
To get started we need a model for the decay process. We represent the α-particle (which consists of two
protons and two neutrons) as a well-defined quantum particle and consider its potential energy as a function
35
FIG. 8: The probability density as a function of position for the 30th excited state of the harmonic oscillator. See
main text for discussion.
of radial distance from the centre of the nucleus, as shown in Fig. 9. When it is within the nucleus, it sees
an attractive potential well arising from interactions with the other neutrons and protons in the nucleus.
When it is outside the nucleus, it sees a Coulomb barrier arising from repulsion between the α-particle with
positive charge 2e and the nucleus with positive charge Ze (since the relevant distances are on the scale of
the nuclear radius, the electronic charge of the atom lies entirely at much larger radii). The escape of an
α particle with positive energy E from the nucleus therefore involves tunnelling through a barrier between
the radius r1 (the edge of the nuclear well) and the radius r2 where the total energy matches the Coulomb
energy V (r) = 4Ze2 /(4πϵ0 r2 ).
V(r)
E
r
r1 r2
FIG. 9: Potential energy for an α-particle as a function of radius from the centre of a nucleus that is unstable to
α-decay. The total energy of the α particle is E, which is less that the Coulomb energy V (r) for r1 < r < r2 .
A detailed calculation should strictly speaking take account of the difference between a one-dimensional
problem and the radial part of the Schrod̈inger equation [specifically, the differences between a kinetic
energy involving d2 /dx2 as compared with r−2 (d/dr)r2 d/dr]. For brevity, we will omit this distinction and
work with the form of the kinetic energy that we have already considered in one dimension, writing the
wavefunction as ψ(r). Then we have a decay rate λ(r) for the wavefunction in the barrier, given by Eq. 107.
− rr2 λ(r)dr
R
Moreover E and r2 are related by E = V (r2 ). Hence ψ(r2 ) = ψ(r1 )e 1 with
Z r2 Z r2 Z r2
1/2
λ(r)dr ∝ ( 1r − r12 )1/2 dr ≈ ( 1r )1/2 dr ≈ 2r2 ∝ E −1/2 . (112)
r1 r1 r1
36
1 1 1/2 1/2
Here the approximations are that r1 ≫ r2 and r2 ≫ r1 and are justified if r2 ≫ r1 . The crucial outcome
−1/2
of the calculation is the energy dependence ψ(r2 ) = ψ(r1 )e−const×E . We can convert this into an energy
dependence of the half-life by arguing that the decay rate is given by the product of an attempt rate and an
escape probability. The former is of order the α-particle velocity divided by the nuclear size, and is much
more weakly dependent on energy than the escape probability, given by |ψ(r2 )/ψ(r1 )|2 . Hence we arrive at
the result
Because this energy dependence appears in the argument of the exponential, it can produce very large
variations in half-life. The results match data strikingly well, as shown in Fig. 10.
FIG. 10: Half lives for α decay for isotopes of the elements uranium and thorium, plotted on a logarithmic scale as a
function of C − E −1/2 with C a constant and values of E labelled in MeV.
37
7 Approximations for quantum problems with time-dependent Hamil-
tonians
We are often concerned with problems in quantum mechanics that are described by time-dependent Hamilto-
nians. An important example is an atom illuminated by light, with the electromagnetic radiation represented
by oscillating electromagnetic potentials. For these problems we want to solve the TDSE
In the case where H is constant in time, we proceed using separation of variables to write the general solution
in the form
X
|ψ(t)⟩ = cn e−iEn t/ℏ |n⟩
n
Clearly, we can make this approximation more and more accurate by letting N → ∞, ∆t → 0, with N ∆t = t
held fixed. Now, if we could re-express the product of exponentials as the exponential of the sum of their
arguments we would arrive at an equation similar to , with the integral replaced by its Riemann sum
approximation (which tends to the integral in the limit described above). The problem is, of course, that
eA eB ̸= eA+B unless [A, B] = 0, which is not in general true.
Expressions of the form (116), in the limit where ∆t → 0, are often called ‘time-ordered exponentials’,
and are given a shorthand notation reflecting that they are the closest analog of (7) that correctly describes
the unitary time evolution according to the TDSE with a time-dependent H(t):
i
Rt
H(t′ )dt′
|ψ(t)⟩ = T e− ℏ 0 |ψ(0)⟩. [correct] (117)
where the time-ordering symbol T denotes the limiting procedure described above.
38
Computing time-ordered exponentials is challenging, though sometimes one can approximate the limiting
procedure numerically. Feynman had a beautiful insight, that applies just as well to time-independent H:
i i
one can split the H(ti ) in each U (ti ) into kinetic and potential terms, e− ℏ H(ti )∆t = e− ℏ (T +V (ti ))∆t (where
for simplicity we have assumed that the time-dependence of H is in the potential energy term, which is the
i i∆t i∆t
usual situation). Now, [T, V ] ̸= 0, but we can write e− ℏ (T +V (ti ))∆t ≈ e− ℏ T ∆t e− ℏ V (ti )∆t and incur an
error only of order (∆t)2 . We can then insert resolutions of the identity in position and in momentum in
between these two: T, V are respectively simple in these bases, and the overlap is also simple ⟨x|p⟩ ∝ eixp/ℏ .
Following this through for each individual U (ti ), re-exponentiating, and performing the Gaussian integrals
over momenta, one can rewrite the time-ordered exponential as a path integral, where one sums over all
classical paths between the initial and final states, weighted by the classical action for each path. In this
picture, the emergence of classical mechanics as the ℏ → 0 limit of quantum mechanics is especially clear.
Path integrals, although central to the modern understanding of quantum mechanics and quantum field
theory, are well beyond the scope of this course, and indeed are overkill for the sorts of problems we will
consider. Instead, we will think of three different simplifications of time-dependent problems that arise in a
variety of physical motivated settings.
7.1 Overview
There are three alternative simplifications that are each useful in different situations.
(1) The sudden limit: H(t) changes rapidly from an initial to a final form. This sometimes called a
‘quantum quench’, and is often experimentally performed in cold atomic gas experiments, where
usually the interesting questions are linked to many-particle physics.
(2) The adiabatic limit: H(t) changes slowly in time. This is often used to understand how perturbations
are switched on and off, but often also arises in approximate treatments of atoms or solids where there
are “slow” and “fast” degrees of freedoms e.g. ions or nuclei versus electrons.
(3) The perturbative limit: H(t) = H0 + V (t), with H0 independent of time and V (t) small. This is a
workhorse tool, most often used to study radiative transition rates and light-matter interactions.
We will consider each of these in the following. Note that fast and slow here is in comparison with the
natural timescale in the quantum problem, given by ℏ/∆E, where ∆E is the energy difference between two
relevant levels.
This is simple to treat because the TDSE is first-order in time. For t > 0 we have
X
|ψ(t)⟩ = cn e−iEn t/ℏ |n⟩ (119)
n
where
39
7.3 Example: particle in box that is suddenly expanded
Consider a particle in an infinite square well, initially in its ground state. We double the size of the well
and ask for the probability that the particle is still in its ground state at the end of the expansion. We have
ℏ2 d2 ℏ2 d2
Hi = − + Vi (x) and Hf = − + Vf (x)
2m dx2 2m dx2
with
0 0<x<a 0 0 < x < 2a
Vi (x) = and Vi (x) = .
∞ otherwise ∞ otherwise
The initial wavefunction is ⟨x|ψ(t = 0− )⟩ = (2/a)1/2 sin(πx/a) for 0 < x < a, and the ground-state
wavefunction in a box of size 2a is ⟨x|n=1⟩ = (1/a)1/2 sin(πx/2a), now for 0 < x < 2a. The amplitude to
remain in the ground state after expansion is therefore
√ Z a Z a √
2 1 4 2
c1 = sin(πx/a) sin(πx/2a)dx = √ [cos(πx/2a) − cos(3πx/2a)]dx = .
a 0 a 2 0 3π
This gives a probability of |c1 |2 = 32/9π 2 ≈ 0.360, which implies that |cn |2 > 0 for some excited states n > 0:
there is a chance that the system gets excited into one of its higher energy levels by this time-dependence
of the Hamiltonian.
The adiabatic theorem says that a system which starts at some initial time t = ti in an eigenstate |n(ti )⟩
of H(t) will remain in that eigenstate under evolution with the TDSE, provided H(t) changes slowly. That
is to say: as H(t) changes, so does |n(t)⟩, but the probability to find the system in this state does not change,
meaning |⟨n(t)|ψ(t)⟩|2 = 1 for all t. In the limit of arbitrarily slow change, this is an exact statement, and
if change is slow but at a finite rate, then it is a good approximation. To illustrate the point, consider the
example treated above, of a box with a size that increases as a function of time. We have seen that if the
increase is sudden, the chance for the system to remain in the ground state is |c1 |2 ≈ 0.360; by contrast, if
the increase is adiabatic, |c1 |2 = 1.
Proof of adiabatic theorem. To prove the theorem we use the instantaneous eigenstates as a basis to
write the wavefunction of the system in the form
X Z t
i ′ ′
|ψ(t)⟩ = cn (t) exp − ℏ En (t )dt |n(t)⟩ . (121)
n 0
Rt
Note that the phase exp[− ℏi 0 En (t′ )dt′ ] in this expression is a generalisation of the form exp(−iEn t/ℏ) that
is familiar for a time-independent Hamiltonian. Substituting into the TDSE we find
i Rt ′ ′
{ċn (t)|n(t)⟩ + cn (t)[∂t |n(t)⟩]] e− ℏ 0 En (t )dt
X
0 = (iℏ∂t − H(t)) |ψ(t)⟩ = iℏ (122)
n
40
It is useful to split off the piece of the RHS that is “off-diagonal”, i.e. mixes the state m with states n ̸= m:
X i Rt ′ ′ ′
∂t cm (t) + ⟨m(t)|∂t |m(t)⟩cm (t) = − eℏ 0 [Em (t )−En (t )]dt cn (t)⟨m(t)|∂t |n(t)⟩. (124)
n̸=m
As the next step we need an expression for the quantity ⟨m(t)|∂t |n(t)⟩ when m ̸= n. We can obtain this in
two ways. The first is to simply differentiate the instantaneous Schrödinger equation: we see that
Taking the inner product with ⟨m(t)| on both sides and using the fact that ⟨m(t)|H(t) = Em (t)⟨m(t)|, we
find that
⟨m(t)|∂t H(t)|n(t)⟩
⟨m(t)|∂t |n(t)⟩ = . (126)
En (t) − Em (t)
Another approach that gives more physical intuition is by using time-independent perturbation theory
as follows. Consider a small time interval ∆t. In this interval the Hamiltonian changes from H(t) to
H(t) + ∆t · ∂t H(t). Writing this change as a perturbation V ≡ ∆t · ∂t H(t), the resulting change in |n(t)⟩ is
from (37)
X ⟨ℓ(t)|V |n(t)⟩ ⟨ℓ(t)|[∂t H(t)]|n(t)⟩
|δψn(1) ⟩ ≡ ∆t ∂t |n(t)⟩ = |ℓ(t)⟩ = ∆t ·
En (t) − Eℓ (t) En (t) − Eℓ (t)
ℓ̸=n
We would like to integrate this equation in time. In order to make the consequences of a slow change in
H(t) clear, it is convenient to write the overall timescale for this change as T and to change variables from
t to s ≡ t/T , so that s = 0 at the start of the process and s = 1 at the end. Changing variables in this way,
we see that the RHS becomes
X i Rs
[Em (s′ )−En (s′ )]ds′ ⟨m(s)|[∂s H(s)]|n(s)⟩
− cn (s)e ℏ T 0 × . (128)
En (s) − Em (s)
n̸=m
i Rs ′ ′ ′
The crucial points to note here are that T appears only in the phase factor e ℏ T 0 [Em (s )−En (s )]ds and that
this phase varies very rapidly with s if T is large. Ultimately we want to compute the amplitude cm (s=1)
for the system to be in the state |m(s=1)⟩ at the end of the time evolution. The contributions of the“off-
diagonal” terms to this amplitude obtained by integrating the above expression from s = 0 to s = 1. For
i Rs ′ ′ ′
large T , since the phase e ℏ T 0 [Em (s )−En (s )]ds varies rapidly as a function of s, this integral will be small.
Therefore, we can ignore the RHS for slowly varying (adiabatic) changes. Thus, what we are left with in
the adiabatic limit is
where we have gone back to the original expression in terms of t rather than s. The remaining “diagonal”
term deserves some explanation. First, one can argue that it is purely imaginary11 . Let us therefore write
⟨m(t)|∂t |m(t)⟩ = −iχm (t); we see then that
Rt
χm (t′ )dt′
cm (t) = cm (0)ei 0 ≡ cm (0)eiγB (t) (130)
11
To see this, differentiate both sides of 1 = ⟨m(t)|m(t)⟩ w.r.t. time, and use this to show that Re[⟨m(t)|∂t |m(t)⟩] = 0.
41
which defines the Berry phase acquired during the time evolution. We see that indeed, |cm (t)|2 = |cm (0)|2 =
1, since the phase drops out. Normally the phase is not observable, and one can make a suitable gauge
transformation to remove it (such a transformation was implicit in the older version of these notes by Prof.
Chalker). However, there are situations where one makes an adiabatic cyclic change in the Hamiltonian,
i.e. that the Hamiltonian at the end of the time evolution is the same as that in the beginning. (For
example, imagine slowly adjusting the “knobs” on an experiment, but in such a fashion that after some
time t they return to their original settings.) If such a cyclic evolution acquires a nonzero Berry phase, one
cannot remove it by a gauge transformation. This is essentially the same piece of mathematics that says
that vector potentials that integrate to a nonzero value — the magnetic flux — around a closed loop are
physical and cannot be eliminated by any choice of gauge; the Berry phase of a closed loop in the space of
Hamiltonians can be viewed as a generalized “flux”, but its meaning can be very subtle. Berry phases are of
great importance in thinking about the motion of electron in crystalline solids, for as you will learn in the
B6 course, such motions always involve a ‘cyclic’ parameter known as the crystal momentum. Clarifying the
role of Berry phases and their generalizations in the motion of electrons in crystals led to the identification
of a new family of materials known as “topological insulators”.
Applications of the adiabatic theorem. The adiabatic theorem has some very important and interesting
applications. One is to cosmic background radiation, black-body radiation filling our current universe with
a temperature of 2.7 kelvin. It originated from the recombination of electrons and protons to form neutral
hydrogen atoms, about 300,000 years after the big bang when the temperature of the universe was around
300 kelvin. We can think of the “box” occupied by these photons as expanding adiabatically with the
expansion of the universe to its current size. In the process the distribution of photons over modes of
the box has remained the same, and therefore the radiation spectrum has remained that of a black body.
At the same time, as the size of the box has increased, the wavelength of the radiation and therefore its
characteristic temperature has decreased to its current value.
A second application (on a smaller scale!) is to vibrations of molecules or solids. Here we can think
of electrons as the glue that holds together the nuclei of the atoms that make up a molecule or a solid.
Vibrational motion of the nuclei is much slower than electronic motion, and so as nuclei move the quantum
states of electrons evolve adiabatically in time.
E(t) = ⟨ψ(t)|H(t)|ψ(t)⟩ .
From the TDSE we have ∂t |ψ(t)⟩ = −(i/ℏ)H(t)|ψ(t)⟩ and therefore ∂t ⟨ψ(t)| = −(i/ℏ)⟨ψ(t)|H(t). This
means that the first two terms on the right of Eq 131 cancel, leaving
As an example, we can consider a particle in an box that expands slowly over time. When the length
of the box is L the instantaneous ground state energy is E = ℏ2 π 2 /(8mL2 ). Hence ∂t E(t) = −(2E/L)∂t L.
In a classical description we would attribute this change in energy to a force exerted by the particle on the
42
walls of the box: E(t) decreases as the box size increases because the force does work. It is interesting to
compare this result with what comes from a classical calculation. Classically we identify a force with a rate
of change of momentum. For a particle of mass m moving with speed v in a box of length L, the change
of momentum each time the particle bounces off a wall is 2mv and the time between bounces from a given
end is 2L/v. Hence the rate of change of momentum is (2mv)/(2L/v) = mv 2 /L = 2E/L with E = 21 mv 2 .
We see in this way that the quantum and classical pictures fit together.
[10] [01]
FIG. 11: Eigenvalues E for the Landau-Zener problem as a function of time t. Eigenvectors for t → ±∞ are shown
next to the corresponding eigenvalues.
The so-called Landau-Zener problem is the simplest time-dependent problem in quantum mechanics. It
is simple because the Hamiltonian is a 2 × 2 matrix and because the time dependence of the matrix elements
is linear. The Hamiltonian is
αt v
H(t) = . (133)
v −αt
√
The instantaneous energy levels of this Hamiltonian are ± α2 t2 + v 2 and vary with t as shown in Fig. 11.
As t varies, the eigenvectors rotate between their limiting forms for t → ±∞, also shown in Fig. 11. Suppose
this system starts in the distant past (t → −∞) in the state (1, 0)T . We ask what the final (t → ∞) state
will be. If α is small (more precisely αℏ/v 2 ≪ 1) then we can apply the adiabatic approximation. The
system follows the lower curve in Fig. 11 and the final state is (0, 1)T . In the opposite case of large α, we can
use the sudden approximation and so the final state is the same as the initial state: (1, 0)T . For intermediate
values of α the system evolves into a superposition of these two states, with relative amplitudes that depend
on α.
43
8 Time-dependent perturbation theory
Suppose that H(t) = H0 + V (t) where H0 is time-independent and V (t) is small. Then we can solve the
TDSE using a form of perturbation theory. First we expand |ψ(t)⟩ in eigenstates of H0 , which satisfy
H0 |n⟩ = En |n⟩. We write
X
|ψ(t)⟩ = cn (t)e−iEn t/ℏ |n⟩ , (134)
n
which is always possible, since whatever time dependence |ψ(t)⟩ may have, it can be represented by suitable
forms for cn (t). The TDSE becomes
X X
e−iEn t/ℏ {iℏ[∂t cn (t)] + En cn (t)} |n⟩ = e−iEn t/ℏ cn (t) {H0 |n⟩ + V (t)|n⟩} .
n n
As in previous calculations, we now want to single out a particular term from the sums in this equation,
which we can do by taking the scalar product with ⟨m|. We also multiply by − ℏi e+iEm t/ℏ to simplify the
result, getting
i X i(Em −En )t/ℏ
∂t cm (t) = − e ⟨m|V (t)|n⟩cn (t) . (135)
ℏ n
So far we have made no approximations and have simply re-written the TDSE in the basis of eigenstates of
H0 . To make progress, we focus on a system that we know is initially in a particular initial state |i⟩ and ask
for the amplitude of a transition to a final state |f ⟩. That is, we take cn (t=0) = δn,i and ask for the value
of cf (t). If V (t) were zero, we would have cn (t) = δn,i for all t and this is the zeroth-order solution. To get
a first order result, we can substitute this zeroth-order form into the right-hand side of Eq. 135, since that
side of the equation anyway includes a factor of V (t). In this way we obtain (for |f ⟩ different from |i⟩ so
that cf (0) = 0)
Z t
i ′
cf (t) = − dt′ ⟨f |V (t′ )|i⟩ei(Ef −Ei )t /ℏ dt′ + O(V 2 ) . (136)
ℏ 0
Perturbation periodic in time. Often we are interested in a perturbation that varies periodically in time,
so that we can write V (t) = V e−iωt where V is a time-independent operator. Writing ωf i = (Ef − Ei )/ℏ we
can then do the integral on t′ and obtain
1 ei(ωf i −ω)t − 1
cf (t) = − ⟨f |V |i⟩
ℏ ωf i − ω
or
2
1 2 sin ([ωf i − ω]t/2)
|cf (t)|2 = |⟨f |V |i⟩| . (137)
ℏ2 [(ωf i − ω)/2]2
To make sense of this result, it is useful to consider the right-hand side as a function of ω. The factor
sin2 ([ωf i −ω]t/2)
[(ω −ω)/2]2
is a sharply peaked function with its maximum at ω = ωf i . Its value at the maximum is t2
fi
and its width is proportional to t−1 so its integral with respect to ω is proportional to t. In detail, since
Z ∞
sin2 ([ωf i − ω]t/2)
dω = 2πt
−∞ [(ωf i − ω)/2]2
(see the complex analysis short option for details of how to do the integral) we can write for large t
44
Omitting the factor of t, we conclude that the transition rate is
2π
|⟨f |V |i⟩|2 δ(ωf i − ω). (138)
ℏ2
Note that with ω > 0 the time dependence V (t) = V e−iωt has induced transitions up in energy, since
Ef − Ei = ℏω > 0. The time dependence V (t) = V e+iωt would produce transitions down in energy.
2π
transition rate = |⟨f |V |i⟩|2 ρ(ωf i ) . (139)
ℏ2
2π
transition rate = |⟨f |V |i⟩|2 g(Ef = Ei + ω) . (140)
ℏ
Note that ℏ rather than ℏ2 appears here because g(Ef )dEf = ℏg(Ef )dωf i .
45
9 Radiative transitions
In this section we discuss an important application of results from time-dependent perturbation theory,
using the approximation to understand radiative transitions between atomic energy levels that are induced
by incident light.
The first step is to identify the form of the perturbation operator denoted above by V . In general,
the interaction between light and electrons in atoms is due to both the electric and the magnetic fields
associated with electromagnetic waves. However, because electron speeds in atoms are generally much less
than the speed of light, the coupling to electric fields dominates over the one to magnetic fields. In addition,
the size of an atom is much smaller than the wavelength of light at the relevant frequencies, and so the
electric field of the wave is nearly constant over the atom. These two simplifications together constitute the
dipole approximation. Although we should in principle describe an electromagnetic wave in terms of both
vector and scalar potentials, working within the dipole approximation we can simply use a scalar potential
appropriate for a uniform oscillating electric field. Taking this field to have strength E and to be along ẑ,
we then have
V = eEz . (141)
10 Selection rules
For a transition between initial and final states to occur, we require the frequency of the incident radiation
to match the energy level difference: ω = ωf i . However, this alone is not sufficient. In addition, we require
certain selection rules to be satisfied. There are two complementary ways of thinking about these rules. One
is in terms of conservation of angular momentum. The other is in terms of the matrix element ⟨f |V |i⟩. We
will discuss both.
46
with the direction of travel (the two values of mph correspond to the two senses of circular polarisation; the
case mph = 0 is absent because the phonon is a massless relativistic particle).
From this we can conclude that: (a) |ℓi − 1| ≤ ℓf ≤ |ℓi + 1|, and (b) mf = mi ± 1 if the photon travels
along the ẑ-axis (i.e. the axis used for quantisation of the atomic angular momentum), or mf = mi , mi ± 1
for other propagation directions of the photon. In fact, although ℓf = ℓi is a possibility allowed by con-
servation of angular momentum, it is excluded within the dipole approximation because it has zero matrix
elements.
To see this consider the effect on the integral of parity inversion r → −r. In spherical polar coordinates
this amounts to the transformation r → r, θ → π − θ and φ → φ + π. The effect of inversion on the spherical
harmonics that make up the angular part of the atomic wavefunction is Yℓ,m (π −θ, φ+π) = (−1)ℓ Y (θ, φ). In
addition, the operator V = eEz changes sign, since cos(π − θ) = − cos(θ). Taken together, this means that
if we compare an evaluation of the matrix element using either the original coordinates r or new coordinates
−r, we find
⟨f |V |i⟩ = (−1)ℓf +ℓi +1 ⟨f |V |i⟩
and hence that ⟨f |V |i⟩ = 0 unless ℓf + ℓi + 1 is even, excluding ℓf = ℓi .
where the choice of factor x, y or z is according to the polarisation of the incident light. Bearing in mind
the conversion to spherical polar coordinates (x, y, z) = r(sin θ cos φ, sin θ sin φ, cos θ) we see that terms e±iφ
arise from x and y but not from z. This means that if we take the factors x or y, we require mf = mi ± 1
to get a non-zero matrix element, but if we take the factor z, we require mf = mi . A photon propagating
along the ẑ axis necessarily has an electric field in the x − y plane, and so mf = mi ± 1 in this case.
Since within the approximations we have used there is no coupling between the radiation and the electron
spin, initial and final spin states are identical for electric dipole transitions.
47
11 Identical Particles in Quantum Mechanics
The term identical has a much stronger meaning in quantum mechanics than it does in classical mechanics.
Consider the evolution of a set of particles from some initial configuration to a final configuration. For a
classical system we can always follow the trajectories of each particle and know which particle in the initial
configuration corresponds to which particle in the final configuration. So within classical mechanics, if we
say particles are identical, we mean that they obey the same law of motion (they have the same mass and
interact with the same forces) but we can nevertheless tell them apart. By contrast, for quantum particles
we cannot follow trajectories without affecting the dynamics (recall the two-slit experiment), and this turns
out to have spectacular consequences. These are embodied in the terms boson and fermion which you
already know from your statistical mechanics course.
To describe a system of many particles, identical or distinct, we should extend the idea of a quantum
wavefunction to include the coordinates of all the particles, so that ψ(r) → ψ(r1 , r2 . . . rN ). The interpreta-
tion of the wavefunction generalises in the obvious way: |ψ(r1 , r2 . . . rN )|2 d3 r1 . . . d3 rN gives the probability
to find particle 1 in a volume d3 r1 near the point r1 , the particle 2 in a volume d3 r2 near the point r2 and
so on.
Consider the case N = 2. If the particles are identical we expect
To examine the consequences of this, it is useful to introduce the exchange operator P12 , which swaps
particles 1 and 2. We have
where the second equality with α real follows from Eq. 144. We also have
Building wavefunctions of the correct symmetry. Take a two-particle system and consider eigenstates
of the Hamiltonian, which in general has the form H = H1 +H2 +H12 , where H1 acts only on the coordinates
of particle 1, H2 acts only on the coordinates of particle 2, and H12 involves the coordinates of both particles.
For example, we might take (with k standing for k = 1 or k = 2)
ℏ2
2
∂2 ∂2
∂
Hk = − + + + V (rk ) and H12 = U (|r1 − r2 |) .
2m ∂x2k ∂yk2 ∂zk2
We want to solve the TISE
Hψ(r1 , r2 ) = Eψ(r1 , r2 ) .
If H12 = 0 it makes sense to try separation of variables, writing
This gives us a solution to the partial differential equation but it is not the end of the story if the parti-
cles are identical because we must also consider symmetry on interchanging the particles, and in general
48
φ1 (r1 )φ2 (r2 ) ̸= eiα φ1 (r2 )φ2 (r1 ) [with an exception for bosons (eiα = +1) in the case that φ1 (r) = φ2 (r)].
At this point we should recall how one uses separation of variables to solve partial differential equations:
the general solution is a linear superposition of individual solutions with the product form. In the present
case, that means we should symmetrise or antisymmetrise the solutions we have found.
The Pauli exclusion principle: if the states φ1 (r) and φ2 (r) are the same, then the antisymmetrised
wavefunction ψ(r1 , r2 ) vanishes. From this we conclude that two identical fermions cannot occupy the same
quantum state.
Counting states in statistical physics. Consider two different orbitals φ1 (r) and φ2 (r) that can be oc-
cupied by two particles. Let’s count the number of allowed arrangements, depending on the types of particles.
If the particles are bosons, there are only three states allowed, since the last two given above must be
combined and symmetrised.
If the particles are fermions, there is only a single allowed state: the first two given above are excluded,
and the last two must be combined and antisymmetrised.
More generally, the correct way to specify distinct states in a system of many identical non-interacting
particles is by giving the numbers of particles occupying each orbital. One then constructs the wavefunc-
tion by symmetrising or antisymmetrising, according to whether the particles are fermions or bosons. Let’s
illustrate this with two examples.
Three identical bosons, two in orbital φ1 (r) and one in orbital φ2 (r). Then
1
ψ(r1 , r2 , r3 ) = √ {φ1 (r1 )φ1 (r2 )φ2 (r3 ) + φ1 (r2 )φ1 (r3 )φ2 (r1 ) + φ1 (r1 )φ1 (r3 )φ2 (r2 )} .
3
For N identical fermions in orbitals φ1 (r) . . . φN (r) we can use convenient properties of determinants to
write the wavefunction as
φ1 (r1 ) φ2 (r1 ) . . . φN (r1 )
1 φ1 (r2 ) φ2 (r2 ) . . . φN (r2 )
ψ(r1 , r2 . . . rN ) = √ .
N! ...
φ1 (rN ) φ2 (rN ) . . . φN (rN )
49
This is known as a Slater determinant. This has the noteworthy feature that it can be computed efficiently,
meaning that as we make N bigger and bigger, the time required to compute the determinant (‘time
complexity’) scales no greater than polynomially in N . (Note that the familiar co-factor expansion as a
weighted some of minors is a terrible way to compute the determinant of a large matrix, as its complexity is
factorial in N , which is worse than exponential. There are various algorithms that scale like N 4 or N 3 , and
the fastest, ‘fast matrix multiplication’ scales like N 2.373 .) This means that noninteracting and certain types
of interacting fermionic systems, which can be exactly or approximately captured by Slater determinants,
are efficiently simulable on classical computers. However, in many interesting problems the wavefunction is
a superposition of Slater determinants, requiring essentially new ideas.
The analogous symmetrized object for placing N bosons in N different states is known as a permanent.
Unlike the determinant, there are believed to be no efficient classical algorithms for computing permanents.
In fact, things are even worse than that: permanents are a type of problem for which the worst case (i.e.,
the problem for which the classical algorithm does maximally badly) scales similarly to the average case.
This means that accessing the sorts of correlations between particles that we will discuss in the next section
in a system of many bosons is a task that is computationally extremely difficult for a classical computer.
However, if you had at hand a quantum computer, you could just prepare a many-boson wavefunction and
measure it to extract those same correlations, providing an instance of “quantum speedup”. The formaliza-
tion of this observation into the computational task of “Boson Sampling” [S. Aaronson and A. Arkhipov,
The Complexity of Linear Optics, Theory of Computing, 9, 143 (2013)] is one of the reasons why we believe
quantum computing will offer access to new regimes of computation beyond classical systems. However,
Boson Sampling seems difficult to implement experimentally, and the experiment one would build to access
it wouldn’t be very useful for other things. The famous ‘Quantum Supremacy’ experiment by Google uses a
related task known as “Random Circuit Sampling”, since the random circuits they study can be made less
random and used to do a variety of other interesting things, such as simulate time crystals.
Correlations between particles arising from symmetry of wavefunctions under exchange The
mere act of symmetrising (or antisymmetrising) the multiparticle wavefunction introduces correlations in the
positions of particles. As we will see, these have some profound and surprisingly far-reaching consequences.
To introduce the idea, we start with two particles moving in a one-dimensional box of length L. We denote
the coordinates of the particles by x and y. The single-particle eigenstates are
r
2
φn (x) = sin(nπx/L) .
L
Consider a state for two bosons, with (n1 , n2 ) = (1, 2). It has the wavefunction
√
2
ψ(x, y) = [sin(πx/L) sin(2πy/L) + sin(2πx/L) sin(πy/L)] . (148)
L
To understand the probability density associated with this wavefunction, consider Fig. 12. We see that the
probability density vanishes if the two particles are at opposite ends of the box and is largest when both
particles are near the same end of the box. In other words: bosons bunch together – the opposite of the
exclusion principle that applies to fermions. Note that this correlation in the positions of the particles is
purely a consequence of the symmetry required in the wavefunction. It occurs despite there being no forces
that act between the particles.
What happens for fermions? Since fermions necessarily have spin, we will need to discuss both the spa-
tial and spin parts of the wavefunction. To do so clearly, it is convenient to introduce some mathematical
notation.
Product spaces. Suppose we have basis states {|x1 ⟩} for particle 1 and {|x2 ⟩ for particle 2. We can write
as basis states for the product space
|x1 , x2 ⟩ = |x1 ⟩ ⊗ |x2 ⟩
50
(a) meaning of coord system (b) (c)
(d) (e)
− −
+ +
FIG. 12: Illustration of wavefunctions for a system of two particles moving in a one-dimensional box with positions
x and y. (a) The significance of the coordinate system. (b) First term from Eq 148. (c) Second term from Eq. 148.
(d) Full wavefunction ψ(x, y) as in Eq. 148. (e) The spatial part of the wavefunction for two fermions as in Eq. 149,
when their spin wavefunction is symmetric.
where the symbol ⊗ denotes what is called a direct product. To be clear about its meaning, we should define
scalar products for these states and we should specify how operators act on them. We have ⟨y1 , y2 |x1 , x2 ⟩ =
⟨y1 |x1 ⟩ · ⟨y2 |x2 ⟩. Also, if for example H1 involves only the coordinates of the first particle, and H1 |x1 ⟩ = |z1 ⟩,
then the operator H1 ⊗ 1 (where here the 1 indicates that this operator acts as the identity on the coordi-
nates of particle 2) has the effect (H1 ⊗ 1)|x1 ⟩ ⊗ |x2 ⟩ = |z1P ⟩ ⊗ |x2 ⟩. A general state in the product space is
a linear superposition of these basis states, with the form x1 ,x2 Ax1 ,x2 |x1 ⟩ ⊗ |x2 ⟩.
Orbital and spin wavefunctions. The notation we have just introduced is useful not only for systems
with more than one particle, but also for systems consisting of a single particle, in the event that the
wavefunction depends on both the position and the spin state of the particle. For example, if we say in
words that an electron in a hydrogen atom is in the n = 1 state with spin up, then by the n = 1 state we
mean that the orbital wavefunction has the form φ(r, θ, ϕ) = N e−r/a0 , and by spin up we mean that the
spin wavefunction is the column vector (1, 0)T in a basis where the spin operator sz is a diagonal matrix.
We can combine these two factors by writing the full wavefunction as
1
φ(r, θ, ϕ) ⊗ .
0
Then examples of an operator on the particle coordinates and an operator on the spin are p̂z ⊗ 1 and 1 ⊗ σ z
respectively.
System of two spin-half fermions. Choose basis states of form |ψ⟩ = |space part⟩ ⊗ |spin part⟩. We can
make |ψ⟩ antisymmetric under exchange of particles if one of these factors is symmetric and the other one
is antisymmetric. So if |spin part⟩ is symmetric, then |space part⟩ should be antisymmetric. In our example
of two particles in a box with quantum numbers (n1 , n2 ) = (1, 2) we have
√
2
ψ(x, y) = [sin(πx/L) sin(2πy/L) − sin(2πx/L) sin(πy/L)] . (149)
L
In this case, and in contrast to the situation with bosons, the probability density is largest when the two
fermions are at opposite ends of the box. More generally, the relation ψ(x, y) = −ψ(y, x) implies that
51
ψ(x, x) = 0. This means that fermions avoid each other in space if their spin wavefunction is symmetric.
Spin wavefunctions for two spin-half particles. We take as single-particle basis states the eigenstates
of sz with eigenvalues ±ℏ, written as | ↑⟩ and | ↓⟩. For the two-particle system we have 2 × 2 = 4 states
in total and it is convenient to write these in linear combinations with definite symmetry under exchange.
This gives us three symmetric states
1
| ↑1 ⟩ ⊗ | ↑2 ⟩, √ {| ↑1 ⟩ ⊗ | ↓2 ⟩ + | ↓1 ⟩ ⊗ | ↑2 ⟩}, | ↓1 ⟩ ⊗ | ↓2 ⟩,
2
and one antisymmetric state
1
√ {| ↑1 ⟩ ⊗ | ↓2 ⟩ − | ↓1 ⟩ ⊗ | ↑2 ⟩}.
2
What is the physical significance of these different combinations? To understand this, recall addition
of angular momentum. Combining two spin-half particles, we can get states with total spin Stot = 0 and
z = 0, or S z
Stot tot = 1 and Stot = 0, ±1. It is easy to see that the first symmetric state we have listed has
z = 1 and so must have S z z
Stot tot = 1. The other two symmetric states have Stot = 0 and Stot = −1, and in
fact they too have Stot = 1 (as can be checked by direct calculation). The antisymmetric state obviously
z = 0 and (less obviously) also has S
has Stot tot = 0.
Summarising the story so far, if Stot = 1 then the spin wavefunction is symmetric. This means that the
spatial wavefunction must be antisymmetric. Conversely, if Stot = 0 the spin wavefunction is antisymmetric,
which forces the spatial wavefunction to be symmetric.
Exchange interactions. If there is an interaction potential between particles, the energies of different
spin states are split, even though there are no spin-dependent forces. We can calculate the energy ∆E shift
approximately, using first order perturbation theory in the interaction potential V (r1 − r2 ). We have
Z Z
1
∆E = d3 r1 d3 r2 |ψ(r1 , r2 )|2 V (r1 − r2 ) with ψ(r1 , r2 ) = √ {φ1 (r1 )φ2 (r2 ) ± φ2 (r1 )φ1 (r2 )} (150)
2
where we now know that the choice of sign in the second expression depends on Stot . Expanding |ψ(r1 , r2 )|2
in the first expression, we have
∆E = ∆Edirect ± ∆Eexchange
with
Z Z
∆Edirect = d3 r1 d3 r2 |φ1 (r1 )φ2 (r2 )|2 V (r1 − r2 ) (151)
and
Z Z
∆Eexchange = d3 r1 d3 r2 φ∗1 (r1 )φ∗2 (r2 )φ2 (r1 )φ1 (r2 )V (r1 − r2 ) . (152)
If V (r1 − r2 ) is repulsive with a maximum at r1 = r2 , then typically ∆Eexchange > 0. This means that the
lowest energy state has Stot = 1, or in words, it has the two spins parallel. This is the origin of magnetism.
Fridge magnets stick to fridges by virtue of the Pauli exclusion principle – a remarkable macroscopic conse-
quence of the laws of quantum mechanics!
Composite systems. Very often we want to consider particles that are made up of a number of individual
particles: an obvious example is an atom, made up of electrons and the nucleus, which itself is made up of
neutrons and protons, themselves made up of quarks. Suppose we have two identical particles, each made
up of NF fermions and NB bosons. What are the statistics of these composite particles? Consider the
relation between ψ(r1 , r2 ) and ψ(r2 , r1 ), where r1 and r2 are the coordinates of the composite particles.
52
When we swap the composite particles, we swap NF fermions and each fermion swap brings a factor of −1.
We therefore have ψ(r2 , r1 ) = (−1)NF ψ(r2 , r1 ). Hence if NF is odd, the composite particles are fermions,
and if it is even they are bosons.
Example: superfluidity in liquid helium. A spectacular example of the importance of quantum statis-
tics is provided by the different behaviours of the two isotopes of helium. 3 He consists of two electrons, two
protons and one neutron. Each of these particles is a fermion so NF = 5. Since NF is odd, 3 He is a fermion.
By contrast 4 He consists of two electrons, two protons and two neutrons, so NF = 6 and it is a boson. Both
these isotopes become superfluid liquids at low temperature (meaning that they flow without any friction)
but the temperatures at wihch this happens are very different in the two cases. At ordinary pressure the
isotope that is a boson is superfluid below 2.1 kelvin, but to observe superfluidity in the isotope that is a
fermion, we must cool our experiment to the very low temperature of 2.5 millikelvin! This is another striking
macroscopic consequence of the Pauli exclusion principle.
Non-examinable aside: Anyons. There is a subtlety with our ‘proof’ that two exchanges equal the
identity, to do with topology, when we are in low dimensions (as can be realized in many solid-state systems,
e.g. 2D graphene or 1D carbon nanotubes). This is because, while our conclusions above were correct, we
were a bit sloppy in defining ‘exchange’. To see this, let us think about the process by two particles get
exchanged, in light of our discussion of trajectories above. Let us first consider particles with a hard-core
condition, i.e. no two of them can be present at the same site (The reason we do this is that to talk about
‘exchanging’ particles, they need to be separated in space, because otherwise for indistinguishable particles
there’s nothing to do. Indeed, we will see a posteriori that the only case where the hard-core constraint
could be violated is for bosons, where it is trivial.)
Now, when we say that ‘two [hard-core] particles are indistinguishable’, what we really mean is that
their configuration space (i.e., the space which labels the different configurations of the system) is invariant
under swapping the particles. This means that, if we think of the two particle coordinates as r1 , r2 , we
should always identify configurations where these are swapped as the same configuration. In other words,
we have
configuration space = {(r1 , r2 )|r1 ̸= r2 , (r1 , r2 ) ∼ (r2 , r1 )}
= {(r+ , r− )|r− ̸= 0, (r+ , r− ) ∼ (r+ , −r− )} = V+ ⊗ V− . (153)
where ‘∼’ denotes that two configurations should be identified as equivalent, r+ = r1 +r
2
2
and r− = r1 − r2 are
center-of-mass and relative coordinates, which respectively take values in the usual n-dimensional space V+ ,
and the ‘punctured half-plane’ V− , which consists of the plane with r− and −r− treated as the same point,
and the origin removed. Any nontrivial properties under exchange must be encoded in V− : to exchange two
particles, we can rotate them until they swap positions, and then simply shift the center-of-mass until the
two particles coincide.
Consider performing a pair of exchanges. We see that the square of the exchange operation corresponds
to a loop that encircles the punctured point. Now, in three dimensions, nothing stops us from shrinking this
loop down to a point: this is a topological argument, in that we are using the fact that a global property
(like exchange) should not depend on local deformations. But since we can shrink the double-exchange to
a point, it must be that it is trivial — which is what we used above.
In spatial dimensions lower than three, the problem is more subtle, with the maximal interest being in
2D. Here, the loop corresponding to two successive exchanges encircles the point of the puncture, and so
cannot be shrunk to nothing. In this case, one can show that a new group of transformations, known as
the braid group, characterizes the particle statistics, and one can have arbitrary phases, and even matrix
operations, that characterize the winding (in the latter case, implying that there is some sort of discrete
structure underlying the multiparticle configuration space). Such particles are called anyons, and emerge
as excitations of strongly-interacting phases of electrons in Landau levels.
In one dimension, one can see that we need to think even more differently, since there is no way to define
statistics in the way we have above: particles perforce must pass through each other under exchange, and
53
indeed in two dimensions one can exactly map between theories of bosons and theories of fermions, so the
statistics are somewhat fungible.
A more formal version of this argument may be found on pp. 278-281 of X.-G. Wen’s book Quantum
Field Theory of Many-Particle Systems, (OUP, 2008). However, technicalluy speaking in order to fully de-
fine particle statistics, we should go through a similar set of arguments for the case of an arbitrary number
of particles N . In this case, it turns out that proving that bosons and fermions are the only possibilities
in three dimensions is quite a bit more subtle. (For the mathematically inclined, this has to do with ruling
out higher-dimensional representations of the permutation group; one way to to this is to use Lorentz in-
variance.) The resulting proof is sufficiently involved that even Prof. Simon’s excellent MMathPhus course
on Topological Quantum Matter (which discusses much of the theory of anyons in 2D, and is highly recom-
mended to the interested student) only briefly alludes to it.
54
12 Atomic Structure
Atoms (other than hydrogen) provide important examples of quantum systems made up of many identical
particles — the electrons surrounding the nucleus. To understand atomic properties, we need to take account
of two aspects of the physics: the fact that these identical particles are fermions, and the fact that they
interact with each other via Coulomb repulsion. It is not possible to develop an exact theory, and instead
we will consider various levels of approximation. We will discuss helium in detail, and trends across the
periodic table in a more schematic way.
ℏ2 2e2 e2
2 2 1 1 1
H=− (∇1 + ∇2 ) − + + . (154)
2m 4πϵ0 r1 r2 4πϵ0 |r1 − r2 |
Here the first term is the kinetic energy (with ∇21 acting on the coordinate r1 of the first electron, and
similarly for ∇22 ), the middle term is the potential energy of the two electrons in the potential of the nucleus,
and the final term is the interaction energy of the two electrons. The obstacle to an exact solution is that
there is no coordinate system in which we can use separation of variables.
For orientation, recall the solution for a single particle moving in a Coulomb potential. The Hamiltonian
ℏ2 2 Ze2
H=− ∇ −
2m 4πϵ0 r
has eigenstates labelled by the quantum numbers n, ℓ, m and ms (where ms is the quantum number for
the z-component of spin and the other quantum numbers specify the energy, total angular momentum and
z-component of angular momentum, respectively). These states have energy −Z 2 R/n2 , where R is the
Rydberg energy, taking the value R ≃ 13.6eV.
The simplest approximation is to ignore altogether the repulsion between electrons, leaving out from the
e2 1
Hamiltonian the term 4πϵ 0 |r1 −r2 |
. In this approximation we can use separation of variables, and we can
describe states of the atom by specifying the quantum numbers of the single-particle states occupied by
each electron. Following these ideas, the ground state has n1 = n2 = 1, which requires ℓ1 = ℓ2 = 0 and
m1 = m2 = 0. In addition, the Pauli exclusion principle requires that the two electrons are in different
states. Given these earlier choices, this can only be achieved by taking ms1 ̸= ms2 , so that ms1 = 1/2 and
ms2 = −1/2 (since we will anyway write a properly antisymmetrised wavefunction, nothing new is obtained
by swapping ms1 and ms2 ).
It is also useful to think about addition of the angular momentum associated with each of the electrons,
to get a total angular momentum for the atom. Starting with spin, since n1 = n2 , ℓ1 = ℓ2 and m1 = m2 ,
the orbital wavefunction must be symmetric, and this requires the spin wavefunction to be antisymmetric.
As we’ve discussed, this implies that the electrons are in a singlet state, with total spin quantum number
S = 0. Turning to the total orbital angular momentum L = ℓ1 + ℓ2 , with eigenvalues for |L|2 of L(L + 1)ℏ2 ,
the fact that ℓ1 = ℓ2 = 0 implies L = 0. Finally, combining spin and orbital angular momentum into a total
angular momentum for the atom J = L + S, it is clear that J = 0 in this state.
55
Notation. There’s a standard notation to keep track of this information about the atomic state, which we
now introduce using the ground state as an example. It is denoted by
1s2 1 S0 .
The different factors have meanings as follows. The combination 1s2 indicates that, in the independent-
particle approximation, the 1s orbital is occupied by two electrons. More specifically, ‘1’ givens the value
of n, ‘s’ gives the angular momentum quantum number ℓ, with ℓ = 0, 1, 2, 3 . . . ⇔ s, p, d, f . . ., and the
superscript in 1s2 indicates that two electrons occupy this orbital. The combination 1 S0 specifies the values
of the combined angular momentum quantum numbers S, L and J. Specifically, the superscript is the value
of 2S + 1 (since in this example the superscript is 1, this implies S = 0 as already discussed), the capital
letter gives the value of L using the same code as before (so L = 0, 1, 2, 3 . . . are denoted by S, P, D, F . . .),
and finally the subscript (here 0) gives the value of J.
Note that this ‘term symbol’ notation is not examinable material but will be useful for those of you considering
taking the B3 paper next year.
Ground state energy. Ignoring electron-electron interactions, E = −Z 2 R = −4R for a single electron
with n = 1, and so the ground state energy is twice this, or −8R = −108.8eV. To do better, we should
treat the electron-electron repulsion approximately instead of ignoring it altogether. We can use first-order
time-independent perturbation theory to write
e2
∆E = ⟨0| |0⟩
4πϵ0 |r1 − r2 |
where |0⟩ denotes the state φ1s (r1 )φ1s (r2 ). Clearly, ∆E is positive and of order R. Evaluating the integral
(see for example the book by Binney and Skinner) one finds ∆E = (5/2)R. So our improved estimate of
the ground state energy of helium is
5 11
E = −8R + R = − R = −74.8eV . (155)
2 2
This compares quite well with the experimental value of −79.0 eV.
We can get more insight by thinking about the process one would use to measure this energy. It represents
the cost of removing both electrons from the nucleus, and we can do this in two steps. The first ionisation
energy is the cost of removing one electron to give the ion He+ and a single distant electron e− . The second
ionisation energy is the cost of removing the second electron to give the ion He2+ and two distant electrons
2e− . Considering this two-step process prompts us to ask what the ground state energy is of He+ . This is
a question we can answer with precision, since the problem is accurately hydrogenic. This implies that
Put differently, the second ionisation energy is +54.4eV, which means that the first ionisation energy is
approximately 11R/2 − 4R = 3R/2 = 20.4eV. Why is the first electron easier to remove than the second
one? The answer is that while the first electron is removed, the remaining electron shields some of the
nuclear charge so that the net charge seen by the electron that is removed lies somewhere between 2e (the
bare nuclear charge) and e (the net value of the nuclear charge and the charge of the remaining electron).
This idea of shielding of nuclear charge by electrons will be useful in discussions below.
56
and so to obtain the first excited state we should set n1 = 1 and n2 = 2. Choosing ℓ2 = 0 (ℓ1 = 0 is
automatic since 0 ≤ ℓ1 < n1 ) in the notation introduced above this gives the configuration 1s1 2s1 . Moreover
ℓ1 = ℓ2 = 0 implies that the total orbital angular momentum quantum number is L = 0. For total spin,
however, we have two possibilities, the singlet S = 0 or the triplet S = 1 (since n1 ̸= n2 we can satisfy the
Pauli exclusion principle by antisymmetrising either the spatial wavefunction or the spin wavefunction. The
superscript takes the values 2S + 1 = 1 and 2S + 1 = 3 in the singlet and triplet states respectively, with
values for the combined (spin + orbital) quantum number J = 0 and J = 1 in the two cases. These states
are therefore denoted by
1s1 2s1 1 S0 and 1s1 2s1 3 S1 .
What are the effects of electron-electron interactions on the energies of these states? From first-order
perturbation theory we can write the interaction contribution to the ground state energy as
where the positive sign is for the spin singlet state and the negative sign is for the triplet state (see the
discussion of exchange interactions above). We have
e2
Z Z
∆Edirect = d3 r1 d2 r2 |φ1s (r1 )φ2s (r2 )|2 (157)
4πϵ0 |r1 − r2 |
and
e2
Z Z
∆Eexchange = d3 r1 d2 r2 φ1s (r1 )φ2s (r2 )φ2s (r1 )φ1s (r2 ) . (158)
4πϵ0 |r1 − r2 |
The value of ∆Eexchange is positive because |r1 − r2 |−1 is largest when r1 and r2 are close, and in this case
φ1s (r1 )φ2s (r2 )φ2s (r1 )φ1s (r2 ) is positive. Crucially, this means that the triplet state is lower in energy than
the singlet state – a consequence of exchange interactions. Summarising this result in simple physical terms,
we say that electrons with their spins parallel (i.e. in a triplet state) are required by the Pauli exclusion
principle to avoid each other in real space, which means their Coulomb energy is lower than if their spins
were antiparallel (i.e. in a singlet state).
Going further, we can ask how accurate first-order perturbation theory is. In fact, it gives a poor esti-
mate: evaluating the integrals, Eq. 158 gives ∆Eexchange = 1.2eV while experiment gives a value of 0.4eV.
This means that electron-electron repulsion affects the shape of the wavefunction as well as the energies of
the states, and the true value of the splitting between the singlet and triplet states has contributions from
both potential and kinetic energies.
It is also interesting to ask about the value of Edirect in these excited states, which turns out to be about
0.8R, and so significantly smaller than the its value (3/2)R in the ground state. The reason for this is that
in the excited states the two electrons are on average at different radii and so have larger separation and
smaller repulsive energy.
So far we have discussed the configuration 1s1 2s1 . The second possible configuration with n1 = 1 and
n2 = 2 is 1s1 2p1 and it is interesting to compare the energies of these two configurations. Before we allow
for electron-electron repulsion, their energies are the same, since in that approximation the energy is deter-
mined by the principal quantum numbers of the electrons. After allowing for electron-electron repulsion,
one finds that the energy of 1s1 2p1 is higher than of 1s1 2s1 . We can understand this by considering the
potential seen by the ‘second’ electron as a function of radius. Close to the nucleus it is simply given by
57
the Coulomb potential Ze2 /4πϵ0 r of the bare nuclear charge (therefore with Z = 2), while far from the
nucleus on the scale of the Bohr radius, the potential is due to combined charges of the nucleus and the
other electron (therefore with Z = 1). Electrons in states with a given n are further from the nucleus if ℓ is
larger. This in turn means that 1s1 2p1 is higher in energy than 1s1 2s1 .
There are in fact four distinct energies for states belonging to the configuration 1s1 2p1 , depending on
how the spin and orbital angular momenta of the electrons are combined. Considering the orbital angular
momentum, since we have ℓ1 = 0 and ℓ2 = 1 we must have L = 1. Considering spin, both S = 0 and S = 1
are possible. Combining spin and orbital angular momenta, if S = 0 then we must have J = 1 but if S = 1,
we have the three possibilities J = 0, J = 1 and J = 2. Of these the singlet state (1s1 2p1 1 P1 ) is higher
in energy and the triplet states (1s1 2p1 3 P0 , 1s1 2p1 3 P1 and 1s1 2p1 3 P2 ) are lower, because of exchange
splitting.
Our first-order perturbation theory approach of incorporating the electron-electron interaction greatly im-
proved our estimate of the ground state energy of Helium: from an initial error of ∼ 38%, we now have the far
more modest error of ∼ 5%. It turns out that do better by going to higher and higher orders in perturbation
theory is quite difficult. Instead, a simple variational ansatz can give us a significant improvement.
To see this, let us think qualitatively about how the interactions between electrons modify the problem.
Each electron in effect sees an effectively reduced nuclear charge, since it is partially shielded by the other
electron. Since the hydrogenic wavefunctions know about the nuclear charge through their length scale:
recall that the 1s state in a hydrogen-like atom with nuclear charge Ze has the wavefunction
Z 3/2 −Zr/aB
φZ
1s (r) = √ 3/2
e . (159)
πaB
One way to incorporate the shielding effect might be is consider a variational ansatz where the original
orbitals obtained by solving H1,2 with Z = 2 are replaced by those with some ‘effective charge’ Z̃, which we
can treat as a variational parameter. In other words, we use the variational wavefunction
ψ Z̃ (r1 , r2 ) = φZ̃ Z̃
1s (r1 )φ1s (r2 ), (160)
which, in contrast to our earlier proposed ground state wavefunction, is not built from eigenstates of the
interaction-free single-electron Hamiltonians H1,2 and hence is not an eigenstate of H1 + H2 . We can
nevertheless evaluate the integrals to compute the variational energy of this trial wavefunction. For the
terms not involving the electron-electron interaction, we have
p2 Ze2
⟨ψ Z̃ |H1 |ψ Z̃ ⟩ = ⟨ψ Z̃ |H2 |ψ Z̃ ⟩ =Z̃ ⟨100| − |100⟩Z̃ (161)
2m 4πϵ0 r
where |100⟩Z̃ denotes the ground-state wavefunction of a fictitious hydrogen-like atom with nuclear charge
Z̃. Instead of doing any integrals explicitly, we can rewrite H1 in terms of the Hamiltonian and the kinetic
energy for this fictitious atom:
!
p2 Ze2 p2 Z̃e2 (Z − Z̃) Z̃e2 (Z − Z̃)
− = − + − = HZ̃ + VZ̃ (162)
2m 4πϵ0 r 2m 4πϵ0 r Z̃ 4πϵ0 r Z̃
We can now use the virial theorem: for a central potential, we know that ⟨KE⟩ = − 12 ⟨PE⟩, so that the
58
total energy ⟨H⟩ = ⟨KE⟩ + ⟨PE⟩ = 12 ⟨PE⟩. Using this, we have
(Z − Z̃)
⟨ψ Z̃ |H1 |ψ Z̃ ⟩ = ⟨ψ Z̃ |H2 |ψ Z̃ ⟩ = Z̃⟨100|HZ̃ |100⟩Z̃ + Z̃⟨100|VZ̃ |100⟩Z̃
Z̃ !
Z̃ − 2Z
= − Z̃⟨100|HZ̃ |100⟩Z̃
Z̃
!
Z̃ − 2Z
= +RZ̃ 2 = R(Z̃ 2 − 2Z Z̃). (163)
Z̃
For the interaction term, we have by dimensional analysis (since everything was normalized) that our answer
should just be a factor of Z̃/Z times that quoted from Binney and Skinner above Eq. (155) (if you like,
you can verify this explicitly by writing out the integrals, and changing variables from r to r̃ = Z̃r/Z): we
find that12
Z̃ 5 5
⟨ψ Z̃ |H12 |ψ Z̃ ⟩ = × R = RZ̃ (164)
Z 2 4
Combining these results, we see that the variational energy is
Z̃ Z̃ 2 5
E(Z̃) = ⟨ψ |(H1 + H2 + H12 )|ψ ⟩ = 2R Z̃ − 2Z Z̃ + Z̃ . (165)
8
Optimizing over Z̃, we find a minimum at a nuclear charge of Z̃ < 2, giving a ground state energy of
corresponding to an error of under 2%. You will reproduce this calculation using slightly different methods
in problem sheet 3, and in doing so will compute Z̃ explicitly.
Filling the first two states, we cross the first row of the periodic table (H and He), and filling the next eight
states we cross the second row (Li to Ne). After that discrepancies arise because we have not allowed for
the effects of electron-electron interactions: the 4s orbitals are occupied before the 3d orbitals. In addition,
to understand ionisation energies (see below) we must take some account of electron-electron interactions.
12
Observe that we should be careful not to leave 1/Z as unspecified this expression – if you like, the 5/2 factor has already
included this, so we shouldn’t double count – alternatively we can see that the power of Z̃ that appears must be fixed to be +1
by dimensional analysis.
59
Hund’s rules. As we have seen, electron-electron interactions contribute to the energies of many-electron
atoms in a way that depends on the atom’s spin and orbital angular momentum. Hund’s rules summarise
what combinations give the ground states by minimising the Coulomb repulsion energy.
The first rule is to take the highest spin state allowed by the Pauli exclusion principle.
The second rule is to take the highest orbital angular momentum state consistent with this spin.
Electronic structure of atoms in the first two rows of the periodic table
H 1s1 2S
1/2
He 1s2 1S
0
Li 1s2 2s1 2S
1/2
Be 1s2 2s2 1S
0
B 1s2 2s2 2p1 2P
J J = 1/2, 3/2
C 1s2 2s2 2p2 3P
J J = 0, 1, 2
N 1s2 2s2 2p3 4S
3/2
O 1s2 2s2 2p4 3P
J J = 0, 1, 2
F 1s2 2s2 2p5 2P
J J = 1/2, 3/2
Ne 1s2 2s2 2p6 1S
0
To understand how Hund’s rules lead to these states, let’s consider in detail the case of carbon, C and.
nitrogen, N.
For carbon, starting from the configuration 1s2 2s2 2p2 , we have freedom in arranging the two electrons
among the three different t 3p orbitals, which are labelled by the quantum number m = 0, ±1. Hund’s
first rule says that we should maximise the total electron spin, and we can do this with S = 1 and S z = 1
by taking both electrons to have spin quantum numbers ms = +1/2. Note that we could have chosen a
different Sz state, but the present choice is convenient because Pauli exclusion then requires each electron to
be in a different spatial orbital. By looking at the wavefunctions obtained by combining two L = 1 angular
momenta, we see that the only possibility allowed is the case when the total angular momentum L = 1 (we
could have also obtained this by noting that the L = 0, 2 cases are symmetric under the exchange of the
two particles.)
For nitrogen, starting from the configuration 1s2 2s2 2p3 we have some freedom in how we arrange the
three electrons in the three different 3p orbitals, which are labelled by the quantum number m = 0, ±1.
Hund’s first rule says that we should maximise the total electron spin, and we can do this with S = 3/2
and S z = 3/2 by taking all three electrons to have spin quantum numbers ms = +1/2. The Pauli exclusion
principle than requires these electrons each to be in a different spatial orbital, so we put one electron in each
of the orbitals m = 1, m = 0 and m = −1. This has the consequence that Lz = 0 and implies also that
L = 0. (The second point actually requires a rather tedious analysis of the symmetries of the states obtained
by combining three L = 1 angular momentum. It turns out that while there are many possibilities, the only
one that works is the case where L = 0) Combining this information we have the state 1s2 2s2 2p3 4 S3/2 as
given in the table.
60
FIG. 13: Left: periodic table. Right: variation of first ionisation energy across the rows of the periodic table.
Here n is the principal quantum number of the electron that is removed in ionisation and Zeff is the effective
nuclear charge after allowing for screening by the other electrons in the atom. As we cross a row of the
periodic table n stays the same, the bare nuclear charge increases by a large amount, and the screened
nuclear charge increases by a small amount. This last increase explains the rise in ionisation energy across
a row. As we go from one row to the next, the extra electrons in the atom occupy orbitals with higher n
and this explains the step down in ionisation energy at the end of each row.
61
13 Density Matrices and Entanglement
Note that density matrices, the difference between classical and quantum uncertainty, and quantum entan-
glement are not examinable material but are conceptually very important, especially if you are considering
further study in atomic physics, quantum information, or many aspects of theoretical physics.
In your courses this year, you’ve been introduced to two quite different types of statistical average that
apply to physical systems. On one hand, in quantum mechanics we discuss expectation values of observables
for a system in a given quantum state. On the other hand, in statistical physics we discuss thermal averages
that arise because a system may be in one of many different possible states. There are in fact important
situations in which we need to combine both types of average, and this requires a new formalism involving
mathematical objects known as density matrices. Examples of situations in which we use this formalism
include the theory of quantum computers and the theory of Magnetic Resonance Imaging body scanners.
In more detail, by a quantum average for a system in the state |ψ⟩, of an observable represented by the
operator Â, we mean
⟨A⟩ = ⟨ψ|Â|ψ⟩ (167)
and by a statistical physics average for a system that can be in states labelled n with probability pn , of an
observable that takes values An in these states, we mean
X
[A]av = pn An . (168)
n
An example of a situation where we need to combine both types of average is given by the set of experi-
ments involving the Stern-Gerlach effect, shown in Fig. 14. The question raised by these examples is how
to represent the difference between polarised and unpolarised incident beams. Note that we cannot make
what might at first seem an obvious choice, and represent the unpolarised beam as a linear combination of
polarised beams, of the form c+ | ↑⟩ + c− | ↓⟩, because any choice
√ of c+ and c− will represent a beam that is
polarised in some direction. For example, with c+ = c− = 1/ 2 we have the state | →⟩, which is polarised
in the +x̂ direction and not unpolarised.
(a) (c)
| > | >
unpolarised +z polarised
SG z SG z
| >
no beam
(b) (d)
| > | >
unpolarised +z polarised
SG x SG x
| > | >
FIG. 14: Examples for discussion of density matrices. In each case, a beam of s = 1/2 particles is incident on a
Stern-Gerlach apparatus, which separates particles according to one component of their spin – either the z-component
or the x-component, according to the labels SGz and SGx . The outcome of the experiment (the nature of the outgoing
beams) depends both on the set-up (the choice of SGx or SGz ) and on the nature of the incident beam (unpolarised
or polarised in the +ẑ direction).
In order to see how to make progress, let’s consider the statistical physics average in more detail. Writing
the states labelled by n as |n⟩ we have
X X
[A]av = pn An = pn ⟨n|Â|n⟩ . (169)
n n
62
P set of basis states, e.g. {|α⟩} in place of {|n⟩}. Then we can employ the
Suppose we want to use a different
resolution of the identity 1 = α |α⟩⟨α| to write
XX XX
[A]av = pn ⟨n|α⟩⟨α|Â|β⟩⟨β|n⟩ = ⟨β|n⟩pn ⟨n|α⟩⟨α|Â|β⟩ = Tr[ρ̂Â] , (170)
α,β n α,β n
P
where we have defined the density matrix ρ̂ in terms of its matrix elements, via ρ̂βα = n ⟨β|n⟩pn ⟨n|α⟩,
implying that
X
ρ̂ = pn |n⟩⟨n| . (171)
n
As a basic check, note that this approach encompasses the familiar quantum-mechanicalPexpectation value
since if a system is definitely in the state m, then pn = δnm , ρ̂ = |m⟩⟨m| and Tr[ρ̂Â] = ℓ ⟨ℓ|m⟩⟨m|Â|ℓ⟩ =
⟨m|Â|m⟩ as expected.
Normalisation
Trρ̂ = 1 (172)
Time evolution: if the pn are constant in time, then the TDSE is iℏ∂t |n⟩ = Ĥ|n⟩, which implies
Note that the sign here is opposite to the one for the time-dependence of observables, which is iℏ∂t ⟨ψ|Â|ψ⟩ =
⟨ψ|[Â, Ĥ|ψ⟩ provided ∂t  = 0.
First let’s recall some basics and set up some notation. Spin component operators for S = 1/2 can be
represented by 2 × 2 matrices as follows.
ℏ ℏ 1 0 ℏ ℏ 0 1
ŝz = σz = and ŝx = σx = . (175)
2 2 0 −1 2 2 1 0
63
and the density matrix for a +ẑ polarised beam is
1 0
ρ̂+z = | ↑⟩⟨↑ | ≡ , (179)
0 0
while the density matrix for a +x̂ polarised beam, in the same basis, is
1 1 1
ρ̂+x = | →⟩⟨→ | ≡ . (180)
2 1 1
Observables in the Stern-Gerlach experiment can be represented as follows. First consider SGz . To get the
intensity in the | ↑⟩ beam we can use
1 1 0
A↑ ≡ (1 + σz ) = (181)
2 0 0
Now we are in a position to calculate what will be observed in experiments of the type shown in Fig. 14.
The prescription is that by evaluating Tr[ρ̂Â∗ ] for ρ̂ representing the incident beam, and with ∗ ≡ ↑, ↓, →
or ←, we find the relative intensity of particles in the outgoing beam labelled with ∗. For the case of an
unpolarised incident beam, this calculation is very simple: we have
1 1
Tr[ρ̂unpol Â∗ ] = Tr[Â∗ ] = (182)
2 2
for all choices of ∗.
It’s interesting also to check how things work for an x-polarised beam using the same basis. We find
1 1 1 1 0 1
Tr[ρ̂+x A↑ ] = Tr · =
2 1 1 0 0 2
1 1 1 1 1
Tr[ρ̂+x A→ ] = Tr · =1
4 1 1 1 1
1 1 1 1 −1
and Tr[ρ̂+x A← ] = Tr · = 0. (184)
4 1 1 −1 1
The results in Eqns. 182, 183 and 184 are as they should be (compare with Fig. 14). They illustrate the
fact that the density matrix provides a way to represent both quantum and statistical uncertainty within
the same framework.
64
13.1 “Pure” and “Mixed” states and von Neumann Entropy of a density matrix
Since a density matrix captures both statistical and quantum uncertainty, one might ask whether there is
a way to distinguish these. It turns out that there is a nice basis-independent measure of the of whether
there is any statistical (as opposed to quantum) uncertainty is to compute the von Neumann entanglement
entropy of the density matrix, defined as
⟨OA ⟩Ψ = |c1 |2 A ⟨ψ1 |OA |ψ1 ⟩A + |c2 |2 A ⟨ψ2 |OA |ψ2 ⟩A (186)
where we have assumed that B ⟨ϕm |ϕn ⟩B = δmn , and will also take A ⟨ψm |ψn ⟩A = δmn . We can rewrite
⟨OA ⟩Ψ in a suggestive form:
X
⟨OA ⟩Ψ = Tr(ρ̂A OA ), with ρ̂A = pn |ψn ⟩A A ⟨ψn |, and pn = |cn |2 (187)
n
Unless |c1 |2 = 1 or |c2 |2 = 1 (corresponding to the cases when |Ψ⟩S is a product state), ρ̂A is a mixed state,
and SvN (ρ̂A ) > 0. This quantity, the von Neumann entropy of a reduced density matrix of a subsystem
computed when the system as a whole is in a pure state, is called the [von Neumann] entanglement entropy
of subsystem A (note that for this sort of bipartitioning of the system, it is just as valid to call this quantity
the entanglement entropy of the complement of A, i.e. B.) The entanglement entropy quantifies the extent
to which the lack of knowledge of the state of the complement of A (here, subsystem B) leads to statistical
uncertainties in observables on A. This “lack of knowledge” can be captured by the procedure of “tracing
out” B: we can define
X
ρ̂A = TrB ρ̂ = B ⟨ϕn |ρ̂|ϕn ⟩B , (188)
ϕn
where the sum is over a complete set of basis states for subsystem B. You should check that this prescription
reproduces ρ̂A in the example above.
65
An Example. Consider the singlet state of two spins- 21
1
|ψ⟩ = √ (| ↑⟩A | ↓⟩B − | ↓⟩A | ↑⟩B ) . (189)
2
To compute the entanglement between the two spins, consider density matrix of the full system:
1
ρ̂ = |ψ⟩⟨ψ| = (| ↑⟩A | ↓⟩B − | ↓⟩A | ↑⟩B ) ( A ⟨↑ | B ⟨↓ | − A ⟨↓ | A ⟨↑ |) . (190)
2
Let us now ‘trace over’ B to obtain ρA :
1
ρ̂A = TrB ρ = B ⟨↑ |ρ̂| ↑⟩B + B ⟨↓ |ρ̂| ↓⟩B = (| ↑⟩A A ⟨↑ | + | ↓⟩A A ⟨↓ |) , (191)
2
which is just the same as the density matrix of an unpolarized beam! If we have no knowledge of B in the
state |ψ⟩, then in doing experiments on A we may as well have thrown all our careful efforts to preserve a
fragile superposition out of the window, and replaced it with an unpolarized beam.
Now the entanglement entropy is just SvN (ρ̂A ) = −Tr[ρ̂A ln ρ̂A ] = − 12 ln 12 − 21 ln 12 = ln 2, which turns
out to be the maximal possible entanglement of a spin- 21 with anything else. This is sometimes equivalently
stated by saying that the density matrix we obtained on giving up knowledge of B is ‘maximally mixed’.
We should note that this result is not special to the singlet state: the relative sign between the two terms
was unimportant and so each spin- 12 in the S = 1, S z = 0 triplet state is maximally entangled with the
other. Both these states |ψ⟩ = √12 (| ↑⟩A | ↓⟩B ± | ↓⟩A | ↑⟩B ) are examples of Bell pair states.
Schmidt Decomposition. It turns out that the bipartite entanglement entropy of a pure state is closely
related to a ‘natural’ operation in linear algebra known as the Schmidt decomposition. In Dirac notation,
this states that for a state |Ψ⟩ which lives in a Hilbert space H that can be decomposed into two subsystems
A and B, i.e. as the tensor product H = HA ⊗ HB , one can always write
D
X
|ψ⟩ = λn |ψn ⟩A |ψn ⟩B . (192)
n=1
where dA = dim(HA ), dB = dim(HB ) are the dimensions of the two components, D = min(dA , dB ), and the
sets {|ψnA ⟩} and {|ψnB ⟩} are orthonormal, i.e. A ⟨ψm |ψn ⟩A = δmn and B ⟨ψm |ψn ⟩B = δmn . On the face of it,
this decomposition seems a bit surprising, since a general state in the Hilbert space can be written as
dA X
X dB
|ψ⟩ = Cij |ei ⟩A |fj ⟩B , (193)
i=1 j=1
where {|ei ⟩A },{|fj ⟩B } are orthonormal bases for HA and HB . However, the (possibly rectangular) dim(HA )×
dim(HB ) matrix Cij can be written as C = U ΛV † using a theorem in linear algebra known as the singular
value decomposition. Here, Λ = diag(λ1 , λ2 , . . . λD ) is a diagonal matrix of ‘singular values’, and the U, V †
are (possibly rectangular) dA × D and D × dB matrices that implement the transformation from {|eµi ⟩} to
the basis of ‘Schmidt states’ {|ψnµ ⟩}: we see that
dA X
dB X
D
Uin Λnn′ Vn†′ j |ei ⟩A |fj ⟩B
X
|ψ⟩ =
i=1 j=1 n,n′ =1
D
"d # d
X XA X B X
∗
= Λnn′ Uin |ei ⟩A Vjn ′ |fj ⟩B ≡ λN |ψn ⟩A |ψn ⟩B . (194)
n,n′ =1 i=1 j=1 n
where in the last step we have used the fact that Λ is diagonal and identified the terms in the square
brackets as the Schmidt states. One can see then that the entanglement entropy is then simply given by
66
SvN (ρ̂A ) = SvN (ρ̂B ) = − n |λn |2 ln |λn |2 . It is not too difficult to show that the entanglement entropy
P
of a bipartition is bounded from above by the logarithm of the dimension of the smaller Hilbert space,
SvN ≤ ln D. (This explains why we used the term ‘maximally entangled’ in our example where A and B are
each a single spin- 21 .)
Understanding the entanglement structure of many-body states is an important step in classifying distinct
phases of matter in quantum systems, and in simulating them using computers. The structure of entangled
states also (perhaps unsurprisingly) has connections with quantum computing, and is one of the main
subjects of quantum information theory.
67